High Availability Guide Release Version: 15.0.0 OpenStack contributors Jun 02,
High Availability Guide Release Version: 15.0.0 OpenStack contributors Jun 02, 2017 CONTENTS Abstract 1 Contents 2 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Introduction to OpenStack high availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Configuring the basic environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Configuring the shared services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Configuring the controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Configuring the networking services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Configuring storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Configuring the compute node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Index 86 i ABSTRACT This guide describes how to install and configure OpenStack for high availability. It supplements the Installation Tutorials and Guides and assumes that you are familiar with the material in those guides. Note: This guide documents OpenStack Ocata, Newton, and Mitaka releases and may not apply to EOL releases Kilo and Liberty. Warning: This guide is a work-in-progress and changing rapidly while we continue to test and enhance the guidance. There are open TODO items throughout and available on the OpenStack manuals bug list. Please help where you are able. 1 CONTENTS Conventions The OpenStack documentation uses several typesetting conventions. Notices Notices take these forms: Note: A comment with additional information that explains a part of the text. Important: Something you must be aware of before proceeding. Tip: An extra but helpful piece of practical advice. Caution: Helpful information that prevents the user from making mistakes. Warning: Critical information about the risk of data loss or security issues. Command prompts $ command Any user, including the root user, can run commands that are prefixed with the $ prompt. # command The root user must run commands that are prefixed with the # prompt. You can also prefix these commands with the sudo command, if available, to run them. Introduction to OpenStack high availability High availability systems seek to minimize the following issues: 1. System downtime: Occurs when a user-facing service is unavailable beyond a specified maximum amount of time. 2. Data loss: Accidental deletion or destruction of data. 2 High Availability Guide (Release Version: 15.0.0) Most high availability systems guarantee protection against system downtime and data loss only in the event of a single failure. However, they are also expected to protect against cascading failures, where a single failure deteriorates into a series of consequential failures. Many service providers guarantee a Service Level Agreement (SLA) including uptime percentage of computing service, which is calculated based on the available time and system downtime excluding planned outage time. Redundancy and failover High availability is implemented with redundant hardware running redundant instances of each service. If one piece of hardware running one instance of a service fails, the system can then failover to use another instance of a service that is running on hardware that did not fail. A crucial aspect of high availability is the elimination of single points of failure (SPOFs). A SPOF is an individual piece of equipment or software that causes system downtime or data loss if it fails. In order to eliminate SPOFs, check that mechanisms exist for redundancy of: • Network components, such as switches and routers • Applications and automatic service migration • Storage components • Facility services such as power, air conditioning, and fire protection In the event that a component fails and a back-up system must take on its load, most high availability systems will replace the failed component as quickly as possible to maintain necessary redundancy. This way time spent in a degraded protection state is minimized. Most high availability systems fail in the event of multiple independent (non-consequential) failures. In this case, most implementations favor protecting data over maintaining availability. High availability systems typically achieve an uptime percentage of 99.99% or more, which roughly equates to less than an hour of cumulative downtime per year. In order to achieve this, high availability systems should keep recovery times after a failure to about one to two minutes, sometimes significantly less. OpenStack currently meets such availability requirements for its own infrastructure services, meaning that an uptime of 99.99% is feasible for the OpenStack infrastructure proper. However, OpenStack does not guarantee 99.99% availability for individual guest instances. This document discusses some common methods of implementing highly available systems, with an emphasis on the core OpenStack services and other open source services that are closely aligned with OpenStack. You will need to address high availability concerns for any applications software that you run on your OpenStack environment. The important thing is to make sure that your services are redundant and available. How you achieve that is up to you. Stateless versus stateful services The following are the definitions of stateless and stateful services: Stateless service A service that provides a response after your request and then requires no further attention. To make a stateless service highly available, you need to provide redundant instances and load balance them. OpenStack services that are stateless include nova-api, nova-conductor, glance-api, keystone- api, neutron-api, and nova-scheduler. Introduction to OpenStack high availability 3 High Availability Guide (Release Version: 15.0.0) Stateful service A service where subsequent requests to the service depend on the results of the first request. Stateful services are more difficult to manage because a single action typically involves more than one request. Providing additional instances and load balancing does not solve the problem. For example, if the horizon user interface reset itself every time you went to a new page, it would not be very use- ful. OpenStack services that are stateful include the OpenStack database and message queue. Making stateful services highly available can depend on whether you choose an active/passive or active/active configuration. Active/passive versus active/active Stateful services can be configured as active/passive or active/active, which are defined as follows: active/passive configuration Maintains a redundant instance that can be brought online when the active service fails. For example, OpenStack writes to the main database while maintaining a disaster recovery database that can be brought online if the main database fails. A typical active/passive installation for a stateful service maintains a replacement resource that can be brought online when required. Requests are handled using a virtual IP address (VIP) that facilitates re- turning to service with minimal reconfiguration. A separate application (such as Pacemaker or Corosync) monitors these services, bringing the backup online as necessary. active/active configuration Each service also has a backup but manages both the main and redundant systems concurrently. This way, if there is a failure, the user is unlikely to notice. The backup system is already online and takes on increased load while the main system is fixed and brought back online. Typically, an active/active installation for a stateless service maintains a redundant instance, and requests are load balanced using a virtual IP address and a load balancer such as HAProxy. A typical active/active installation for a stateful service includes redundant services, with all instances having an identical state. In other words, updates to one instance of a database update all other instances. This way a request to one instance is the same as a request to any other. A load balancer manages the traffic to these systems, ensuring that operational systems always handle the request. Clusters and quorums The quorum uploads/s1/ ha-guide.pdf
Documents similaires
-
14
-
0
-
0
Licence et utilisation
Gratuit pour un usage personnel Attribution requise- Détails
- Publié le Nov 04, 2022
- Catégorie Administration
- Langue French
- Taille du fichier 0.7262MB