Creating an OpenStack test image

These instructions should work to create an test OpenStack image for Cento6. I needed an minimum centos image for testing puppet modules out on.

An Ubuntu server running Devstack was used to build and test the images.

Packages needed on the Ubuntu server

sudo apt-get install vnc4server  vncviewer

Start the vnc server with the options for a larger resolution

vncserver -geometry 1600×1200 -randr 1600×1200,1440×900,1024×768 
Download the Centos 6 Minimum cd image

Create a qcow2 image to use to build the server in
kvm-img create -f qcow2 centos6-min.qcow2 10G
Boot the image
sudo kvm -m 1024 -cdrom .iso -drive file=centos6-min.qcow2,if=virtio,index=0 -boot d -net nic -net user -nographic -vnc :2
In another terminal session connect to the vnc
vncviewer :2
Install the host as usual with a custom disk layout (/ = 9120 and Swap for the rest)

Once the install is complete shutdown the host
shutdown -h now
Reboot the image without the iso attached
sudo kvm -m 1024  -drive file=centos6-min.qcow2,if=virtio,index=0 -boot d -net nic -net user -nographic -vnc :2
In another terminal session connect to the vnc
vncviewer :2

Install any updates

sudo yum -y update
At this point I also disabled iptables (service iptables stop ; checkconfig iptables off) and disabled selinux as this image if for testing.

Remove the line from the /etc/sysconfig/network-scripts/ifcfg-eth0 specifying the hardware address 

Remove /etc/udev/rules.d/70-persistent-net.rules to ensure the new network adapter gets detected a boot time

Shutdown the host again
shutdown -h now
The file centos6-min.qcow2 is ready to be loaded into glance
glance image-create –name=”centos6-min” –is-public=true –container-format=ovf –disk-format=qcow2 < centos6-min.qcow2
Advertisements
Creating an OpenStack test image

Quick Galera Setup

Configuring the nodes

3 CentOS nodes with a minimal install

Galera1 :  192.168.0.231
Galera2 :  192.168.0.232
Galera3 :  192.168.0.233

On each node

rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm
yum install Percona-XtraDB-Cluster-server
yum install Percona-XtraDB-Cluster-client
yum install xtrabackup
mkdir -p /mnt/data
mysql_install_db –datadir=/mnt/data –user=mysql

On each node put the following /etc/my.cnf

Galera1

[mysqld_safe]
wsrep_urls=gcomm://192.168.0.231:4567,gcomm://192.168.0.232:4567,gcomm://192.168.0.233:4567,gcomm://

[mysqld]
datadir=/mnt/data
user=mysql

binlog_format=ROW

wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_slave_threads=2
wsrep_cluster_name=galeracluster
wsrep_sst_method=rsync
wsrep_node_name=node1

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
log-bin=mysqld-bin
server-id=1

Galera2

[mysqld_safe]
wsrep_urls=gcomm://192.168.0.231:4567,gcomm://192.168.0.232:4567,gcomm://192.168.0.233:4567,gcomm://

[mysqld]
datadir=/mnt/data
user=mysql

binlog_format=ROW

wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_slave_threads=2
wsrep_cluster_name=galeracluster
wsrep_sst_method=rsync
wsrep_node_name=node2

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
log-bin=mysqld-bin
server-id=2

Galera3

[mysqld_safe]
wsrep_urls=gcomm://192.168.0.231:4567,gcomm://192.168.0.232:4567,gcomm://192.168.0.233:4567,gcomm://

[mysqld]
datadir=/mnt/data
user=mysql

binlog_format=ROW

wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_slave_threads=2
wsrep_cluster_name=galeracluster
wsrep_sst_method=rsync
wsrep_node_name=node3

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
log-bin=mysqld-bin
server-id=3

Startup the cluster

on Galera1

mysqld_safe &

On Galera2 and 3

mysqld_safe &

Look at the mysql errorlog in /mnt/data for the following message. It means all 3 nodes are synced

130204  9:41:50 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 4,
        members    = 3/3 (joined/total),
        act_id     = 1,
        last_appl. = 0,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 52e67407-6eae-11e2-0800-8eebc3f6cc80

Quick Galera Setup

Redirecting database traffic with iptables

To temporally redirect traffic from one port to another use the following iptables rules
export source=3306 
export destination=13306 
export host=`hostname` 
iptables -t nat -A PREROUTING -p tcp –dport $source -j REDIRECT –to-port $destination 
iptables -t nat -A OUTPUT -p tcp -d 127.0.0.1 –dport $source -j REDIRECT –to-port $destination 
iptables -t nat -A OUTPUT -p tcp -d $host –dport $source -j REDIRECT –to-port $destination
To check them
iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination 
REDIRECT tcp — anywhere anywhere tcp dpt:mysql redir ports 13306 
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination 

Chain OUTPUT (policy ACCEPT)
target prot opt source destination 
REDIRECT tcp — anywhere localhost tcp dpt:mysql redir ports 13306 
REDIRECT tcp — anywhere db2.example.com tcp dpt:mysql redir ports 13306
To remove them
iptables -t nat -D PREROUTING -p tcp –dport $source -j REDIRECT –to-port $destination 
iptables -t nat -D OUTPUT -p tcp -d 127.0.0.1 –dport $source -j REDIRECT –to-port $destination 
iptables -t nat -D OUTPUT -p tcp -d $host –dport $source -j REDIRECT –to-port $destination
Redirecting database traffic with iptables

Failures in AWS

In light of the recent failures I though I would share my findings based on some AWS investigations I have carried out. There seems to have been a mind shift on how we provision and deploy infrastructures in the new Cloud world which has lead to some poor decisions being made.

With the increasing adoption of AWS to build and maintain applications and databases it is becoming commonplace to view a virtual server as a commodity that can be created and thrown away as required. Before AWS when servers where purchased as physical machines and hosted in dedicated datacentres or co-location suites every part of the hardware was investigated before going live to ensure there was no single point of failure
  •  Are the hard disk’s mirrored?
  •  Is there a separate power supply and is each power supply connected to a separate power sources
  • How may NIC cards are there and are they connected to a different switch?

In the new world of AWS all that is needed is the press of a few buttons and a server appears with some storage attached to it. Very little thought is put into what happens when things fail, the common assumption is Amazon will take care of everything and if something fails we can always start a new instance.
What happens if you can’t start a new instance? In recent AWS failures there has been a rush to allocate new instances and Amazon has had to throttle requests to enable it to cope with the load.
22nd Oct 1:02 PM PDT We continue to work to resolve the issue affecting EBS volumes in a single availability zone in the US-EAST-1 region. The AWS Management Console for EC2 indicates which availability zone is impaired. EC2 instances and EBS volumes outside of this availability zone are operating normally. Customers can launch replacement instances in the unaffected availability zones but may experience elevated launch latencies or receive ResourceLimitExceeded errors on their API calls, which are being issued to manage load on the system during recovery. Customers receiving this error can retry failed requests. 
(from the AWS status page)
I’m using Multi-AZ RDS that will keep my data available.
Multi-AZ RDS instances are within the same Amazon region. A failure of a region could affect both the primary and backup instances of your data. If the backup instance is available a failover to it can take a least 3 minutes possibly a lot long depending on the time taken to perform a crash recovery on the data. If this failover is part of a large outage this crash recovery could take longer than you expect because of high loads with other instances failing over.
I’m snapshotting the EBS volumes so I will be able to recreate the databases from that?
EBS snapshots are stored in EBS, in both of the recent big outages EBS failures have been one of the main root causes. Your backup may not be there when you are need it and if it is you may not be able to create a new volume from it because of EBS load or problems.
I’ve got replication Slaves in other regions I can fail over to those
Manually failing over to slaves is a complex process. Depending on your replication topology you could have slaves with different numbers of transactions application to them. Before you can allow your application to start again you need to complete the following
  •      Ensure all the remaining slaves are in a consistent point. Find the slave with the highest number of transactions applied to it and manually apply missing transactions to the remaining slaves until the are all consistent
  •       Manually reconfigure the replication topology and ensure transactions are flowing down to all the slaves
  •       Manually reconfigure your application to point to the new master.

While doing this your application is down. The length of time to do this will depend on the amount of data involved and the complexity of your replication topology. 

What solutions exist to help


Obviously I’m biased toward Continuent Tungsten as they pay my wages but the reason I work for them is because I believe it is one of the best solutions available and it works. There are others out there MHA, Galera etc which I have used but I believe simple managed Async-replication is at the moment the best solution.

Failures in AWS

Setting up Fan in using Tungsten Replicator


Fan-in using Tungsten Replicator is fairly simple. You just need to make sure the data being replicated into the slave is unique and there will be no conflicts.

Fan In the entire Instance


In this example DB1 and DB2 are the masters and DB3 is the slave to fan-in to. Before setting up replication you will need to ensure that the schemas are populated in DB3 with the data you need.

./tools/tungsten-installer
    –master-slave    
    –master-host=DB1  
    –datasource-user=tungsten
    –datasource-password=secret
    –service-name=master1  
    –home-directory=/opt/tungsten  
    –cluster-hosts=DB1,DB3
    –start-and-report

./tools/tungsten-installer  
   –master-slave
   –master-host=DB2  
   –datasource-user=tungsten  
   –datasource-password=secret
   –service-name=master2
   –home-directory=/opt/tungsten2
   –thl-port=2114
   –rmi-port=10002
   –cluster-hosts=DB2,DB3  
   –start-and-report

To test everything is working

mysql -h centos1 -e”create table test.c1 (a int)”
mysql -h centos2 -e”create table test.c2 (a int)”
mysql -h centos3 -e”use test;show tables;”

The output should look like this

+—————-+
| Tables_in_test |
+—————-+
| c1             |
| c2             |
+—————-+
2 rows in set (0.00 sec)

mysql> 

Fan In only specified Databases

You can filter the databases being replicated by adding the –property=replicator.filter.replicate.do option to the installer and enabling the filter

In this example only databaseA will be replicated from DB1 and databaseB from DB2



./tools/tungsten-installer
    –master-slave    
    –master-host=DB1  
    –datasource-user=tungsten
    –datasource-password=secret
    –service-name=master1  
    –home-directory=/opt/tungsten  
    –cluster-hosts=DB1,DB3
    –svc-extractor-filters=replicate
    –property=replicator.filter.replicate.do=databaseA
    –start-and-report

./tools/tungsten-installer  
   –master-slave
   –master-host=DB2  
   –datasource-user=tungsten  
   –datasource-password=secret
   –service-name=master2
   –home-directory=/opt/tungsten2
   –thl-port=2114
   –rmi-port=10002
   –cluster-hosts=DB2,DB3  
   –svc-extractor-filters=replicate
   –property=replicator.filter.replicate.do=databaseB
   –start-and-report

Setting up Fan in using Tungsten Replicator