Puppet modules for Continuent Tungsten Installation

About 3 years ago we (myself and my colleague Jeff Mace) embarked on a journey to automate installations for the Continuent Tungsten and Tungsten Replicator products (both now owned by VMware). Initially this was driven by 3 different requirements

  • Assist customer deployments reducing the load on Support and Deployment teams
  • Standardise QA host setup, we had many hosts with different configurations on them
  • All quick demo setups with Vagrant, both on Virtualbox and AWS

The initial target for this was the MySQL platform using Percona Server (at the time the only variant to support an yum/apt repository). Initially we wrote our own module for installing and maintaining MySQL but after several months of struggling we just offloaded that work to the Puppet labs MySQL module (https://forge.puppetlabs.com/puppetlabs/mysql).

Over the past 3 years it has been expanded to install the following RDBMS
  • MySQL (via Puppetlabs MySQL) – Standard Oracle MySQL, MariaDB and Percona Server
  • Oracle 11g/12c
  • Vertica
  • Hadoop (Cloudera 5)

It’s now at the point where a developer can spin up a new test VM using the following command, setting up by had used to be a multi-hour effort and a barrier for new people.

yum install puppet
puppet module install continuent/tungsten
echo "class { 'tungsten': installSSHKeys => true, installMysql=> true }"|puppet apply

 

This module became a key  component on the recent migration to internal VMware systems. This module allowed the quick deployment of around 1000 vm’s in a new vSphere environment. The deployment covered a range of MySQL flavours and versions, Oracle 11g and 12c and a mix of Hadoop and Vertica tests cluster. This module was paired with a range of internal modules which stood up the complete host, users, test toolkits, network configurations etc with no real manual intervention.

 

 

The initial adoption was painful (about 6 months) and initially had a great deal of push back from users who couldn’t understand why puppet was changing things back. After a while of education and moaning the benefits became more apparent to them.   

https://forge.puppetlabs.com/continuent/tungsten

Advertisements
Puppet modules for Continuent Tungsten Installation

Update

A part of the reserection and migration of some of the posts a lot of the code samples have either gone missing or the formatting has been messed up. I’m working my way through them sorting them out

Update

Failures in AWS

In light of the recent failures I though I would share my findings based on some AWS investigations I have carried out. There seems to have been a mind shift on how we provision and deploy infrastructures in the new Cloud world which has lead to some poor decisions being made.

With the increasing adoption of AWS to build and maintain applications and databases it is becoming commonplace to view a virtual server as a commodity that can be created and thrown away as required. Before AWS when servers where purchased as physical machines and hosted in dedicated datacentres or co-location suites every part of the hardware was investigated before going live to ensure there was no single point of failure
  •  Are the hard disk’s mirrored?
  •  Is there a separate power supply and is each power supply connected to a separate power sources
  • How may NIC cards are there and are they connected to a different switch?

In the new world of AWS all that is needed is the press of a few buttons and a server appears with some storage attached to it. Very little thought is put into what happens when things fail, the common assumption is Amazon will take care of everything and if something fails we can always start a new instance.
What happens if you can’t start a new instance? In recent AWS failures there has been a rush to allocate new instances and Amazon has had to throttle requests to enable it to cope with the load.
22nd Oct 1:02 PM PDT We continue to work to resolve the issue affecting EBS volumes in a single availability zone in the US-EAST-1 region. The AWS Management Console for EC2 indicates which availability zone is impaired. EC2 instances and EBS volumes outside of this availability zone are operating normally. Customers can launch replacement instances in the unaffected availability zones but may experience elevated launch latencies or receive ResourceLimitExceeded errors on their API calls, which are being issued to manage load on the system during recovery. Customers receiving this error can retry failed requests. 
(from the AWS status page)
I’m using Multi-AZ RDS that will keep my data available.
Multi-AZ RDS instances are within the same Amazon region. A failure of a region could affect both the primary and backup instances of your data. If the backup instance is available a failover to it can take a least 3 minutes possibly a lot long depending on the time taken to perform a crash recovery on the data. If this failover is part of a large outage this crash recovery could take longer than you expect because of high loads with other instances failing over.
I’m snapshotting the EBS volumes so I will be able to recreate the databases from that?
EBS snapshots are stored in EBS, in both of the recent big outages EBS failures have been one of the main root causes. Your backup may not be there when you are need it and if it is you may not be able to create a new volume from it because of EBS load or problems.
I’ve got replication Slaves in other regions I can fail over to those
Manually failing over to slaves is a complex process. Depending on your replication topology you could have slaves with different numbers of transactions application to them. Before you can allow your application to start again you need to complete the following
  •      Ensure all the remaining slaves are in a consistent point. Find the slave with the highest number of transactions applied to it and manually apply missing transactions to the remaining slaves until the are all consistent
  •       Manually reconfigure the replication topology and ensure transactions are flowing down to all the slaves
  •       Manually reconfigure your application to point to the new master.

While doing this your application is down. The length of time to do this will depend on the amount of data involved and the complexity of your replication topology. 

What solutions exist to help


Obviously I’m biased toward Continuent Tungsten as they pay my wages but the reason I work for them is because I believe it is one of the best solutions available and it works. There are others out there MHA, Galera etc which I have used but I believe simple managed Async-replication is at the moment the best solution.

Failures in AWS

Installing Tungsten Replicator on Multiple Ports

Today I’ve been setting up tungsten on hosts which have multiple instances running on different ports

The hosts are tung1, tung2 and tung3 (all centos 6  running mysql 5.1) with tung1 being the master and tung2 and 3 being slaves.

The instances in this case are on ports 3101 and 3102. You need to do  a separate install for each port



./tools/tungsteninstaller
    masterslave
    masterhost=tung1
    datasourceuser=tungsten
    datasourcepassword=secret
    –datasource-port=3101
    servicename=tr3101
    homedirectory=/opt/tungsten/tr_3101 
    clusterhosts=tung1,tung2,tung3
    –thl-port=2101
    –rmi-port=10001
    startandreport

./tools/tungsteninstaller
    masterslave
    masterhost=tung1
    datasourceuser=tungsten
    datasourcepassword=secret
    –datasource-port=3102
    servicename=tr3102
    homedirectory=/opt/tungsten/tr_3101 
    clusterhosts=tung1,tung2,tung3
    –thl-port=2102
    –rmi-port=10003
    startandreport
It appears that the replicator needs 2 RMI ports hence the jump as tr3101 is using 10001 and 10002 and tr3102 is using 10003 and 10004

When you use trepctl you need to add the -port=rmi_port to access it
Installing Tungsten Replicator on Multiple Ports

Testing Tungsten Fan In replication

I need to end up with several MySQL servers replicating into a single instance Tungsten Replicator appears to be the only solution for this. To test this out I am using tungsten-sandbox (http://code.google.com/p/tungsten-toolbox/wiki/TungstenSandbox).



The machine needs to have MySQL Sandbox 3.0.24 running (https://launchpad.net/mysql-sandbox/+download)


Create the directory to host the sandboxes

mkdir $HOME/tsb2

Get the Mysql binaries (I’m using 5.5.19 on a 64bit ubuntu box)

mkdir -p $HOME/opt/mysql
cd $HOME/opt/mysql

tar -xvf mysql-5.5.19-linux2.6-x86_64.tar.gz
mv mysql-5.5.19-linux2.6-x86_64 5.5.19

Download and untar the latest version of Tungsten Replicator and Tungsten Sandbox


From within the tungsten replication dir run

../tungsten-sandbox -n 3 –topology=fan-in –hub=3  -m 5.5.19 -p 7300 -l 12300 -r 10300

where -n is the number of nodes to create and –hub is the node to fan into

Once it has installed

cd $HOME/tsb2
./test_topology

Testing topology fan-in with 3 nodes.
Master nodes: [1 2] – Slave nodes: [3]
# node 3
1 inserted by node #1
2 inserted by node #2
appliedLastSeqno: 7
serviceName     : alpha
state           : ONLINE
appliedLastSeqno: 7
serviceName     : bravo
state           : ONLINE
appliedLastSeqno: 57
serviceName     : charlie
state           : ONLINE

More info
http://code.google.com/p/tungsten-toolbox/wiki/TungstenSandbox

Update:


It looks like in more recent versions of tungsten-sandbox the option –hub has been replaced by –fan-in so the command line is now

../tungsten-sandbox -n 3 –topology=fan-in –fan-in=3  -m 5.5.19 -p 7300 -l 12300 -r 10300

Testing Tungsten Fan In replication