Continuous Integration with Hudson and Puppet
Problem Statement
You work on a java project that has a few dozen software engineers, some of whom are seasoned engineers with training in agile methods. You have hudson already building and running Junit tests but there's some code that nobody wants to touch which has no unit tests -- and never will. To make matters worse, it takes hours to do a full nightly build and hours more to install from scratch.
Deploying continuously-build jars is all well and good but sometimes the integration team wants it to doesn't install more smoothly. Your requirements are that customers can directly migrate from the 3 earlier versions currently being supported. Upgrades require downtime, which customers don't like. The highest levels of management get monthly reports that measure how smoothly these activities occur.
Sounds pretty bleak, right? Wrong!
The Glass is Really Half-full
You have a great deal of respect for the executive management of your firm who, in their infinite wisdom, have given you a job that pays you very punctually on a semi-monthly basis. Better still, you are a perpetual optimist. You remember that old mentor from Russia who use to say "software that has no bugs has no customers either." You realize that years of work that went in to meeting the rapid pace of the business environment and you wouldn't have a job were it not for that! Think of your poor dog, who would be so sad and hungry now if you didn't have the money for his dog food. Let's not let him down! Get back to work!
No, stop and think for a second. A lot of time has already gone into hands-free installation, DB load scripts, automated integration tests with web clients, etc. These exists because developers needed to test something before it went to production, and QA . Now you're going to use it with hudson, bringing even greater fame and glory to who?? Your boss of course!
You have succeeded in convincing key stakeholders that share your vision:
- The only way to effectively manage a large network of systems is to “automate early and often.”
- Automation can help set and fine-tune successful processes in a way that reduces human error.
- Every superhero knows that with great power comes great resposibility. Automation sometimes backfires.
- A concerted effort to develop a single source of truth benefits everyone.
Your Action Plan
- Use a snapshot filesystem - this may be a Linux LVM, Solaris 10 zfs, or something bigger like an EMC or NetApp product. Then you will need a running solaris zone or x86 virtualization using XEN, KVM, or VMWare
- Automate deployment once your nightly build finishes to a staging box.
- Then you want to scrap it, have each supported earlier release running on a Solaris zone, and run the upgrade
- After all that install and upgrade's done, keep the system running so the continuous integration server doesn't need to waste time during the day when users are committing code.
- Give basically the same tool you develop for your own selfish needs to the field engineers so the can fine-tune their processes, reducing human error.
About this example
- The Solaris method using zones and zfs snapshots is described here.
- We have many CVS branches, but currently we're trying to release PEOPLEMERGE_1_1
- 'puppetmaster.peoplemerge.com' is a SUSE box running on 192.168.200.2. It's the puppetmaster.
- 'testclient.peoplemerge.com' launches automated web tests runs SUSE, with the IP 192.168.200.3 (that's a work in progress)
- 'hudson-build.peoplemerge.com' runs Solaris on SPARC, with the IP 192.168.200.4. It runs continuous integration every minute, running all junit tests. The agile team gets almost instantaneous feedback if a build broke. It also runs the nightly build.
- 'hudson-integration.peoplemerge.com' server runs on Solaris on SPARC, with the IP 192.168.200.5. As well as running hudson, it is the global zone that can start, stop, and snapshot the other zones.
- 'testserver.peoplemerge.com' is another zone on the same physical box as hudson-integration used to deploy app code, and has the IP 192.168.200.6. The global zone knows of it's identity as 'testserver'.
- The global zone can reach testserver's zone on the filesystem at: /zones/testserver/root -- that resolves to '/' on testserver (not '/root'!).
- Eventually, another zone (again, same box as hudson-integration) will be used to create a database, and have the IP 192.168.200.7 (that's a work in progress)
Every day the following will occur
- At midnight the build server will kick off a build. When finished, it writes the file /var/lib/build/latest-PEOPLEMERGE_1_1 with the current build
- The puppet client on the global zone is watching and waiting for that file to change. When it does, it jumps into action.
- First, it blows away the install from the previous day and restarts the zone with a clean Solaris installation.
- Next, it copies the build over using the puppet fileserver. Either the global zone or the test server can do this.
- Next, the test server runs the installer shell script.
- When the install is done, it makes another snapshot before any automated tests begin
Getting started with hudson
Download and install Hudson
Set up all your projects to poll CVS every minute and build them all.
Getting started with Puppet
For details, see Volcane's getting started guide. This explains how to set up the hosts file with these 4 servers.
Also consider buying the "Pulling Strings with Puppet: Configuration Management Made Easy" by James Turnbill. The electronic version is great to mail out to your coworkers.
Quick-start instructions are below:
Add the following yast repo
zypper ar http://download.opensuse.org/repositories/system:/management/openSUSE_11.0/
Install a server on SuSE
zypper in puppet-server
Install a client on SuSE
zypper in puppet
Solaris Installation
Prerequisite: You need ruby installed. Ruby needs to have rdoc, whatever that is. This should verify both:
ruby -rrdoc/usage -e "puts :installed"
Install Facter
wget http://www.reductivelabs.com/downloads/facter/facter-1.3.8.tgz
gzcat facter-1.3.8.tgz |tar -xvf -
cd facter-1.3.8
ruby install.rb
facter --version
cd ..
Install Puppet
wget --no-check-certificate https://reductivelabs.com/downloads/puppet/puppet-0.22.4.tgz
gzcat puppet-0.22.4.tgz |tar -xvf -
cd puppet-0.22.4
ruby install.rb
On server: Put something simple in the /etc/puppet/manifests/site.pp (This will create "/tmp/testfile" if it doesn't exist.)
class test_class {
file { "/tmp/testfile":
ensure => present,
mode => 644,
owner => root,
group => root
}
}
node 'solaris.peoplemerge.com'{
}
Don't forget the quotes since you're probably using an FQDN (.peoplemerge.com) !
(re)Start the server
/etc/init.d/puppetmaster start
A certificate will be generated
On the client, set add the ip for the server in /etc/hosts to 'puppet'. For example
192.168.200.2 puppet
Start the client
/etc/init.d/puppet
The client will generate a cert as well. TODO: on Solaris we need to make an init or get it from blastwave. Till then do:
puppetd --verbose --waitforcert 60
If everything's working OK, the client should try to connect to the server providing it's public key. On Server:
puppetca --list
Will show you the clients trying to connect. Let's say I'm coming from solaris.peoplemerge.com. It should show up on the --list. Next:
puppetca --sign solaris.peoplemerge.com
Setting up Hudson
All your projects run on Hudson continuously. You add a nightly build that runs as a shell script. Click "build periodically" and enter 0 0 * * *
Enter the following in the execute shell box:
BRANCH=PEOPLEMERGE_1_1
DATE=`date +%Y%m%d_%H%M`
DIR=${DATE}_${BRANCH}
#... do what you need to run the build, creating a dir /var/lib/build/$DIR
echo $DIR > /var/lib/build/latest-$BRANCH
Set up the network
On the puppetmaster mount the build server with sshfs:
mkdir /mnt/builds
sshfs -o allow_other buildserver:/var/lib/build /mnt/builds
Setting up Puppet
On puppetmaster, add the following to /etc/puppet/fileserver.conf
[autobuild]
path /mnt/autobuild
allow *
This puppet manifest is my naive first attempt. The problem is that puppet will not orchestrate tasks between systems. Maybe capistrano or controltier can help out here. This process orchestration is critical.
class nightly-build{
$builddir = generate("/bin/cat", "/mnt/build/latest-PEOPLEMERGE_1_1")
file {
"/tmp/latest-PEOPLEMERGE_1_1":
source => "puppet://$servername/autobuild/latest-PEOPLEMERGE_1_1",
notify => [ Exec["sync-nightly"], File["/zones/testserver/root/var/lib/staging"], Exec["install-nightly"] ];
"/zones/testserver/root/var/lib/staging":
mode => 755, owner => root, group => root,
ensure => directory,
recurse => true,
source => "puppet://$servername/autobuild/$builddir";
}
exec {
"sync-nightly":
refreshonly => true,
notify => Exec["rollback-zone"],
command => "/usr/sbin/zoneadm -z testserver halt";
"rollback-zone":
refreshonly => true,
notify => Exec["restart-zone"],
command => "/usr/sbin/zfs rollback storage/zones/testserver@pre-install";
"restart-zone":
refreshonly => true,
command => "/usr/sbin/zoneadm -z testserver boot";
"install-nightly":
refreshonly => true,
cwd => "/home/staging",
command => "/var/lib/staging/install_our_app.sh";
"snap-zone":
refreshonly => true,
command => "/usr/sbin/zfs snapshot storage/zones/testserver@snap-$builddir",
}
}
