May 29, 2014

Steps to a building out a full fledged cloud - local environment

In my current job, I have had to design a software product from scratch - without any inherited infrastructure/ rules/ processes whatsoever! 

It has been a great learning opportunity for me and my colleagues to test our ideas. We started off with a bold idea. Our application would only live on the cloud. All collaboration, build out, testing would only happen on the cloud. Every developer would just be handed over a laptop and from then on, the next interaction would be with the cloud.

As a starting point, we wanted to model our SDLC (Software Development Lifecycle), around the following toolset.


  • Starting development from a high end MAC, we wanted to compress our entire cloud application into one virtual machine
  • Vagrant (along with Githubturned out to be the perfect choice for scripting and sharing our development environment with other programmers on the team
  • Scripting our development environment itself was based around Chef-Solo (bundled along with Vagrant)
  • Finally, an open source chef-server on Amazon-EC2 for creating and setting up nodes
  • This (using open source chef-server, instead of relying on Amazon OpsWorks service) also keeps an option open for us to migrate away from Amazon-EC2 to another provider or a leased data center cloud in the future
The ride was not all smooth and rosy, and there were some good learning and experience that we built along the way.

Lets start off with the choice of MAC. We chose the 2.3 GHz mac with 16 GB wired memory. On this, our 64 Bit Ubuntu VM is provided with 4 GB memory. Inside of this Ubuntu VM, we run a bunch of application and database services (to be discussed in future). When we tried running on a slightly lower end MAC (2.0 GHz with 8 GB wired memory), there was appreciable slowdown in using the machine and the VM - leading to complaints from engineers and general loss of productivity. Lesson 1: in skimping out a development environment, ensure that you have a very high end client/ development laptop. 

Even in the better client MAC, we see that heavy use of the development environment forces a shutdown ("Your MAC was restarted because of a problem") is unfortunately an occasional issue that crops up.

Moving on to Vagrant. The bundled in (and free!) VirtualBox provider is good enough to get started - however we soon learnt (running our application services), VirtualBox wasn't the right choice for stability and sharing code. Upgrading to the paid VMWare Fusion hypervisor and Vagrant VMWare provider plugin took care of that issue. However, re-running our scripts using VMWare's built in HGFS kernel module runs into issues; we need to delete and re-provision our virtual machine in order to trust the entire buildout.

This lead to another (related) issue - how do we make it cheaper to destroy and recreate our development VM's at will? Experimentation is in our development culture; and making it easier is something that encourages it further. We innovated by downloading and standardizing relevant (debian) packages to our MAC laptop, and then sharing the downloaded folder with our Vagrant VM. Two more steps got us the cheaper VM buildout we wanted:
  1. Adding the "shared" folder to "/etc/apt/sources.list"
  2. Changing chef recipes to run in "vagrant" mode; reading and changing attributes to help a cheaper build out
Relevant script for #1 is below:

#!/bin/bash
dpkg -i /home/vagrant/downloads/debs/chef_11.8.2-1.ubuntu.12.04_amd64.deb
dpkg -i /home/vagrant/downloads/dpkg-dev/*deb
if [ `grep -c /home/vagrant/downloads/debs /etc/apt/sources.list` -eq 0 ]; then
  echo "deb file:/home/vagrant/downloads/debs /" \
    | sudo tee -a /etc/apt/sources.list
fi
cd /home/vagrant/downloads/debs
dpkg-scanpackages . /dev/null | gzip -9c > Packages.gz
sudo apt-get update

A relevant check for #2 is below:


if node['chef-solo-var']['user'].eql?("vagrant")
  // run vagrant env commands
else
  // run cloud (amazon ec2) commands
end

In the next post, we will discuss setting up other parameters for setting up the cloud.







Aug 29, 2013

AAA

Authentication, Authorization and Auditing are important techniques in information management. 

A quick metaphor to understand their importance lies in comparing information to life giving blood stream in a 21st century service organization, and these techniques acting as "valves" allowing safe, reliable and correct passage ways for this information to flow. 

In days long gone by, these techniques (concerns) were bundled and recreated inside of every single app. Luckily, this wasteful practice soon changed: organization wide user directory services were built out, and applications interrogated these new directory services to validate users and accept/ deny information access appropriately. 

The advent of world wide internet changed this situation, as information could (needed to) flow across multiple organizations. As a result, information became more broad, and users became more productive. However, the internet is decentralized and user information is spread (fractured) across multiple directory services. This causes confusion for the application provider, and the user.

The application provider needs to support multiple heterogenous directory services and users have to replicate their information and guarantee consistency across these.

Unfortunately, this is the reality of today. I am not aware of any elegant solution to this fundamental problem. Lots of open standards (CAS, OAuth, Open ID, SAML, CBA, ...) have been proposed and succeeded marginally, but failed to meet their desired goal eventually. These standards have caused more work and confusion for the application developer and provided no solution for the user to manage his information across different directories.

The situation is not so grim however, and we hope to make progress on making both the experience of the user (of the application) and the application developer much more pleasant than today.

The idea utilizes a well known (and one could argue trivial) technique to all computer science students: introduce an abstraction which hides away complexity of dealing with multiple directory services. For brevity, we can name this abstraction as  bis - brokered identity services. This works well in the decentralized internet as now this abstraction can change independently of the user or the application developer.

For bis to be effective, following assumptions must hold true:

  1. bis mediates (proxies) communication from an application to to a given directory
  2. bis normalizes (translates) identity tokens from a given directory
  3. bis manipulates user entries (replicates) data subsets in a given directory on an as needed basis  

We will study more in detail on each of these points in upcoming posts.

Apr 21, 2013

Make: the swiss army knife of software packaging

Most of the time spent on developing software is in the maintenance phase of its lifecycle (adding new features, fixing bugs). In SaaS, where software is sold to customers as a service - its critical to get developer productivity high in this phase. 

Unix utility make is the swiss army knife of software automation toolset. Unfortunately, lots of new programmers get scared by its peculiar tab syntax. However there are lots of rewards for programmers who spend a little bit of extra time learning about the extensive feature set of make

One of the niceties of make lies in making use of its meta-programming features (using built in eval function). Using eval, we can define new targets runtime. I use this technique all the time - avoiding boilerplate work (for me and other developers on the team).

Here's the tip/ trick in all its glory:

# we assume that each module is encapsulated 
# in its own directory (relative to current)
# note: $$ is needed to escape make from 
# interpreting $N (from $NF)
modules=$(shell \ls -l . | awk '/^d/ {print $$NF}')

# 'all' target is the default
# needs to be the first target 
# in the Makefile
all: $(modules)

# ensure that presence of directory
# does not imply to make 
# that the module has already 
# been built
.PHONY: $(modules)

# build process of a single module
# $(1) is the passed in module name
define process_module
$(1): 
 @echo building $(1)
endef

# for each module 'x' evaluate the 
# macro process_module
$(foreach x,$(modules),$(eval $(call process_module,$(x))))