Skip navigation

Category Archives: work

Two years ago we started our journey to write what would become an enterprise server software in the Python language. Over time we’ve done some pretty nutty things that wouldn’t have been made if the Python VM wasn’t crap. The reason we started with Python was due to a constraint on how to communicate with a core component in the environment. In hindsight we probably should have written our own library from start (we have done so today), but it was also an interesting ride.

Like everyone else we noticed that Python becomes slower and slower for each thread you add, specially on SMP systems, thanks to the glorious Global Interpreter Lock. With the help of python-multiprocessing we later were able to take advantage of the 8 cores available to us, at the cost of copying a lot of data between processes (5-60 processes depending on configuration), and consuming a heap of RAM (16-24GB were not uncommon). To reduce the work of using multiprocessing, python-orb was created (which could do with a bit more polish, but it suits our needs).

Later on we noticed that our software pretty much crawled to a halt at a regular interval. At last we started to realize that this might be caused by the Python garbage collector. After some investigation this turned out to be the case, and we decided to just skip the garbage collector altogether as it only helps when you have circular references in your application (Python is otherwise reference counted), and those can be fairly easily circumvented.

Python being a dynamic language means that you pretty much have to make up for the rapid development and compact syntax with twice as many test cases (yes, your application will start with completely broken syntax, and typos until it’s time to execute that particular line of code). This is not really that bad as the tests too are rapidly developed, and you need to have tests to prove that your software does what you want even after a major refactoring.

At the time we found the problem we simply disabled the garbage collector in our test-framework and started logging gc.collect()’s after each test method had run. In addition to this, we added support for running the garbage collector on demand in our software so that we could run it for some hours with tons of data and then see if a gc.collect() returned something. Some days later we had nailed the last of the few cyclic references and were ready to run the whole application with the garbage collector disabled. Result was a lot better performance, and the end of stop-the-world garbage collections. Win!

The new version of our product relies on a much better virtual machine, namely the JVM, we do however still use Python a lot for non performance critical scripting, and for analyzing data and so on. During last week I analyzed a lot of data to locate a bug, this involved loading up a blob of JSON data and juggle it around until something interesting popped up (and it did!). This is a prime example of what disabling the garbage collector can do for you on a daily basis, so here it comes:

> import cjson, time, gc

> def read_json_blob():
>   t0 = time.time()
>   fd = file("mytestfile")
>   data = fd.read()
>   fd.close()
>   t1 = time.time()
>   parsed = cjson.decode(data)
>   t2 = time.time()
>   print "read file in %.2fs, parsed json in %.2fs, total of %.2fs" % \
>                                                   (t1-t0, t2-t1, t2-t0)

> read_json_blob()
read file in 10.57s, parsed json in 531.10s, total of 541.67s

> gc.disable()
> read_json_blob()
read file in 0.59s, parsed json in 15.13s, total of 15.72s

> gc.collect()
0

Ok, so that’s 15 seconds instead of about 9 minutes until I’m able to to start to analyze the data, and of course there was nothing for the garbage collector to collect afterwards. The file in question is a 1.2GB JSON text file, the disks perform at about 110MB/s sequential reads, and we have 8 cores of Intel Xeon E5520 2.27GHz to use (only one core used in this example).

I hope this saves someone elses time as it has saved mine.

About one and a half years ago I got tired of using Trac and started looking for alternatives. There were (are?) a lot of issues with Trac, but one of the more visible usability problems is that you write filters in SQL. As I’m accustomed to filters in a fire-and-forget fashion, from my years with the Mantis BTS, this doesn’t really work for me. The Almighty Google Machine led me to a heap of people recommending Redmine as a drop-in replacement, with nice import scripts. A couple of days later I’d created my first Redmine instance and have not since looked back.

We’ve also started using Redmine in my project at work, and now the other projects are getting jealous on our fancy setup, hence this post.

Pre-reqs: One piece of hardware with Debian Lenny installed.

First start with adding the Debian Backports Apt repository to your sources.list:

# echo "deb http://www.backports.org/debian lenny-backports \
            main contrib non-free" >> /etc/apt/sources.list
# aptitude update
# aptitude install debian-backports-keyring
# aptitude update

Next up you’ll need an Apache module with a very fancy web page, Passenger:

# aptitude -t lenny-backports install \
                    libapache2-mod-passenger

You’re also going to need some database to store your crap in. I’m just going to base this on MySQL as that’s the DB that was already running on those machines I run Redmine on, and there’s no specific reason why I select version 5.1 here either:

# aptitude -t lenny-backports install mysql-server-5.1

During the installation you’ll be asked to enter a password for the root account on the MySQL database server. If you’re out of ideas I can really recommend installing the pwgen package which will happily generate a secure password for you:

# pwgen -sy | cat

Armed with a MySQL database and a secure password it’s now time to create the Redmine database:

# mysql -u root -p
mysql> create database redmine character set utf8;
Query OK, 1 row affected (0.00 sec)
mysql> create user 'redmine'@'localhost' identified by 'my_password';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all privileges on redmine.* to 'redmine'@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye

…where you’d obviously use that fancy pwgen tool to generate yet another super secure password that you’ll forget before reading the rest of this text.

Armed with a database and a Ruby on Rails hungry Apache module you’re now ready to grab Redmine:

# cd /var/www
# wget http://rubyforge.org/frs/download.php/69449/redmine-0.9.3.tar.gz
# tar xvfz redmine-0.9.3.tar.gz

Now it’s time to remember that fancy password of yours:

# cd redmine-0.9.3
# cat <<EOF > config/database.yml
production:
  adapter: mysql
  database: redmine
  host: localhost
  username: redmine
  password: my-sikritt-passw0rd
EOF

Ok, so now Redmine is configured to access the database, but Rails is missing, lets grab it:

# gem install rails -v=2.3.5
# aptitude install libopenssl-ruby libmysql-ruby

Got Rails! Next up, prepare Redmine, and then populate the database:

# RAILS_ENV=production rake config/initializers/session_store.rb
# RAILS_ENV=production rake db:migrate
# RAILS_ENV=production rake redmine:load_default_data

The last step here will ask for the default language, select something you can understand.

Ok, we’re getting closer to actually run Redmine for the first time. The following steps will hook up Redmine to be run by Apache:

# chown -R www-data:www-data files log tmp public/plugin_assets
# mv public/.htaccess public/.disabled_htaccess
# cat <<EOF > /etc/apache2/sites-available/redmine
<VirtualHost _default_:80>
 ServerName your.domain.name
 DocumentRoot /var/www/redmine-0.9.3/public
 RailsEnv production
</VirtualHost>
EOF
# a2ensite redmine
# /etc/init.d/apache2 restart

When directing your browser to http://your.domain.name Redmine will present itself. You should of course make sure that the rest of your Apache installation works properly now and no strange directories are exposed to evil visitors, but otherwise you should be good to go.. enjoy!

Got a new laptop at work yesterday, the long awaited ThinkPad X200.

Hands down best laptop ever. Everyone should throw away their old crappy laptops and get this one, or the X200s. I’ve started writing a page over at ThinkWiki on how to install Debian on it.

We have this cool Nespresso GEMINI CS220 at work and it has a serial interface. I just connected it to my laptop and fired up a terminal, but it wasn’t very talkative. So now I wonder if anyone else has played with this fine piece of equipment, and how to make it more social. It’s probably possible to gather a lot of statistics, and perhaps even modify its functionality. It would indeed rock to plot graphs over caffeine intake at the office over a long period of time.

Nespresso GEMINI CS220