Skip navigation

Category Archives: code

Having spent quite a few total hours in GDB over the past weeks it started to get boring to print *res->value.list->… etc over and over to see what weird value was harassing my tests. I had heard that GDB had gained Python support  some time ago and decided to write some useful tools for debugging XMMS2 related code.

This is now what it looks like when you print a xmmsv_coll_t:

(gdb) print *coll
$11 = {
  "attributes": {
    "type": "id"
  }, 
  "type": "XMMS_COLLECTION_TYPE_ORDER", 
  "idlist": [], 
  "operands": [
    {
      "attributes": {}, 
      "type": "XMMS_COLLECTION_TYPE_UNIVERSE", 
      "idlist": [], 
      "operands": []
    }
  ]
}

…and a regular xmmsv_t:

(gdb) print *fetch
$15 = {
  "type": "cluster-list", 
  "data": {
    "type": "organize", 
    "data": {
      "id": {
        "aggregate": "first", 
        "type": "metadata", 
        "get": [
          "id"
        ]
      }
    }
  }, 
  "cluster-by": "id"
}

The code is a bit under 100 lines of Python and should be a nice inspiration for people who still haven’t added this huge help to their projects. The code can be found here, and checked out via:

git clone git://git.xmms.se/xmms2/xmmsgdb.git

Annonser

As it’s always been a bit too far behind the scenes I wanted to take some time to describe what measurements has been taken to increase the quality of XMMS2, and what the future has in stock.

Today we have a basic unit test framework built on top of libcunit. To reduce boiler plate code in the actual test suites a number of macros have been defined. Here is an example of the basic structure of a simple test suite:

SETUP (mytestsuite) {
  /* setup what ever is needed
   * for each test case to run
   */
}

CLEANUP () {
  /* clean up such that the state
   * is restored to the state before
   * SETUP (mytestsuite) was run.
   */
}

CASE (mytestcase1) {
  /* the actual test */
}

CASE (mytestcase2) {
  ...
}

To guarantee correctness SETUP will be executed before each CASE is run, and CLEANUP will be executed after each CASE has finished. Additionally the whole SETUP, CASE, CLEANUP is wrapped by the following checks both before and after:

VALGRIND_DO_LEAK_CHECK;
VALGRIND_COUNT_LEAKS(leak, d, r, s);

This imposes no runtime dependency but injects markers such that if the test is executed under Valgrind, each test case will be inspected for memory leaks independently, which causes that test case to fail if a leak is found.

That covers writing test cases and validating their resource management, next up is getting a clear view of what has been tested and this is where coverage reports come into play. To get coverage reporting, via gcov, the –coverage flag is appended to both CFLAGS and LINKFLAGS in the build system. When running the tests a heap of .gcda and .gcno files will be emited which among other things contains the metadata about what lines were executed. To produce something that’s easy to inspect lcov processes these files into a heap of HTML files using the following command line:

lcov -c -b $base-directory -d $metadata-directory -o coverage.info
genhtml -o cov coverage.info

The $base-directory in this case is used to resolve relative paths as our build system outputs its artifacts in a sibling directory of the source directory. So for example the source files will be called ”../src/xmms/medialib.c”, where ”..” is relative to ”_build_”. The $metadata-directory is the directory to recursively scan for .gcda files. See the man page for further details.

So we now know that our tests produce the correct result, they don’t leak, and we’ve verified via coverage that our tests cover the complex code paths we want them to. Already this gives us a lot of comfort that what we’re doing is correct, but there’s one more tool we can use to increase that comfort and that is the beautiful free static analysis tool from the clang project. To perform a static analysis of the XMMS2 source code simply issue the following commands:

scan-build ./waf configure
scan-build ./waf build

After a while the analysis is done and you will be presented with a command to run which opens your browser with the static analysis report. This is the latest addition to our tool chain which will help us to increase our code quality even further, so there are still some warnings of different severities left which should be fixed.

Now on to the future. While working on getting Collections 2.0 into shape I started working on a comfortable way of validating the correctness while getting to know the whole concept and the code behind it so that I could easily modify its structure without breaking things.

First step was to build the data structures via the C-API like clients would, and some basic validation of the result. This turned out to be pretty verbose as the whole data structures would be written out in code instead of generated from some kind of user interface. The first step was to write a small JSON parser that constructed a xmmsv_t which could be used to build the fetch specification, so that by looking at a test for a second you’d know exactly what the result would be. After this the next obvious step was to construct a xmmsv_t from JSON with the expected result. Here a vision of an entirely code-free test suite started to grow, and some lines of code later a xmmsv_coll_t could also be constructed from JSON.

The envisioned test-runner is not committed yet, but what it does is to scan a directory structure like this:

testcases/test_query_infos_order_by_tracknr/medialib.json
testcases/test_query_infos_order_by_tracknr/collection.json
testcases/test_query_infos_order_by_tracknr/query.json
testcases/test_query_infos_order_by_tracknr/expected.json
testcases/test_something_complex/medialib.json
testcases/test_something_complex/collection.json
testcases/test_something_complex/query.json
testcases/test_something_complex/expected.json

And for each directory under ”testcases” it performs the same task as the current test framework does, but now in a way that makes it easy for non C-coders to contribute new test cases.

A bonus here is that it’s easy to re-use this collection of test cases for other types of tests, such as performance tests, which actually already works. When running the suite in performance mode another directory is scanned for media libraries of different sizes (500, 1000, 2000, 4000, 8000, 16000 songs) on which each of the tests are executed, and performance metrics per test is dumped on stdout.

The idea is that these performance tests will emit data in a format that can be used for producing nice graphs based on different metrics. The script that produces the graphs would take as input a number of test-runs, so that you could easily compare multiple versions of the code to check for performance improvements and regressions.

So that’s it folks, if you have recommendations on further improvements, don’t hesitate to join the IRC channel for a chat, or perhaps drop me a mail.

Two years ago we started our journey to write what would become an enterprise server software in the Python language. Over time we’ve done some pretty nutty things that wouldn’t have been made if the Python VM wasn’t crap. The reason we started with Python was due to a constraint on how to communicate with a core component in the environment. In hindsight we probably should have written our own library from start (we have done so today), but it was also an interesting ride.

Like everyone else we noticed that Python becomes slower and slower for each thread you add, specially on SMP systems, thanks to the glorious Global Interpreter Lock. With the help of python-multiprocessing we later were able to take advantage of the 8 cores available to us, at the cost of copying a lot of data between processes (5-60 processes depending on configuration), and consuming a heap of RAM (16-24GB were not uncommon). To reduce the work of using multiprocessing, python-orb was created (which could do with a bit more polish, but it suits our needs).

Later on we noticed that our software pretty much crawled to a halt at a regular interval. At last we started to realize that this might be caused by the Python garbage collector. After some investigation this turned out to be the case, and we decided to just skip the garbage collector altogether as it only helps when you have circular references in your application (Python is otherwise reference counted), and those can be fairly easily circumvented.

Python being a dynamic language means that you pretty much have to make up for the rapid development and compact syntax with twice as many test cases (yes, your application will start with completely broken syntax, and typos until it’s time to execute that particular line of code). This is not really that bad as the tests too are rapidly developed, and you need to have tests to prove that your software does what you want even after a major refactoring.

At the time we found the problem we simply disabled the garbage collector in our test-framework and started logging gc.collect()’s after each test method had run. In addition to this, we added support for running the garbage collector on demand in our software so that we could run it for some hours with tons of data and then see if a gc.collect() returned something. Some days later we had nailed the last of the few cyclic references and were ready to run the whole application with the garbage collector disabled. Result was a lot better performance, and the end of stop-the-world garbage collections. Win!

The new version of our product relies on a much better virtual machine, namely the JVM, we do however still use Python a lot for non performance critical scripting, and for analyzing data and so on. During last week I analyzed a lot of data to locate a bug, this involved loading up a blob of JSON data and juggle it around until something interesting popped up (and it did!). This is a prime example of what disabling the garbage collector can do for you on a daily basis, so here it comes:

> import cjson, time, gc

> def read_json_blob():
>   t0 = time.time()
>   fd = file("mytestfile")
>   data = fd.read()
>   fd.close()
>   t1 = time.time()
>   parsed = cjson.decode(data)
>   t2 = time.time()
>   print "read file in %.2fs, parsed json in %.2fs, total of %.2fs" % \
>                                                   (t1-t0, t2-t1, t2-t0)

> read_json_blob()
read file in 10.57s, parsed json in 531.10s, total of 541.67s

> gc.disable()
> read_json_blob()
read file in 0.59s, parsed json in 15.13s, total of 15.72s

> gc.collect()
0

Ok, so that’s 15 seconds instead of about 9 minutes until I’m able to to start to analyze the data, and of course there was nothing for the garbage collector to collect afterwards. The file in question is a 1.2GB JSON text file, the disks perform at about 110MB/s sequential reads, and we have 8 cores of Intel Xeon E5520 2.27GHz to use (only one core used in this example).

I hope this saves someone elses time as it has saved mine.

The non-smart-phone world seems so distant now after being connected to The Hive<tm> around the clock for a little more than a year with the HTC Dream/Android G1. It’s not the best of phones, but it was the first and I can’t really say any other Android-based phone has impressed me much. There is some hope for the rumored Motorola Shadow but nevermind, this post is about the applications I’ve grown to love.

There are a couple of applications in my daily life, but some applications stand out more than others.

  1. ConnectBot
    This is hands down the best SSH-client on-the-go that I’ve ever used. It supports keys, multiple concurrent sessions, it hooks up one of the hardware buttons to switch between the windows in GNU Screen. Gesture support involves scrolling Up/Down in the buffer or sending Page Up/Down depending on if you touch the left or right part of the screen. The trackball is Ctrl which makes using a shell with high latency a breeze. There are bookmarks, and you can even tunnel ports to the phone which is really nice if you have some web-page hidden inside some network or something. Simply put, pure awesomeness. It’s not uncommon I start my work day on the bus with this application.
  2. Google Listen
    I never really cared about podcasts before, but this completly changed when I found this wonderful application. With flat rate data subscription, and the podcasts being downloaded to the phone, or streamed as you listen, this sweet application makes podcasts really accessible. The only annoying thing is that it continues to play new podcasts in queue with no way of stopping after only one, which causes me to wake up with strange voices in my head in the middle of the night. Another feature some iPhone fanboy friends of mine have in their podcast clients is the ability to increase speed, which would be very nice when listening to The Economist podcast. My current list of poison can be found here.
  3. Twidroid
    I wasn’t really into twitter until I found this application. Haven’t tried many other as I don’t feel limited with this one. It’s not mega awesome, but it’s well written and does its job well. It supports all the features you’d expect, it updates tweets in the background, it supports URL shortening services, photo sharing services, it hooks into the Share-feature in Android etc.
  4. Google Sky Map
    Using the accelerometer to navigate, GPS to fetch your position, it presents to you with a 3D map of the universe around you. As a typical Swede I could only spot the Big Dipper and perhaps Orion’s Belt so for me this app is a big +1. A dark night last summer I found myself amazed by having augmented my reality with the ability to see the stars that were right under me, only visible from other parts of earth. A must see at least.
  5. FxCamera
    A pretty simple but neat camera application that applies some fancy filters to your otherwise crappy photos. It’s a nice addition when you snap a photo and upload it to Facebook or Twitter directly from your phone. Features Toy Camera, Polaroid etc.
  6. Google Reader
    Ok, not really an Android application, but it is a custom version for mobile use, and I use it a lot while I travel by bus, or just being too lazy to grab my laptop. Very effective way of getting your daily dose of from The Hive<tm>.

So with the mentioned applications I’m pretty satisfied with the whole Android experience. The only area that’s currently lacking is in Tower Defense games, but that’s probably just a matter of time, and it’s probably good that there aren’t any worth playing yet ;).

As for firmware customizations I’ve done some experimentation. At first I used the JesusFreke firmware, which got discontinued, next up was CyanogenMod which was all the rage the whole autumn, and I recently switched to OpenEclair which is a rock solid Android 2.1 version for the G1 that I’m really satisfied with.

It’s nice to see that such a large community of hackers have spawned around the Android project and I hope it grows even more. I haven’t had the time to get involved myself yet, except for some half-assed attempt to play with Scala, and a small XMMS2 client just to get the feel for the API. Hopefully time permits future adventures into Android-hacking, I still have hopes, and it looks like Android is here to stay.

So to sum it up, I’m really satisfied with Android, although I find it a bit sad that no manufacturer have yet to come even close to the iPhone touchscreen performance  (although S-E X10 Mini is pretty close, unfortunatly with a molested UI).

GSoC Mentors Summit in all glory, but all sessions and no hack made drax and me dull boys… enter Skidbladnir to bring joy to life!

After a day of slow sessions, me hacking on Abraca, while drax hacking on a new web 2.0 client we decided that enough was enough, time to get some collaboration going.

I actually came up with the idea a really long time ago, while Service Clients was just an vague idea in the minds of drax, theefer, and the wanderers.

As I live in Sweden, home of the fast Internets, I know that a whole lot of people would be very happy if their favorite music player had easy access to, everyones favorite, The Pirate Bay for getting more content.

A typical scenario would be that I was playing some song by Timbuktu, and my music player would automagically notice that I’m missing that new single that Timbuktu, one of Swedens most popular artists, officially released first to the world on The Pirate Bay *hint hint hint all other artists* and then present a link to that torrent for me to click on, and download using my favorite torrent client.

This feature is so hot that ALL XMMS2 clients should have it, thus we wanted to do this as a Service Client.

So late saturday afternoon just before we left Googleplex I started to update the xmmsclient python bindings to match the Service Client branch my student had written during GSoC. Meanwhile drax was working on getting his webclient ready and some helpers to count string distance between Freebase data and some mock Pirate Bay torrent names. Due to jetlag my evening ended early for me, but when waking up somewhere around 3AM I had a great message from The Pirate Bay waiting for me about getting early access to their upcomming webservice API. The rest of the sunday was spent frantically hacking the python bindings so that we could have a running demo before I had to leave for the airport and it worked! Around 2.45PM we made the first working request from the service client and I ran to the bus.

So to summarize what this client does:

  1. Register as a service client that accepts an artist (string) as argument.
  2. Accept request.
  3. Find albums by artist in medialibrary.
  4. Find albums by artist in Freebase.
  5. Find albums by artist in The Pirate Bay.
  6. Subtracts the albums in medialibrary from the albums returned by Freebase.
  7. Calculates string distance from what’s left of Freebase result with The Pirate Bay result to get good names pointing to correct but crappy torrent names.
  8. Return a list of albums missing in the medialibrary by some artist, with links to download.

Right.. and the name Skidbladnir refers to the ship of Freyr that sails the Scandinavian intern^H waters with fair wind, and folds easily into ones pocket.

It’s time again for the annual Hackaton where a heap of creative people meat up and bash their heads against their keyboards until something cool comes out. Be there, write the code, spread the source…

Details can be found at the official page here.

PTSD, I miss google…
I want to go back…
I want to hack…
take me back…

Buildbot is nice… I use it for the XMMS2 project, and I use it at work. However, the hellspawn OS known as Windows likes to tell the user if some executable crashes. This might be nice and user-friendly as it’s a pretty common scenario that applications crash on this OS, but when running unittests in a buildbot slave this causes the slave to hang instead as there’s nobody watching the screen and clicking the ok button in the dialog box. Killing the application with the Win32-api doesn’t help as the message box heavily guarded by the OS (…or rather the result of another application intercepting the crash). I bet others have stumbled upon this disturbing issue, and like me don’t know that much about the OS in question, so here’s one solution that works:

  • Disable problem reporting in ”Properties in My Computer”
  • Disable JIT debugging in ”Tools->Options->Debug” in Visual Studio versions
  • Enable drwatson by running drwtsn32.exe -i

After reading an article over at wired.com earlier this week I wondered how fast I typed these days. The girl in the article managed to type 120 words per minute, and I know I was pretty fast in school when we worked on type writers, so up to the test.

First I tried copying a Swedish text from paper while a friend was standing next to me with a stop watch, the result measured around 80 wpm (although I think I could write a bit faster if I’d tried a couple of more times).

Second up was finding a program on the computer that just gave me random words and performed the same measurement. After a quick search with apt I found the package ‘typespeed’ and gave it a spin. A couple of tries later I didn’t even manage to get past 52 words per minute, but this test was in english, and there were some words I hadn’t heared of before, but still, both 80 and 52 are WAY less than 120.

Ok.. so finally time to get to the point. This ReCaptcha thing has been around for a while now. For those of you who haven’t heared about this wonderful project, here comes a short resume: ReCaptcha is like regular Captcha, but with two words, one known word, and one word that a computer has failed to interpret while digitalizing a book. So each time you solve a Captcha, you help opening up the world of cyberspace to a dear old book. Each word will be sent out to a lot of Captchas thus providing a kind of voting mechanism that filters out the typos.

Ok.. so finally time to get to the point (really). It would be pretty damn neat if you had a typespeed program that fed you a specified ratio of unknown words for you to train up your fingers with. This would ofc not distort the actual word before displaying it like done with Captchas. Unfortunatly I guess it’s a bit more common with people having to fill in captchas for blogs etc, than people trying to improve their writing speed using some silly program.

Anyways, way past bedtime (perhaps the reason for the idea in the first place), so good night intarweb, see you tomorrow!

In January this year I rearranged my living room. After the furniture were in position and cables were starting to find their new routes, I realized that my TV-antenna cable was too short.

At the time it wasn’t unusual for me to get cought in front of the TV, zapping channels, watching random crap, and by doing so I also polluted my brain with those evil commercials that NOBODY wants to watch, but most of us watch anyway.

Instead of going out to buy a new one (so that I could continue to destroy my brain, bit by bit) I decided to simply ditch oldschool TV, and consume all content on demand.

Most series I follow were already available for download from great websites that really understands how to treat a consumer. The missing piece was my absolute favorite TV-station, Swedish Television, that has given me hours of interesting content since I was a kid.

At the time Swedish Television was accessible only through the webbrowser, by Real Media, or Microsoft Media Player, but after browsing some HTML source I managed to grab the URL to the Windows Media stream, and was able to play it successfully with Xbox Media Center. By using the python scripting interface to XBMC I could replicate the web interface as a browsable directory tree with all content Swedish Television decided to put online (which is quite a lot of material). I eventually released the script on the XBMC scripts portal for others to download and it seemed like others liked the idea too. To this day I’ve had 16804 downloads, and lots of positive emails that made it all even more worthwhile.

I’ve been free from commercials for almost a year now, and I must say it feels great. Visiting friends who still watch oldschool TV feels like walking through a time portal, and it’s kind of scary to watch commercials when you’re not used to them, they feel pretty offensive (which of course is the purpose).

Ok, so the year of 2007 has been a great year, but what about 2008? The future of Swedish Television on the web is uncertain. This weekend I had to update the script because the short news clips are now distributed using Macromedia Flash. Luckily the url to the .flv-files could be found in the HTML-source, and XBMC can play all formats known to man, so I still have access to all content I’m interested in. But using Macromedia Flash isn’t the only new addition to the webportal. A couple of months ago Swedish Television started to distribute some content with Move Media Player that would allow for higher quality streams, but required the consumer to install additional software on their competers. When I saw the news I did a little digging to see if I could access the stream, unfortunatly without much luck (may still be possible to find out where to reach them). During the digging I found out a bit more about Move Media Player and it seems to be using the VP7 codec which FFMPEG currently doesn’t suppor (but hopefully will soon). So using VP7 on the Xbox is a no go, for now at least.

FFMPEG not supporting VP7 isn’t that big of a deal. The frustration lies in that Swedish Television is produced by tax-money, and obligatory TV-licence that you must pay if you have a TV or a computer with a TV-card and you live in Sweden. We fund the production of their content, and they repay us with restrictions on the content by using proprietary and patent damaged solutions.

This is NOT ok!

Swedish Television put pride in their claim to be Free as in Freedom. I fully agree to this when applied to their content, but by locking that Free as in Freedom content into proprietary and patent damaged distribution forms they completly invalidate their claim.

Their locked down agenda also stretches outside their own web portal. In an attempt to gain more publicity in the younger crowd of Internet-addicted kids they started to publish some of their content on YouTube which offers really crappy video quality, and uses the non-Free Macromedia Flash format for distributing its content. I can see why they do this as a LOT of people use YouTube, by creating a link between YouTube and their own web portal they are likely to get more publicity, at least to some degree.

However, YouTube isn’t the only external site with damaged distribution forms they’ve looked at. The latest way to gain more viewers has been to establish a relationship with a new site called Joost. This site is run by the founder of Skype, and is even more closed down by requiring you to create an account to watch the TV programs that the population of Sweden has collectively funded. Sorry, I don’t see the public interest in associating a person with a tv program, maybe someone can enlighten me? I haven’t verified what codec Joost uses, but as Skype uses VP7, that is probably a good bet. But to play content from Joost, you have to blindly implement their P2P technology too which most likely will never happen.

I wonder how much money has poured into the pockets of Microsoft, Real Media and Move Media for providing the current solutions, and Joost for the new sidetrack, and the project of uploading their content to YouTube. I wonder how many hours of developer time those price tags could have translated to if Swedish Television had hired a group of good programmers to write the necessary tools and modifications to get a Free solution using Theora or Dirac up and running.

So, in my eyes the future of video on demand from Swedish Television using Free software without steping on patents is in grave danger, specially when (and it’s probably just a question of when) EU opens its arms for US patents. For those who agree with me I strongly suggest you sign this petition in hope of seeing the Free as in Freedom content using Free as in Freedom distribution forms:

http://www.namninsamling.se/index.php?sida=2&nid=1120

Update:
The Xbox Media Center script, Sveriges Television 0.94, can be found here.