Musicbrainz is having grief with their acoustical fingerprint server.

I thought I'd give a modification of the markov chaining playlist generator a try for matching tracks.

The good news is that the matching works well. Extracting 180 distinct features yields a correct answer 83.4% of the time when querying for the nearest neighbour. That number goes up to 87.8% if you query for the 3 nearest neighbours.

The bad news is that it works better with more features extracted. This is bad because the matching process is finding the nearest neighbours to a vector in a n-dimensional space. n is the number of features extracted. For example, if 180 features are extracted, then a query to find out which track in the database is closest translates to a finding the nearest neighbouring point in 180 dimensional space. This turns out to be a hard problem. Most spacial indexing systems (R-Trees, X-Trees, M-Trees) suffer from "the curse of dimensionality" -- query times are dependant on the square of the dimension.

The good news is that there appear to be techniques for avoiding the so called "curse of dimensionality": The Pyramid Technique, iDistance, p+-tree.

The pieces

The puzzle

After a few iterations, here's the recipe I came up with.

The creation of the decision tree determines what constitutes a feature. Care must be taken to ensure that the decision tree creation process is tolerant of encoder artifacts.

This could potentially map onto something similar to the current TRM approach:

There are quite a few improvements left to be made to the basic approach. First it needs to scale well, both in database size, and number of dimensions. The MTrees currently being used do not scale well. There is still a large amount of tuning that could be made to the decision tree to improve quality of match.