Home > C#, software design > Optimizing the search beast

Optimizing the search beast

One of the projects I’m working on is a product catalog update service. Basically, it provides a couple of web services that updates the database following some rules. One of the services receives a review and upgrades the metadata according an analysis of the text. When I came to the project, all was implemented but it was slow, so my work was to help optimizing the code. I love when projects follows the “desing first, optimize later” idea.

The model have a Searcher class that looks for a series of terms in the text and returns the ones contained in it. The performance problem begun when we started using a real database containing 50000 terms. Let’s assume that the search algorythm is really fast.

Since the text we received has something like 1000 words, the first approach we thought was to create a text with the 50000 terms and to look the 1000 words in it. It didn’t work, because we have to look for terms, not words, and terms could have spaces, so it’s impossible to know where to cut the text.

Well, analyizing the stored terms we realized that several terms were similar or, what is more interesting, started with the same word. So we indexed the terms by first word and look for those words first. Once we know which words exist in the text, we could look for the terms starting with those words only and reduce the amount of terms to find.

Well, the amount of “first words” was ~3500, and it was solved really fast. Finally, the new code performed 25 times faster that the old one :). I was really impressed.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: