Permalink13 August 2008 at 09:31 by Alan Capel - Head of Content
Posted under News

© Pictorial Press LtdWe have been working on changes to our search engine to improve the search results for our customers. We have our own search engine with a dedicated team who work on it. This gives us greater flexibility and allows us to change things more quickly.
We have spent a lot of time analysing customer behaviour and looking at where our current search engine lets us down. Customers love the fact that we have such breadth of content so we are keen to ensure relevancy of search results is maintained and where possible, improved.
Recent changes include the following:-
Turning off ‘stemming’
Previously on Alamy the search engine would employ stemming, this involved looking at the main ‘stem’ of a word and also include words in the search calculation that included that ‘stem’. Sounds complicated?
Here's some examples:
- A search where a customer typed in 'Dog' would also return images where the word ‘Dogs’ has been applied. Fine in the vast majority of cases (maybe except when the images were of the ‘Isle of Dogs’!). So for plurals it was usually ok.
However, it wasn’t just used on plurals:
- A search for ‘ski’ would also return images of ‘skies’ so loads of clear blue skies without a skier in sight would appear, not good for the customer.
- Also a search for ‘Communication’ would also return images of ‘communism’.
- ‘Celebration’ would also return ‘celebrities’.
So, as you can see, many irrelevant images were appearing. A short while back we asked for further examples and received some very valuable feedback from our contributors. We appreciated this help and in fact it made us reach the decision to turn off the stemming completely. We could have tweaked and adjusted the search mechanisms to take account of these anomalies but this was not a cost effective, comprehensive or practical solution.
We have done a number of tests with no stemming and it vastly improves the results.
So now, whatever is searched for, will return only images where there is an exact match with the keywords or caption words attached to the images. Simply put, what you put in is what you will get out.
Our keywording advice has always been to include plurals where relevant and not rely on stemming so we hope you will see the benefit of this change.
We apologise if you have followed the additional syntax rule of adding a ^sign before any words you didn’t want stemming. We felt at the time this was the best solution. The good news now is that the problem you were trying to solve is no longer an issue. You do not need to remove the ^sign.
As with all changes, we will periodically review it over an extended period.
Single character ‘stop words’
We have ‘stop words’ that the search engine ignores, this helps the performance of the searches as it reduces unnecessary work. Examples are words like ‘a’ ‘of’ ‘if’.
This helps us, but by analysing customer searches we identified areas where it was counter productive. All single characters were treated as stop words which caused some problems:
- A search for ‘T Shirt’ would treat the ‘T’ as a stop word so would ignore it and return results that match a search for ‘Shirt’, As a customer this was very frustrating as you would have to trawl through all the shirts to find the T shirts.
- We identified that this was happening more than we would like, other examples include ‘L plates’ ‘X box’ and ‘Y fronts’.
So now single characters are included and are not treated as ‘stop words’. A search for ‘T shirt’ will only return images where ‘T’ and ‘shirt’ appear.
Apostrophes and special characters
Again we have looked at where customers are seeing irrelevance or not seeing images that are relevant. We have improved how some special characters are handled and how we work with words which contain apostrophes. The end result being more relevant results.
We are continuing to take the annotation you are applying and matching that with our search engine to ensure the customer gets what they want. Keep an eye on the blog for further developments including utilising the additional syntax and improving phrase searching.
Top