Facepunch forums are pretty huge. Thereâs over 20 million posts in the database. This is cool, but the database regularly locks up when searching. Thereâs a number of ways to try to get around this, better database server, faster hard drives, pruning posts, slave databases. But whatever you do sooner or later as your post count increases youâre going to have to do it again.
So last week I looked into Amazon Cloudsearch. Put simply you throw all your posts at this service and it provides an API for you to search them. This means that search isnât using your databases anymore, so your database doesnât need to be half as powerful, and doesnât keep locking up.
Inserting Posts
First of all let me start by saying that you can put anything in here. It doesnât have to be plain text â you can throw pretty much any document at it and itâll make it searchable.
The way I do it on our forum is simple. I added a âindexedâ field to the posts table. When a post is created or edited it sets the âindexedâ field to 0.
Then I set up a cron that scan the posts table every minute, then any posts that are indexed=0 it uploads them to CloudSearch. You upload to CloudSearch by sending AJAX queries â it couldnât be easier. Hereâs how I build mine
So obviously Iâve selected the appropriate posts from the database, then I loop through them all adding them to an array. I donât add them if they are lower than 10 letters, and I strip UTF from the posts to save space (you donât have to do this, I just decided itâd be a waste of time preserving these characters since weâre an english forum).
Then itâs just a case of converting the array to JSON
And sending that baby to Amazon
And if it succeeds, mark them as indexed..
The Fields
You might have noticed that I send date, forumid, threadid and userid with the posts. This allows us to also search via those fields, and filter by those fields. So if you only want to find threads containing the word âButtâ in General Discussion, posted last week, by me â you can do that easily.
But more than that. When you search it will also categorize your results. Itâll show you the top x forumidâs with that word in, and the top useridâs, and the top threadids.
This allows you to show the results in a way that lets people drill down to find what they want. For example, this search:
http://api.facepunch.com/#/page/1/forum//thread//user//search/max{a1da5f31666ba9e4b613f74c475fa44930e0dd96d95a00cdfc356ce4f15804bc}20payne
You can see that itâs most mentioned in the Max Payne thread. And most mentioned by âA Big Fat Assâ. But itâs also mentioned in the Max Payne 3 thread a lot, and maybe thatâs what weâre looking for. So clicking on the thread restricts the results to that thread.
http://api.facepunch.com/#/page/1/forum//thread/1125222/user//search/max{a1da5f31666ba9e4b613f74c475fa44930e0dd96d95a00cdfc356ce4f15804bc}20payne
Not a super example of why this is cool, but compare that process to VBulletinâs search:
Searching shouldnât be THIS much work.
Getting Results
Searching is as easy as opening a URL. Because thatâs all it is. When you create your search domain youâll be given a unique URL to query. You can do this from inside your site (I wrap it in an API) â or you could query it directly via an ajax request. Their dashboard lets you run test queries, so you can make sure itâs working.
Price
This is where the usefulness will probably drop out for most people. The price starts at around $80 a month. So if youâre providing search for one small thing â it probably wouldnât be worth it.
It scales with usage. So if you have a hundred million entries, and youâre querying 1000 times a second â itâs gonna cost a lot more $80. But on the upside performance will remain the same.
Right now we have nearly 3 million posts indexed and weâre still on a single small search type (it scales automatically). I am expecting this to change to the large type soon (which is around $350 a month).
Summary
For us â even at $350 a month â I think itâs worth it, for these three reasons.
- Search Results are Instant
- Search Results are Better
- Takes pressure off the Database
Add a Comment