Searching Beast and WordPress from a Rails app
published May 17, 2008
I did some work for ecolect recently, integrating search across their main site, a Beast forum and a WordPress blog. It was pretty straightforward once I had it figured out, but I couldn’t find a walkthrough on the net.
So, I decided to write one.
The Search Engine
Searching the Beast Forum
(Note: the beast Forum search isn’t live at the moment as the Forums are not fully functioning yet.)
In Beast, each Forum has many Topics (saved in the topics table), and each Topic has many Posts (saved in the posts table).
In this setup, the Beast tables are added to the main site’s database. So, the posts and topics tables that we want to search are already in the main site’s database. This made things pretty easy: you can search the Beast forum just like you would any database table.
Searching a non-integrated Beast forum
If you don’t have the Beast tables integrated in to your main site’s database, you can still search them. You just need to point the Topic and Post models to the correct database. This is a two step process.
First, set up a database entry for your Beast forum in
config/database.yml. Something like this:
Then, in the bottom of
config/environment.rb, add the following lines:
I haven’t actually tried this, so let me know if you get it working or if you needed to make any changes to what I’ve written here.
The Topic Model
Even with the integrated setup there were a few wrinkles. First, although the Beast tables are in the database, there are no models associated with them. I wanted to search post body and topic titles, so I created Topic and Post models.
Here’s the Topic model. It’s only here so that the Post model can search a posts’s titles, so there’s not much to it.
The Post Model
The Post model is a bit more complicated.
acts_as_ferret declaration makes the Post model searchable. Notice that the actual fields being searched are not taken directly from the database; they are both manipulated in some way.
acts_as_ferret doesn’t really care if the stuff it is indexing is coming directly from the database or from methods you have added to your model.
scrubbing the html tags
The post body is stored with HTML tags in them, so I wanted to search and show the posts with tags scrubbed out of them. This is done using the
Post#scrubbed_body method, which is just an ugly regexp that takes out anything between < and > signs.
the url method
I also wanted to link to the posts, so I created a
Post#url method which is used in the view.
Finally, the actual search is done using the
FullTextSearch mixin, which adds a class method
FullTextSearch mixin is described in more detail below.
Searching the WordPress Blog
The only table from the WordPress Blog that you really care about is the
wp_post table. To get access to it in your Rails app, make a
WpPost model and point it at your WordPress db.
First, create a database entry in
config/database.yml that looks like this:
Then, add the following line to the bottom of
Here’s the model:
There are a few things to note here:
ID, rather than
id, as its primary key. The line
primary_key = "ID" lets Rails know about that. You also need to add an
id method that returns
ID to get ferret indexing things properly.
You will need to scrub the html tags from the content and title; that’s what the
scrubbed_content methods do.
Finally, you don’t want search results to index assets (which are stored in the wp_post model as well) or any un-published posts.
- Real posts will have a
- An unpublished post won’t have its guid set.
This is taken care of by only returning titles or content
if post_type == "post" and !guid.empty?.
The FullTextSearch mixin
This is based on code by Roman Mackovcak’s article on full text search in Rails. All I did was extract the method he provides out in to a mixin so I could use it in multiple models.
To use the mixin in a model, the model needs to define
SEARCH_FIELDS and have an
SEARCH_FIELDS is an array of symbols giving the model fields to be searched.
You use it like this:
full_text_search method returns an array of length two. The first value in the array is the number of search results, and the second value the actual search results.
Re-indexing the non-local models
As Ruben pointed out in the comments, I forgot to mention how I deal with re-indexing the WordPress and Beast database tables. This is necessary as these tables have data that is modified by another application, so your Rails app doesn’t know that changes have been made to them.
To deal with this, I wrote a simple Rake task that reindexes the
Post models, and then added a line to the
crontab to run it hourly. Here’s the rake file, which I put in
and, for posterity, here’s the crontab line:
20 * * * * cd <rails_root> && /usr/local/bin/rake ferret:rebuild_nonlocal_indices >> <log_directory>/ferret_reindex.log 2>&1