Sometimes, there are cases when we need a Search Solution for a hobby project but don’t want to spend extra time and money in having a Lucene
or some other form of abstraction of Lucene
, such as Solr
or ElasticSearch
. If you’re working with MySQL
(or even better, with Postgres
), then you’re in luck – there’s a way to have a quick and dirty search solution for your dear project.
We could use ElasticSearch… but that would be expensive
Before we go on looking at the solution, let’s look at the problems with using ElasticSearch
for a hobby project in a bit more detail.
To use ElasticSearch
, you need to know what you want to index, and what analysers to use to effectively search through your documents.
You then have to implement a mechanism that triggers insertion or updating of new documents to your ElasticSearch
index – this is usually done through some form of messaging if you want to do this properly – if you go this route, you have to have something that’s getting notified about these new messages to insert documents to ElasticSearch
.
You could also go with the polling
(pull) route, where you poll the database for changes by looking at the timestamps and upserting
documents to ElasticSearch.
This is all a lot of extra effort, and quite a bit more money needed to set something like this up unless you have your own dedicated box.
MySQL LIKE Operator
Hmmmm… let’s see, one search solution could be as simple as using the LIKE
operator.
SELECT ...
WHERE column LIKE '%keyword%'
One problem with this though is that it scales poorly as your dataset becomes larger.
MySQL Full-text Search
The better way of implementing a poor man’s solution is using MySQL full-text search.
The first thing that you have to do in order to get full-text search working is indexing columns that you want to search against detailed in MySQL’s Manual
Generic Abstractions
Let’s first define some abstractions of data structures required for performing a search in the most generic way – keeping abstractions to the bare minimum avoiding headaches later.
public abstract class PaginatedDbQuery
{
public virtual int Skip { get; set; }
public virtual int Take { get; set; }
}
public class SearchResult<T> where T : class
{
public virtual IList<T> Hits { get; set; }
public virtual int HitCount { get; set; }
public virtual long TimeTakeInMs { get; set; }
}
Data Structures
In my example, I want to be abe to search jobs, with string match for location and keywords that search both job title and job description.
public class JobSearchQuery : PaginatedDbQuery
{
public const int DefaultTake = 25;
public string Keywords { get; set; }
public string Location { get; set; }
public bool IsRemote { get; set; }
public override int Take { get; set; } = DefaultTake;
}
public class JobSearchResultItem
{
public int Id { get; set; }
public string Title { get; set; }
public string LocationName { get; set; }
public bool IsRemote { get; set; }
public string SalaryText { get; set; }
public int CompanyId { get; set; }
public string CompanyName { get; set; }
public string CompanyImagePath { get; set; }
public IList<Skill> Skills { get; set; } = new List<Skill>();
}
public class JobSearchResult : SearchResult<JobSearchResultItem> { }
MySQL Full-text Search Query
Let’s create a query that performs the most complex query – in our case, a way to search against title and description along with a full text match on location.
SELECT
j.id AS Id
, j.title AS Title
, j.location_name AS LocationName
, j.is_remote AS IsRemote
, j.salary_text AS SalaryText
, c.id AS CompanyId
, c.name AS CompanyName
, c.image_path AS CompanyImagePath
, MATCH(j.title) AGAINST('search keywords') AS title_relevance
, MATCH(j.description) AGAINST('search keywords') AS description_relevance
FROM job AS j
JOIN company AS c
ON j.company_id = c.id
WHERE MATCH(j.title, j.description) AGAINST('search keywords')
AND j.location_name LIKE 'London'
ORDER BY title_relevance * 2 + description_relevance DESC
LIMIT 0, 10
Implementation
Now for our implementation, I’ve kept it simple – there are lots of things that can be improved, such as using a SQL Builder, or string builder at the very least for creating the query, but the implementation solves the problem at hand for now.
public interface IJobSearchService
{
JobSearchResult Search(JobSearchQuery query);
}
public class JobSearchService : IJobSearchService
{
public JobSearchResult Search(JobSearchQuery query)
{
var stopwatch = Stopwatch.StartNew();
var countSelect = @"SELECT COUNT(*)";
var select = @"SELECT
j.id AS Id
, j.title AS Title
, j.location_name AS LocationName
, j.is_remote AS IsRemote
, j.salary_text AS SalaryText
, c.id AS CompanyId
, c.name AS CompanyName
, c.image_path AS CompanyImagePath";
var whereClause = string.Empty;
var keywordWhereClause = false;
if (!string.IsNullOrWhiteSpace(query.Keywords))
{
var escapedKeywords = MysqlEscape(query.Keywords);
var matchTitle = $" , MATCH(j.title) AGAINST('{escapedKeywords}') AS title_relevance";
var matchDescription = $" , MATCH(j.description) AGAINST('{escapedKeywords}') AS description_relevance";
select += matchTitle;
select += matchDescription;
whereClause += $" WHERE MATCH(j.title, j.description) AGAINST('{escapedKeywords}')";
keywordWhereClause = true;
}
if (!string.IsNullOrWhiteSpace(query.Location))
{
var clause = "j.location_name LIKE @location";
if (keywordWhereClause)
{
whereClause += $" AND {clause}";
}
else
{
whereClause += $" WHERE {clause}";
}
}
var searchQuery = $@"
{select}
FROM job AS j
JOIN company AS c
ON j.company_id = c.id
{whereClause}";
var countQuery = $@"
{countSelect}
FROM job AS j
JOIN company AS c
ON j.company_id = c.id
{whereClause}";
if (keywordWhereClause)
{
searchQuery += " ORDER BY title_relevance * 2 + description_relevance DESC ";
}
searchQuery += " LIMIT @skip, @take";
using (var db = new MySqlConnection(Settings.ConnectionStrings.FindTechCareer))
{
var args = new
{
location = $"%{query.Location}%",
skip = query.Skip,
take = query.Take
};
var dbResults = db.Query<JobSearchResultItem>(searchQuery, args).ToList();
var totalCount = db.ExecuteScalar<int>(countQuery, args);
stopwatch.Stop();
return new JobSearchResult
{
Hits = dbResults,
HitCount = totalCount,
TimeTakeInMs = stopwatch.ElapsedMilliseconds
};
}
}
private static string MysqlEscape(string str)
{
return Regex.Replace(str, @"[\x00'""\b\n\r\t\cZ\\%_]", match =>
{
var v = match.Value;
switch (v)
{
case "\x00": // ASCII NUL (0x00) character
return "\\0";
case "\b": // BACKSPACE character
return "\\b";
case "\n": // NEWLINE (linefeed) character
return "\\n";
case "\r": // CARRIAGE RETURN character
return "\\r";
case "\t": // TAB
return "\\t";
case "\u001A": // Ctrl-Z
return "\\Z";
default:
return "\\" + v;
}
});
}
}
Next steps are to get rid of manual escaping and using a SQL Builder for creating our queries.