Continuing from the previous part, where we’ve written and tested our parser using Antlr 4 – let’s now generate a C#
parser, and hook it up to search an elasticsearch
index.
Building the C# Parser
antlr4 -Dlanguage=CSharp -o Generated Predicate.g4 -no-listener -visitor
You will need to include Antlr4.Runtime.Standard as a dependency to use the generated C#
code.
Adding Some Extra Abstraction
I keep the surface area of interfaces at the minimum. This gives me a few benefits:
- The first obvious benefit is simplicity – I don’t expose things that will never be used, this keeps the solution simple making it easier to understand and use.
- This somewhat decouples the parsing process from Antlr, allowing us to change the implementation later on if needed without too many breaking changes.
- By introduction our own syntax tree as an abstraction, we can further simplify the syntax tree by removing unnecessary tokens that are useful for the user of the language, but not so much for the parser, in our case, the
(
and)
simply tells us the order of evaluation, which can simply be done by positioning the tokens correctly within the tree.
Simplified Parse Tree
This is what the simplified parse tree looks like:
public class PredicateNode
{
public PredicateNode(SymbolType type, object value)
{
Type = type;
Value = value;
Children = new List<PredicateNode>();
}
public SymbolType Type { get; set; }
public object Value { get; set; }
public IList<PredicateNode> Children { get; }
}
public enum SymbolType
{
Operand,
Operator,
BooleanOperator
}
public class Operand
{
public Operand(OperandType type, object value)
{
OperandType = type;
Value = value;
}
public OperandType OperandType { get; set; }
public object Value { get; set; }
}
public enum OperandType
{
String,
Number,
Property
}
public enum BooleanOperator
{
And,
Or
}
public enum Operator
{
GreaterThan,
GreaterThanEqual,
LessThan,
LessThanEqual,
Equal,
NotEqual,
Contains
}
Parse Tree Generator
We’re using the Antlr’s generated visitor to build our own parse tree:
internal class PredicateSyntaxTreeBuilderVisitor : PredicateBaseVisitor<PredicateNode>
{
public override PredicateNode VisitExpr(PredicateParser.ExprContext context)
{
return Visit(context.predicate());
}
public override PredicateNode VisitPredicate(PredicateParser.PredicateContext context)
{
// OpenParen predicate CloseParen
if (context.OpenParen() != null)
{
return Visit(context.predicate().First());
}
// predicate booleanOperator predicate
if (context.booleanOperator() != null)
{
var booleanOperator = Visit(context.booleanOperator());
booleanOperator.Children.Add(Visit(context.predicate()[0]));
booleanOperator.Children.Add(Visit(context.predicate()[1]));
return booleanOperator;
}
// operand operator operand
if (context.@operator() != null)
{
var @operator = Visit(context.@operator());
@operator.Children.Add(Visit(context.operand()[0]));
@operator.Children.Add(Visit(context.operand()[1]));
return @operator;
}
throw new Exception("Unhandled Predicate");
}
public override PredicateNode VisitOperand(PredicateParser.OperandContext context)
{
var terminal = (ITerminalNode)context.GetChild(0);
var symbolType = terminal.Symbol.Type;
switch (symbolType)
{
case PredicateLexer.String:
return new PredicateNode(SymbolType.Operand, new Operand(OperandType.String, terminal.Symbol.Text.Trim('"')));
case PredicateLexer.Number:
return new PredicateNode(SymbolType.Operand, new Operand(OperandType.Number, decimal.Parse(terminal.Symbol.Text)));
case PredicateLexer.Property:
return new PredicateNode(SymbolType.Operand, new Operand(OperandType.Property, terminal.Symbol.Text.Trim('@')));
}
throw new Exception("Unhandled Operand");
}
public override PredicateNode VisitBooleanOperator(PredicateParser.BooleanOperatorContext context)
{
var terminal = (ITerminalNode)context.GetChild(0);
var symbolType = terminal.Symbol.Type;
switch (symbolType)
{
case PredicateLexer.And:
return new PredicateNode(SymbolType.BooleanOperator, BooleanOperator.And);
case PredicateLexer.Or:
return new PredicateNode(SymbolType.BooleanOperator, BooleanOperator.Or);
}
throw new Exception("Unhandled Boolean Operator");
}
public override PredicateNode VisitOperator(PredicateParser.OperatorContext context)
{
var terminal = (ITerminalNode) context.GetChild(0);
var symbolType = terminal.Symbol.Type;
switch (symbolType)
{
case PredicateLexer.GreaterThan:
return CreateOperatorNode(Operator.GreaterThan);
case PredicateLexer.GreaterThanEqual:
return CreateOperatorNode(Operator.GreaterThanEqual);
case PredicateLexer.LessThan:
return CreateOperatorNode(Operator.LessThan);
case PredicateLexer.LessThanEqual:
return CreateOperatorNode(Operator.LessThanEqual);
case PredicateLexer.Equal:
return CreateOperatorNode(Operator.Equal);
case PredicateLexer.NotEqual:
return CreateOperatorNode(Operator.NotEqual);
case PredicateLexer.Contains:
return CreateOperatorNode(Operator.Contains);
}
throw new Exception("Unhandled Operator");
}
private PredicateNode CreateOperatorNode(Operator value)
{
return new PredicateNode(SymbolType.Operator, value);
}
}
Writing Tests
The Directory Structure
From this point on, I’ll be following a more strict directory structure, that looks like this:
|- docker-compose.yml
|--tests/
| |--Predicate.Evaluator.Accetance.Tests/
| |-- {Test files goes here}
|--src/
|--Predicate.Parser/
| |-- {Everything to do with parsing goes here inc. .g4 and .bat/.sh files for antlr files}
|--Predicate.Evaluator/
|-- {Everything to do with evaluating an expression goes here}
The Interface
Simplest interface for evaluating a predicate expression for searching.
public interface IPredicateEvaluator<T>
{
Task<PredicateEvaluationOperation<T>> Evaluate(string predicateExpression);
}
public class PredicateEvaluationOperation<T>
{
public PredicateEvaluationOperation(T result)
{
Result = result;
}
public bool IsSuccessful => Result != null;
public T Result { get; }
}
We’ll probably need to create an implementation that doesn’t really do much to start with to get some good red tests
public class ElasticSearchPredicateEvaluator : IPredicateEvaluator<CandidateSearchResult>
{
public async Task<PredicateEvaluationOperation<CandidateSearchResult>> Evaluate(string predicateExpression)
{
throw new NotImplementedException();
}
}
public class CandidateSearchResult
{
public IList<CandidateSearchResultItem> Items { get; set; }
}
public class CandidateSearchResultItem
{
public string CurrentJobTitle { get; set; }
public int ExperienceInYears { get; set; }
public int Salary { get; set; }
}
public class CandidateDocument
{
public string CurrentJobTitle { get; set; }
public int ExperienceInYears { get; set; }
public int Salary { get; set; }
}
We will also need to include the NEST Nuget Package as a dependency at this point.
The Test Setup
To test our advanced search feature using elasticsearch
as the underlying tech, we need to be able to spin up elasticsearch
as a dependency. So here’s our docker-compose.yaml
file to be able to do that.
version: '3.3'
services:
elasticsearch:
image: elasticsearch:7.6.0
volumes:
- data01/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
environment:
discovery.type: single-node
Now, let’s create an XUnit
test project and write a test setup:
[CollectionDefinition("PredicateEvaluatorTest")]
public class PredicateEvaluatorTestCollection : ICollectionFixture<PredicateEvaluatorTestFixture>
{
}
public class PredicateEvaluatorTestFixture : IDisposable
{
public static ElasticSearchPredicateEvaluator Evaluator { get; private set; }
public PredicateEvaluatorTestFixture()
{
InitDependencies();
SetupElasticSearch();
}
private void InitDependencies()
{
Console.WriteLine("Starting Dependencies");
var slash = Path.DirectorySeparatorChar;
var isWindows =
System.Runtime.InteropServices.RuntimeInformation.IsOSPlatform(System.Runtime.InteropServices.OSPlatform
.Windows);
var toExec = new ProcessStartInfo
{
WorkingDirectory = $"{Directory.GetCurrentDirectory()}{slash}..{slash}..{slash}..{slash}..{slash}..",
Arguments = "-f docker-compose.yaml -p acceptance up -d --no-recreate",
FileName = isWindows ? "docker-compose.exe" : "docker-compose",
RedirectStandardOutput = true,
RedirectStandardError = true
};
var process = Process.Start(toExec);
process.WaitForExit();
var message = process.StandardOutput.ReadToEnd();
var errors = process.StandardError.ReadToEnd();
Console.Write($"message: ${message}");
Console.Write($"errors: ${errors}");
Assert.Equal(0, process.ExitCode);
}
private void SetupElasticSearch()
{
var index = "candidates";
var settings = new ConnectionSettings(new Uri("http://localhost:9200/"));
settings.DisableDirectStreaming();
settings.EnableDebugMode(d =>
{
Console.Write(d.DebugInformation);
});
var client = new ElasticClient(settings);
client.Indices.Delete(index);
client.Indices.Create(index, c => c.Map(x => x.AutoMap<CandidateDocument>()));
client.IndexMany(new List<CandidateDocument>
{
new CandidateDocument { CurrentJobTitle = "Software Engineer", ExperienceInYears = 5, Salary = 70000 },
new CandidateDocument { CurrentJobTitle = "Full-stack Engineer", ExperienceInYears = 7, Salary = 85000 },
new CandidateDocument { CurrentJobTitle = "Marketing Manager", ExperienceInYears = 4, Salary = 60000 },
new CandidateDocument { CurrentJobTitle = "Head of Security", ExperienceInYears = 9, Salary = 100000 },
new CandidateDocument { CurrentJobTitle = "Automation Engineer", ExperienceInYears = 5, Salary = 73000 },
new CandidateDocument { CurrentJobTitle = ".NET Developer", ExperienceInYears = 3, Salary = 50000 },
new CandidateDocument { CurrentJobTitle = "Developer", ExperienceInYears = 4, Salary = 58000 },
new CandidateDocument { CurrentJobTitle = "Junior Developer", ExperienceInYears = 1, Salary = 38000 },
}, index);
Task.Delay(1000).GetAwaiter().GetResult();
Evaluator = new ElasticSearchPredicateEvaluator(client, index, new PropertyDetailsProvider());
}
public void Dispose()
{
}
}
Let’s take a pause here and analyse what our setup code is doing here:
- We’re spinning up our dependencies – in this case, elasticsearch, using
docker-compose
- We’re clearning our index, and filling it with a few pre-defined documents that we’ll be using throughout all our tests
- We’re creating our re-usable
Evaluator
to be used throughout all our tests, this is fine as it’s stateless.
The Test
Now, let’s write our first happy path test, let’s test all the major features our search expression allows and assert the results.
[Collection("PredicateEvaluatorTest")]
public class ElasticSearchPredicateEvaluator_Tests
{
private readonly ElasticSearchPredicateEvaluator _evaluator;
public ElasticSearchPredicateEvaluator_Tests()
{
_evaluator = PredicateEvaluatorTestFixture.Evaluator;
}
[Fact]
public async Task It_Returns_CorrectResults()
{
var result = await _evaluator.Evaluate("@current_job_title contains \"Developer\" and @experience_years < 4");
Assert.Equal(2, result.Result.Items.Count);
Assert.Contains(result.Result.Items, c => c.CurrentJobTitle == "Junior Developer");
Assert.Contains(result.Result.Items, c => c.CurrentJobTitle == ".NET Developer");
}
}
The test itself is self explanatory, given the search expression @current_job_title contains "Developer" and @experience_years < 4
, we expect to get 2
results back based on our setup.
Now, when we run our test, after a few seconds, or upto a few minutes depending on whether you already have elasticsearch image or not, you should see elasticsearch spin up:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2b552985b2b4 elasticsearch:7.6.0 "/usr/local/bin/dock…" About a minute ago Up About a minute 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp acceptance_elasticsearch_1
You will also see the test fail:
Evaluating An Expression
Now that we have the test in place, let’s get it to pass by implementing our evaluator that talks to elasticsearch
.
public class ElasticSearchPredicateEvaluator : IPredicateEvaluator<CandidateSearchResult>
{
private readonly IElasticClient _elasticClient;
private readonly string _index;
private readonly IPropertyDetailsProvider _propertyDetailsProvider;
public ElasticSearchPredicateEvaluator(IElasticClient elasticClient, string index, IPropertyDetailsProvider propertyDetailsProvider)
{
_elasticClient = elasticClient;
_index = index;
_propertyDetailsProvider = propertyDetailsProvider;
}
public async Task<PredicateEvaluationOperation<CandidateSearchResult>> Evaluate(string predicateExpression)
{
var charStream = new AntlrInputStream(predicateExpression);
var lexer = new PredicateLexer(charStream);
var stream = new CommonTokenStream(lexer);
stream.Fill();
var tokens = stream.Get(0, stream.Size);
stream.Reset();
if (tokens.Any(x => x.Type == PredicateLexer.Discardable))
throw new Exception("Contains unknown tokens");
var parser = new PredicateParser(stream);
parser.RemoveErrorListeners();
parser.AddErrorListener(new ThrowingErrorListener());
var treeBuilder = new PredicateSyntaxTreeBuilderVisitor();
var tree = treeBuilder.Visit(parser.expr());
var searchBuilder = new ElasticSearchQueryBuilder(_propertyDetailsProvider);
var query = searchBuilder.BuildNestQuery(tree);
var searchResult = await _elasticClient.SearchAsync<CandidateDocument>(new SearchRequest(_index)
{
Query = query
});
var resultItems = searchResult.Documents
.Select(x => new CandidateSearchResultItem
{
CurrentJobTitle = x.CurrentJobTitle,
Salary = x.Salary,
ExperienceInYears = x.ExperienceInYears
})
.ToList();
return new PredicateEvaluationOperation<CandidateSearchResult>(new CandidateSearchResult
{
Items = resultItems
});
}
}
internal class ThrowingErrorListener : BaseErrorListener
{
public override void SyntaxError(TextWriter output, IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException ex)
{
throw ex;
}
}
public interface IPropertyDetailsProvider
{
PropertyDetails GetPropertyDetails(string propertyName);
}
public class PropertyDetails
{
public PropertyDetails(ConcreteType type, string sourceName)
{
Type = type;
SourceName = sourceName;
}
public ConcreteType Type { get; }
public string SourceName { get; }
}
public enum ConcreteType
{
Number,
String
}
public class PropertyDetailsProvider : IPropertyDetailsProvider
{
public PropertyDetails GetPropertyDetails(string propertyName)
{
switch (propertyName)
{
case "current_job_title":
return new PropertyDetails(ConcreteType.String, "currentJobTitle");
case "experience_years":
return new PropertyDetails(ConcreteType.Number, "experienceInYears");
case "salary":
return new PropertyDetails(ConcreteType.Number, "salary");
default:
return null;
}
}
}
internal class ElasticSearchQueryBuilder
{
private readonly IPropertyDetailsProvider _propertyDetailsProvider;
public ElasticSearchQueryBuilder(IPropertyDetailsProvider propertyDetailsProvider)
{
_propertyDetailsProvider = propertyDetailsProvider;
}
public QueryContainer BuildNestQuery(PredicateNode root)
{
return Visit(root);
}
private QueryContainer Visit(PredicateNode node)
{
if (node.Type == SymbolType.BooleanOperator)
return VisitBooleanOperator(node);
if (node.Type == SymbolType.Operator)
return VisitOperator(node);
throw new Exception("Unable to create from search query from tree");
}
private QueryContainer VisitBooleanOperator(PredicateNode node)
{
var value = (BooleanOperator) node.Value;
var predicate1 = node.Children[0];
var predicate2 = node.Children[1];
if (value == BooleanOperator.And)
return Visit(predicate1) && Visit(predicate2);
return Visit(predicate1) || Visit(predicate2);
}
private QueryContainer VisitOperator(PredicateNode node)
{
var value = (Operator) node.Value;
var op1 = (Operand) node.Children[0].Value;
var op2 = (Operand) node.Children[1].Value;
if (op1.OperandType != OperandType.Property)
throw new Exception("First operand must be a property");
var prop1Name = (string) op1.Value;
var prop1Details = _propertyDetailsProvider.GetPropertyDetails(prop1Name);
if (prop1Details == null)
throw new Exception($"Property {prop1Name} does not exist");
if (op2.OperandType == OperandType.Property)
throw new Exception("Second operand cannot be a property");
switch (value)
{
case Operator.Contains:
if (op2.OperandType != OperandType.String)
throw new Exception("Second operand must be string");
return new QueryContainer(new MatchQuery
{
Query = (string) op2.Value,
Field = prop1Details.SourceName
});
case Operator.Equal:
// can be a string or number
return new QueryContainer(new MatchQuery
{
Query = (string) op2.Value,
Field = prop1Details.SourceName
});
case Operator.NotEqual:
return !(new QueryContainer(new MatchQuery
{
Query = (string)op2.Value,
Field = prop1Details.SourceName
}));
case Operator.LessThan:
case Operator.LessThanEqual:
case Operator.GreaterThan:
case Operator.GreaterThanEqual:
if (op2.OperandType != OperandType.Number)
throw new Exception($"Second operator {op2.Value} must be a number");
if (prop1Details.Type != ConcreteType.Number)
throw new Exception($"First operator {op1.Value} must be a property of type number");
var rangeQuery = new NumericRangeQuery
{
Field = prop1Details.SourceName
};
if (value == Operator.LessThan)
rangeQuery.LessThan = (double)(decimal)op2.Value;
if (value == Operator.LessThanEqual)
rangeQuery.LessThanOrEqualTo = (double)(decimal)op2.Value;
if (value == Operator.GreaterThan)
rangeQuery.GreaterThan = (double) (decimal) op2.Value;
if (value == Operator.GreaterThanEqual)
rangeQuery.GreaterThanOrEqualTo = (double)(decimal)op2.Value;
return new QueryContainer(rangeQuery);
default:
throw new Exception($"Unknown Operator {node.Value}");
}
}
}
Let’s break down what’s happening here.
PropertyDetailsProvider contains a few pre-defined properties that we want to expose to our clients, these are statically defined rather than static, so we can control exactly what’s exposed to the user.
ElasticSearchQueryBuilder builds an elasticsearch query, based on the parse tree that the parser generated. We’re also doing running the parse tree against a few runtime rules where, where we throw an exception if any of those rules are broken.
ElasticSearchPredicateEvaluator simply connects all the dots, it evaluates a predicate expression by parsing the string, building an elasticsearch query using ElasticSearchQueryBuilder, executing the query, and returning the result.
Now, if we run our test again, we should see our test pass:
If you want to see the full working solution, you can head over to this repository on my github
Future Follow-up
In the future, I will be including a solution to include a front-end component with property selection dropdown and predictive auto-completion feature to tie everything together along with error handling.