Aho-Corasick – implementation and animation
Today we will learn how to implement the Aho-Corasick algorithm. It is used for finding occurences of words in text and it is faster than other common algorithms. Code will be in ActionScript 3, but implementation in C, C++, C# or Java will be very similar.
You can also try a fullscreen version of an applett.
Aho-Corasick algorithm
We can divide the algorithm into 2 steps:- constructing a finite state automaton
- constructing a Trie with input words
- finding the right fall function for each node of trie
- moving throuhg automaton – reading each character from input, storing the results
In the visualization above, I have words “fast”, “sofa”, “so”, “take”. Searching text is in the bottom textfield. If an algorithm finds right words in some step, (more words can end in the same place), the nodes light up and the numbers by their indices are incremented.
Implementation
Trie
We learned how to implement a Trie in previous post about Trie. For our purpose, we add 5 more properties to the Node class:
// pointer to parent node public var parent:TNode; public var fall:TNode; // pointer - fall function // some unnecessary params for drawing public var x:Number; public var y:Number; public var width:Number;
Construction of fall edges
For constructing back edges (“fall” pointer), we need to do BFS (bredth-first search) on Trie and we need a Queue data structure for this. I used the queue from AS3DS package from PolygonLabs (“q” variable).
function MakeFallFunctions():void
{
trie.fall = trie; // fall function for the root is root
q.enqueue(trie); // I add the root to the queue
while (q.size != 0) // BFS
{
var n:TNode = q.dequeue();
var no:TNode;
var i:int;
for (i = 0; i < 4; i++) // adding children to queue
{
no = n.cn[i];
if (no == null) continue;
while (no.next != null) { q.enqueue(no); no = no.next; }
q.enqueue(no);
}
if (n == trie) continue; // root already has a fall function
// I move on fall function from parent to find the longest suffix
// or I reach the root
var fall:TNode = n.parent.fall;
while (fall.GetChild(n.c) == null && fall != trie) fall = fall.fall;
// if I found suffix, I set it to "fall" pointer
n.fall = fall.GetChild(n.c);
if (n.fall == null) n.fall = trie; // there is no suffix
if (n.fall == n) n.fall = trie;
}
}
Searching - moving through an automaton
function Search(e:MouseEvent):void
{
var text:String = "I will search for words in this text.";
var cState = trie; // current state
var n:TNode; // some other pointers
var no:TNode;
var b:String; // the character which we read
for(var i:int=0; i < text.length; i++)
{
n = cState;
b = text.charAt(i);
// I move on falls until I find the right child or reach the root
while (n.GetChild(b) == null && n != trie) n = n.fall;
// I got into the root
if (n == trie)
{
n = n.GetChild(b);
if (n == null) n = trie;
}
else n = n.GetChild(b); // or I found the right child
no = n; // I move on falls to root and report all sufices I found
while(no!= trie)
{
if(no.isEnd){/* I found the word, which ends in "no"*/}
no=no.fall;
}
cState = n;
}
}
Visualization and animation
I used very similar algorithm to draw the automaton, as I used in drawing a trie. You should just add some curves for fall edges.
Conclusion
My first implementation of Aho-Corasick was in C#, so now I just rewrote it to ActionScript 3. You will probably never need to write this algorithm in AS3, because, if you are looking for just few words, you can use String.indexOf("..."). If you are searching many words in a long text, It will be better to use another (faster) platform and another programming language. This post was meant to be an illustration of algorithm and a hint for programming Aho-Corasick in other languages.
great post, thanks for sharing
December 17th, 2010Full screen functionality really helps out. Thanks!
March 19th, 2011This is brilliant!
Thanks a lot! It really helps
August 14th, 2011It’s awesome! I like this visualization especially because of colors and beatifull animation! Good job, very helpfull
September 16th, 2011Do you do implementation of Aho-Corasick in delphi or pascal
November 10th, 2011Thank you very much!
to hernan: No I didn’t, but it will be very similar in Pascal. If you understand an algorithm and you know Pascal, there shouldn’t be any problems.
December 7th, 2011please mail me the aho curosick code in c
January 31st, 2012thanks.
please mail me the aho-corasick code in java. It’s really an urgent. Thank you!
February 16th, 2012Actually I am looking for the fall function in ahocorasick algorithm. Here I found the info. Good post.
February 22nd, 2012i want verilog code of aho corasick algorithm pls send me
February 23rd, 2012To pnss, shhhharry32 and kala: As you maybe noticed, I haven’t send any code to anyone of you. It’s not because you didn’t leave your e-mails. First, it’s because I have this code written only in ActionScript 3 and C#, and second, you have to make a donation if you want some additional code, which is not on this site.
February 23rd, 2012Hi this is too good….
Could u pls give me the confirmation that will i b able to implement the same AC algorithm in verilog code……… pls do reply m waiting..
thanks
August 22nd, 2012Hi,
Feel free to use my code in any way you want.
August 22nd, 2012I have never heard about verilog and I don’t know, if you will be able to implement an algorithm. If you are asking about “licencing”, ask Aho and Corasick, not me
[...] a neat animation of the algorithm in [...]
January 3rd, 2013