
Aho-Corasick – implementation and animation
Today we will learn how to implement the Aho-Corasick algorithm. It is used for finding occurences of words in text and it is faster than other common algorithms. Code will be in ActionScript 3, but implementation in C, C++, C# or Java will be very similar.
You can also try a fullscreen version of an applett.
Aho-Corasick algorithm
We can divide the algorithm into 2 steps:- constructing a finite state automaton
- constructing a Trie with input words
- finding the right fall function for each node of trie
- moving throuhg automaton - reading each character from input, storing the results
In the visualization above, I have words "fast", "sofa", "so", "take". Searching text is in the bottom textfield. If an algorithm finds right words in some step, (more words can end in the same place), the nodes light up and the numbers by their indices are incremented.
Implementation
Trie
We learned how to implement a Trie in previous post about Trie. For our purpose, we add 5 more properties to the Node class:
// pointer to parent node public var parent:TNode; public var fall:TNode; // pointer - fall function // some unnecessary params for drawing public var x:Number; public var y:Number; public var width:Number;
Construction of fall edges
For constructing back edges ("fall" pointer), we need to do BFS (bredth-first search) on Trie and we need a Queue data structure for this. I used the queue from AS3DS package from PolygonLabs ("q" variable).
function MakeFallFunctions():void { trie.fall = trie; // fall function for the root is root q.enqueue(trie); // I add the root to the queue while (q.size != 0) // BFS { var n:TNode = q.dequeue(); var no:TNode; var i:int; for (i = 0; i < 4; i++) // adding children to queue { no = n.cn[i]; if (no == null) continue; while (no.next != null) { q.enqueue(no); no = no.next; } q.enqueue(no); } if (n == trie) continue; // root already has a fall function // I move on fall function from parent to find the longest suffix // or I reach the root var fall:TNode = n.parent.fall; while (fall.GetChild(n.c) == null && fall != trie) fall = fall.fall; // if I found suffix, I set it to "fall" pointer n.fall = fall.GetChild(n.c); if (n.fall == null) n.fall = trie; // there is no suffix if (n.fall == n) n.fall = trie; } }
Searching - moving through an automaton
function Search(e:MouseEvent):void { var text:String = "I will search for words in this text."; var cState = trie; // current state var n:TNode; // some other pointers var no:TNode; var b:String; // the character which we read for(var i:int=0; i < text.length; i++) { n = cState; b = text.charAt(i); // I move on falls until I find the right child or reach the root while (n.GetChild(b) == null && n != trie) n = n.fall; // I got into the root if (n == trie) { n = n.GetChild(b); if (n == null) n = trie; } else n = n.GetChild(b); // or I found the right child no = n; // I move on falls to root and report all sufices I found while(no!= trie) { if(no.isEnd){/* I found the word, which ends in "no"*/} no=no.fall; } cState = n; } }
Visualization and animation
I used very similar algorithm to draw the automaton, as I used in drawing a trie. You should just add some curves for fall edges.
Conclusion
My first implementation of Aho-Corasick was in C#, so now I just rewrote it to ActionScript 3. You will probably never need to write this algorithm in AS3, because, if you are looking for just few words, you can use String.indexOf("..."). If you are searching many words in a long text, It will be better to use another (faster) platform and another programming language. This post was meant to be an illustration of algorithm and a hint for programming Aho-Corasick in other languages.
great post, thanks for sharing
December 17th, 2010Full screen functionality really helps out. Thanks!
March 19th, 2011This is brilliant!
Thanks a lot! It really helps
August 14th, 2011It’s awesome! I like this visualization especially because of colors and beatifull animation! Good job, very helpfull
September 16th, 2011Do you do implementation of Aho-Corasick in delphi or pascal
November 10th, 2011Thank you very much!
to hernan: No I didn’t, but it will be very similar in Pascal. If you understand an algorithm and you know Pascal, there shouldn’t be any problems.
December 7th, 2011please mail me the aho curosick code in c
January 31st, 2012thanks.
please mail me the aho-corasick code in java. It’s really an urgent. Thank you!
February 16th, 2012Actually I am looking for the fall function in ahocorasick algorithm. Here I found the info. Good post.
February 22nd, 2012i want verilog code of aho corasick algorithm pls send me
February 23rd, 2012To pnss, shhhharry32 and kala: As you maybe noticed, I haven’t send any code to anyone of you. It’s not because you didn’t leave your e-mails. First, it’s because I have this code written only in ActionScript 3 and C#, and second, you have to make a donation if you want some additional code, which is not on this site.
February 23rd, 2012Hi this is too good….
Could u pls give me the confirmation that will i b able to implement the same AC algorithm in verilog code……… pls do reply m waiting..
thanks
August 22nd, 2012Hi,
Feel free to use my code in any way you want.
August 22nd, 2012I have never heard about verilog and I don’t know, if you will be able to implement an algorithm. If you are asking about “licencing”, ask Aho and Corasick, not me
[...] a neat animation of the algorithm in [...]
January 3rd, 2013excellent
August 21st, 2013It was very helpful..:)
September 5th, 2013Thanks..:)
Can u please mail me the java implementation of this.
my emil id is: goyal.1234rahul@gmail.com
September 5th, 2013Hi you.
October 6th, 2013Can u send me your code? I am studying this algorithm but i haven’t understood it :(.
Please send your code to my email: sakura_shangai@yahoo.com
Thanks alot.
will this aho corasick algorithm fit for system using RFID
October 14th, 2013Hey Gr8!!!,
October 15th, 2013This is very useful and best animation for understanding algorithm.Will you please send me C code of aho corasick on my mail. panditanupama91@gmail.com.
Thank you.
hi please send me code in java programming
January 7th, 2014thanks for writing such a great post, it was a pleasure to read radar detectors canada
January 12th, 2014It’s truly very complicated in this full of activity life to listen news on TV,
March 9th, 2014therefore I just use web for that purpose, and obtain the most up-to-date news.
can any one explain that how it’s moving in the trie as we are moving forward in the text. In simple words please…
March 14th, 2014fRbABJwpiQm
February 20th, 2025