Aho-Corasick – implementation and animation

Today we will learn how to implement the Aho-Corasick algorithm. It is used for finding occurences of words in text and it is faster than other common algorithms. Code will be in ActionScript 3, but implementation in C, C++, C# or Java will be very similar.

You can also try a fullscreen version of an applett.

Aho-Corasick algorithm

We can divide the algorithm into 2 steps:
  1. constructing a finite state automaton
    • constructing a Trie with input words
    • finding the right fall function for each node of trie
  2. moving throuhg automaton - reading each character from input, storing the results

In the visualization above, I have words "fast", "sofa", "so", "take". Searching text is in the bottom textfield. If an algorithm finds right words in some step, (more words can end in the same place), the nodes light up and the numbers by their indices are incremented.

Implementation

Trie

We learned how to implement a Trie in previous post about Trie. For our purpose, we add 5 more properties to the Node class:

   // pointer to parent node
   public var parent:TNode;
   public var fall:TNode; // pointer - fall function
   // some unnecessary params for drawing
   public var x:Number;
   public var y:Number;
   public var width:Number;

Construction of fall edges

For constructing back edges ("fall" pointer), we need to do BFS (bredth-first search) on Trie and we need a Queue data structure for this. I used the queue from AS3DS package from PolygonLabs ("q" variable).

function MakeFallFunctions():void
{
   trie.fall = trie; // fall function for the root is root
   q.enqueue(trie);  // I add the root to the queue
   while (q.size != 0) // BFS
   {
      var n:TNode = q.dequeue();
      var no:TNode;
      var i:int;
      for (i = 0; i < 4; i++) // adding children to queue
      {
         no = n.cn[i];
         if (no == null) continue;
         while (no.next != null) { q.enqueue(no); no = no.next; }
         q.enqueue(no);
      }
      if (n == trie) continue; // root already has a fall function
      // I move on fall function from parent to find the longest suffix
      // or I reach the root
      var fall:TNode = n.parent.fall;
      while (fall.GetChild(n.c) == null && fall != trie) fall = fall.fall;
      // if I found suffix, I set it to "fall" pointer
      n.fall = fall.GetChild(n.c);
      if (n.fall == null) n.fall = trie; // there is no suffix
      if (n.fall == n) n.fall = trie;
   }
}

Searching - moving through an automaton

function Search(e:MouseEvent):void
{
   var text:String = "I will search for words in this text.";
   var cState = trie; // current state
   var n:TNode;       // some other pointers
   var no:TNode;
   var b:String;      // the character which we read
   for(var i:int=0; i < text.length; i++)
   {
      n = cState;
      b = text.charAt(i);
      
      // I move on falls until I find the right child or reach the root
      while (n.GetChild(b) == null && n != trie) n = n.fall;
      
      // I got into the root
        if (n == trie)
        {
            n = n.GetChild(b);
            if (n == null) n = trie;
        }
        else n = n.GetChild(b); // or I found the right child
      
      no = n; // I move on falls to root and report all sufices I found
      while(no!= trie)
      {
         if(no.isEnd){/* I found the word, which ends in "no"*/}
         no=no.fall;
      }
      cState = n;
   }
}

Visualization and animation

I used very similar algorithm to draw the automaton, as I used in drawing a trie. You should just add some curves for fall edges.

Conclusion

My first implementation of Aho-Corasick was in C#, so now I just rewrote it to ActionScript 3. You will probably never need to write this algorithm in AS3, because, if you are looking for just few words, you can use String.indexOf("..."). If you are searching many words in a long text, It will be better to use another (faster) platform and another programming language. This post was meant to be an illustration of algorithm and a hint for programming Aho-Corasick in other languages.

Old comments (closed because of spam)

24 Comments

  1. Daniel said:

    great post, thanks for sharing

    December 17th, 2010
  2. Chet said:

    Full screen functionality really helps out. Thanks!

    March 19th, 2011
  3. Diogo Neves said:

    This is brilliant! :) Thanks a lot! It really helps

    August 14th, 2011
  4. Slava said:

    It’s awesome! I like this visualization especially because of colors and beatifull animation! Good job, very helpfull

    September 16th, 2011
  5. hernan said:

    Do you do implementation of Aho-Corasick in delphi or pascal

    November 10th, 2011
  6. Ivan Kuckir said:

    Thank you very much! :)

    to hernan: No I didn’t, but it will be very similar in Pascal. If you understand an algorithm and you know Pascal, there shouldn’t be any problems.

    December 7th, 2011
  7. pnss said:

    please mail me the aho curosick code in c
    thanks.

    January 31st, 2012
  8. shhhharry32 said:

    please mail me the aho-corasick code in java. It’s really an urgent. Thank you!

    February 16th, 2012
  9. chandu said:

    Actually I am looking for the fall function in ahocorasick algorithm. Here I found the info. Good post.

    February 22nd, 2012
  10. kala said:

    i want verilog code of aho corasick algorithm pls send me

    February 23rd, 2012
  11. Ivan Kuckir said:

    To pnss, shhhharry32 and kala: As you maybe noticed, I haven’t send any code to anyone of you. It’s not because you didn’t leave your e-mails. First, it’s because I have this code written only in ActionScript 3 and C#, and second, you have to make a donation if you want some additional code, which is not on this site.

    February 23rd, 2012
  12. Girish S said:

    Hi this is too good….
    Could u pls give me the confirmation that will i b able to implement the same AC algorithm in verilog code……… pls do reply m waiting..

    thanks

    August 22nd, 2012
  13. Ivan Kuckir said:

    Hi,
    I have never heard about verilog and I don’t know, if you will be able to implement an algorithm. If you are asking about “licencing”, ask Aho and Corasick, not me :) Feel free to use my code in any way you want.

    August 22nd, 2012
  14. Finding substrings very very fast | koding notes said:

    [...] a neat animation of the algorithm in [...]

    January 3rd, 2013
  15. vardhan said:

    excellent

    August 21st, 2013
  16. rahul said:

    It was very helpful..:)
    Thanks..:)
    Can u please mail me the java implementation of this.

    September 5th, 2013
  17. rahul said:

    my emil id is: goyal.1234rahul@gmail.com

    September 5th, 2013
  18. Souchan said:

    Hi you.
    Can u send me your code? I am studying this algorithm but i haven’t understood it :(.
    Please send your code to my email: sakura_shangai@yahoo.com
    Thanks alot.

    October 6th, 2013
  19. nikko said:

    will this aho corasick algorithm fit for system using RFID

    October 14th, 2013
  20. A.Pandit said:

    Hey Gr8!!!,
    This is very useful and best animation for understanding algorithm.Will you please send me C code of aho corasick on my mail. panditanupama91@gmail.com.
    Thank you.

    October 15th, 2013
  21. revathy said:

    hi please send me code in java programming

    January 7th, 2014
  22. radar jammer forum said:

    thanks for writing such a great post, it was a pleasure to read radar detectors canada

    January 12th, 2014
  23. diet supplements said:

    It’s truly very complicated in this full of activity life to listen news on TV,
    therefore I just use web for that purpose, and obtain the most up-to-date news.

    March 9th, 2014
  24. Shail said:

    can any one explain that how it’s moving in the trie as we are moving forward in the text. In simple words please…

    March 14th, 2014