Regular Expressions : How Exactly they work internally ?

Regular Expressions : How Exactly they work internally ?

Regular expressions were playing major role in computers from several decades, they are not as simple as we think. They are need extensive computational power, a decade before regular expressions were a big headache for computer programmers and scientists because they were consuming lot of time to complete the task.

History behind Regular Expressions :

A neuroscientist  who tried to understand how the human brain could produce complex patterns using simple cells that are bound together. In 1943, Warren McCulloch and Walter Pitts published “A logical calculus of the ideas immanent in nervous activity”.  this paper had a great influence on computer science during that time. In 1956, mathematician Stephen Kleene developed this model one step further. In the paper “Representation of events in nerve nets and finite automata” he presents a simple algebra. The terms Regular Sets and Regular Expressions were born from then. As mentioned above, Kleene’s algebra had only two operations and one function. In 1968, the Unix pioneer Ken Thompson published the article “Regular Expression Search Algorithm” in Communications.

A Pattern in regular expression :

How to learn regular expressions ? How to understand regular expressions?. They look confusing to understand regular expression but regular expressions are an elegant way to do some string manipulations.

We have studied finite automata and the regular expressions all just works on this theory, but Most modern programming languages (Perl, Python, Ruby, Java (and JVM based languages), C#) do not use this approach rather they use recursive backtracking approach. But finite automata is really good to understand regular expressions with its state diagrams and not a bad idea to implement in this approach may be recursive backtracking is better. So do you need a help to understand regular expression? here it is…

Let us take one example, Validating an email address

If this is your regular expression

[a-z][a-z|0-9|]*([_][a-z|0-9]+)*([.][a-z|0-9]+([_][a-z|0-9]+)*)

The respective state diagram might look like below

Regular expression state diagram

Regular expression state diagram

 

Regular expression parser Source Code

If you want to try out with your own regular expression code you can better go through here 

http://www.codeproject.com/Articles/5412/Writing-own-regular-expression-parser