Connecting Tech Pros Worldwide Help | Site Map

How to build long Regular Expression

  #1  
Old October 29th, 2007, 08:21 AM
Newbie
 
Join Date: Oct 2007
Posts: 1
Usually when you make regular expression to extract text you are starting from simple expression. When you got to know target text, you are extending your expression. Subsequently very hard to ready long set of special symbols and impossible to improve such expression.

We have to create ’smart’ regular expression. Instead of write one line expression we prepare multi line text from which we shall generate our long expression. Here is a simple example.

Expand|Select|Wrap|Line Numbers
  1. space                    [\s/-]+
  2. word                     \w+
  3. words                    (?:{word}{space})*?{word}
  4. birthday                 (?<birthday>\d+\.d+\.d+)
  5. title                    {word}\.
  6. name                     {words}
  7. person                   {title}{space}{name}{space}{birthday}
  8.  
This text consist of two columns separated by spaces. First column is pattern name and second column is easy to read regular expression. The resulting regular expression for pattern ‘person’ will be:
Expand|Select|Wrap|Line Numbers
  1. \w+\.[\s/-]+(?:\w+[\s/-]+)*?\w+[\s/-]+(?<birthday>\d+.\d+.\d+)
  2.  
You can do it using following class
Expand|Select|Wrap|Line Numbers
  1. public class Lexer
  2.     {
  3.         private NameValueCollection col;
  4.         public Lexer()
  5.         {
  6.             col = new NameValueCollection();
  7.         }
  8.  
  9.         public static Lexer Create(string resource)
  10.         {
  11.             StringReader sr = new StringReader(resource);
  12.             Lexer lex =new Lexer();
  13.             while (sr.Peek()>=0)
  14.             {
  15.                 string line = sr.ReadLine();
  16.                 Match m = Regex.Match(line,@"([\w_]+)\s+(.*)");
  17.                 if (m.Success) 
  18.                 {
  19.                     lex.col.Add(m.Groups[1].Value.Trim(), m.Groups[2].Value.Trim());
  20.                 }
  21.             }
  22.             sr.Close();
  23.  
  24.             return lex;
  25.         }
  26.  
  27.  
  28.         public string GetExpression(string name)
  29.         {
  30.             if (name == null || name.Length == 0) return string.Empty;
  31.             string res = col[name];
  32.             if (res == null) throw new ArgumentException("Template not found", name);
  33.  
  34.             bool needGroup = res.IndexOf('|') > 0;
  35.             Regex reg = new Regex(@"(?<!\\p){([a-zA-Z][\w_]+)}");
  36.             Match m = reg.Match(res);
  37.             while (m.Success)
  38.             {
  39.                 string token = m.Groups[1].Value;
  40.                 string exp = GetExpression(token); 
  41.                 if (exp != null && exp.Length>0)
  42.                     res = res.Replace(@"{"+token+"}",exp);
  43.                 m = m.NextMatch();
  44.             }
  45.             string result = res;
  46.             if (needGroup)
  47.             {
  48.                 result = "(?:" + res + ")";
  49.             }
  50.             result = "(?#" + name + ")" + result;
  51.  
  52.             return result;
  53.         }
  54.  
  55.     }
  56.  

Then we can create class instance and get regular expression
Expand|Select|Wrap|Line Numbers
  1. Lexer lex = Lexer.Create(txtLexerText.Text);
  2. string expr = lex.GetExpression("person");
  3. Regex reg = new Regex(expr);
  4.  



Reply


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Regular Expression] match a word with interpunctuation teo answers 2 June 27th, 2006 10:45 PM
Regular Expression Validator Bryce Budd answers 2 November 22nd, 2005 03:38 AM
How to build a regular expression on runtime? Harry answers 9 July 23rd, 2005 02:08 PM
Regular Expression Validator Bryce Budd answers 2 July 21st, 2005 09:29 AM