By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,589 Members | 1,194 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,589 IT Pros & Developers. It's quick & easy.

Colour Code Display Page Using Regular Expressions?

P: 72
Hi,

I wanted a web page where I could post code to, and have it appear in coloured formatting based on the context of the code.

Most of the techniques I have seen for this involve complex use of string manipulation where they seek through the string back and forth doing replacements to substitute in the needed HTML code.

I am convinced that this can be done with a few regular expressions. Unfortunately my knowledge of regular expressions is limited, so while I've built what I think should work, my testing was a complete failure.

I wondered if anybody could help me out.

Below is the test piece of code that I am testing my php code on:

Expand|Select|Wrap|Line Numbers
  1. /****************
  2.     This is a test function to check on
  3. //  the regular expressions for code
  4.     colouring.
  5. *************/
  6. //  Just checking.
  7. void testDoTesting(int nTest = 0, string sTest = "thisstring")
  8. {
  9.         int nLoop = 1;
  10.         float fNull = 0.123f;
  11.  
  12.      do
  13.      {
  14.         if (nLoop = 3)
  15.         {
  16.              nTest = 100;
  17.         }
  18.         else if (nLoop = 2)
  19.         {
  20.             return;
  21.         }
  22.         else
  23.         {
  24.              sTest = "Nah nah";
  25.         }
  26.      } while (nLoop <= 10);
  27.  
  28.      for (nLoop = 1; nLoop <= 10; nLoop++)
  29.      {
  30.          switch (nLoop)
  31.          {
  32.             //Included this function name as it has
  33.             //void keyword in the middle.
  34.             case 1: DoAvoidThisFunc(); break;
  35.             case 2: sTest = "I reached number 2!";
  36.             case default: return;
  37.          }
  38.      }
  39. }
  40.  
This is my php (with comments):

[PHP]
//This function loops through a list brought back from a mysql query to build two arrays,
//which can be used in reg_replace as find/replace params.
//This is a list of all the functions in the database so that we can integrate URL links
//into the web page for easier navigation to the code for functions being used.
//$sql is a custom sql result set object. $script is an optional URL param.
function getFuncKeywords($sql, $script = 0)
{
//The base URL to use in the links (the ? is escaped).
$nav = "\?navpage=code";

//The start and finish elements of the search pattern. This was to try and avoid
//having patterns which found keywords in the middle of other words. The problem
//with tagging these on is that although it will (hopefully) narrow down the
//search and replace to only correct elements of text, I think it will result in
//the replace getting rid of the surrounding characters as well. In which case, is
//there a way to use these only for the search pattern, but not replace them in the
//returned string (or indicate to replace them with themselves)?

//Most functions in C based code will have a space, newline, tab or
//opening parenthesis just prior to them (if they are nested within
//another function call).
$pat_start = "[\n|\s|\t|\(]";

//Most functions will be followed by an opening parenthesis to list their
//parameters.
$pat_end = "[\n|\s|\(]";

$func = array();
$rep = array();

$sql->fetch_row(1);

//Cycle through sql result rows
do {
//The pattern would be the above start and end character classes sitting around the
//name of the function being searched for.
$func[] = $pat_start . $sql->rows['NAME'] . $pat_end;

//One of the aspects I don't know if it will work are the use of back references.
//As I understand it, this should result in:
//${1} is what was prior to the keyword found.
//${2} is the keyword found.
//${3} is the rest of the text.
//I had to escape the $ symbols at this point or the array value becomes a blank string.
//I don't know if this will stop the back references from working later on when it does
//the replace.
//The addslashes bit was because a function definition may include literal quote
//marks in if there is a string param with a default initialisation.
$rep[] = "\${1}<a href=\"" . $nav . "&func=" . $sql->rows['UID'] . "\" alt=\"" . addslashes($sql->rows['DEF']) . "\">\${2}</a>\${3}";

} while ($sql->fetch_result());

return array('find' => $func, 'replace' => $rep);
}

//This is the main function to process the regular expressions and return a colour coded version.
//$code is the string with the code to be processed. $keylinks is the array returned by the above function.
function getFuncOutput($code, $keylinks)
{
//Most keywords will have the same characters preceeding them as functions.
$pat_start = "[\n|\s|\t|\(]";

//However, they may be followed by newlines or spaces, or more likely be followed by a
//semi-colon to terminate the line. It's also possible for them to appear in a function
//call as its params so commas and closing parenthesis too.
$pat_end = "[\n|\s|\;|\)|\,]";

//The patterns to use within the character classes for the below patterns.
//In general I am escaping virtually all characters to stop them from having some
//other meaning in the patterns.

//This is for finding literal numbers represented in the code (integers and floats).
$val_number = "0-9\.\f";

//This is for finding a literal string quotation within the code. It allows all punctuation
//except for quote marks, which it uses to identify the start and end of a string.
//(This currently says which characters to accept, is there a way to say 'allow all characters
//except the quote marks'?)
$val_string = "0-9a-zA-Z\.\*\&\!\-\+\[\]\{\}\<\>\;\:\#\'\(\)\=\/\\\\";

//This is for finding comment blocks in the code. Comments can include virtually all
//characters, including back slashes, but we need to exclude forward slashes as they
//typically indicate the end of a block (such as */).
$val_comment = "0-9a-zA-Z\.\*\&\!\-\+\[\]\{\}\<\>\;\:\#\'\(\)\=\"\\\\";

$val_datatypes = "void|int|string|object|vector|float|struct";
$val_keywords = "if|while|do|switch|case|else|return|break|for|def ault";
$val_special = "OBJECT_SELF|OBJECT_INVALID|TRUE|FALSE";

//The basic form of the below patterns is to tag the above pattern start and end classes around
//a character class with the above listed strings. Some of these have additional pattern
//elements surrounding the class.
//The replacement strings are HTML tags to create a span with a given CSS style. I use back
//references because I don't know which of the keywords it may have found in the pipelined
//classes. Otherwise I would have to create an entry for each individual keyword in an array.
$return = $code;

//To find literal numbers.
$return = preg_replace($pat_start . "([" . $val_number . "]+)" . $pat_end, "${1}<span class=\"code_number\">${2}</span>${3}", $return);

//To find literal strings I look for the valid character class values encased between
//quote marks. I use 0+ as the quantifier as there may be literal blank string values.
$return = preg_replace($pat_start . "\"([" . $val_string . "]*)\"" . $pat_end, "${1}<span class=\"code_string\">${2}</span>${3}", $return);

//The first set of comment blocks are one-liners, where the comment is preceeded by a
//double forward slash, and does not end until the end of the line.
$return = preg_replace($pat_start . "\/\/([" . $val_comment . "]*)\n" , "${1}<span class=\"code_comment\">${2}</span>${3}", $return);

//The second set of comment blocks are the blocks, where the comment is preceeded by a
//forward slash and asterisk, and does not end until an asterisk then a forward slash.
$return = preg_replace($pat_start . "\/\*([" . $val_comment . "]*)\*\/" . $pat_end, "${1}<span class=\"code_comment\">${2}</span>${3}", $return);

$return = preg_replace($pat_start . "[" . $val_datatypes . "]" . $pat_end, "${1}<span class=\"code_key_data\">${2}</span>${3}", $return);
$return = preg_replace($pat_start . "[" . $val_keywords . "]" . $pat_end, "${1}<span class=\"code_key_syntax\">${2}</span>${3}", $return);
$return = preg_replace($pat_start . "[" . $val_special . "]" . $pat_end, "${1}<span class=\"code_key_special\">${2}</span>${3}", $return);

//This is the last bit where we replace all instances of function names in the above created
//arrays with their HTML URL link versions.
//I don't know if the back references would work here given I had to escape them in the above
//process.
$return = preg_replace($keylinks['find'], $keylinks['replace'], $return);

return $return;
}
[/PHP]

Any and all help would be greatly appreciated.

Regards,
Rob.
Mar 7 '07 #1
Share this question for a faster answer!
Share on Google+

Post your reply

Sign in to post your reply or Sign up for a free account.