sign in | join about | help | sitemap
Connecting Tech Pros Worldwide
beatTheDevil's Avatar

Ruby regex for removing C/Java-style /* ... */ comments


Question posted by: beatTheDevil (Newbie) on July 31st, 2007 06:52 PM
Hey guys,

As the title says I'm trying to make a regular expression (regex/regexp) for use in removing the comments from code. In this case, this particular regex is meant to match /* ... */ comments.

I'm using Ruby v.1.8.6

Here's my regex:
Expand|Select|Wrap|Line Numbers
  1. multiline_comments = /\/\*(.*?)\*\//

When I try
Expand|Select|Wrap|Line Numbers
  1. myStr.gsub(multiline_comments, "")

I see no effect. The string has big fat comments in it too. I tried using this regex in irb with a couple test strings, and it works perfectly. This leads me to think I don't understand some subtlety of file io, so here's all my code (cop-out, I know). I'm trying to write a very simple JavaScript compacter, but I want to preserve readability so I'm only getting rid of unnecessary newlines, spaces in between tokens, and comments. I DON'T want the whole file on one line like some compactors do it. Anyway, here goes:

Expand|Select|Wrap|Line Numbers
  1. # Non-destructive JavaScript Packer
  2. # =================================
  3. #
  4. # Reduces overall script filesize by removing comments
  5. # and unecessary whitespace. Does not affect variable naming,
  6. # indentation, or line-by-line formatting in order to maximize
  7. # readability.
  8.  
  9. def pack_line(file_line)
  10.  
  11.     return '' unless file_line
  12.  
  13.     #puts "The next line: " + file_line
  14.  
  15.     #kill one-line (//...) comments
  16.     line_comments = /(\S*)\s*\/\/.*/
  17.     intermed = file_line.gsub(line_comments, '\1')
  18.     intermed += "\n" if intermed[intermed.length - 1] != "\n"
  19.  
  20.     #puts "\tAfter one-liner removal: " + intermed
  21.  
  22.     #kill unnecessary whitespace
  23.     extra_whitespace = /([^(var|function|return|\s*)])[ \t]+(.*?)/
  24.     intermed = intermed.gsub(extra_whitespace, '\1\2')
  25.  
  26.     #puts "\tAfter extra whitespace removal: " + intermed
  27.  
  28.     intermed
  29. end
  30.  
  31. #performs the packing operation, returns a single string
  32. #representing the packed document
  33. def pack(file)
  34.     lines = Array.new
  35.  
  36.     file.each_line do |line|
  37.         lines.push pack_line(line)
  38.     end
  39.  
  40.     intermed = lines.join
  41.  
  42.     #puts "\tBefore multi-liner removal: " + intermed
  43.  
  44.     #kill multi-line (/* ... */) comments
  45.     multiline_comments = /\/\*(.*?)\*\//
  46.     intermed = intermed.gsub(multiline_comments, '')
  47.  
  48.     #puts "\tAfter multi-liner removal: " + intermed
  49.  
  50.     #kill extra new lines
  51.     extra_newlines = /(\r?\n){2,}/
  52.     intermed = intermed.gsub(extra_newlines, "\n")
  53.  
  54.     #puts "\tFinally: " + intermed + "\n"
  55.  
  56.     intermed
  57. end
  58.  
  59. #open file for reading and pass it to pack()
  60. def init(in_file, out_file)
  61.     file = File.new(in_file, "r")
  62.  
  63.     newDoc = pack(file)
  64.  
  65.     file.close
  66.  
  67.     if out_file then
  68.         file = File.new(out_file, "w")
  69.  
  70.         file.puts(newDoc)
  71.  
  72.         file.close
  73.     else
  74.         puts newDoc
  75.     end
  76. end
  77.  
  78. #start the script with the command-line arg file name
  79. puts init(ARGV[0], ARGV[1])


Any ideas? Thanks for all your help.
2 Answers Posted
beatTheDevil's Avatar
beatTheDevil August 1st, 2007 11:22 PM
Newbie - 16 Posts
#2: Re: Ruby regex for removing C/Java-style /* ... */ comments

More specifically, even though it works fine on one-line strings, I think I've found that it's unable to match this style of comments across new lines ("\n"). Is there a way to get around this? I thought the '.' matched any character whatsoever...
roguesheep's Avatar
roguesheep August 2nd, 2007 07:57 PM
Newbie - 1 Posts
#3: Re: Ruby regex for removing C/Java-style /* ... */ comments

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/11137
Reply
Not the answer you were looking for? Post your question . . .
197,040 members ready to help you find a solution.
Join Bytes.com

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 197,040 network members.
Post your question now . . .
It's fast and it's free

Popular Articles

Top Ruby / Rails Contributors