473,394 Members | 1,802 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Regular Expressions -- count lines with a specific pattern in a flat file

I have a CSV file like so:

"HDR",20060629133932,"9845","9083","0010"
1,"3","000000000690","000007","rsM4hJXR5Ik0O8RWghj tDBlUVAOZq7tO","BAR","0010","","",20.00
2,"3","000000000691","000007","65Xbp5dMcDFflPJnxWC rsJtV1jzcUjgd","BAR","0010","","",20.00
3,"3","000000000692","000007","SEjcf3eDA7hWmwGrNsL WoCWt1Geyh4GN","BAR","0010","","",20.00
4,"3","000000000693","000007","MJMkrp/kRMMGimeZo1uFOJzeDTVeOkFU","BAR","0010","","",20.0 0
5,"3","000000000694","000007","fDIBFgockQHhN+eVQxE BqqrJfZ78roja","BAR","0010","","",20.00
......and so on...

Each file has about a million records or more. Instead of iterating
through each line and counting line breaks, and ignoring header and
footer records and counting only data records, I thought of writing a
regex pattern for the same. Here's what I've written to count only data
records, i.e rows that start with a number followed by a comma and then
any othe text and ending with a line break.

numRecords = System.Text.RegularExpressions.Regex.Matches(ret,
"(?m)^[0-9]{1, 6}*$",
System.Text.RegularExpressions.RegexOptions.Multil ine).Count;

I get a zero match collection count.

Sep 4 '06 #1
2 5330
So using that form of .Matches means that you have to load the entire string
at once? Bet that's fast... ;-p Especially with the lookaround ...

However, it makes sense that it fails:

^[0-9]{1,6}*$

says newline, then "between 1 and 6 digits" "zero or more times" then end of
line, with nothing else; well the commas and quotes seem to get in the way?
Did you mean

^[0-9]{1-6},.*$

which is new line, "between 1 and 6 digits", comma, "zero-ormore chars
except newline", end of line

However, for performance I would still suggest using line by line,
stream-based, reading, and also re-using a single Regex instance (ideally
precompiled):

Regex re = new Regex("[0-9]{1-6},.*",RegexOptions.Compiled);
int count = 0;
using(StreamReader reader = File.OpenText(path)) {
while(!reader.EndOfStream) {
string line = reader.ReadLine();
if(!string.IsNullOrEmpty(line) && re.IsMatch(line))
count++;
}
}

Marc
Sep 4 '06 #2
Sorry - typo by me: I meant {1,6} (as per your original example); likewise
"^[0-9]{1,6},.*$" in the example code - although given we don't care about
the rhs it may also work with just "^[0-9]{1,6},".

Marc
Sep 4 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
3
by: Gianluca | last post by:
Hi, I'm using regular expressions to extract some information from my vb.net source code files. I have something like this: 1: '<class name="xyz" description="xxxxxx"/> 2: Class xyz ......
4
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or...
5
by: Trevor Braun | last post by:
Hi, I'm not sure that this is the right forum for this, but I've been having a very tough time completing this expression, and I was hoping someone might have some suggestions for me. I am trying...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
3
by: Chris | last post by:
Hi everyone, I'm trying to parse through the contents of some text files with regular expressions, but am new to regular expressions and how to use them in VB.net. I'm pretty sure that the...
10
by: supercrossking | last post by:
I am trying to the values of string of text in the sample before. The ds are for digits and s is for string and string of text is for a string with more than one or two values. I am trying to use...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.