By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,107 Members | 1,114 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,107 IT Pros & Developers. It's quick & easy.

Regex parsing - numeric values with whitespace

P: n/a
I have rows of 8 numerical values in a text file that I have to parse. Each
value must occupy 10 spaces and can be either a decimal or an integer.

// these are valid - each fit in a 10 character block
123.8
123.8
1234.567
12345
12345
1234.567

// these are not valid
12 34.56 // whitespace in the middle of the value
1234 // occupies 11 spaces

// these are valid lines
// there is one value in each 10 character block
12345.678912345.678912345.678912345.678912345.6789 12345.678912345.678912345.6789
345.6789 345.6789 45.6789 45.678912345.6789 345.678912345.6789
2345.6789

In reality the data can be in many different formats. (single values, rows
of 8 followed by an integer, rows of 10...) Currently I am parsing this with
code. For this example I read a line, break it up into an array of 8 values
of 10 characters each, and check to see if each one is numeric. If they are
all numeric, move on to the next line.

My biggest problem is how to deal with the whitespace. It can come before
the value, after the value, but not in the middle of the value. The value
can be of any length up to 10. It can be placed anywhere in the 10
character block and padded with whitespace. There are a lot of possible
variations. Also, I would like to generalize it to create the regular
expression on the fly for different formats.

Is there a way, using regular expressions, to specify a total width
including varing amounts of whitespace (depending on the size and position
of the value)?

Thanks for any help.

David
May 10 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a
David,

I don't think that you are going to get any real performance gains using
RegEx here. I also don't think that you are going to get a maintinence gain
either, since I can't think of a regex feature which will let you do this.

On top of that, you already have code that does this. It really doesn't
seem to difficult. Just read each line, read 10 characters, trim it, see if
it is a number, repeat.

Why fix it if it isn't broken?
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"David" <dr*******@yahoo.com> wrote in message
news:uc**************@TK2MSFTNGP04.phx.gbl...
I have rows of 8 numerical values in a text file that I have to parse.
Each value must occupy 10 spaces and can be either a decimal or an integer.

// these are valid - each fit in a 10 character block
123.8
123.8
1234.567
12345
12345
1234.567

// these are not valid
12 34.56 // whitespace in the middle of the value
1234 // occupies 11 spaces

// these are valid lines
// there is one value in each 10 character block
12345.678912345.678912345.678912345.678912345.6789 12345.678912345.678912345.6789
345.6789 345.6789 45.6789 45.678912345.6789 345.678912345.6789
2345.6789

In reality the data can be in many different formats. (single values,
rows of 8 followed by an integer, rows of 10...) Currently I am parsing
this with code. For this example I read a line, break it up into an array
of 8 values of 10 characters each, and check to see if each one is
numeric. If they are all numeric, move on to the next line.

My biggest problem is how to deal with the whitespace. It can come before
the value, after the value, but not in the middle of the value. The value
can be of any length up to 10. It can be placed anywhere in the 10
character block and padded with whitespace. There are a lot of possible
variations. Also, I would like to generalize it to create the regular
expression on the fly for different formats.

Is there a way, using regular expressions, to specify a total width
including varing amounts of whitespace (depending on the size and position
of the value)?

Thanks for any help.

David

May 10 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.