By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,118 Members | 1,134 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,118 IT Pros & Developers. It's quick & easy.

Regex, TextReader...?

P: n/a
I have attached a block of text similar to the type that I am working
with.

I have been learning a lot about Regex - it is quite impressive. I can
easily capture bits of info, but I keep having trouble with line breaks.

I want to identify the start and end of blocks of text. Are there some
tips someone can share?

EG: in my text, I can grab a collection of everyones Phone number with:
^"M:"\t"(?<PhoneNumber>[^"])"

But, what about if I wanted to grab many lines, until it matched a
certain pattern. I use the ^ to say not the quote, but can I say not 14
hyphens?

The way I have split this type of data is inefficient. I match all the
cases of:
^-{14}
Then I use many math equations to split the file using the index of the
matches. I am sure Regex must have some way to pattern match a complex
not, to indicate the end of my match?

Thank you.


--------------
"M:" "3242310532"
"Subscriber Name:" "MR Regex"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"
"Monthly Service Plan" $40.00
"Additional Local Airtime" $0.00
"Long Distance Charges" $0.00
"Roaming Charges" $0.00
"Network and Licensing Charges" $7.20
"Total Taxes:" $7.09
"Total Current Charges:" $47.20

"MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03
"Service Plan Name" "Total"
"Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00
"Total Monthly Service Plan Charges" $40.00

"ADDITIONAL LOCAL AIRTIME"
"Service" "Total Mins. Used" "Free Mins. Used" "Included Mins.
Used" "Chargeable Mins. Used" "Total"
"Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00
"Total Additional Local Airtime Charges" $0.00

"LONG DISTANCE, ROAMING AND OTHER CALL CHARGES"
"Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total"
"Total Long Distance Charges" $0.00

"ROAMING"
"Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes"
"Roaming LD Charges" "Roaming Surcharge" "Total"
"Total Roaming Charges" $0.00

"WIRELESS WEB - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Wireless Web Premium Services Charges" $0.00

"PHONE - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Phone Premium Services Charges" $0.00

"PAGER SERVICES"
"Service" "Total Messages" "Included Messages" "Chargeable
Messages" "Total"
"Total Pager Charges" $0.00

"VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03
"Service" "Total"
"Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00
"Total Value Added Service Charges" $0.00

"OTHER CHARGES AND CREDIT"
"Charge or Credit" "Total"
"Total Other Charges and Credits" $0.00

"NETWORK and LICENSING CHARGES"
"Service" "Total"
"911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25
"System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95
"Total Network Licensing Charges" $7.20

"TAXES"
"" "Total"
"Total Taxes" $7.09

--------------
"M:" "9042437121"
"Subscriber Name:" "Fred 1"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"
Nov 22 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Yes, you can do it in regex. The trick is to allow your pattern to match
more than one time. For example, if I had something like:

1234
34123
11313
113133
xxxxx

I could write something like:

(?<Numbers>^\d+$)+xxxxx

Which means that I need to look at Match.Captures instead of Match.Groups,
IIRC.

Note that in most uses of this technique, what you really need to write is
something like:

((?<Numbers> match numbers) match stuff between numbers)+xxxxx

so that the match can continue. You may also need to play around with the
singleline and multiline options.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Masahiro Ito" <ma**@pleasespamgoaway.it> wrote in message
news:Xn************************@216.196.105.130...
I have attached a block of text similar to the type that I am working
with.

I have been learning a lot about Regex - it is quite impressive. I can
easily capture bits of info, but I keep having trouble with line breaks.

I want to identify the start and end of blocks of text. Are there some
tips someone can share?

EG: in my text, I can grab a collection of everyones Phone number with:
^"M:"\t"(?<PhoneNumber>[^"])"

But, what about if I wanted to grab many lines, until it matched a
certain pattern. I use the ^ to say not the quote, but can I say not 14
hyphens?

The way I have split this type of data is inefficient. I match all the
cases of:
^-{14}
Then I use many math equations to split the file using the index of the
matches. I am sure Regex must have some way to pattern match a complex
not, to indicate the end of my match?

Thank you.


--------------
"M:" "3242310532"
"Subscriber Name:" "MR Regex"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"
"Monthly Service Plan" $40.00
"Additional Local Airtime" $0.00
"Long Distance Charges" $0.00
"Roaming Charges" $0.00
"Network and Licensing Charges" $7.20
"Total Taxes:" $7.09
"Total Current Charges:" $47.20

"MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03
"Service Plan Name" "Total"
"Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00
"Total Monthly Service Plan Charges" $40.00

"ADDITIONAL LOCAL AIRTIME"
"Service" "Total Mins. Used" "Free Mins. Used" "Included Mins.
Used" "Chargeable Mins. Used" "Total"
"Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00
"Total Additional Local Airtime Charges" $0.00

"LONG DISTANCE, ROAMING AND OTHER CALL CHARGES"
"Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total"
"Total Long Distance Charges" $0.00

"ROAMING"
"Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes"
"Roaming LD Charges" "Roaming Surcharge" "Total"
"Total Roaming Charges" $0.00

"WIRELESS WEB - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Wireless Web Premium Services Charges" $0.00

"PHONE - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Phone Premium Services Charges" $0.00

"PAGER SERVICES"
"Service" "Total Messages" "Included Messages" "Chargeable
Messages" "Total"
"Total Pager Charges" $0.00

"VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03
"Service" "Total"
"Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00
"Total Value Added Service Charges" $0.00

"OTHER CHARGES AND CREDIT"
"Charge or Credit" "Total"
"Total Other Charges and Credits" $0.00

"NETWORK and LICENSING CHARGES"
"Service" "Total"
"911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25
"System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95
"Total Network Licensing Charges" $7.20

"TAXES"
"" "Total"
"Total Taxes" $7.09

--------------
"M:" "9042437121"
"Subscriber Name:" "Fred 1"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"

Nov 22 '05 #2

P: n/a
Thank you Eric. I was doing a capture group (in my first example using
my sample text I used (?<PhoneNumber>[^"]*) to capture everything until
the next " in my phonenumber collection.

In this simple example, capturing the Field 1 and Field5 value, I cannot
reliably regex the 'everything between numbers'.

My attempt (doesn't work:
Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5>[0-9.$]*)
^trouble^

Field1: 1234
Field2: 34123
Field3: 1313
Field4: 13133
Field5: $xxxx.00
Field6: 2342df
Field1: 2342
Field2: 33241
Field3: 2142
Field4: 543523
Field5: $342.00
Field6: 43254
Field1: 3415
Field2: 234235
Field3: 341
Field4: 13212533
Field5: $5234.00
Field6: 32415

Of course, I can run two separate captures, but...

You gave the example technique : ((?<Numbers> match numbers) match stuff
between numbers)+xxxxx

Does this +xxxxx match everything until the xxxxx is found? In my regex
apps (I use expresso and Regex Workshop as dotnet tools) there are no
matches.

Thanks,

Masa

"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:#O**************@TK2MSFTNGP12.phx.gbl:
Yes, you can do it in regex. The trick is to allow your pattern to
match more than one time. For example, if I had something like:

1234
34123
11313
113133
xxxxx

I could write something like:

(?<Numbers>^\d+$)+xxxxx

Which means that I need to look at Match.Captures instead of
Match.Groups, IIRC.

Note that in most uses of this technique, what you really need to
write is something like:

((?<Numbers> match numbers) match stuff between numbers)+xxxxx

so that the match can continue. You may also need to play around with
the singleline and multiline options.


Nov 22 '05 #3

P: n/a
I'm a little confused about what you're trying to do. Given the example text
below, what is the expect output that you want?

If I assume that you didn't mean to write xxxx.00 for the Field5 value
below, the following regex may do what you want:

new Regex(@"
(
(?<S2>.*?)
Field1:\s(?<F1>[0-9]*)
(?<S1>.+?)
Field5:\s(?<F5>[0-9.\$]+)
)+",
RegexOption.IgnorePatternWhitespace);

All the F1 values will be in one capture, all the F5 values in the other
capture. I named the S1 and S2 captures so you could see what they're
matching.

I'd suggest using my Regex Workbench at
http://www.gotdotnet.com/Community/U...1-4EE2729D7322 -
it makes playing around with Regex much easier.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Masahiro Ito" <ma**@pleasespamgoaway.it> wrote in message
news:Xn***********************@207.46.248.16...
Thank you Eric. I was doing a capture group (in my first example using
my sample text I used (?<PhoneNumber>[^"]*) to capture everything until
the next " in my phonenumber collection.

In this simple example, capturing the Field 1 and Field5 value, I cannot
reliably regex the 'everything between numbers'.

My attempt (doesn't work:
Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5>[0-9.$]*)
^trouble^

Field1: 1234
Field2: 34123
Field3: 1313
Field4: 13133
Field5: $xxxx.00
Field6: 2342df
Field1: 2342
Field2: 33241
Field3: 2142
Field4: 543523
Field5: $342.00
Field6: 43254
Field1: 3415
Field2: 234235
Field3: 341
Field4: 13212533
Field5: $5234.00
Field6: 32415

Of course, I can run two separate captures, but...

You gave the example technique : ((?<Numbers> match numbers) match stuff
between numbers)+xxxxx

Does this +xxxxx match everything until the xxxxx is found? In my regex
apps (I use expresso and Regex Workshop as dotnet tools) there are no
matches.

Thanks,

Masa

"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:#O**************@TK2MSFTNGP12.phx.gbl:
Yes, you can do it in regex. The trick is to allow your pattern to
match more than one time. For example, if I had something like:

1234
34123
11313
113133
xxxxx

I could write something like:

(?<Numbers>^\d+$)+xxxxx

Which means that I need to look at Match.Captures instead of
Match.Groups, IIRC.

Note that in most uses of this technique, what you really need to
write is something like:

((?<Numbers> match numbers) match stuff between numbers)+xxxxx

so that the match can continue. You may also need to play around with
the singleline and multiline options.

Nov 22 '05 #4

P: n/a
"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:OQ**************@TK2MSFTNGP09.phx.gbl:
I'm a little confused about what you're trying to do. Given the
example text below, what is the expect output that you want?

If I assume that you didn't mean to write xxxx.00 for the Field5 value
below, the following regex may do what you want:

new Regex(@"
(
(?<S2>.*?)
Field1:\s(?<F1>[0-9]*)
(?<S1>.+?)
Field5:\s(?<F5>[0-9.\$]+)
)+",
RegexOption.IgnorePatternWhitespace);

All the F1 values will be in one capture, all the F5 values in the
other capture. I named the S1 and S2 captures so you could see what
they're matching.

I'd suggest using my Regex Workbench at
http://www.gotdotnet.com/Community/U...px?SampleGuid=
C712F2DF-B026-4D58-8961-4EE2729D7322 - it makes playing around with
Regex much easier.

Thanks Eric. Actually, I was using your Regex Workbench already - it is
great! Thank you for sharing it.

Something is not clicking with me and these regex expressions. Even when I
paste your regex, I don't believe I am getting the responses you intended.
In the sample I posted, I am trying to capture the field 1 and field 5
values. I can capture them separately, but can't seem to grasp the 'skip
everything until a specific pattern is matched'.

I am trying to break down your sample piece by piece. Does the @ at the
start do something?

Also, using Regex Workbench, using your sample in your first reply, I am
not getting any matches.
String:
1234
34123
11313
113133
xxxxx

Regex:
(?<Numbers>^\d+$)+xxxxx

I have tried every permutation I can think of with Multi/single line, etc..
I feel like I am going crazy.

Thank you.

Masa

Nov 22 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.