473,386 Members | 1,706 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Regex, TextReader...?

I have attached a block of text similar to the type that I am working
with.

I have been learning a lot about Regex - it is quite impressive. I can
easily capture bits of info, but I keep having trouble with line breaks.

I want to identify the start and end of blocks of text. Are there some
tips someone can share?

EG: in my text, I can grab a collection of everyones Phone number with:
^"M:"\t"(?<PhoneNumber>[^"])"

But, what about if I wanted to grab many lines, until it matched a
certain pattern. I use the ^ to say not the quote, but can I say not 14
hyphens?

The way I have split this type of data is inefficient. I match all the
cases of:
^-{14}
Then I use many math equations to split the file using the index of the
matches. I am sure Regex must have some way to pattern match a complex
not, to indicate the end of my match?

Thank you.


--------------
"M:" "3242310532"
"Subscriber Name:" "MR Regex"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"
"Monthly Service Plan" $40.00
"Additional Local Airtime" $0.00
"Long Distance Charges" $0.00
"Roaming Charges" $0.00
"Network and Licensing Charges" $7.20
"Total Taxes:" $7.09
"Total Current Charges:" $47.20

"MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03
"Service Plan Name" "Total"
"Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00
"Total Monthly Service Plan Charges" $40.00

"ADDITIONAL LOCAL AIRTIME"
"Service" "Total Mins. Used" "Free Mins. Used" "Included Mins.
Used" "Chargeable Mins. Used" "Total"
"Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00
"Total Additional Local Airtime Charges" $0.00

"LONG DISTANCE, ROAMING AND OTHER CALL CHARGES"
"Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total"
"Total Long Distance Charges" $0.00

"ROAMING"
"Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes"
"Roaming LD Charges" "Roaming Surcharge" "Total"
"Total Roaming Charges" $0.00

"WIRELESS WEB - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Wireless Web Premium Services Charges" $0.00

"PHONE - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Phone Premium Services Charges" $0.00

"PAGER SERVICES"
"Service" "Total Messages" "Included Messages" "Chargeable
Messages" "Total"
"Total Pager Charges" $0.00

"VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03
"Service" "Total"
"Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00
"Total Value Added Service Charges" $0.00

"OTHER CHARGES AND CREDIT"
"Charge or Credit" "Total"
"Total Other Charges and Credits" $0.00

"NETWORK and LICENSING CHARGES"
"Service" "Total"
"911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25
"System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95
"Total Network Licensing Charges" $7.20

"TAXES"
"" "Total"
"Total Taxes" $7.09

--------------
"M:" "9042437121"
"Subscriber Name:" "Fred 1"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"
Nov 22 '05 #1
4 2052
Yes, you can do it in regex. The trick is to allow your pattern to match
more than one time. For example, if I had something like:

1234
34123
11313
113133
xxxxx

I could write something like:

(?<Numbers>^\d+$)+xxxxx

Which means that I need to look at Match.Captures instead of Match.Groups,
IIRC.

Note that in most uses of this technique, what you really need to write is
something like:

((?<Numbers> match numbers) match stuff between numbers)+xxxxx

so that the match can continue. You may also need to play around with the
singleline and multiline options.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Masahiro Ito" <ma**@pleasespamgoaway.it> wrote in message
news:Xn************************@216.196.105.130...
I have attached a block of text similar to the type that I am working
with.

I have been learning a lot about Regex - it is quite impressive. I can
easily capture bits of info, but I keep having trouble with line breaks.

I want to identify the start and end of blocks of text. Are there some
tips someone can share?

EG: in my text, I can grab a collection of everyones Phone number with:
^"M:"\t"(?<PhoneNumber>[^"])"

But, what about if I wanted to grab many lines, until it matched a
certain pattern. I use the ^ to say not the quote, but can I say not 14
hyphens?

The way I have split this type of data is inefficient. I match all the
cases of:
^-{14}
Then I use many math equations to split the file using the index of the
matches. I am sure Regex must have some way to pattern match a complex
not, to indicate the end of my match?

Thank you.


--------------
"M:" "3242310532"
"Subscriber Name:" "MR Regex"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"
"Monthly Service Plan" $40.00
"Additional Local Airtime" $0.00
"Long Distance Charges" $0.00
"Roaming Charges" $0.00
"Network and Licensing Charges" $7.20
"Total Taxes:" $7.09
"Total Current Charges:" $47.20

"MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03
"Service Plan Name" "Total"
"Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00
"Total Monthly Service Plan Charges" $40.00

"ADDITIONAL LOCAL AIRTIME"
"Service" "Total Mins. Used" "Free Mins. Used" "Included Mins.
Used" "Chargeable Mins. Used" "Total"
"Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00
"Total Additional Local Airtime Charges" $0.00

"LONG DISTANCE, ROAMING AND OTHER CALL CHARGES"
"Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total"
"Total Long Distance Charges" $0.00

"ROAMING"
"Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes"
"Roaming LD Charges" "Roaming Surcharge" "Total"
"Total Roaming Charges" $0.00

"WIRELESS WEB - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Wireless Web Premium Services Charges" $0.00

"PHONE - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Phone Premium Services Charges" $0.00

"PAGER SERVICES"
"Service" "Total Messages" "Included Messages" "Chargeable
Messages" "Total"
"Total Pager Charges" $0.00

"VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03
"Service" "Total"
"Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00
"Total Value Added Service Charges" $0.00

"OTHER CHARGES AND CREDIT"
"Charge or Credit" "Total"
"Total Other Charges and Credits" $0.00

"NETWORK and LICENSING CHARGES"
"Service" "Total"
"911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25
"System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95
"Total Network Licensing Charges" $7.20

"TAXES"
"" "Total"
"Total Taxes" $7.09

--------------
"M:" "9042437121"
"Subscriber Name:" "Fred 1"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""

"CURRENT CHARGES"

Nov 22 '05 #2
Thank you Eric. I was doing a capture group (in my first example using
my sample text I used (?<PhoneNumber>[^"]*) to capture everything until
the next " in my phonenumber collection.

In this simple example, capturing the Field 1 and Field5 value, I cannot
reliably regex the 'everything between numbers'.

My attempt (doesn't work:
Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5>[0-9.$]*)
^trouble^

Field1: 1234
Field2: 34123
Field3: 1313
Field4: 13133
Field5: $xxxx.00
Field6: 2342df
Field1: 2342
Field2: 33241
Field3: 2142
Field4: 543523
Field5: $342.00
Field6: 43254
Field1: 3415
Field2: 234235
Field3: 341
Field4: 13212533
Field5: $5234.00
Field6: 32415

Of course, I can run two separate captures, but...

You gave the example technique : ((?<Numbers> match numbers) match stuff
between numbers)+xxxxx

Does this +xxxxx match everything until the xxxxx is found? In my regex
apps (I use expresso and Regex Workshop as dotnet tools) there are no
matches.

Thanks,

Masa

"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:#O**************@TK2MSFTNGP12.phx.gbl:
Yes, you can do it in regex. The trick is to allow your pattern to
match more than one time. For example, if I had something like:

1234
34123
11313
113133
xxxxx

I could write something like:

(?<Numbers>^\d+$)+xxxxx

Which means that I need to look at Match.Captures instead of
Match.Groups, IIRC.

Note that in most uses of this technique, what you really need to
write is something like:

((?<Numbers> match numbers) match stuff between numbers)+xxxxx

so that the match can continue. You may also need to play around with
the singleline and multiline options.


Nov 22 '05 #3
I'm a little confused about what you're trying to do. Given the example text
below, what is the expect output that you want?

If I assume that you didn't mean to write xxxx.00 for the Field5 value
below, the following regex may do what you want:

new Regex(@"
(
(?<S2>.*?)
Field1:\s(?<F1>[0-9]*)
(?<S1>.+?)
Field5:\s(?<F5>[0-9.\$]+)
)+",
RegexOption.IgnorePatternWhitespace);

All the F1 values will be in one capture, all the F5 values in the other
capture. I named the S1 and S2 captures so you could see what they're
matching.

I'd suggest using my Regex Workbench at
http://www.gotdotnet.com/Community/U...1-4EE2729D7322 -
it makes playing around with Regex much easier.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Masahiro Ito" <ma**@pleasespamgoaway.it> wrote in message
news:Xn***********************@207.46.248.16...
Thank you Eric. I was doing a capture group (in my first example using
my sample text I used (?<PhoneNumber>[^"]*) to capture everything until
the next " in my phonenumber collection.

In this simple example, capturing the Field 1 and Field5 value, I cannot
reliably regex the 'everything between numbers'.

My attempt (doesn't work:
Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5>[0-9.$]*)
^trouble^

Field1: 1234
Field2: 34123
Field3: 1313
Field4: 13133
Field5: $xxxx.00
Field6: 2342df
Field1: 2342
Field2: 33241
Field3: 2142
Field4: 543523
Field5: $342.00
Field6: 43254
Field1: 3415
Field2: 234235
Field3: 341
Field4: 13212533
Field5: $5234.00
Field6: 32415

Of course, I can run two separate captures, but...

You gave the example technique : ((?<Numbers> match numbers) match stuff
between numbers)+xxxxx

Does this +xxxxx match everything until the xxxxx is found? In my regex
apps (I use expresso and Regex Workshop as dotnet tools) there are no
matches.

Thanks,

Masa

"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:#O**************@TK2MSFTNGP12.phx.gbl:
Yes, you can do it in regex. The trick is to allow your pattern to
match more than one time. For example, if I had something like:

1234
34123
11313
113133
xxxxx

I could write something like:

(?<Numbers>^\d+$)+xxxxx

Which means that I need to look at Match.Captures instead of
Match.Groups, IIRC.

Note that in most uses of this technique, what you really need to
write is something like:

((?<Numbers> match numbers) match stuff between numbers)+xxxxx

so that the match can continue. You may also need to play around with
the singleline and multiline options.

Nov 22 '05 #4
"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:OQ**************@TK2MSFTNGP09.phx.gbl:
I'm a little confused about what you're trying to do. Given the
example text below, what is the expect output that you want?

If I assume that you didn't mean to write xxxx.00 for the Field5 value
below, the following regex may do what you want:

new Regex(@"
(
(?<S2>.*?)
Field1:\s(?<F1>[0-9]*)
(?<S1>.+?)
Field5:\s(?<F5>[0-9.\$]+)
)+",
RegexOption.IgnorePatternWhitespace);

All the F1 values will be in one capture, all the F5 values in the
other capture. I named the S1 and S2 captures so you could see what
they're matching.

I'd suggest using my Regex Workbench at
http://www.gotdotnet.com/Community/U...px?SampleGuid=
C712F2DF-B026-4D58-8961-4EE2729D7322 - it makes playing around with
Regex much easier.

Thanks Eric. Actually, I was using your Regex Workbench already - it is
great! Thank you for sharing it.

Something is not clicking with me and these regex expressions. Even when I
paste your regex, I don't believe I am getting the responses you intended.
In the sample I posted, I am trying to capture the field 1 and field 5
values. I can capture them separately, but can't seem to grasp the 'skip
everything until a specific pattern is matched'.

I am trying to break down your sample piece by piece. Does the @ at the
start do something?

Also, using Regex Workbench, using your sample in your first reply, I am
not getting any matches.
String:
1234
34123
11313
113133
xxxxx

Regex:
(?<Numbers>^\d+$)+xxxxx

I have tried every permutation I can think of with Multi/single line, etc..
I feel like I am going crazy.

Thank you.

Masa

Nov 22 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Masahiro Ito | last post by:
I have attached a block of text similar to the type that I am working with. I have been learning a lot about Regex - it is quite impressive. I can easily capture bits of info, but I keep having...
6
by: | last post by:
I am rewriting a C++ application in C#. This file has a combination of Text and Binary data. I used CFile before to read the text. If I hit a certain string that denotes the following data is...
1
by: Colin Green | last post by:
Hi I wonder if anyone has any ideas about this... I am dumping the contents of an XmlDocument into a RichTextBox so the user can see the raw XML. I use a line of code something like this to lado...
3
by: Chan | last post by:
Got in a difficult situation of storing and retrieving TextReader in Cache, but found no such post yet. Tried to store a textreader into Cache object in a similar way illustrated in .NET's SDK...
2
by: Bryan Dickerson | last post by:
StreamReader says it is designed to read a stream of characters StringReader says it is designed to read a string TextReader says it is designed to read a sequential list of characters. I hate...
11
by: info | last post by:
Hi All, How can i rewind the following sile stream: TextReader tr = new StreamReader(File.Open(fileName, FileMode.Open)); Is there a dedicated method or shall I close and re-open the...
3
by: trint | last post by:
When testing locally with: TextReader tr = new StreamReader(@"C:\gcc.set"); Now that I have uploaded this to the server on the net, that locations permissions are denied. How can I change this to...
1
by: Rene | last post by:
Hi, I decided to take a closer look at the TextWriter and TextReader abstract classes just for fun. While poking around, I noticed that the TextWriter class includes an 'Encoding' property in...
3
by: Tony Johansson | last post by:
Hello! I just wonder in this specific case is it any advantage to use a TextReader a reference to a StreamReader ? Try { TextReader tr = new StreamReader(locationTextBox.Text); Try {
0
by: Mobious | last post by:
Hello, I am currently designing a console app that will run and search our network for any files that contain 16 digit numbers. I'm having to utilise iFilters to properly index each file, which...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.