I have attached a block of text similar to the type that I am working
with.
I have been learning a lot about Regex - it is quite impressive. I can
easily capture bits of info, but I keep having trouble with line breaks.
I want to identify the start and end of blocks of text. Are there some
tips someone can share?
EG: in my text, I can grab a collection of everyones Phone number with:
^"M:"\t"(?<PhoneNumber>[^"])"
But, what about if I wanted to grab many lines, until it matched a
certain pattern. I use the ^ to say not the quote, but can I say not 14
hyphens?
The way I have split this type of data is inefficient. I match all the
cases of:
^-{14}
Then I use many math equations to split the file using the index of the
matches. I am sure Regex must have some way to pattern match a complex
not, to indicate the end of my match?
Thank you.
--------------
"M:" "3242310532"
"Subscriber Name:" "MR Regex"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""
"CURRENT CHARGES"
"Monthly Service Plan" $40.00
"Additional Local Airtime" $0.00
"Long Distance Charges" $0.00
"Roaming Charges" $0.00
"Network and Licensing Charges" $7.20
"Total Taxes:" $7.09
"Total Current Charges:" $47.20
"MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03
"Service Plan Name" "Total"
"Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00
"Total Monthly Service Plan Charges" $40.00
"ADDITIONAL LOCAL AIRTIME"
"Service" "Total Mins. Used" "Free Mins. Used" "Included Mins.
Used" "Chargeable Mins. Used" "Total"
"Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00
"Total Additional Local Airtime Charges" $0.00
"LONG DISTANCE, ROAMING AND OTHER CALL CHARGES"
"Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total"
"Total Long Distance Charges" $0.00
"ROAMING"
"Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes"
"Roaming LD Charges" "Roaming Surcharge" "Total"
"Total Roaming Charges" $0.00
"WIRELESS WEB - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Wireless Web Premium Services Charges" $0.00
"PHONE - PREMIUM SERVICE"
"Service" "Total Events" "Event Type" "Total"
"Total Phone Premium Services Charges" $0.00
"PAGER SERVICES"
"Service" "Total Messages" "Included Messages" "Chargeable
Messages" "Total"
"Total Pager Charges" $0.00
"VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03
"Service" "Total"
"Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00
"Total Value Added Service Charges" $0.00
"OTHER CHARGES AND CREDIT"
"Charge or Credit" "Total"
"Total Other Charges and Credits" $0.00
"NETWORK and LICENSING CHARGES"
"Service" "Total"
"911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25
"System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95
"Total Network Licensing Charges" $7.20
"TAXES"
"" "Total"
"Total Taxes" $7.09
--------------
"M:" "9042437121"
"Subscriber Name:" "Fred 1"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""
"CURRENT CHARGES" 4 2052
Yes, you can do it in regex. The trick is to allow your pattern to match
more than one time. For example, if I had something like:
1234
34123
11313
113133
xxxxx
I could write something like:
(?<Numbers>^\d+$)+xxxxx
Which means that I need to look at Match.Captures instead of Match.Groups,
IIRC.
Note that in most uses of this technique, what you really need to write is
something like:
((?<Numbers> match numbers) match stuff between numbers)+xxxxx
so that the match can continue. You may also need to play around with the
singleline and multiline options.
--
Eric Gunnerson
Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/
This posting is provided "AS IS" with no warranties, and confers no rights.
"Masahiro Ito" <ma**@pleasespamgoaway.it> wrote in message
news:Xn************************@216.196.105.130... I have attached a block of text similar to the type that I am working with.
I have been learning a lot about Regex - it is quite impressive. I can easily capture bits of info, but I keep having trouble with line breaks.
I want to identify the start and end of blocks of text. Are there some tips someone can share?
EG: in my text, I can grab a collection of everyones Phone number with: ^"M:"\t"(?<PhoneNumber>[^"])"
But, what about if I wanted to grab many lines, until it matched a certain pattern. I use the ^ to say not the quote, but can I say not 14 hyphens?
The way I have split this type of data is inefficient. I match all the cases of: ^-{14} Then I use many math equations to split the file using the index of the matches. I am sure Regex must have some way to pattern match a complex not, to indicate the end of my match?
Thank you.
-------------- "M:" "3242310532" "Subscriber Name:" "MR Regex" "Additional line user name:" "" "Sublevel:" " " "Sublevel:" "" "Reference 1:" "" "Reference 2:" ""
"CURRENT CHARGES" "Monthly Service Plan" $40.00 "Additional Local Airtime" $0.00 "Long Distance Charges" $0.00 "Roaming Charges" $0.00 "Network and Licensing Charges" $7.20 "Total Taxes:" $7.09 "Total Current Charges:" $47.20
"MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03 "Service Plan Name" "Total" "Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00 "Total Monthly Service Plan Charges" $40.00
"ADDITIONAL LOCAL AIRTIME" "Service" "Total Mins. Used" "Free Mins. Used" "Included Mins. Used" "Chargeable Mins. Used" "Total" "Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00 "Total Additional Local Airtime Charges" $0.00
"LONG DISTANCE, ROAMING AND OTHER CALL CHARGES" "Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total" "Total Long Distance Charges" $0.00
"ROAMING" "Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes" "Roaming LD Charges" "Roaming Surcharge" "Total" "Total Roaming Charges" $0.00
"WIRELESS WEB - PREMIUM SERVICE" "Service" "Total Events" "Event Type" "Total" "Total Wireless Web Premium Services Charges" $0.00
"PHONE - PREMIUM SERVICE" "Service" "Total Events" "Event Type" "Total" "Total Phone Premium Services Charges" $0.00
"PAGER SERVICES" "Service" "Total Messages" "Included Messages" "Chargeable Messages" "Total" "Total Pager Charges" $0.00
"VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03 "Service" "Total" "Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00 "Total Value Added Service Charges" $0.00
"OTHER CHARGES AND CREDIT" "Charge or Credit" "Total" "Total Other Charges and Credits" $0.00
"NETWORK and LICENSING CHARGES" "Service" "Total" "911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25 "System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95 "Total Network Licensing Charges" $7.20
"TAXES" "" "Total" "Total Taxes" $7.09
-------------- "M:" "9042437121" "Subscriber Name:" "Fred 1" "Additional line user name:" "" "Sublevel:" " " "Sublevel:" "" "Reference 1:" "" "Reference 2:" ""
"CURRENT CHARGES"
Thank you Eric. I was doing a capture group (in my first example using
my sample text I used (?<PhoneNumber>[^"]*) to capture everything until
the next " in my phonenumber collection.
In this simple example, capturing the Field 1 and Field5 value, I cannot
reliably regex the 'everything between numbers'.
My attempt (doesn't work:
Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5>[0-9.$]*)
^trouble^
Field1: 1234
Field2: 34123
Field3: 1313
Field4: 13133
Field5: $xxxx.00
Field6: 2342df
Field1: 2342
Field2: 33241
Field3: 2142
Field4: 543523
Field5: $342.00
Field6: 43254
Field1: 3415
Field2: 234235
Field3: 341
Field4: 13212533
Field5: $5234.00
Field6: 32415
Of course, I can run two separate captures, but...
You gave the example technique : ((?<Numbers> match numbers) match stuff
between numbers)+xxxxx
Does this +xxxxx match everything until the xxxxx is found? In my regex
apps (I use expresso and Regex Workshop as dotnet tools) there are no
matches.
Thanks,
Masa
"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:#O**************@TK2MSFTNGP12.phx.gbl: Yes, you can do it in regex. The trick is to allow your pattern to match more than one time. For example, if I had something like:
1234 34123 11313 113133 xxxxx
I could write something like:
(?<Numbers>^\d+$)+xxxxx
Which means that I need to look at Match.Captures instead of Match.Groups, IIRC.
Note that in most uses of this technique, what you really need to write is something like:
((?<Numbers> match numbers) match stuff between numbers)+xxxxx
so that the match can continue. You may also need to play around with the singleline and multiline options.
I'm a little confused about what you're trying to do. Given the example text
below, what is the expect output that you want?
If I assume that you didn't mean to write xxxx.00 for the Field5 value
below, the following regex may do what you want:
new Regex(@"
(
(?<S2>.*?)
Field1:\s(?<F1>[0-9]*)
(?<S1>.+?)
Field5:\s(?<F5>[0-9.\$]+)
)+",
RegexOption.IgnorePatternWhitespace);
All the F1 values will be in one capture, all the F5 values in the other
capture. I named the S1 and S2 captures so you could see what they're
matching.
I'd suggest using my Regex Workbench at http://www.gotdotnet.com/Community/U...1-4EE2729D7322 -
it makes playing around with Regex much easier.
--
Eric Gunnerson
Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/
This posting is provided "AS IS" with no warranties, and confers no rights.
"Masahiro Ito" <ma**@pleasespamgoaway.it> wrote in message
news:Xn***********************@207.46.248.16... Thank you Eric. I was doing a capture group (in my first example using my sample text I used (?<PhoneNumber>[^"]*) to capture everything until the next " in my phonenumber collection.
In this simple example, capturing the Field 1 and Field5 value, I cannot reliably regex the 'everything between numbers'.
My attempt (doesn't work: Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5>[0-9.$]*) ^trouble^
Field1: 1234 Field2: 34123 Field3: 1313 Field4: 13133 Field5: $xxxx.00 Field6: 2342df Field1: 2342 Field2: 33241 Field3: 2142 Field4: 543523 Field5: $342.00 Field6: 43254 Field1: 3415 Field2: 234235 Field3: 341 Field4: 13212533 Field5: $5234.00 Field6: 32415
Of course, I can run two separate captures, but...
You gave the example technique : ((?<Numbers> match numbers) match stuff between numbers)+xxxxx
Does this +xxxxx match everything until the xxxxx is found? In my regex apps (I use expresso and Regex Workshop as dotnet tools) there are no matches.
Thanks,
Masa "Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in news:#O**************@TK2MSFTNGP12.phx.gbl:
Yes, you can do it in regex. The trick is to allow your pattern to match more than one time. For example, if I had something like:
1234 34123 11313 113133 xxxxx
I could write something like:
(?<Numbers>^\d+$)+xxxxx
Which means that I need to look at Match.Captures instead of Match.Groups, IIRC.
Note that in most uses of this technique, what you really need to write is something like:
((?<Numbers> match numbers) match stuff between numbers)+xxxxx
so that the match can continue. You may also need to play around with the singleline and multiline options.
"Eric Gunnerson [MS]" <er****@online.microsoft.com> wrote in
news:OQ**************@TK2MSFTNGP09.phx.gbl: I'm a little confused about what you're trying to do. Given the example text below, what is the expect output that you want?
If I assume that you didn't mean to write xxxx.00 for the Field5 value below, the following regex may do what you want:
new Regex(@" ( (?<S2>.*?) Field1:\s(?<F1>[0-9]*) (?<S1>.+?) Field5:\s(?<F5>[0-9.\$]+) )+", RegexOption.IgnorePatternWhitespace);
All the F1 values will be in one capture, all the F5 values in the other capture. I named the S1 and S2 captures so you could see what they're matching.
I'd suggest using my Regex Workbench at http://www.gotdotnet.com/Community/U...px?SampleGuid= C712F2DF-B026-4D58-8961-4EE2729D7322 - it makes playing around with Regex much easier.
Thanks Eric. Actually, I was using your Regex Workbench already - it is
great! Thank you for sharing it.
Something is not clicking with me and these regex expressions. Even when I
paste your regex, I don't believe I am getting the responses you intended.
In the sample I posted, I am trying to capture the field 1 and field 5
values. I can capture them separately, but can't seem to grasp the 'skip
everything until a specific pattern is matched'.
I am trying to break down your sample piece by piece. Does the @ at the
start do something?
Also, using Regex Workbench, using your sample in your first reply, I am
not getting any matches.
String:
1234
34123
11313
113133
xxxxx
Regex:
(?<Numbers>^\d+$)+xxxxx
I have tried every permutation I can think of with Multi/single line, etc..
I feel like I am going crazy.
Thank you.
Masa This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Masahiro Ito |
last post by:
I have attached a block of text similar to the type that I am working
with.
I have been learning a lot about Regex - it is quite impressive. I can
easily capture bits of info, but I keep having...
|
by: |
last post by:
I am rewriting a C++ application in C#. This file has a combination of Text
and Binary data.
I used CFile before to read the text. If I hit a certain string that
denotes the following data is...
|
by: Colin Green |
last post by:
Hi I wonder if anyone has any ideas about this...
I am dumping the contents of an XmlDocument into a RichTextBox so the
user can see the raw XML. I use a line of code something like this to
lado...
|
by: Chan |
last post by:
Got in a difficult situation of storing and retrieving TextReader in
Cache, but found no such post yet.
Tried to store a textreader into Cache object in a similar way
illustrated in .NET's SDK...
|
by: Bryan Dickerson |
last post by:
StreamReader says it is designed to read a stream of characters
StringReader says it is designed to read a string
TextReader says it is designed to read a sequential list of characters.
I hate...
|
by: info |
last post by:
Hi All,
How can i rewind the following sile stream:
TextReader tr = new StreamReader(File.Open(fileName, FileMode.Open));
Is there a dedicated method or shall I close and re-open the...
|
by: trint |
last post by:
When testing locally with:
TextReader tr = new StreamReader(@"C:\gcc.set");
Now that I have uploaded this to the server on the net,
that locations permissions are denied.
How can I change this to...
|
by: Rene |
last post by:
Hi,
I decided to take a closer look at the TextWriter and TextReader abstract
classes just for fun.
While poking around, I noticed that the TextWriter class includes an
'Encoding' property in...
|
by: Tony Johansson |
last post by:
Hello!
I just wonder in this specific case is it any advantage to use a
TextReader a reference to a StreamReader ?
Try
{
TextReader tr = new StreamReader(locationTextBox.Text);
Try
{
|
by: Mobious |
last post by:
Hello,
I am currently designing a console app that will run and search our network for any files that contain 16 digit numbers. I'm having to utilise iFilters to properly index each file, which...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| | |