473,387 Members | 1,512 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Parsing using RE?

Hello all

I have a huge string that I need to parse

Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim3>

Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim3>

repeat for a couple hundred thousand times

The <Delim1> seprates the Key, Value pair
<Delim2> seprates two different Key,Value pairs
<Delim2> seprates records.

I need to get the Key Value pairs and populate a table with that
information.

Would the .NET regular expressions be worth while and how would I go
about doing it in a clean optimized fashion.
Thanks

-Ravi Singh

Nov 16 '05 #1
9 1452
yes, definately.

you'll need to write you own reg exp tho

i'd recommend using an app called expresso. free reg exp tester/builder.
http://www.ultrapico.com/Expresso.htm

if all the delimiters are unique definately use a reg exp. else, you'll be
looping "while (str.indexOf("<Delim")) { ..." etc.
using regular expression to find matches would be much quicker, and return
array of matches (and fields)

if you get stuck, repost.

HTH
sam
"Ravi Singh (UCSD)" <ra*********@gmail.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
Hello all

I have a huge string that I need to parse

Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim3>

Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim3>

repeat for a couple hundred thousand times

The <Delim1> seprates the Key, Value pair
<Delim2> seprates two different Key,Value pairs
<Delim2> seprates records.

I need to get the Key Value pairs and populate a table with that
information.

Would the .NET regular expressions be worth while and how would I go
about doing it in a clean optimized fashion.
Thanks

-Ravi Singh

Nov 16 '05 #2
string input = "Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2>
Key <Delim1> Value <Delim3>Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim2> Key <Delim1> Value <Delim3>";

Regex delim1 = new Regex("<Delim1>");
Regex delim2 = new Regex("<Delim2>");
Regex delim3 = new Regex("<Delim3>");

string[] rets3 = delim3.Split(input);
string[] rets2 = delim2.Split(String.Concat(rets3));
string[] rets1 = delim1.Split(String.Concat(rets2));

rets 2 and rets 1 is not what I expect it to be. =(. any ideas?

Thanks
-Ravi.

Nov 16 '05 #3
I got it :-)

Thanks

Nov 16 '05 #4
Could you post the solution so we can see it? It might help someone
else in the same situation some day.

Nov 16 '05 #5
Ravi,
In addition to the other comments.

You could use a While loop with Match.NextMatch.

Something like:

string pattern = @"(?<key>\w+)=(?<value>\w+)(:;|)";
string input = "a=1;b=2;c=3;d=4;e=5;";

Regex parser = new Regex(pattern, RegexOptions.Compiled);

Match match = parser.Match(input);
while (match.Success)
{
Debug.WriteLine(match.Groups["key"], "key");
Debug.WriteLine(match.Groups["value"], "value");
match = match.NextMatch();
}

Where "=" is Delim1 & ";" is Delim2, depending on how important Delim3 is I
would consider using String.SubString to extract the input upto Delim3 then
use the above code...

Hope this helps
Jay

"Ravi Singh (UCSD)" <ra*********@gmail.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
Hello all

I have a huge string that I need to parse

Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim3>

Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim3>

repeat for a couple hundred thousand times

The <Delim1> seprates the Key, Value pair
<Delim2> seprates two different Key,Value pairs
<Delim2> seprates records.

I need to get the Key Value pairs and populate a table with that
information.

Would the .NET regular expressions be worth while and how would I go
about doing it in a clean optimized fashion.
Thanks

-Ravi Singh

Nov 16 '05 #6
string input = "Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2>
Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim2> Key <Delim1> Value <Delim3>";

Regex delim1 = new Regex("<Delim1>");
Regex delim2 = new Regex("<Delim2>");
Regex delim3 = new Regex("<Delim3>");

string[] rets3 = delim3.Split(input);
string[] rets2 = delim2.Split(String.Concat(rets3));
string[] rets1 = delim1.Split(String.Concat(rets2));

There it is I concat it, however a join might be more appropriate.

Thanks

Nov 16 '05 #7
Ravi Singh (UCSD) wrote:

Would the .NET regular expressions be worth while and how would I go
about doing it in a clean optimized fashion.

RegEx? I'd use PERL :)

hjf
Nov 16 '05 #8
Why not use string.split, it should be faster and easier to implement.

"Ravi Singh (UCSD)" <ra*********@gmail.com> wrote in message
news:11**********************@g14g2000cwa.googlegr oups.com...
string input = "Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2>
Key <Delim1> Value <Delim2> Key <Delim1> Value <Delim2> Key <Delim1>
Value <Delim2> Key <Delim1> Value <Delim3>";

Regex delim1 = new Regex("<Delim1>");
Regex delim2 = new Regex("<Delim2>");
Regex delim3 = new Regex("<Delim3>");

string[] rets3 = delim3.Split(input);
string[] rets2 = delim2.Split(String.Concat(rets3));
string[] rets1 = delim1.Split(String.Concat(rets2));

There it is I concat it, however a join might be more appropriate.

Thanks

Nov 16 '05 #9
Here's a little snippet I wrote to do this kind of thing with just 2
delimiters, one to separate the key-value pairs and another to split
apart each actual pair. Since both delimiters are arrays however, you
can specify any number of different delimiters, so in your case you may
have outerDelimiters == { "<Delim2>", "<Delim3>" } ... if I understand
correctly what it is you are after.

Though I haven't tested it, I'm pretty sure the String.Split method
will be much faster than using Regular Expressions; even a simple RE
requires the costly construction of some internal data structures to do
the job, and the RE routines will at least have to do everything that
String.Split() has to do anyway. However if your delimiters are not
predictable recurring strings, RE would be a better way.

The code:

================================================== ==
public class NameValueCollectionEx : NameValueCollection
{
public void LoadFrom(string source, string[] outerDelimiters,
string[] innerDelimiters)
{
// using this constructor is due to be obsoleted in .NET 2.0,
// use StringSplitOptions enum instead
string[] pairs = source.Split(outerDelimiters, true);

foreach ( string pair in pairs ) {
string[] elements = pair.Split(innerDelimiters, 2, true);
this.Add(elements[0], elements[1]);
}
}
}
================================================== ==

I don't think you can get things a whole lot more optimized than this.
Though if anyone feels inspired to do a performance comparison vs. RE,
I'd be interested in seeing the results.

Joel

Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
0
by: Pentti | last post by:
Can anyone help to understand why re-parsing occurs on a remote database (using database links), even though we are using a prepared statement on the local database: Scenario: ======== We...
5
by: gamehack | last post by:
Hi all, I was thinking about parsing equations but I can't think of any generic approach. Basically I have a struct called math_term which is something like: struct math_term { char sign; int...
5
by: randy | last post by:
Can some point me to a good example of parsing XML using C# 2.0? Thanks
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
3
by: Anup Daware | last post by:
Hi Group, I am facing a strange problem here: I am trying to read xml response from a servlet using XmlTextWriter. I am able to read the read half of the xml and suddenly an exception:...
6
by: jackwootton | last post by:
Hello everyone, I understand that XML can be parsed using JavaScript using the XML Document object. However, it is possible to parse XHTML using JavaScript? I currently listen for DOMMutation...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
6
by: gw7rib | last post by:
I have a program that needs to do a small amount of relatively simple parsing. The routines I've written work fine, but the code using them is a bit long-winded. I therefore had the idea of...
1
by: eyeore | last post by:
Hello everyone my String reverse code works but my professor wants me to use pop top push or Stack code and parsing code could you please teach me how to make this code work with pop top push or...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.