By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,310 Members | 2,028 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,310 IT Pros & Developers. It's quick & easy.

Text Parsing with Qualifiers

P: n/a
Hi all,

Does anyone know of a GOOD example on parsing text with text qualifiers?

I am hoping to parse text with variable length delimiters/qualifiers. Also,
qualified text could run onto mulitple lines and contain characters like
vbcrlf (thus the multiple lines).

Anyhow, any help would be appreciated. Thanks!

--
Lucas Tam (RE********@rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
Nov 20 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Nak
> Does anyone know of a GOOD example on parsing text with text qualifiers?

What exactly do you mean by text qualifiers? Characters?

Parsing strings in .NET has become even easier than in VB6, and it's
certainly easier than C++. Have you taken a look at the String class? It
contains many methods for maniplating strings

http://msdn.microsoft.com/library/de...classtopic.asp

You can even look into regular expressions for examining strings for
patterns. They are a bit fiddly to get the hang of to start with but they
are very very useful. I recently changed an HTML parsing routine that I had
for a regular expression alternative and the code size has bee dramatically
reduced.

http://msdn.microsoft.com/library/de...classtopic.asp

Anyway I hope this information can help you :-) If you let me know a little
bit more about what kind of strings you are wanting to maniplate I might be
able to give you some more tips.

Nick.

--
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
"No matter. Whatever the outcome, you are changed."

Fergus - September 5th 2003
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
"Lucas Tam" <RE********@rogers.com> wrote in message
news:Xn***************************@140.99.99.130.. .
Hi all,

I am hoping to parse text with variable length delimiters/qualifiers. Also, qualified text could run onto mulitple lines and contain characters like
vbcrlf (thus the multiple lines).

Anyhow, any help would be appreciated. Thanks!

--
Lucas Tam (RE********@rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/

Nov 20 '05 #2

P: n/a
"Nak" <a@a.com> wrote in news:O0**************@TK2MSFTNGP12.phx.gbl:
Does anyone know of a GOOD example on parsing text with text qualifiers?


What exactly do you mean by text qualifiers? Characters?


Ya, I am hoping to parse strings like:
"""This is a Quote""",01/01/2003,"Some Interesting Text, Here"

etc etc.

I've seen sample code that only handles single character delimiters/text
qualifiers, but I am hoping to find code that can handle any length text
qualifier/delimiters.

It's not TOO hard to parse such text, but if someone else has already
written some good code, might as well use it.

--
Lucas Tam (RE********@rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
Nov 20 '05 #3

P: n/a
Lucas Tam wrote:
"Nak" <a@a.com> wrote in news:O0**************@TK2MSFTNGP12.phx.gbl:
Does anyone know of a GOOD example on parsing text with text qualifiers?
What exactly do you mean by text qualifiers? Characters?


Ya, I am hoping to parse strings like:
"""This is a Quote""",01/01/2003,"Some Interesting Text, Here"

etc etc.

I've seen sample code that only handles single character delimiters/text
qualifiers, but I am hoping to find code that can handle any length text
qualifier/delimiters.


So you really mean something like:

quotequotequoteThis is a Quotequotequotequotecomma01/01/2003commaquoteSome
Interesting Textcomma Herequote

( :-) )
It's not TOO hard to parse such text, but if someone else has already
written some good code, might as well use it.


If anyone hase some public VB (.NET or otherwise) code for generic handling of
this sort of thing, I'd like to see it too, but in .NET the best option is a
custom Regex, probably with extra code to handle context.

--
Regards,
Mark Hurd, B.Sc.(Ma.) (Hons.)
Nov 20 '05 #4

P: n/a
Mark Hurd wrote:
Lucas Tam wrote:
"Nak" <a@a.com> wrote in news:O0**************@TK2MSFTNGP12.phx.gbl:
> Does anyone know of a GOOD example on parsing text with text
> qualifiers?

What exactly do you mean by text qualifiers? Characters?


Ya, I am hoping to parse strings like:
"""This is a Quote""",01/01/2003,"Some Interesting Text, Here"

etc etc.

I've seen sample code that only handles single character delimiters/text
qualifiers, but I am hoping to find code that can handle any length text
qualifier/delimiters.


So you really mean something like:

quotequotequoteThis is a Quotequotequotequotecomma01/01/2003commaquoteSome
Interesting Textcomma Herequote

( :-) )
It's not TOO hard to parse such text, but if someone else has already
written some good code, might as well use it.


If anyone hase some public VB (.NET or otherwise) code for generic handling
of this sort of thing, I'd like to see it too, but in .NET the best option
is a custom Regex, probably with extra code to handle context.


I should add: if you're talking about parsing anything more complex, you
should look at .NET versions of lex and yacc, etc.

--
Regards,
Mark Hurd, B.Sc.(Ma.) (Hons.)
Nov 20 '05 #5

P: n/a
"Mark Hurd" <ma******@ozemail.com.au> wrote in
news:#V**************@TK2MSFTNGP09.phx.gbl:
I've seen sample code that only handles single character
delimiters/text qualifiers, but I am hoping to find code that can
handle any length text qualifier/delimiters.


So you really mean something like:

quotequotequoteThis is a
Quotequotequotequotecomma01/01/2003commaquoteSome Interesting
Textcomma Herequote


Exactly! I'm trying to build an import routine that is as flexible as
possible. Who knows, maybe someone does use odd delimters like that : )

--
Lucas Tam (RE********@rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
Nov 20 '05 #6

P: n/a
"Mark Hurd" <ma******@ozemail.com.au> wrote in news:eC6NYQ0eDHA.3248
@tk2msftngp13.phx.gbl:
I should add: if you're talking about parsing anything more complex, you
should look at .NET versions of lex and yacc, etc.


Ah, I used Yacc briefly with Java. I didn't know it existed with .NET.
Thanks for the tip!

--
Lucas Tam (RE********@rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
Nov 20 '05 #7

P: n/a
Lucas Tam wrote:
"Mark Hurd" <ma******@ozemail.com.au> wrote in
news:#V**************@TK2MSFTNGP09.phx.gbl:
I've seen sample code that only handles single character
delimiters/text qualifiers, but I am hoping to find code that can
handle any length text qualifier/delimiters.


So you really mean something like:

quotequotequoteThis is a
Quotequotequotequotecomma01/01/2003commaquoteSome Interesting
Textcomma Herequote


Exactly! I'm trying to build an import routine that is as flexible as
possible. Who knows, maybe someone does use odd delimters like that : )


When I posed the "comma" separated values example I was going to provide a
Regex for it, but at the time I didn't have enough time...

Here it is:

((((quote)(?<quoted>(([^q])|(q[^u])|(qu[^o])|(quo[^t])|(quot[^e])|((quote)(quo
te)))*)(quote)))|(?<unquoted>(([^c])|(c[^o])|(co[^m])|(com[^m])|(comm[^a]))*))
((comma)|$)

I've put in a couple of pairs of brackets to highlight how this could be
produced by an automated generator...

The intended use of the above regex is to loop through all matches, checking
there are no unmatched gaps - syntax errors -- and ignoring the null match at
the end of the string. The <quoted> group needs to have quotequote reduced to
quote -- .Replace "quotequote" "quote" -- and only on of <quoted> or
<unquoted> should have any content.

Can someone confirm whether there's an optimisation for this Regex using the
extended grouping features?
--
Regards,
Mark Hurd, B.Sc.(Ma.) (Hons.)
Nov 20 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.