By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,561 Members | 3,010 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,561 IT Pros & Developers. It's quick & easy.

Regex Issues

P: n/a
I have the following System.Text.RegularExpressions.Regex that is supposed
to remove this predefined list of garbage characters from contact names that
come in on import files :

Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")

We use it like this:

value = _dropContactGarbage.Replace(value, "")

But the Regex constructor is throwing an ArgumentException whose Message
property says only "Parse ([". There is no inner exception. Normally, if I
have a string expression that's wrong, I would Console.WriteLine() it. But
in this case, it doesn't WriteLine correctly, because some of the characters
in the expression are control characters, so what it displays is not
visually correct.

I have slaved over this issue for hours and hours and I can only guess that
one of the items must be escaped with a "\" or something, but I cannot
figure it out. I have already been all over the MSDN help topics for the
Regex Class.

Help?

--
Peace & happy computing,

Mike Labosh, MCSD
"After very careful consideration, I have come
to the conclusion that this new system SUCKS"
-- General Barringer, from WARGAMES
Nov 21 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
I think you're having problems with all those &'s between the brakets.
Maybe?

"([Chr(0) & "-" & Chr(31) ]+)|"
"Mike Labosh" <ml*****@hotmail.com> wrote in message
news:en**************@TK2MSFTNGP10.phx.gbl...
I have the following System.Text.RegularExpressions.Regex that is supposed
to remove this predefined list of garbage characters from contact names
that come in on import files :

Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")

We use it like this:

value = _dropContactGarbage.Replace(value, "")

But the Regex constructor is throwing an ArgumentException whose Message
property says only "Parse ([". There is no inner exception. Normally, if
I have a string expression that's wrong, I would Console.WriteLine() it.
But in this case, it doesn't WriteLine correctly, because some of the
characters in the expression are control characters, so what it displays
is not visually correct.

I have slaved over this issue for hours and hours and I can only guess
that one of the items must be escaped with a "\" or something, but I
cannot figure it out. I have already been all over the MSDN help topics
for the Regex Class.

Help?

--
Peace & happy computing,

Mike Labosh, MCSD
"After very careful consideration, I have come
to the conclusion that this new system SUCKS"
-- General Barringer, from WARGAMES

Nov 21 '05 #2

P: n/a
>> Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")
I think you're having problems with all those &'s between the brakets.
Maybe?

"([Chr(0) & "-" & Chr(31) ]+)|"


No. It's not the concatenation. The dumb thing *used to work*, but they
want me to change a couple of the character ranges. So all I have done is
changed a couple of the character codes passed to the Chr() function. Now
it's b0rken.

What I am currently attempting is to create a Regex from each single line of
my OP so I can find which one is causing the issue, then perhaps I can
determine a workaround.
--
Peace & happy computing,

Mike Labosh, MCSD
"Musha ring dum a doo dum a da!" -- James Hetfield
Nov 21 '05 #3

P: n/a
jg
make sure the new character code does not special meaning in regex. IF they
do, use the escape prefix before the " & chr...

Sorry, I don't know the details, but I am sure you can look it up in msdn
under regex

"Mike Labosh" <ml*****@hotmail.com> wrote in message
news:e3**************@TK2MSFTNGP15.phx.gbl...
Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")

I think you're having problems with all those &'s between the brakets.
Maybe?

"([Chr(0) & "-" & Chr(31) ]+)|"


No. It's not the concatenation. The dumb thing *used to work*, but they
want me to change a couple of the character ranges. So all I have done is
changed a couple of the character codes passed to the Chr() function. Now
it's b0rken.

What I am currently attempting is to create a Regex from each single line
of my OP so I can find which one is causing the issue, then perhaps I can
determine a workaround.
--
Peace & happy computing,

Mike Labosh, MCSD
"Musha ring dum a doo dum a da!" -- James Hetfield

Nov 21 '05 #4

P: n/a
> make sure the new character code does not special meaning in regex. IF
they do, use the escape prefix before the " & chr...
That's what I'm trying to do. Each line of the expression seems to work by
itself. So I am now trying varying combinations.
Sorry, I don't know the details, but I am sure you can look it up in msdn
under regex


heh. If you saw all the MSDN printouts on my desk, you would hurt me for
killing trees :)

--
Peace & happy computing,

Mike Labosh, MCSD
"Musha ring dum a doo dum a da!" -- James Hetfield
Nov 21 '05 #5

P: n/a
It barks at me until I remove this line:
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _

I'm not sure why.

"Mike Labosh" <ml*****@hotmail.com> wrote in message
news:en**************@TK2MSFTNGP10.phx.gbl...
I have the following System.Text.RegularExpressions.Regex that is supposed
to remove this predefined list of garbage characters from contact names
that come in on import files :

Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")

We use it like this:

value = _dropContactGarbage.Replace(value, "")

But the Regex constructor is throwing an ArgumentException whose Message
property says only "Parse ([". There is no inner exception. Normally, if
I have a string expression that's wrong, I would Console.WriteLine() it.
But in this case, it doesn't WriteLine correctly, because some of the
characters in the expression are control characters, so what it displays
is not visually correct.

I have slaved over this issue for hours and hours and I can only guess
that one of the items must be escaped with a "\" or something, but I
cannot figure it out. I have already been all over the MSDN help topics
for the Regex Class.

Help?

--
Peace & happy computing,

Mike Labosh, MCSD
"After very careful consideration, I have come
to the conclusion that this new system SUCKS"
-- General Barringer, from WARGAMES

Nov 21 '05 #6

P: n/a
parsing "([>-Y]+)|" - [x-y] range in reverse order.

"Chris Burgess" <cb******@converse.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
It barks at me until I remove this line:
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _

I'm not sure why.

"Mike Labosh" <ml*****@hotmail.com> wrote in message
news:en**************@TK2MSFTNGP10.phx.gbl...
I have the following System.Text.RegularExpressions.Regex that is supposed
to remove this predefined list of garbage characters from contact names
that come in on import files :

Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")

We use it like this:

value = _dropContactGarbage.Replace(value, "")

But the Regex constructor is throwing an ArgumentException whose Message
property says only "Parse ([". There is no inner exception. Normally,
if I have a string expression that's wrong, I would Console.WriteLine()
it. But in this case, it doesn't WriteLine correctly, because some of the
characters in the expression are control characters, so what it displays
is not visually correct.

I have slaved over this issue for hours and hours and I can only guess
that one of the items must be escaped with a "\" or something, but I
cannot figure it out. I have already been all over the MSDN help topics
for the Regex Class.

Help?

--
Peace & happy computing,

Mike Labosh, MCSD
"After very careful consideration, I have come
to the conclusion that this new system SUCKS"
-- General Barringer, from WARGAMES


Nov 21 '05 #7

P: n/a
Mike,
Rather then literally using Chr(0), Chr(31), Chr(33), ..., I would recommend
the RegEx Character Escape sequences.

http://msdn.microsoft.com/library/de...terescapes.asp

Something like:

' With ASCII character escapes
Dim _dropContactGarbage As New Regex( _
"([\x00-\x1F]+)|" & _
"([\x21-\x26]+)|" & _
"([\x28-\x2C]+)|" & _
...

Of course you may have problems with Chr(128) & above, as Chr(128) is an
ANSI char code, while Regex expects ASCII and/or Unicode. As you know ASCII
is 7 bit (0 to 127) & Unicode in RegEx needs 4 digits (\u0000).

' with Unicode character escapes
Dim _dropContactGarbage As New Regex( _
"([\u0000-\u001F]+)|" & _
"([\u0021-\u0026]+)|" & _
"([\u0028-\u002C]+)|" & _
It might be "easier" if you used a the predefined character classes (\s \w
\W \s ...) instead:

http://msdn.microsoft.com/library/de...terclasses.asp

Something like:
Dim _dropContactGarbage As New Regex("\W")

Which says match any nonword character...

Expresso & RegEx Workbench both have wizards of varying degrees to help you
build your expression, plus they allow you to test your expressions, also
the analyzer/interpreter in each is rather handy.

Expresso:
http://www.ultrapico.com/Expresso.htm

RegEx Workbench:
http://www.gotdotnet.com/Community/U...-4ee2729d7322A

tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/de...geElements.asp

Hope this helps
Jay

"Mike Labosh" <ml*****@hotmail.com> wrote in message
news:en**************@TK2MSFTNGP10.phx.gbl...
|I have the following System.Text.RegularExpressions.Regex that is supposed
| to remove this predefined list of garbage characters from contact names
that
| come in on import files :
|
| Dim _dropContactGarbage As New Regex( _
| "([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
| "([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
| "([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
| "([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
| "([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
| "([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
| "([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
| "([" & Chr(152) & "]+)|" & _
| "([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
| "([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
| "([" & Chr(226) & "-" & Chr(255) & "]+)")
|
| We use it like this:
|
| value = _dropContactGarbage.Replace(value, "")
|
| But the Regex constructor is throwing an ArgumentException whose Message
| property says only "Parse ([". There is no inner exception. Normally, if
I
| have a string expression that's wrong, I would Console.WriteLine() it.
But
| in this case, it doesn't WriteLine correctly, because some of the
characters
| in the expression are control characters, so what it displays is not
| visually correct.
|
| I have slaved over this issue for hours and hours and I can only guess
that
| one of the items must be escaped with a "\" or something, but I cannot
| figure it out. I have already been all over the MSDN help topics for the
| Regex Class.
|
| Help?
|
| --
| Peace & happy computing,
|
| Mike Labosh, MCSD
| "After very careful consideration, I have come
| to the conclusion that this new system SUCKS"
| -- General Barringer, from WARGAMES
|
|
Nov 21 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.