473,806 Members | 2,248 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

regex pro

here's the deal...cvs, tick encapsulted data. trying to use regex's to
validate records. here's an example row:

'AD,'BF','13246 5','06/09/2004','','BNSF' ,'A','TYPE','12 78','','BR','29 99',''
,'LX','','01',' 09','1','','',' ','','','','',' ','CUSTOM JOB CODE TEST'

record type is in the 8th column ('1278'). using regex b/c there are a
miriad of types that cause other data w/n the record (or related records) to
be in/valid. i'm having problems getting a match on the generalization of
the first 7 columns:

something like this:

(?=(?:(?<!','). *(?!=',')',')){ 7}(?:'1278(?=', '))

(?=(?:(?<!','). *(?!=',')',')){ 7} represents the first 7 colums.

if someone can help me generalize that patter, i'd appreciate it very much.

tia,

steve
Nov 20 '05
17 1660
Hi Terry,

Again exactly the same
I dont discard it. I simply dont like it, everytime I look at a complex
regExp, it reminds me of the old days where people used to cram code in so
tight because of memory constraints that it was unreadable, or of die hard C programmers who like to write code which is impossible to undersrtand.

Thats my point. If you dont agree with me, then thats fine.


With the addition that I sometimes think that it is used as a kind of
obfuscating the code for others.

However the last sentence from you (OHM) is as well for me.

Cor
Nov 20 '05 #11
If you have ever processed lots of text like screen scrapping you will end
up using regex. I agree that it is somewhat hard to see what it does but
once you learn it like everything you will wonder why you did not use it
before. But I had to buy a book since resources on the web are sparse and
MS documentation is pretty much non-existant.

Lloyd Sheen

"Cor Ligthert" <no**********@p lanet.nl> wrote in message
news:uk******** *****@TK2MSFTNG P11.phx.gbl...
Hi Terry,

Again exactly the same
I dont discard it. I simply dont like it, everytime I look at a complex
regExp, it reminds me of the old days where people used to cram code in so tight because of memory constraints that it was unreadable, or of die
hard C
programmers who like to write code which is impossible to undersrtand.

Thats my point. If you dont agree with me, then thats fine.


With the addition that I sometimes think that it is used as a kind of
obfuscating the code for others.

However the last sentence from you (OHM) is as well for me.

Cor

Nov 20 '05 #12
Hi Lloyd,

I never say never, however it is not the first choise from me while I not
say forever do not use it.
When you have to do complex text changes in one search through a not
orginized documents I believe there are not much alternatives, however I see
it here often special for simple changes.

Cor
If you have ever processed lots of text like screen scrapping you will end
up using regex. I agree that it is somewhat hard to see what it does but
once you learn it like everything you will wonder why you did not use it
before. But I had to buy a book since resources on the web are sparse and
MS documentation is pretty much non-existant.

Nov 20 '05 #13
| With the addition that I sometimes think that it is used as a kind of
| obfuscating the code for others.

set the regex options to ignore white space and then indent and add comments
w/n the regex. what you have in that case is a programming language. regex
is like a sql statement...but for text. consider what it would take to
program the following:

testString:

'abcdefg', 'lkasjdflk', 'kslthhtjkehslj t', '.zx,mv.zmx', 'abcdefg',
'lkasjdflk', 'kslthhtjkehslj t', '.zx,mv.zmx', 'abcdefg', 'lkasjdflk',
'kslthhtjkehslj t', '.zx,mv.zmx', 'abcdefg', 'lkasjdflk', 'kslthhtjkehslj t',
'.zx,mv.zmx'

your job:

find all ticked strings, then replace the ones that repeat with only one
instance.

well, w/ regex, it is as simple as:

dim regex as new regex("('[^']*?', )(?:'[^']*?', )*(\1)")
testString = regex.replace(t estString, "$1")

if the pattern looks complex, ignore whitespace and add your comments and/or
indenting or whatever. pretty simple.

but again, to each their own.
Nov 20 '05 #14
Hi Steve,
so, both C and regex are not your friends? ;^)


Although I wrote that there are circumstances that I do not ommit things as
regex forever, I get the idea that you try (without directly saying it
however with that single ;^) to hit the knowledge of OHM and me.

Try this

Dim teststring1 As String =
"'abcdefg','lka sjdflk','kslthh tjkehsljt','.zx ,mv.zmx','abcde fg'," & _
"'lkasjdflk','k slthhtjkehsljt' ,'.zx,mv.zmx',' abcdefg','lkasj dflk'," & _
"'kslthhtjkehsl jt','.zx,mv.zmx ','lkasjdflk',' abcdefg','lkasj dflk','kslthhtj k
ehsljt'," & _
"'.zx,mv.zm x'"
Dim start As Integer = Environment.Tic kCount
Dim teststring2 As String
For i As Integer = 0 To 10000
Dim regex As New
System.Text.Reg ularExpressions .Regex("('[^']*?',)(?:'[^']*?',)*(\1)")
teststring2 = regex.Replace(t eststring1, "$1")
Next
Console.Write(t eststring2 & "time: " & _
(Environment.Ti ckCount - start).ToString & vbCrLf)
start = Environment.Tic kCount
Dim teststring3 As String

For i As Integer = 0 To 10000
Dim sb As System.Text.Str ingBuilder
sb = New System.Text.Str ingBuilder
Dim sp As String() = Split(teststrin g1, "','")
sp(0) = sp(0).Substring (1)
sp(sp.Length - 1) = sp(sp.Length - 1).Substring(0, _
sp(sp.Length - 1).Length - 1)
For Each da1 As String In sp
Dim da2 As String = "'" & da1 & "'"
If sb.ToString.Ind exOf(da2) = -1 Then
sb.Append(da2)
sb.Append(",")
End If
Next
teststring3 = sb.ToString.Sub string(0, sb.ToString.Len gth - 1)
Next
Console.Write(t eststring3 & "time: " & _
(Environment.Ti ckCount - start).ToString & vbCrLf)

You will see that the second one without the regex is 4 times faster and
gives the same result.
About the style and code we can discus if the second as well reach a kind of
obfuscating style, however that is for me the same as with the Regex sample.

Cor
Nov 20 '05 #15
'C' was a great language as far as I am concerned, I used it for several
years quite happily, so I would accredit myself with a reasonable
understanding of it.

However, I personally know two programmers who used to write code in way
which was designed to demonstrate ( to the knowledgeable eye ) their
expertise in understanding the nuances of the compiler. I've seen a piece of
code which printf'ed a poem which no obvious source. All of it looked like
garbage, not unlike a RegExp.

--

OHM ( Terry Burns )
. . . One-Handed-Man . . .
"steve" <a@b.com> wrote in message
news:10******** *****@corp.supe rnews.com...
this just hit me as funny logical consequence...

|die hard C
| programmers who like to write code which is impossible to undersrtand.

punctuated differently:

"die hard C programmers who like to write code, which is impossible to
understand."

the are only two logical assumptions...o nly die hard C programmers like to
write code that is impossible to understand...or ...ockham's razor - any code can be made very hard to understand; few, if any, programmers of any
language "like" to write hard to understand code; understanding is inference from experience; therefore, it is simplest to say that the language of C is impossible for you to understand (since understanding is wholly an
individual endeavor). and, as we all know, this simplest answer or solution is more oft' than not, the correct one.

so, both C and regex are not your friends? ;^)

just playing w/ you ohm.

cheers.

Nov 20 '05 #16
i was kidding w/ ohm.

ahhhh...i wonder what the difference w/b if the data itself contained no
identifying marker by which you could perform a nifty split operation? that
would require you to rewrite your entire function...i'd just have to change
the pattern in my example and still be left w/ two lines of code to
maintain. the regex pattern itself is no longer considered obfuscation if,
like i said, one were to place comments w/n it (just as you would w/ any
language).

but i digress...i can see my humor was lost to offense. appologies s/b
applied where they are needed to both you and ohm.

later,

steve
"Cor Ligthert" <no**********@p lanet.nl> wrote in message
news:eW******** ******@TK2MSFTN GP10.phx.gbl...
| Hi Steve,
|
| > so, both C and regex are not your friends? ;^)
|
| Although I wrote that there are circumstances that I do not ommit things
as
| regex forever, I get the idea that you try (without directly saying it
| however with that single ;^) to hit the knowledge of OHM and me.
|
| Try this
|
| Dim teststring1 As String =
| "'abcdefg','lka sjdflk','kslthh tjkehsljt','.zx ,mv.zmx','abcde fg'," & _
| "'lkasjdflk','k slthhtjkehsljt' ,'.zx,mv.zmx',' abcdefg','lkasj dflk'," & _
|
"'kslthhtjkehsl jt','.zx,mv.zmx ','lkasjdflk',' abcdefg','lkasj dflk','kslthhtj k
| ehsljt'," & _
| "'.zx,mv.zm x'"
| Dim start As Integer = Environment.Tic kCount
| Dim teststring2 As String
| For i As Integer = 0 To 10000
| Dim regex As New
| System.Text.Reg ularExpressions .Regex("('[^']*?',)(?:'[^']*?',)*(\1)")
| teststring2 = regex.Replace(t eststring1, "$1")
| Next
| Console.Write(t eststring2 & "time: " & _
| (Environment.Ti ckCount - start).ToString & vbCrLf)
| start = Environment.Tic kCount
| Dim teststring3 As String
|
| For i As Integer = 0 To 10000
| Dim sb As System.Text.Str ingBuilder
| sb = New System.Text.Str ingBuilder
| Dim sp As String() = Split(teststrin g1, "','")
| sp(0) = sp(0).Substring (1)
| sp(sp.Length - 1) = sp(sp.Length - 1).Substring(0, _
| sp(sp.Length - 1).Length - 1)
| For Each da1 As String In sp
| Dim da2 As String = "'" & da1 & "'"
| If sb.ToString.Ind exOf(da2) = -1 Then
| sb.Append(da2)
| sb.Append(",")
| End If
| Next
| teststring3 = sb.ToString.Sub string(0, sb.ToString.Len gth - 1)
| Next
| Console.Write(t eststring3 & "time: " & _
| (Environment.Ti ckCount - start).ToString & vbCrLf)
|
| You will see that the second one without the regex is 4 times faster and
| gives the same result.
| About the style and code we can discus if the second as well reach a kind
of
| obfuscating style, however that is for me the same as with the Regex
sample.
|
| Cor
|
|
Nov 20 '05 #17
No appology needed, I was smirking while both reading and replying. RegExp
does have its place as you say.

Cheers

OHM ( Terry Burns )
. . . One-Handed-Man . . .
"steve" <a@b.com> wrote in message
news:10******** *****@corp.supe rnews.com...
i was kidding w/ ohm.

ahhhh...i wonder what the difference w/b if the data itself contained no
identifying marker by which you could perform a nifty split operation? that would require you to rewrite your entire function...i'd just have to change the pattern in my example and still be left w/ two lines of code to
maintain. the regex pattern itself is no longer considered obfuscation if,
like i said, one were to place comments w/n it (just as you would w/ any
language).

but i digress...i can see my humor was lost to offense. appologies s/b
applied where they are needed to both you and ohm.

later,

steve
"Cor Ligthert" <no**********@p lanet.nl> wrote in message
news:eW******** ******@TK2MSFTN GP10.phx.gbl...
| Hi Steve,
|
| > so, both C and regex are not your friends? ;^)
|
| Although I wrote that there are circumstances that I do not ommit things
as
| regex forever, I get the idea that you try (without directly saying it
| however with that single ;^) to hit the knowledge of OHM and me.
|
| Try this
|
| Dim teststring1 As String =
| "'abcdefg','lka sjdflk','kslthh tjkehsljt','.zx ,mv.zmx','abcde fg'," & _
| "'lkasjdflk','k slthhtjkehsljt' ,'.zx,mv.zmx',' abcdefg','lkasj dflk'," & _
|
"'kslthhtjkehsl jt','.zx,mv.zmx ','lkasjdflk',' abcdefg','lkasj dflk','kslthhtj k | ehsljt'," & _
| "'.zx,mv.zm x'"
| Dim start As Integer = Environment.Tic kCount
| Dim teststring2 As String
| For i As Integer = 0 To 10000
| Dim regex As New
| System.Text.Reg ularExpressions .Regex("('[^']*?',)(?:'[^']*?',)*(\1)")
| teststring2 = regex.Replace(t eststring1, "$1")
| Next
| Console.Write(t eststring2 & "time: " & _
| (Environment.Ti ckCount - start).ToString & vbCrLf)
| start = Environment.Tic kCount
| Dim teststring3 As String
|
| For i As Integer = 0 To 10000
| Dim sb As System.Text.Str ingBuilder
| sb = New System.Text.Str ingBuilder
| Dim sp As String() = Split(teststrin g1, "','")
| sp(0) = sp(0).Substring (1)
| sp(sp.Length - 1) = sp(sp.Length - 1).Substring(0, _
| sp(sp.Length - 1).Length - 1)
| For Each da1 As String In sp
| Dim da2 As String = "'" & da1 & "'"
| If sb.ToString.Ind exOf(da2) = -1 Then
| sb.Append(da2)
| sb.Append(",")
| End If
| Next
| teststring3 = sb.ToString.Sub string(0, sb.ToString.Len gth - 1)
| Next
| Console.Write(t eststring3 & "time: " & _
| (Environment.Ti ckCount - start).ToString & vbCrLf)
|
| You will see that the second one without the regex is 4 times faster and
| gives the same result.
| About the style and code we can discus if the second as well reach a kind of
| obfuscating style, however that is for me the same as with the Regex
sample.
|
| Cor
|
|

Nov 20 '05 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2088
by: Jon Maz | last post by:
Hi All, Am getting frustrated trying to port the following (pretty simple) function to CSharp. The problem is that I'm lousy at Regular Expressions.... //from http://support.microsoft.com/default.aspx?scid=kb;EN-US;246800 function fxnParseIt() { var sInputString = 'asp and database';
9
4593
by: Tim Conner | last post by:
Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }
20
8122
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex matches are called within a loop (like if or for). E.g. for(int i = 0; i < 10; i++) { Regex r = new Regex();
17
3984
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher http://forta.com/books/0672325667/
6
2505
by: Extremest | last post by:
I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all the right parts in there in one big line and for some reason that did not work either. Here is my regex's. static Regex rar = new Regex("\\.part.*", RegexOptions.IgnoreCase); static Regex par = new Regex("\\.vol.*", RegexOptions.IgnoreCase);
7
2592
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward slash then some numbers. For some reason I am not getting that. It won't work at all in 2.0
3
2707
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def RN(name, regex): """protect using () and give an optional name to a regex""" if name:
15
50273
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
4
2673
by: CJ | last post by:
Is this the format to parse a string and return the value between the item? Regex pRE = new Regex("<File_Name>.*>(?<insideText>.*)</File_Name>"); I am trying to parse this string. <File_Name>Services</File_Name> Thanks
0
1738
by: Karch | last post by:
I have these two methods that are chewing up a ton of CPU time in my application. Does anyone have any suggestions on how to optimize them or rewrite them without Regex? The most time-consuming operation by a long-shot is the regex.Replace. Basically the only purpose of it is to remove spaces between opening/closing tags and the element name. Surely there is a better way. private string FixupJavascript(string htmlCode) { string result...
0
10617
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10364
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10370
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9186
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7649
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6876
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5545
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3849
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3008
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.