here's the deal...cvs, tick encapsulted data. trying to use regex's to
validate records. here's an example row:
'AD,'BF','13246 5','06/09/2004','','BNSF' ,'A','TYPE','12 78','','BR','29 99',''
,'LX','','01',' 09','1','','',' ','','','','',' ','CUSTOM JOB CODE TEST'
record type is in the 8th column ('1278'). using regex b/c there are a
miriad of types that cause other data w/n the record (or related records) to
be in/valid. i'm having problems getting a match on the generalization of
the first 7 columns:
something like this:
(?=(?:(?<!','). *(?!=',')',')){ 7}(?:'1278(?=', '))
(?=(?:(?<!','). *(?!=',')',')){ 7} represents the first 7 colums.
if someone can help me generalize that patter, i'd appreciate it very much.
tia,
steve
Nov 20 '05
17 1660
Hi Terry,
Again exactly the same I dont discard it. I simply dont like it, everytime I look at a complex regExp, it reminds me of the old days where people used to cram code in so tight because of memory constraints that it was unreadable, or of die hard
C programmers who like to write code which is impossible to undersrtand.
Thats my point. If you dont agree with me, then thats fine.
With the addition that I sometimes think that it is used as a kind of
obfuscating the code for others.
However the last sentence from you (OHM) is as well for me.
Cor
If you have ever processed lots of text like screen scrapping you will end
up using regex. I agree that it is somewhat hard to see what it does but
once you learn it like everything you will wonder why you did not use it
before. But I had to buy a book since resources on the web are sparse and
MS documentation is pretty much non-existant.
Lloyd Sheen
"Cor Ligthert" <no**********@p lanet.nl> wrote in message
news:uk******** *****@TK2MSFTNG P11.phx.gbl... Hi Terry,
Again exactly the same
I dont discard it. I simply dont like it, everytime I look at a complex regExp, it reminds me of the old days where people used to cram code in
so tight because of memory constraints that it was unreadable, or of die
hard C programmers who like to write code which is impossible to undersrtand.
Thats my point. If you dont agree with me, then thats fine.
With the addition that I sometimes think that it is used as a kind of obfuscating the code for others.
However the last sentence from you (OHM) is as well for me.
Cor
Hi Lloyd,
I never say never, however it is not the first choise from me while I not
say forever do not use it.
When you have to do complex text changes in one search through a not
orginized documents I believe there are not much alternatives, however I see
it here often special for simple changes.
Cor If you have ever processed lots of text like screen scrapping you will end up using regex. I agree that it is somewhat hard to see what it does but once you learn it like everything you will wonder why you did not use it before. But I had to buy a book since resources on the web are sparse and MS documentation is pretty much non-existant.
| With the addition that I sometimes think that it is used as a kind of
| obfuscating the code for others.
set the regex options to ignore white space and then indent and add comments
w/n the regex. what you have in that case is a programming language. regex
is like a sql statement...but for text. consider what it would take to
program the following:
testString:
'abcdefg', 'lkasjdflk', 'kslthhtjkehslj t', '.zx,mv.zmx', 'abcdefg',
'lkasjdflk', 'kslthhtjkehslj t', '.zx,mv.zmx', 'abcdefg', 'lkasjdflk',
'kslthhtjkehslj t', '.zx,mv.zmx', 'abcdefg', 'lkasjdflk', 'kslthhtjkehslj t',
'.zx,mv.zmx'
your job:
find all ticked strings, then replace the ones that repeat with only one
instance.
well, w/ regex, it is as simple as:
dim regex as new regex("('[^']*?', )(?:'[^']*?', )*(\1)")
testString = regex.replace(t estString, "$1")
if the pattern looks complex, ignore whitespace and add your comments and/or
indenting or whatever. pretty simple.
but again, to each their own.
Hi Steve, so, both C and regex are not your friends? ;^)
Although I wrote that there are circumstances that I do not ommit things as
regex forever, I get the idea that you try (without directly saying it
however with that single ;^) to hit the knowledge of OHM and me.
Try this
Dim teststring1 As String =
"'abcdefg','lka sjdflk','kslthh tjkehsljt','.zx ,mv.zmx','abcde fg'," & _
"'lkasjdflk','k slthhtjkehsljt' ,'.zx,mv.zmx',' abcdefg','lkasj dflk'," & _
"'kslthhtjkehsl jt','.zx,mv.zmx ','lkasjdflk',' abcdefg','lkasj dflk','kslthhtj k
ehsljt'," & _
"'.zx,mv.zm x'"
Dim start As Integer = Environment.Tic kCount
Dim teststring2 As String
For i As Integer = 0 To 10000
Dim regex As New
System.Text.Reg ularExpressions .Regex("('[^']*?',)(?:'[^']*?',)*(\1)")
teststring2 = regex.Replace(t eststring1, "$1")
Next
Console.Write(t eststring2 & "time: " & _
(Environment.Ti ckCount - start).ToString & vbCrLf)
start = Environment.Tic kCount
Dim teststring3 As String
For i As Integer = 0 To 10000
Dim sb As System.Text.Str ingBuilder
sb = New System.Text.Str ingBuilder
Dim sp As String() = Split(teststrin g1, "','")
sp(0) = sp(0).Substring (1)
sp(sp.Length - 1) = sp(sp.Length - 1).Substring(0, _
sp(sp.Length - 1).Length - 1)
For Each da1 As String In sp
Dim da2 As String = "'" & da1 & "'"
If sb.ToString.Ind exOf(da2) = -1 Then
sb.Append(da2)
sb.Append(",")
End If
Next
teststring3 = sb.ToString.Sub string(0, sb.ToString.Len gth - 1)
Next
Console.Write(t eststring3 & "time: " & _
(Environment.Ti ckCount - start).ToString & vbCrLf)
You will see that the second one without the regex is 4 times faster and
gives the same result.
About the style and code we can discus if the second as well reach a kind of
obfuscating style, however that is for me the same as with the Regex sample.
Cor
'C' was a great language as far as I am concerned, I used it for several
years quite happily, so I would accredit myself with a reasonable
understanding of it.
However, I personally know two programmers who used to write code in way
which was designed to demonstrate ( to the knowledgeable eye ) their
expertise in understanding the nuances of the compiler. I've seen a piece of
code which printf'ed a poem which no obvious source. All of it looked like
garbage, not unlike a RegExp.
--
OHM ( Terry Burns )
. . . One-Handed-Man . . .
"steve" <a@b.com> wrote in message
news:10******** *****@corp.supe rnews.com... this just hit me as funny logical consequence...
|die hard C | programmers who like to write code which is impossible to undersrtand.
punctuated differently:
"die hard C programmers who like to write code, which is impossible to understand."
the are only two logical assumptions...o nly die hard C programmers like to write code that is impossible to understand...or ...ockham's razor - any
code can be made very hard to understand; few, if any, programmers of any language "like" to write hard to understand code; understanding is
inference from experience; therefore, it is simplest to say that the language of C
is impossible for you to understand (since understanding is wholly an individual endeavor). and, as we all know, this simplest answer or
solution is more oft' than not, the correct one.
so, both C and regex are not your friends? ;^)
just playing w/ you ohm.
cheers.
i was kidding w/ ohm.
ahhhh...i wonder what the difference w/b if the data itself contained no
identifying marker by which you could perform a nifty split operation? that
would require you to rewrite your entire function...i'd just have to change
the pattern in my example and still be left w/ two lines of code to
maintain. the regex pattern itself is no longer considered obfuscation if,
like i said, one were to place comments w/n it (just as you would w/ any
language).
but i digress...i can see my humor was lost to offense. appologies s/b
applied where they are needed to both you and ohm.
later,
steve
"Cor Ligthert" <no**********@p lanet.nl> wrote in message
news:eW******** ******@TK2MSFTN GP10.phx.gbl...
| Hi Steve,
|
| > so, both C and regex are not your friends? ;^)
|
| Although I wrote that there are circumstances that I do not ommit things
as
| regex forever, I get the idea that you try (without directly saying it
| however with that single ;^) to hit the knowledge of OHM and me.
|
| Try this
|
| Dim teststring1 As String =
| "'abcdefg','lka sjdflk','kslthh tjkehsljt','.zx ,mv.zmx','abcde fg'," & _
| "'lkasjdflk','k slthhtjkehsljt' ,'.zx,mv.zmx',' abcdefg','lkasj dflk'," & _
|
"'kslthhtjkehsl jt','.zx,mv.zmx ','lkasjdflk',' abcdefg','lkasj dflk','kslthhtj k
| ehsljt'," & _
| "'.zx,mv.zm x'"
| Dim start As Integer = Environment.Tic kCount
| Dim teststring2 As String
| For i As Integer = 0 To 10000
| Dim regex As New
| System.Text.Reg ularExpressions .Regex("('[^']*?',)(?:'[^']*?',)*(\1)")
| teststring2 = regex.Replace(t eststring1, "$1")
| Next
| Console.Write(t eststring2 & "time: " & _
| (Environment.Ti ckCount - start).ToString & vbCrLf)
| start = Environment.Tic kCount
| Dim teststring3 As String
|
| For i As Integer = 0 To 10000
| Dim sb As System.Text.Str ingBuilder
| sb = New System.Text.Str ingBuilder
| Dim sp As String() = Split(teststrin g1, "','")
| sp(0) = sp(0).Substring (1)
| sp(sp.Length - 1) = sp(sp.Length - 1).Substring(0, _
| sp(sp.Length - 1).Length - 1)
| For Each da1 As String In sp
| Dim da2 As String = "'" & da1 & "'"
| If sb.ToString.Ind exOf(da2) = -1 Then
| sb.Append(da2)
| sb.Append(",")
| End If
| Next
| teststring3 = sb.ToString.Sub string(0, sb.ToString.Len gth - 1)
| Next
| Console.Write(t eststring3 & "time: " & _
| (Environment.Ti ckCount - start).ToString & vbCrLf)
|
| You will see that the second one without the regex is 4 times faster and
| gives the same result.
| About the style and code we can discus if the second as well reach a kind
of
| obfuscating style, however that is for me the same as with the Regex
sample.
|
| Cor
|
|
No appology needed, I was smirking while both reading and replying. RegExp
does have its place as you say.
Cheers
OHM ( Terry Burns )
. . . One-Handed-Man . . .
"steve" <a@b.com> wrote in message
news:10******** *****@corp.supe rnews.com... i was kidding w/ ohm.
ahhhh...i wonder what the difference w/b if the data itself contained no identifying marker by which you could perform a nifty split operation?
that would require you to rewrite your entire function...i'd just have to
change the pattern in my example and still be left w/ two lines of code to maintain. the regex pattern itself is no longer considered obfuscation if, like i said, one were to place comments w/n it (just as you would w/ any language).
but i digress...i can see my humor was lost to offense. appologies s/b applied where they are needed to both you and ohm.
later,
steve
"Cor Ligthert" <no**********@p lanet.nl> wrote in message news:eW******** ******@TK2MSFTN GP10.phx.gbl... | Hi Steve, | | > so, both C and regex are not your friends? ;^) | | Although I wrote that there are circumstances that I do not ommit things as | regex forever, I get the idea that you try (without directly saying it | however with that single ;^) to hit the knowledge of OHM and me. | | Try this | | Dim teststring1 As String = | "'abcdefg','lka sjdflk','kslthh tjkehsljt','.zx ,mv.zmx','abcde fg'," & _ | "'lkasjdflk','k slthhtjkehsljt' ,'.zx,mv.zmx',' abcdefg','lkasj dflk'," & _ |
"'kslthhtjkehsl jt','.zx,mv.zmx ','lkasjdflk',' abcdefg','lkasj dflk','kslthhtj k | ehsljt'," & _ | "'.zx,mv.zm x'" | Dim start As Integer = Environment.Tic kCount | Dim teststring2 As String | For i As Integer = 0 To 10000 | Dim regex As New | System.Text.Reg ularExpressions .Regex("('[^']*?',)(?:'[^']*?',)*(\1)") | teststring2 = regex.Replace(t eststring1, "$1") | Next | Console.Write(t eststring2 & "time: " & _ | (Environment.Ti ckCount - start).ToString & vbCrLf) | start = Environment.Tic kCount | Dim teststring3 As String | | For i As Integer = 0 To 10000 | Dim sb As System.Text.Str ingBuilder | sb = New System.Text.Str ingBuilder | Dim sp As String() = Split(teststrin g1, "','") | sp(0) = sp(0).Substring (1) | sp(sp.Length - 1) = sp(sp.Length - 1).Substring(0, _ | sp(sp.Length - 1).Length - 1) | For Each da1 As String In sp | Dim da2 As String = "'" & da1 & "'" | If sb.ToString.Ind exOf(da2) = -1 Then | sb.Append(da2) | sb.Append(",") | End If | Next | teststring3 = sb.ToString.Sub string(0, sb.ToString.Len gth - 1) | Next | Console.Write(t eststring3 & "time: " & _ | (Environment.Ti ckCount - start).ToString & vbCrLf) | | You will see that the second one without the regex is 4 times faster and | gives the same result. | About the style and code we can discus if the second as well reach a
kind of | obfuscating style, however that is for me the same as with the Regex sample. | | Cor | |
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Jon Maz |
last post by:
Hi All,
Am getting frustrated trying to port the following (pretty simple) function
to CSharp. The problem is that I'm lousy at Regular Expressions....
//from http://support.microsoft.com/default.aspx?scid=kb;EN-US;246800
function fxnParseIt()
{
var sInputString = 'asp and database';
|
by: Tim Conner |
last post by:
Is there a way to write a faster function ?
public static bool IsNumber( char Value )
{
if (Regex.IsMatch( Value.ToString(), @"^+$" ))
{
return true;
}
else return false;
}
|
by: jeevankodali |
last post by:
Hi
I have an .Net application which processes thousands of Xml nodes each
day and for each node I am using around 30-40 Regex matches to see if
they satisfy some conditions are not. These Regex matches are called
within a loop (like if or for). E.g.
for(int i = 0; i < 10; i++)
{
Regex r = new Regex();
|
by: clintonG |
last post by:
I'm using an .aspx tool I found at but as nice as the interface is I
think I need to consider using others. Some can generate C# I understand.
Your preferences please...
<%= Clinton Gallagher
http://forta.com/books/0672325667/
|
by: Extremest |
last post by:
I have a huge regex setup going on. If I don't do each one by itself
instead of all in one it won't work for. Also would like to know if
there is a faster way tried to use string.replace with all the right
parts in there in one big line and for some reason that did not work
either. Here is my regex's.
static Regex rar = new Regex("\\.part.*",
RegexOptions.IgnoreCase);
static Regex par = new Regex("\\.vol.*",
RegexOptions.IgnoreCase);
| |
by: Extremest |
last post by:
I am using this regex.
static Regex paranthesis = new Regex("(\\d*/\\d*)",
RegexOptions.IgnoreCase);
it should find everything between parenthesis that have some numbers
onyl then a forward slash then some numbers. For some reason I am not
getting that. It won't work at all in 2.0
|
by: aspineux |
last post by:
My goal is to write a parser for these imaginary string from the SMTP
protocol, regarding RFC 821 and 1869.
I'm a little flexible with the BNF from these RFC :-)
Any comment ?
tests=
def RN(name, regex):
"""protect using () and give an optional name to a regex"""
if name:
|
by: morleyc |
last post by:
Hi, i would like to remove a number of characters from my string (\t
\r \n which are throughout the string), i know regex can do this but i
have no idea how. Any pointers much appreciated.
Chris
|
by: CJ |
last post by:
Is this the format to parse a string and return the value between the item?
Regex pRE = new Regex("<File_Name>.*>(?<insideText>.*)</File_Name>");
I am trying to parse this string.
<File_Name>Services</File_Name>
Thanks
|
by: Karch |
last post by:
I have these two methods that are chewing up a ton of CPU time in my
application. Does anyone have any suggestions on how to optimize them or
rewrite them without Regex? The most time-consuming operation by a long-shot
is the regex.Replace. Basically the only purpose of it is to remove spaces
between opening/closing tags and the element name. Surely there is a better
way.
private string FixupJavascript(string htmlCode)
{
string result...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |