473,395 Members | 1,502 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

How can I match a hyphen in this regular expression?

Hi all,

I am searching through directories trying to find the prefix to a
number of files. Unfortunately the files don't have a standard naming
convention yet.

So some of them appear as:
THH307A.Monitoring Public Health Issues.doc ' A single period
THH307A- Monitoring Public Health Issues.doc ' A hyphen, space
THH307A Monitoring Public Health Issues.doc ' A single space
THH307 A.Monitoring Public Health Issues.doc ' A mess

At the moment I can match filenames that have a period or space after
the prefix but can't work out how to also match a hyphen.

This is my regex at the moment:
Dim PrefixRegex As Regex = New Regex("(?<prefix>[^\.| ]+)[\.| ](?
<unitName>.+)")

Can someone help me match the hyphen and maybe even the messy ones
where there could be a space near the end of the unit code.

Many thanks,

Peter.

Mar 7 '07 #1
4 3129
Peter wrote:
<snip>
So some of them appear as:
THH307A.Monitoring Public Health Issues.doc ' A single period
THH307A- Monitoring Public Health Issues.doc ' A hyphen, space
THH307A Monitoring Public Health Issues.doc ' A single space
THH307 A.Monitoring Public Health Issues.doc ' A mess

At the moment I can match filenames that have a period or space after
the prefix but can't work out how to also match a hyphen.

This is my regex at the moment:
Dim PrefixRegex As Regex = New Regex("(?<prefix>[^\.| ]+)[\.| ](?
<unitName>.+)")
<snip>

Maybe "(?<prefix>[^\.| ]+)[\.| ]+(?<unitName>.+)" will do. However,
the last example (with an embeded space in the prefix) will be more
challenging...
HTH.

Regards.

Branco.

Mar 7 '07 #2
On Mar 8, 12:45 am, "Branco Medeiros" <branco.medei...@gmail.com>
wrote:
Peter wrote:

<snip>So some of them appear as:
THH307A.Monitoring Public Health Issues.doc ' A single period
THH307A- Monitoring Public Health Issues.doc ' A hyphen, space
THH307A Monitoring Public Health Issues.doc ' A single space
THH307 A.Monitoring Public Health Issues.doc ' A mess
At the moment I can match filenames that have a period or space after
the prefix but can't work out how to also match a hyphen.
This is my regex at the moment:
Dim PrefixRegex As Regex = New Regex("(?<prefix>[^\.| ]+)[\.| ](?
<unitName>.+)")

<snip>

Maybe "(?<prefix>[^\.| ]+)[\.| ]+(?<unitName>.+)" will do. However,
the last example (with an embeded space in the prefix) will be more
challenging...

HTH.

Regards.

Branco.

Hey Branco,

Thanks for replying. I tried what you suggested but it appears to do
the same thing as my original regular expression. I am trying to
extract the prefix without the hyphen. But unfortunately using your
above mentioned regex the hyphen remains attached to the prefix.

I am using this small console app to test the regex:

Sub Main()

' My Regex:
' Dim PrefixRegex As Regex = New Regex("(?<prefix>[^\.| ]+)
[\.| ](?<unitName>.+)")

' Brancos Regex:
Dim PrefixRegex As Regex = New Regex("(?<prefix>[^\.| ]+)[\.| ]
+(?<unitName>.+)")
Dim filename As String = "THH307A Monitoring Public Health
Issues.doc"
Dim filename2 As String = "THH307A.Monitoring Public Health
Issues.doc"
Dim filename3 As String = "THH307A- Monitoring Public Health
Issues.doc "

Dim M As Match = PrefixRegex.Match(filename)
Dim M2 As Match = PrefixRegex.Match(filename2)
Dim M3 As Match = PrefixRegex.Match(filename3)

If M.Success Then
System.Console.WriteLine("Prefix: " &
M.Groups("prefix").Value)
System.Console.WriteLine("Unit Name: " &
M.Groups("unitName").Value)
Else
System.Console.WriteLine(filename & " is not a valid
filename")
End If

If M2.Success Then
System.Console.WriteLine("Prefix: " &
M2.Groups("prefix").Value)
System.Console.WriteLine("Unit Name: " &
M2.Groups("unitName").Value)
Else
System.Console.WriteLine(filename2 & " is not a valid
filename")
End If

If M3.Success Then
System.Console.WriteLine("Prefix: " &
M3.Groups("prefix").Value)
System.Console.WriteLine("Unit Name: " &
M3.Groups("unitName").Value)
Else
System.Console.WriteLine(filename3 & " is not a valid
filename")
End If

System.Console.WriteLine()
System.Console.WriteLine("Press Enter to Continue...")
System.Console.ReadLine()
End Sub

Output:
-----------
Prefix: THH307A
Unit Name: Monitoring Public Health Issues.doc
Prefix: THH307A
Unit Name: Monitoring Public Health Issues.doc
Prefix: THH307A-
Unit Name: Monitoring Public Health Issues.doc

Press Enter to Continue...

----------------------------------

Do you have any other ideas?

Thanks again,

Peter.

Mar 7 '07 #3
Peter wrote:
<snip>
Thanks for replying. I tried what you suggested but it appears to do
the same thing as my original regular expression. I am trying to
extract the prefix without the hyphen. But unfortunately using your
above mentioned regex the hyphen remains attached to the prefix.
<snip>

Sorry, I really can't recall what I originally understood from your
first post (a real busy day on this side of the country)...

The thing with specifying a hyphen in a charclass is that it must be
the last element of the class. therefore, the regrex will probably be
like this:

"(?<prefix>[^\.| -]+)[\.| -]+(?<unitName>.+)"
HTH.

Regards,

Branco.

Mar 8 '07 #4
What about logically approaching it like this (to help catch "the mess" case):

1) strip off the file extension and the preceeding period
2) now you're left with just the file name
3) Reverse the string and then find the first occurence (in reality the last
occurence, since we reversed the string) of a non-alphanumeric character
that's not a space. Take everything after this character.
4) If the result of #3, above, is an empty string, then go back to the
original file name and just take everything to the left of the first space
(the file name must not contain any special characters to parse off of).
PREFIX FOUND.
5) Else, take the result of #3, above, reverse it to put it back in the
normal order. PREFIX FOUND.

One approach I've used when trying to parse strings in crazy formats, is to
apply whatever rules you've got so far to your list of strings. Make two
groups, those that you were able to parse correctly and those you weren't.
Look at the unparse-able group to see what rules you need to add to increase
recognized strings. Keep adding rules to shrink the size of the unparse-able
group. When you're done, you'll be left with a small group of strings that
you might have to parse manually if a parsing rule can't be created.

PJ Simon

"Peter" wrote:
Hi all,

I am searching through directories trying to find the prefix to a
number of files. Unfortunately the files don't have a standard naming
convention yet.

So some of them appear as:
THH307A.Monitoring Public Health Issues.doc ' A single period
THH307A- Monitoring Public Health Issues.doc ' A hyphen, space
THH307A Monitoring Public Health Issues.doc ' A single space
THH307 A.Monitoring Public Health Issues.doc ' A mess

At the moment I can match filenames that have a period or space after
the prefix but can't work out how to also match a hyphen.

This is my regex at the moment:
Dim PrefixRegex As Regex = New Regex("(?<prefix>[^\.| ]+)[\.| ](?
<unitName>.+)")

Can someone help me match the hyphen and maybe even the messy ones
where there could be a space near the end of the unit code.

Many thanks,

Peter.

Mar 8 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Ron Adam | last post by:
Is it possible to match a string to regular expression pattern instead of the other way around? For example, instead of finding a match within a string, I want to find out, (pass or fail), if...
0
by: Follower | last post by:
Hi, I am working on a function to return extracts from a text document with a specific phrase highlighted (i.e. display the context of the matched phrase). The requirements are: * Match...
1
by: Venkat | last post by:
Hi, I am using match function of string to find if a character is there in a string. The function Match is working fine with all the other characters except when the searching character is "+"....
27
by: The Bicycling Guitarist | last post by:
Hi. I found the following when trying to learn if there is such a thing as a non-breaking hyphen. Apparently Unicode has a ‑ but that is not well-supported, especially in older browsers. Somebody...
19
by: Tom Deco | last post by:
Hi, I'm trying to use a regular expression to match a string containing a # (basically i'm looking for #include ...) I don't seem to manage to write a regular expression that matches this. ...
2
by: Christian Staffe | last post by:
Hi, I would like to check for a partial match between an input string and a regular expression using the Regex class in .NET. By partial match, I mean that the input string could not yet be...
3
by: Henrik Dahl | last post by:
Hello! The regular expression matches on a, b or c. How to put a hyphen itself as either a or c, i.e. something like (from hyphen to c)? Best regards, Henrik Dahl
38
by: Steve Kirsch | last post by:
I need a simple function that can match the number of beginning and ending parenthesis in an expression. Here's a sample expression: ( ( "john" ) and ( "jane" ) and ( "joe" ) ) Does .NET have...
14
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.