473,226 Members | 1,418 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,226 software developers and data experts.

Help with regular expression?

I'm hopeless at regular expressions (I just don't use them often
enough to gain/maintain knowledge), but I need one now and am looking
for help. I need to parse through a document to find a URL, and then
reconstruct another URL based on it. For example, I need to scan a
web page looking for something like <a
href="some_dir/list_20050815100225.csv">. I don't know in advance
what the date/time in the file name will be. I need to take the
result of that and construct a URL out of it so that I can automate
the download of this file on a regular basis. The replace can be done
by replacing "<token>" in
"http://www.whatever.com/some_dir/list_<token>" with the result from
above. However, I would like the directory information included in
the search result so that I don't have to hard-code it (i.e. I'd
rather look for a URL with "list_<datetime>.csv" in it).

I have a regular expression that comes close:
"href=""some_dir/list_(?:(?<1>[^""]*)""|(?<1>\S+))". I got that by
tweaking the example at
http://msdn.microsoft.com/library/de...ateformats.asp.
If I can't find a cleaner sample, that will have to do. However,
there are two minor problems with this expression: 1) I would rather
be returning the complete URL in the href (to make it easier to
capture variable subdirectories, for example), and 2) it would require
a two-step process (the match followed by the replace). Is it
possible have a single regular expression to do both? That would
simplify configuration of my program, since the intent is that none of
this be hard-coded.

Any help would be appreciated.

Thanks!
Brad.

P.S. If there's a better place to post this kind of question, I'd
love to hear about it. I was tempted to cross-post, but.... :-)
Aug 15 '05 #1
5 2454
This Regex string will work for identifying URLS:

(http|https|mailto):([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])+#*([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])

"Bradley Plett" wrote:
I'm hopeless at regular expressions (I just don't use them often
enough to gain/maintain knowledge), but I need one now and am looking
for help. I need to parse through a document to find a URL, and then
reconstruct another URL based on it. For example, I need to scan a
web page looking for something like <a
href="some_dir/list_20050815100225.csv">. I don't know in advance
what the date/time in the file name will be. I need to take the
result of that and construct a URL out of it so that I can automate
the download of this file on a regular basis. The replace can be done
by replacing "<token>" in
"http://www.whatever.com/some_dir/list_<token>" with the result from
above. However, I would like the directory information included in
the search result so that I don't have to hard-code it (i.e. I'd
rather look for a URL with "list_<datetime>.csv" in it).

I have a regular expression that comes close:
"href=""some_dir/list_(?:(?<1>[^""]*)""|(?<1>\S+))". I got that by
tweaking the example at
http://msdn.microsoft.com/library/de...ateformats.asp.
If I can't find a cleaner sample, that will have to do. However,
there are two minor problems with this expression: 1) I would rather
be returning the complete URL in the href (to make it easier to
capture variable subdirectories, for example), and 2) it would require
a two-step process (the match followed by the replace). Is it
possible have a single regular expression to do both? That would
simplify configuration of my program, since the intent is that none of
this be hard-coded.

Any help would be appreciated.

Thanks!
Brad.

P.S. If there's a better place to post this kind of question, I'd
love to hear about it. I was tempted to cross-post, but.... :-)

Aug 15 '05 #2
The example that I cited is actually closer to what I need, but
thanks!

Brad.

On Mon, 15 Aug 2005 13:23:03 -0700, Paul O
<Pa***@discussions.microsoft.com> wrote:
This Regex string will work for identifying URLS:

(http|https|mailto):([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])+#*([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])

"Bradley Plett" wrote:
I'm hopeless at regular expressions (I just don't use them often
enough to gain/maintain knowledge), but I need one now and am looking
for help. I need to parse through a document to find a URL, and then
reconstruct another URL based on it. For example, I need to scan a
web page looking for something like <a
href="some_dir/list_20050815100225.csv">. I don't know in advance
what the date/time in the file name will be. I need to take the
result of that and construct a URL out of it so that I can automate
the download of this file on a regular basis. The replace can be done
by replacing "<token>" in
"http://www.whatever.com/some_dir/list_<token>" with the result from
above. However, I would like the directory information included in
the search result so that I don't have to hard-code it (i.e. I'd
rather look for a URL with "list_<datetime>.csv" in it).

I have a regular expression that comes close:
"href=""some_dir/list_(?:(?<1>[^""]*)""|(?<1>\S+))". I got that by
tweaking the example at
http://msdn.microsoft.com/library/de...ateformats.asp.
If I can't find a cleaner sample, that will have to do. However,
there are two minor problems with this expression: 1) I would rather
be returning the complete URL in the href (to make it easier to
capture variable subdirectories, for example), and 2) it would require
a two-step process (the match followed by the replace). Is it
possible have a single regular expression to do both? That would
simplify configuration of my program, since the intent is that none of
this be hard-coded.

Any help would be appreciated.

Thanks!
Brad.

P.S. If there's a better place to post this kind of question, I'd
love to hear about it. I was tempted to cross-post, but.... :-)


Aug 15 '05 #3
I'll put this c#.

Regex regex = new Regex("href=\\\"(?'url'some_dir\\/list_[^\\\"]*)\\\""
, RegexOptions.IgnoreCase | RegexOptions.Singleline |
RegexOptions.ExplicitCapture);
string form ="<a href=\"some_dir/list_20050815100225.csv\">";
Match match = regex.Match( form );

if (match.Success)
{
Console.WriteLine("success: " + "http://www.whatever.com/" +
match.Groups["url"].Value);
}
else
{
Console.WriteLine("failed.");
}

and gets this result

success: http://www.whatever.com/some_dir/lis...0815100225.csv
Bruce Dunwiddie
www.csvreader.com
Paul O wrote:
This Regex string will work for identifying URLS:

(http|https|mailto):([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])+#*([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])

"Bradley Plett" wrote:
I'm hopeless at regular expressions (I just don't use them often
enough to gain/maintain knowledge), but I need one now and am looking
for help. I need to parse through a document to find a URL, and then
reconstruct another URL based on it. For example, I need to scan a
web page looking for something like <a
href="some_dir/list_20050815100225.csv">. I don't know in advance
what the date/time in the file name will be. I need to take the
result of that and construct a URL out of it so that I can automate
the download of this file on a regular basis. The replace can be done
by replacing "<token>" in
"http://www.whatever.com/some_dir/list_<token>" with the result from
above. However, I would like the directory information included in
the search result so that I don't have to hard-code it (i.e. I'd
rather look for a URL with "list_<datetime>.csv" in it).

I have a regular expression that comes close:
"href=""some_dir/list_(?:(?<1>[^""]*)""|(?<1>\S+))". I got that by
tweaking the example at
http://msdn.microsoft.com/library/de...ateformats.asp.
If I can't find a cleaner sample, that will have to do. However,
there are two minor problems with this expression: 1) I would rather
be returning the complete URL in the href (to make it easier to
capture variable subdirectories, for example), and 2) it would require
a two-step process (the match followed by the replace). Is it
possible have a single regular expression to do both? That would
simplify configuration of my program, since the intent is that none of
this be hard-coded.

Any help would be appreciated.

Thanks!
Brad.

P.S. If there's a better place to post this kind of question, I'd
love to hear about it. I was tempted to cross-post, but.... :-)


Aug 15 '05 #4
Yes, if I tweak the regular expression you provided just slightly (by
replacing "'url'some_dir" with "'url'[^\\\"]*", that works well and
includes the directory information even if it changes. Now it would
be nice if I could include the ["http://www.whatever.com/" +
match.Groups["url"].Value] in the same regular expression, but that
may be asking too much! :-)

Thanks!
Brad.

On 15 Aug 2005 14:16:20 -0700, "shriop" <sh****@hotmail.com> wrote:
I'll put this c#.

Regex regex = new Regex("href=\\\"(?'url'some_dir\\/list_[^\\\"]*)\\\""
, RegexOptions.IgnoreCase | RegexOptions.Singleline |
RegexOptions.ExplicitCapture);
string form ="<a href=\"some_dir/list_20050815100225.csv\">";
Match match = regex.Match( form );

if (match.Success)
{
Console.WriteLine("success: " + "http://www.whatever.com/" +
match.Groups["url"].Value);
}
else
{
Console.WriteLine("failed.");
}

and gets this result

success: http://www.whatever.com/some_dir/lis...0815100225.csv
Bruce Dunwiddie
www.csvreader.com
Paul O wrote:
This Regex string will work for identifying URLS:

(http|https|mailto):([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])+#*([a-zA-Z0-9$_.+!*(),;/?:@&~=%-])

"Bradley Plett" wrote:
> I'm hopeless at regular expressions (I just don't use them often
> enough to gain/maintain knowledge), but I need one now and am looking
> for help. I need to parse through a document to find a URL, and then
> reconstruct another URL based on it. For example, I need to scan a
> web page looking for something like <a
> href="some_dir/list_20050815100225.csv">. I don't know in advance
> what the date/time in the file name will be. I need to take the
> result of that and construct a URL out of it so that I can automate
> the download of this file on a regular basis. The replace can be done
> by replacing "<token>" in
> "http://www.whatever.com/some_dir/list_<token>" with the result from
> above. However, I would like the directory information included in
> the search result so that I don't have to hard-code it (i.e. I'd
> rather look for a URL with "list_<datetime>.csv" in it).
>
> I have a regular expression that comes close:
> "href=""some_dir/list_(?:(?<1>[^""]*)""|(?<1>\S+))". I got that by
> tweaking the example at
> http://msdn.microsoft.com/library/de...ateformats.asp.
> If I can't find a cleaner sample, that will have to do. However,
> there are two minor problems with this expression: 1) I would rather
> be returning the complete URL in the href (to make it easier to
> capture variable subdirectories, for example), and 2) it would require
> a two-step process (the match followed by the replace). Is it
> possible have a single regular expression to do both? That would
> simplify configuration of my program, since the intent is that none of
> this be hard-coded.
>
> Any help would be appreciated.
>
> Thanks!
> Brad.
>
> P.S. If there's a better place to post this kind of question, I'd
> love to hear about it. I was tempted to cross-post, but.... :-)
>


Aug 15 '05 #5
Hi Bradley,

As far as I know, the regular expression can only do matching in a string.
It cannot concatenate strings. So I think you have to do the string
operations in the C# code. HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Aug 16 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Steve | last post by:
Hello, I am writing a script that calls a URL and reads the resulting HTML into a function that strips out everthing and returns ONLY the links, this is so that I can build a link index of various...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
6
by: JohnSouth | last post by:
Hi I've been using a Regular expression to test for valid email addresses. It looks like: \w+(\w+)*@\w+(\w+)*\.\w+(\w+)* I've now had 2 occassions where it has rejected and email address...
3
by: Joe | last post by:
Hi, I have been using a regular expression that I don’t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator...
5
by: Bradley Plett | last post by:
I'm hopeless at regular expressions (I just don't use them often enough to gain/maintain knowledge), but I need one now and am looking for help. I need to parse through a document to find a URL,...
1
by: Rahul | last post by:
Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am...
3
by: Zach | last post by:
Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular...
14
by: Chris | last post by:
I need a pattern that matches a string that has the same number of '(' as ')': findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = Can anybody help me out? Thanks for any help!
3
by: Mr.Steskal | last post by:
Posted: Wed Jul 11, 2007 7:01 am Post subject: Regular Expression Help -------------------------------------------------------------------------------- I need help writing a regular...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.