By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,589 Members | 1,194 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,589 IT Pros & Developers. It's quick & easy.

repeated calls to strrchr... to find second to last occurence

P: n/a
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

I thought I would be able to do it like this:

----------------------------------------------------
char *cd = (char *)NULL;

if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
---------------------------------------------------

But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".".

Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!

Nov 14 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Sean Berry <se********@cox.net> scribbled the following:
I need to find the second to last occurence of a "." in a string. Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt and want to extract /path/to/file.txt I thought I would be able to do it like this: ----------------------------------------------------
char *cd = (char *)NULL; if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
--------------------------------------------------- But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".". Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!


strrchr() returns a pointer to the last match, or NULL if there was
no match. So, if it found a match, you need to investigate the part
of the string that comes before the last match.
First check if the pointer is the same as your original string
pointer. If it is, strrchr() found a match at the exact start of your
string, and there can't possibly be anything before it. So in that
case, exit: there isn't a second-to-last match.
Otherwise, change the character in the position strrchr() reported to
'\0', chopping off the string from the match onwards. Then call
sttrchr() again. If it found a match, that's your second-to-last
match. Otherwise exit: there isn't a second-to-last match.
If your string isn't modifiable, copy it into a modifiable string.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"We're women. We've got double standards to live up to."
- Ally McBeal
Nov 14 '05 #2

P: n/a

On Thu, 24 Jun 2004, Sean Berry wrote:

I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt


Don't use 'strrchr'. As you have discovered, that won't work.
The solution is not to hack around the problem, but rather to solve
it a different way. What part of the string do you want to extract?
Answer: the part following "mydomin.com". In general, the part which
comes immediately after the domain name, which is a sequence of
alphanumerics and dots, which itself follows the string "http://".
So look for "http://", followed by a sequence of alphanumerics and
dots, followed by a slash; and then extract everything from the slash
onwards.

BTW, unnecessary casts are evil.

char *cd;
if (strncmp(short_database, "http://", (sizeof "http://" - 1)) != 0)
do_error("URL does not begin with 'http://'!");
cd = short_database + (sizeof "http://" - 1);
while (!strchr("/", *cd))
++cd;
strcpy(short_database, cd);

(Note the use of 'strchr("/",...)' instead of '(... == '/')'. This
is an idiom that I've found very useful; it catches the end-of-string
null character as well as the slash for which we're really looking.
Be sure to preserve this behavior in your code; you don't want to
segfault if the user enters "http://www.google.com"!)

HTH,
-Arthur
Nov 14 '05 #3

P: n/a
On Thu, 24 Jun 2004, Sean Berry wrote:
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt
It seems like you are asking the wrong question. If you want to extract
everything after the domain name then looking for the second last '.'
character will not always work. What if I had the URL:

http://some.domain.com/path.with/a.p...in/it/file.txt

Don't you want everything after, and including, the third '/'?
I thought I would be able to do it like this:

----------------------------------------------------
char *cd = (char *)NULL;

if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
---------------------------------------------------

But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".".

Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!


Your C code doesn't seem to be the problem. You might want to pop over to
comp.programming and validate your algorithm before you attempt to
implement it.

--
Send e-mail to: darrell at cs dot toronto dot edu
Don't send e-mail to vi************@whitehouse.gov
Nov 14 '05 #4

P: n/a
Arthur J. O'Dwyer <aj*@nospam.andrew.cmu.edu> wrote:
On Thu, 24 Jun 2004, Sean Berry wrote:

I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt
Don't use 'strrchr'. As you have discovered, that won't work.
The solution is not to hack around the problem, but rather to solve
it a different way. What part of the string do you want to extract?
Answer: the part following "mydomin.com". In general, the part which
comes immediately after the domain name, which is a sequence of
alphanumerics and dots, which itself follows the string "http://".
So look for "http://", followed by a sequence of alphanumerics and
dots, followed by a slash; and then extract everything from the slash
onwards. BTW, unnecessary casts are evil. char *cd;
if (strncmp(short_database, "http://", (sizeof "http://" - 1)) != 0)
Why ( sizeof "http://" - 1 )? Shouldn't that be sizeof "http://"?
do_error("URL does not begin with 'http://'!");
cd = short_database + (sizeof "http://" - 1);
I would think that the "-1" part does not look right here.
while (!strchr("/", *cd))
Using strchr() seems to be a bit of overkill when just comparing
characters. And it might be useful to guard against URLs like
"http://xx.yy.zz" by checking for '\0' while iterating over the
string:

while ( *cd && *cd != '/' ) ++cd;
if ( ! *cd )
do_error("URL without a path!");
strcpy(short_database, cd);


Don't use strcpy() when the strings may overlap, use memmove() instead:

memmove( short_database, cd, strlen( cd ) );

Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Nov 14 '05 #5

P: n/a
Je***********@physik.fu-berlin.de wrote:
Arthur J. O'Dwyer <aj*@nospam.andrew.cmu.edu> wrote:
On Thu, 24 Jun 2004, Sean Berry wrote:

I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt
<snip>
char *cd;
if (strncmp(short_database, "http://", (sizeof "http://" - 1)) != 0)
Why ( sizeof "http://" - 1 )? Shouldn't that be sizeof "http://"?


Because otherwise strncmp would return non-zero for each and every
valid URL; hint: sizeof "http://" evaluates to 8.
do_error("URL does not begin with 'http://'!");
cd = short_database + (sizeof "http://" - 1);


I would think that the "-1" part does not look right here.


But it is correct. See above.
while (!strchr("/", *cd))


Using strchr() seems to be a bit of overkill when just comparing
characters. And it might be useful to guard against URLs like
"http://xx.yy.zz" by checking for '\0' while iterating over the
string:


And that's exactly what the strchr does; as Arthur already pointed
out, strchr considers the terminating null character to be part of
the string.
while ( *cd && *cd != '/' )
++cd;
That's an equivalent solution.

FWIW, I'd let my code additionally check if the sequence between
"http://" and the next '/' or '\0' only consists of alphanumeric
characters plus dash plus dot.

<snip>
strcpy(short_database, cd);


Don't use strcpy() when the strings may overlap, use memmove() instead:

memmove( short_database, cd, strlen( cd ) );


Now, that's good advice, provided the obvious error is fixed:
terminating the string looks like a Good Idea to me.

Regards
--
Irrwahn Grausewitz (ir*******@freenet.de)
welcome to clc: http://www.ungerhu.com/jxh/clc.welcome.txt
clc faq-list : http://www.faqs.org/faqs/C-faq/faq/
clc OT guide : http://benpfaff.org/writings/clc/off-topic.html
Nov 14 '05 #6

P: n/a
Irrwahn Grausewitz <ir*******@freenet.de> wrote:
Je***********@physik.fu-berlin.de wrote:
Arthur J. O'Dwyer <aj*@nospam.andrew.cmu.edu> wrote:
On Thu, 24 Jun 2004, Sean Berry wrote:

I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt <snip>
char *cd;
if (strncmp(short_database, "http://", (sizeof "http://" - 1)) != 0)


Why ( sizeof "http://" - 1 )? Shouldn't that be sizeof "http://"? Because otherwise strncmp would return non-zero for each and every
valid URL; hint: sizeof "http://" evaluates to 8.
do_error("URL does not begin with 'http://'!");
cd = short_database + (sizeof "http://" - 1);


I would think that the "-1" part does not look right here. But it is correct. See above.
Grrr.... Perhaps hitting myself on my head will help me to remember

sizeof != strlen
while (!strchr("/", *cd))


Using strchr() seems to be a bit of overkill when just comparing
characters. And it might be useful to guard against URLs like
"http://xx.yy.zz" by checking for '\0' while iterating over the
string: And that's exactly what the strchr does; as Arthur already pointed
out, strchr considers the terminating null character to be part of
the string.
Ah, that was too clever a solution for me;-(
while ( *cd && *cd != '/' )
++cd; That's an equivalent solution. FWIW, I'd let my code additionally check if the sequence between
"http://" and the next '/' or '\0' only consists of alphanumeric
characters plus dash plus dot. <snip> strcpy(short_database, cd);


Don't use strcpy() when the strings may overlap, use memmove() instead:

memmove( short_database, cd, strlen( cd ) );

Now, that's good advice, provided the obvious error is fixed:
terminating the string looks like a Good Idea to me.


Yes, another off by 1 error. Mustn't have been a good day...

Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Nov 14 '05 #7

P: n/a
In <pVECc.55$_h.50@fed1read07> "Sean Berry" <se********@cox.net> writes:
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt


This is an ideal job for sscanf: its pattern matching capabilities are
just enough for your needs. Assuming that the string starts with
"http://" (otherwise use strstr to locate it first):

char str[] = "http://this.is.mydomain.com/path/to/file.txt";
char path[sizeof str];
int rc = sscanf(str, "http://%*[^/]%[^\n]", path);
if (rc == 1) puts(path);

If there are some other characters that you don't want to allow into the
path, include them in the last conversion specification.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #8

P: n/a
Sean Berry wrote:
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

I thought I would be able to do it like this:

----------------------------------------------------
char *cd = (char *)NULL;

if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
---------------------------------------------------

But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".".

Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!


Here's one way to do it.

char short_database[] = "http://this.is.mydomin.com/path/to/file.txt";
char* reader = short_database, * writer = short_database;
int slash_count = 3;
do {
if (slash_count <= 0) *writer++ = *reader++;
else if (*reader++ == '/') slash_count--;
} while (*reader);

will put "path/to/file.txt" into short_database.
Nov 14 '05 #9

P: n/a
Je***********@physik.fu-berlin.de wrote:
Arthur J. O'Dwyer <aj*@nospam.andrew.cmu.edu> wrote:
On Thu, 24 Jun 2004, Sean Berry wrote:
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

Don't use 'strrchr'. As you have discovered, that won't work.
The solution is not to hack around the problem, but rather to solve
it a different way. What part of the string do you want to extract?
Answer: the part following "mydomin.com". In general, the part which
comes immediately after the domain name, which is a sequence of
alphanumerics and dots, which itself follows the string "http://".
So look for "http://", followed by a sequence of alphanumerics and
dots, followed by a slash; and then extract everything from the slash
onwards.


BTW, unnecessary casts are evil.


char *cd;
if (strncmp(short_database, "http://", (sizeof "http://" - 1)) != 0)

Why ( sizeof "http://" - 1 )? Shouldn't that be sizeof "http://"?


I don't see why. sizeof "http://" is 8 (7 real characters + the null
character) but we only want to use seven characters.

I normally prefer to use strlen() in this circumstance. It makes the
code more obvious, since (as we just demonstrated) not everyone is
familar with the semantics of sizeof with string literals, and gcc will
optimize calls to strlen() with string literals so there is no
performance win using sizeof. YMMV.

do_error("URL does not begin with 'http://'!");
cd = short_database + (sizeof "http://" - 1);

I would think that the "-1" part does not look right here.


See above.
while (!strchr("/", *cd))

Using strchr() seems to be a bit of overkill when just comparing
characters. And it might be useful to guard against URLs like
"http://xx.yy.zz" by checking for '\0' while iterating over the
string:

while ( *cd && *cd != '/' )
++cd;

if ( ! *cd )
do_error("URL without a path!");

strcpy(short_database, cd);

Don't use strcpy() when the strings may overlap, use memmove() instead:

memmove( short_database, cd, strlen( cd ) );

Regards, Jens


Both good points, but your memmove() fails to copy the null character.
Add one to the strlen().

-Peter
Nov 14 '05 #10

P: n/a
kal
"Sean Berry" <se********@cox.net> wrote in message news:<pVECc.55$_h.50@fed1read07>...
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt


As has been pointed out by others, this approach has some
drawbacks.

If faced with a similar problem, I would first look around
for availability of functions that manipulate URL like
strings. The more general a function the better.

i.e. a function that extracts the "path" part even if one
or more of the other parts are missing. And may be another
function to check if a URL string is well formed.

--

"There is no problem, however complicated, which when looked
at the right way did not become still more compilcated."
Nov 14 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.