By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,691 Members | 2,041 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,691 IT Pros & Developers. It's quick & easy.

Generating a site map

P: 14
Hey there everyone,
Sorry if this isnt the right place for this post.
I am building a script to generate a site map for me. For the most part it works fine, but a few errors are popping up.
Google isn't liking several of my urls. For instance, i have a url that ends in "trews.html". It looks like this line is fine in the site map, but when i type the url into the browser i get a 404 error as if the page doesn't exists. If i erase the url and type it out manually it works fine however.. and when i see the websites i recently visited, it lists "t%08rews.html" as one of them. It seems that perl is interpretting some kind of character that isnt there? I'm not sure. How could this be corrected?
Nov 30 '08 #1
Share this Question
Share on Google+
11 Replies


eWish
Expert 100+
P: 971
Without seeing some code it is we can't tell what your problem is. Also, you might post some of the expected output.

--Kevin
Nov 30 '08 #2

P: 14
Well I have one program which is generating all of my html pages as well as a text file containing all of the urls that are being created. It seems as if the problem is stemming from there as my sitemap program is using that text file to create its site map. The problem is that I can't see the problem.. in the text file it looks just fine as "trews.html", but if i copy it into the browser it cant find that site without me re-writing it. And I can't really post the code as it is very long, just wondering if anyone had seen something like this before and could lead me on the right track.
Nov 30 '08 #3

P: 14
Copying the url into word it appears as normal as "trews.html", but when copying from word into the browser it now appears as "t rews.html".
Nov 30 '08 #4

eWish
Expert 100+
P: 971
If you look up URL Encoding then %08 means that it is a backspace character.

So, I guess that your editor is adding some characters that don't need to be there. If you view and save your file using notepad or another text editor (not word or a word processor) do you still have the same problems?

--Kevin
Nov 30 '08 #5

KevinADC
Expert 2.5K+
P: 4,059
Make sure your text file is plain text. Re-save the file as txt or ASCII instead of doc or other word processing format.
Nov 30 '08 #6

P: 14
The file is .txt with UTF-8 encoding.
Dec 1 '08 #7

P: 14
I have also tried opening the file with textwrangler and using the "convert to ASCII" option.. but no luck.
Dec 1 '08 #8

P: 14
so i guess the question now is how could i use perl to remove a backspace?
Dec 1 '08 #9

KevinADC
Expert 2.5K+
P: 4,059
Quite odd to ever see a backspace in a text file but I guess its possible. You can try this. \b inside a charcater class is a backspace:

$str =~ s/[\b]//g;
Dec 1 '08 #10

P: 14
That did the trick for me. Thanks so much. It was the weirdest thing ever... it looked like it was fine when i copied the url. If you tried to hit backspace to erase it, it took two clicks on the spot where the extra character was to take it out.
Dec 1 '08 #11

KevinADC
Expert 2.5K+
P: 4,059
You're welcome
Dec 1 '08 #12

Post your reply

Sign in to post your reply or Sign up for a free account.