By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,200 Members | 1,755 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,200 IT Pros & Developers. It's quick & easy.

Translating Foreign HTML Code

P: 23
I'm working on this script that grabs a web page from a foreign site, searches it for specific information, and grabs web pages from links on the original page. Once I had it working, I tried it out on the foreign site. However, the information I got back was nonsensical. I'm guessing the code I get back from the web page is written in that foreign language, but when I [view]->[page source] of the same page, it looks like normal html code.

Does anyone know what is happening here or how to fix it?

Here is my script:
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use WWW::Mechanize;
  3.  
  4. my $mech = WWW::Mechanize->new();
  5. my $page = $mech->get('http://russia.ru');
  6.  
  7. print $page->content;
  8.  
Here is what the page looks like when I view the source code from my browser:



Here is what html code is returned after the script is run:



I hope I've provided adequate information. Thank you for all of your help.
Nov 21 '07 #1
Share this Question
Share on Google+
7 Replies


eWish
Expert 100+
P: 971
There is nothing wrong with the code you posted. Have you tried to check other sites? The Russian site you are trying to view is mostly flash content. That is likely your problem.

--Kevin
Nov 21 '07 #2

P: 23
Thanks for your input!

I'm developing this script for web sites from many different countries. The first that I it tried on was another russian pages. I simply provided russia.ru as an example. I want to be able to search the content retrieved for different strings, but I don't know how I can do that if the content isn't normal html.
Nov 21 '07 #3

KevinADC
Expert 2.5K+
P: 4,059
html is only written in english as far as I know.
Nov 21 '07 #4

P: 23
I think I misdiagnosed the problem. I believe now that what I'm getting back from these web pages is raw php content, because the forums (english or foreign) I tried were all written in php.

This is what I get back:


By alnoir

It doesn't seem to be formatted or even recognizable code, however, that's what it is. Does this familiar to anyone? Can perl interpret this so that information can be extracted?

Thank you everyone.
Nov 24 '07 #5

KevinADC
Expert 2.5K+
P: 4,059
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
Nov 24 '07 #6

numberwhun
Expert Mod 2.5K+
P: 3,503
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
Agreed! There is no way to get the raw PHP code. That is one nice thing about PHP is you cannot just "get" the raw code as it is automatically converted to the HTML output before presenting it to the user.

I also agree that that looks more like binary output than anything else and I don't think that there is any way for Perl to help you with that. If you look at eWish's earlier post, I think he stated it correctly that because this page seems to be flash driven, that is why you are getting binary data, instaed of the html. Why don't you pick a site that is not flash based and try to grab it?

Regards,

Jeff
Nov 25 '07 #7

P: 23
With help from a friend, I finally got the script to work. Instead of using the module that I was, I tried using LWP and it worked great. Thank you to all the people who took time to help me with the problems I was encountering. I appreciate the help.
Nov 28 '07 #8

Post your reply

Sign in to post your reply or Sign up for a free account.