473,404 Members | 2,178 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

Translating Foreign HTML Code

23
I'm working on this script that grabs a web page from a foreign site, searches it for specific information, and grabs web pages from links on the original page. Once I had it working, I tried it out on the foreign site. However, the information I got back was nonsensical. I'm guessing the code I get back from the web page is written in that foreign language, but when I [view]->[page source] of the same page, it looks like normal html code.

Does anyone know what is happening here or how to fix it?

Here is my script:
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use WWW::Mechanize;
  3.  
  4. my $mech = WWW::Mechanize->new();
  5. my $page = $mech->get('http://russia.ru');
  6.  
  7. print $page->content;
  8.  
Here is what the page looks like when I view the source code from my browser:



Here is what html code is returned after the script is run:



I hope I've provided adequate information. Thank you for all of your help.
Nov 21 '07 #1
7 1876
eWish
971 Expert 512MB
There is nothing wrong with the code you posted. Have you tried to check other sites? The Russian site you are trying to view is mostly flash content. That is likely your problem.

--Kevin
Nov 21 '07 #2
alnoir
23
Thanks for your input!

I'm developing this script for web sites from many different countries. The first that I it tried on was another russian pages. I simply provided russia.ru as an example. I want to be able to search the content retrieved for different strings, but I don't know how I can do that if the content isn't normal html.
Nov 21 '07 #3
KevinADC
4,059 Expert 2GB
html is only written in english as far as I know.
Nov 21 '07 #4
alnoir
23
I think I misdiagnosed the problem. I believe now that what I'm getting back from these web pages is raw php content, because the forums (english or foreign) I tried were all written in php.

This is what I get back:


By alnoir

It doesn't seem to be formatted or even recognizable code, however, that's what it is. Does this familiar to anyone? Can perl interpret this so that information can be extracted?

Thank you everyone.
Nov 24 '07 #5
KevinADC
4,059 Expert 2GB
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
Nov 24 '07 #6
numberwhun
3,509 Expert Mod 2GB
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
Agreed! There is no way to get the raw PHP code. That is one nice thing about PHP is you cannot just "get" the raw code as it is automatically converted to the HTML output before presenting it to the user.

I also agree that that looks more like binary output than anything else and I don't think that there is any way for Perl to help you with that. If you look at eWish's earlier post, I think he stated it correctly that because this page seems to be flash driven, that is why you are getting binary data, instaed of the html. Why don't you pick a site that is not flash based and try to grab it?

Regards,

Jeff
Nov 25 '07 #7
alnoir
23
With help from a friend, I finally got the script to work. Instead of using the module that I was, I tried using LWP and it worked great. Thank you to all the people who took time to help me with the problems I was encountering. I appreciate the help.
Nov 28 '07 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: geoff | last post by:
The table creation script(at the end of this post) works fine on 4.0.1-alpha-win, but the foreign key constraints fail on 4.0.15-win. I am starting the server with the same command for both...
43
by: David Trimboli | last post by:
In a text, I might want to include a foreign language term. In print, this is typically shown with italics. For instance (asterisks represent italics): 'I thought it was only a kind of *cram,*...
8
by: gregf | last post by:
Is there a way or a program (for windows) that can translate foreign characters inot the proper html code? I have a word document with many different characters and I really don't want to spend all...
23
by: gregf | last post by:
I have a paragraph of text pasted into a word document, it's in Polish, complete with polish characters. They show up just fine in word, but the program I use for web page programming, HomeSite,...
2
by: H5N1 | last post by:
Hi there First of all excuse me posting such simple (I guess) question, but I didn't find the answer in tutorials. I have a formView presenting records from some table. one of the fields is...
1
by: zufie | last post by:
Hi, I want to specifying a foreign key by altering a table. First, I create an ORDERS table without specifying a foreign key. Here is my code: CREATE TABLE ORDERS (Order_ID integer,...
2
by: Andrus | last post by:
I have resource files in different languages created by VCS 2005 Express. I want to use those files to translate reports at runtime. I have text to be translated as string. I think I need to...
3
by: MitchellEr | last post by:
I can't seem to get consistency in my application with foreign character handling. I'm creating a series of forms that update database tables. So, when trying to edit a form, the field values that...
5
by: RomeoX | last post by:
Hi everybody actually I need your help in fixing my code. Actually I have a library system that can be applied in university or any school and I'm stucking in a page that for loan student book. I...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.