473,790 Members | 2,850 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Translating Foreign HTML Code

23 New Member
I'm working on this script that grabs a web page from a foreign site, searches it for specific information, and grabs web pages from links on the original page. Once I had it working, I tried it out on the foreign site. However, the information I got back was nonsensical. I'm guessing the code I get back from the web page is written in that foreign language, but when I [view]->[page source] of the same page, it looks like normal html code.

Does anyone know what is happening here or how to fix it?

Here is my script:
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use WWW::Mechanize;
  3.  
  4. my $mech = WWW::Mechanize->new();
  5. my $page = $mech->get('http://russia.ru');
  6.  
  7. print $page->content;
  8.  
Here is what the page looks like when I view the source code from my browser:



Here is what html code is returned after the script is run:



I hope I've provided adequate information. Thank you for all of your help.
Nov 21 '07 #1
7 1928
eWish
971 Recognized Expert Contributor
There is nothing wrong with the code you posted. Have you tried to check other sites? The Russian site you are trying to view is mostly flash content. That is likely your problem.

--Kevin
Nov 21 '07 #2
alnoir
23 New Member
Thanks for your input!

I'm developing this script for web sites from many different countries. The first that I it tried on was another russian pages. I simply provided russia.ru as an example. I want to be able to search the content retrieved for different strings, but I don't know how I can do that if the content isn't normal html.
Nov 21 '07 #3
KevinADC
4,059 Recognized Expert Specialist
html is only written in english as far as I know.
Nov 21 '07 #4
alnoir
23 New Member
I think I misdiagnosed the problem. I believe now that what I'm getting back from these web pages is raw php content, because the forums (english or foreign) I tried were all written in php.

This is what I get back:


By alnoir

It doesn't seem to be formatted or even recognizable code, however, that's what it is. Does this familiar to anyone? Can perl interpret this so that information can be extracted?

Thank you everyone.
Nov 24 '07 #5
KevinADC
4,059 Recognized Expert Specialist
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
Nov 24 '07 #6
numberwhun
3,509 Recognized Expert Moderator Specialist
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
Agreed! There is no way to get the raw PHP code. That is one nice thing about PHP is you cannot just "get" the raw code as it is automatically converted to the HTML output before presenting it to the user.

I also agree that that looks more like binary output than anything else and I don't think that there is any way for Perl to help you with that. If you look at eWish's earlier post, I think he stated it correctly that because this page seems to be flash driven, that is why you are getting binary data, instaed of the html. Why don't you pick a site that is not flash based and try to grab it?

Regards,

Jeff
Nov 25 '07 #7
alnoir
23 New Member
With help from a friend, I finally got the script to work. Instead of using the module that I was, I tried using LWP and it worked great. Thank you to all the people who took time to help me with the problems I was encountering. I appreciate the help.
Nov 28 '07 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

2
3916
by: geoff | last post by:
The table creation script(at the end of this post) works fine on 4.0.1-alpha-win, but the foreign key constraints fail on 4.0.15-win. I am starting the server with the same command for both versions: mysqld-max-nt --console --transaction-isolation=SERIALIZABLE In 4.0.15-win I can extract the following error after I run the table creation script: ERROR 1005: Can't create table '.\ibdata\#sql-a14_3.frm'
43
2908
by: David Trimboli | last post by:
In a text, I might want to include a foreign language term. In print, this is typically shown with italics. For instance (asterisks represent italics): 'I thought it was only a kind of *cram,* such as the Dale-men make for journeys in the wild,' said the Dwarf. 'So it is,' they answered. 'But we call it *lembas* or waybread, and it is more strengthening than any food made by Men, and it is more pleasant than *cram,* by all accounts.'
8
2734
by: gregf | last post by:
Is there a way or a program (for windows) that can translate foreign characters inot the proper html code? I have a word document with many different characters and I really don't want to spend all the time editing it with all the html code ie "ścią". Certainly someone must have a program that can do this automatically.
23
3156
by: gregf | last post by:
I have a paragraph of text pasted into a word document, it's in Polish, complete with polish characters. They show up just fine in word, but the program I use for web page programming, HomeSite, won't translate it. When I paste the text into the code, the special characters are missing. If they would show up there I could use the Replace Special Characters feature to change it to the proper code, but it won't even paste into it...
2
1614
by: H5N1 | last post by:
Hi there First of all excuse me posting such simple (I guess) question, but I didn't find the answer in tutorials. I have a formView presenting records from some table. one of the fields is a Category ID, which is a foreign key. in edit template I replace it with dropdownlist, which is gets this ID, and displays corresponding name, by getting information from other datasource.
1
5160
by: zufie | last post by:
Hi, I want to specifying a foreign key by altering a table. First, I create an ORDERS table without specifying a foreign key. Here is my code: CREATE TABLE ORDERS (Order_ID integer, Order_Date date, Customer_SID integer,
2
3525
by: Andrus | last post by:
I have resource files in different languages created by VCS 2005 Express. I want to use those files to translate reports at runtime. I have text to be translated as string. I think I need to search resource file for this string id. After that I need to return translated string from other resouce file ? Is this best idea? Where to find sample code which implements this ?
3
2128
by: MitchellEr | last post by:
I can't seem to get consistency in my application with foreign character handling. I'm creating a series of forms that update database tables. So, when trying to edit a form, the field values that show up are queried from the database. Occasionally, some fields will contain foreign characters - like ü, ã, é. The Session.Codepage is set to 65001. The charset also is set in the HTML code: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML...
5
7554
by: RomeoX | last post by:
Hi everybody actually I need your help in fixing my code. Actually I have a library system that can be applied in university or any school and I'm stucking in a page that for loan student book. I have student table, Library table, books table, usersystem table and Bookoutonloan table as I mention it below so here in this table I have 4 foreign keys from different tables as I mentioned them before so here in the php page when I want to loan...
0
9512
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10413
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10200
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10145
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9986
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9021
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6769
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5422
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5551
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.