I'm working on this script that grabs a web page from a foreign site, searches it for specific information, and grabs web pages from links on the original page. Once I had it working, I tried it out on the foreign site. However, the information I got back was nonsensical. I'm guessing the code I get back from the web page is written in that foreign language, but when I [view]->[page source] of the same page, it looks like normal html code.
Does anyone know what is happening here or how to fix it?
Here is my script: -
use strict;
-
use WWW::Mechanize;
-
-
my $mech = WWW::Mechanize->new();
-
my $page = $mech->get('http://russia.ru');
-
-
print $page->content;
-
Here is what the page looks like when I view the source code from my browser:
Here is what html code is returned after the script is run:
I hope I've provided adequate information. Thank you for all of your help.
7 1928 eWish 971
Recognized Expert Contributor
There is nothing wrong with the code you posted. Have you tried to check other sites? The Russian site you are trying to view is mostly flash content. That is likely your problem.
--Kevin
Thanks for your input!
I'm developing this script for web sites from many different countries. The first that I it tried on was another russian pages. I simply provided russia.ru as an example. I want to be able to search the content retrieved for different strings, but I don't know how I can do that if the content isn't normal html.
KevinADC 4,059
Recognized Expert Specialist
html is only written in english as far as I know.
I think I misdiagnosed the problem. I believe now that what I'm getting back from these web pages is raw php content, because the forums (english or foreign) I tried were all written in php.
This is what I get back:
By alnoir
It doesn't seem to be formatted or even recognizable code, however, that's what it is. Does this familiar to anyone? Can perl interpret this so that information can be extracted?
Thank you everyone.
KevinADC 4,059
Recognized Expert Specialist
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
numberwhun 3,509
Recognized Expert Moderator Specialist
Looks like some kind of binary code. Could be flash or something similar. I have no idea if perl can translate that into something useful.
Agreed! There is no way to get the raw PHP code. That is one nice thing about PHP is you cannot just "get" the raw code as it is automatically converted to the HTML output before presenting it to the user.
I also agree that that looks more like binary output than anything else and I don't think that there is any way for Perl to help you with that. If you look at eWish's earlier post, I think he stated it correctly that because this page seems to be flash driven, that is why you are getting binary data, instaed of the html. Why don't you pick a site that is not flash based and try to grab it?
Regards,
Jeff
With help from a friend, I finally got the script to work. Instead of using the module that I was, I tried using LWP and it worked great. Thank you to all the people who took time to help me with the problems I was encountering. I appreciate the help.
Sign in to post your reply or Sign up for a free account.
Similar topics |
by: geoff |
last post by:
The table creation script(at the end of this post) works fine on
4.0.1-alpha-win, but the foreign key constraints fail on 4.0.15-win. I
am starting the server with the same command for both versions:
mysqld-max-nt --console --transaction-isolation=SERIALIZABLE
In 4.0.15-win I can extract the following error after I run the table
creation script:
ERROR 1005: Can't create table '.\ibdata\#sql-a14_3.frm'
|
by: David Trimboli |
last post by:
In a text, I might want to include a foreign language term. In print,
this is typically shown with italics. For instance (asterisks
represent italics):
'I thought it was only a kind of *cram,* such as the Dale-men make
for journeys in the wild,' said the Dwarf.
'So it is,' they answered. 'But we call it *lembas* or waybread,
and it is more strengthening than any food made by Men, and it is more
pleasant than *cram,* by all accounts.'
|
by: gregf |
last post by:
Is there a way or a program (for windows) that can translate foreign
characters inot the proper html code? I have a word document with many
different characters and I really don't want to spend all the time
editing it with all the html code ie "ścią". Certainly someone
must have a program that can do this automatically.
|
by: gregf |
last post by:
I have a paragraph of text pasted into a word document, it's in Polish,
complete with polish characters. They show up just fine in word, but
the program I use for web page programming, HomeSite, won't translate
it. When I paste the text into the code, the special characters are
missing. If they would show up there I could use the Replace Special
Characters feature to change it to the proper code, but it won't even
paste into it...
|
by: H5N1 |
last post by:
Hi there
First of all excuse me posting such simple (I guess) question, but I
didn't find the answer in tutorials.
I have a formView presenting records from some table. one of the fields
is a Category ID, which is a foreign key. in edit template I replace it
with dropdownlist, which is gets this ID, and displays corresponding
name, by getting information from other datasource.
| |
by: zufie |
last post by:
Hi,
I want to specifying a foreign key by altering a table.
First, I create an ORDERS table without specifying a foreign key. Here
is my code:
CREATE TABLE ORDERS
(Order_ID integer,
Order_Date date,
Customer_SID integer,
|
by: Andrus |
last post by:
I have resource files in different languages created by VCS 2005 Express.
I want to use those files to translate reports at runtime.
I have text to be translated as string.
I think I need to search resource file for this string id.
After that I need to return translated string from other resouce file ?
Is this best idea?
Where to find sample code which implements this ?
|
by: MitchellEr |
last post by:
I can't seem to get consistency in my application with foreign character handling. I'm creating a series of forms that update database tables. So, when trying to edit a form, the field values that show up are queried from the database.
Occasionally, some fields will contain foreign characters - like ü, ã, é.
The Session.Codepage is set to 65001.
The charset also is set in the HTML code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML...
|
by: RomeoX |
last post by:
Hi everybody
actually I need your help in fixing my code. Actually I have a library system that can be applied in university or any school and I'm stucking in a page that for loan student book. I have student table, Library table, books table, usersystem table and Bookoutonloan table as I mention it below so here in this table I have 4 foreign keys from different tables as I mentioned them before so here in the php page when I want to loan...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |