473,387 Members | 1,611 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

How do I parse this page?

I am trying to parse
http://www.ebay.com without success.

I view the source, and I see a lot of ?/td>. This page is unsavable.

It displays perfectly in IE, but once the source is saved/viewed, it no long
display right in IE.

When I use LYNX to view it, it is formated perfectly.

My question is how Ebay allow any brower to display the content right
without allowing viewing source or safe as?
Jul 23 '05 #1
9 1447
nntp wrote:
I am trying to parse
http://www.ebay.com without success.
In Perl, try
http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm
I view the source, and I see a lot of ?/td>. This page is unsavable. It displays perfectly in IE, but once the source is saved/viewed, it no
long display right in IE.
Maybe it uses css, or needs images to provide formatting hints.
When I use LYNX to view it, it is formated perfectly.

My question is how Ebay allow any brower to display the content right
without allowing viewing source or safe as?


Please don't clutter Perl newsgroups with web server questions.

gtoomey
Jul 23 '05 #2
> > I am trying to parse
http://www.ebay.com without success.


In Perl, try
http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm
I view the source, and I see a lot of ?/td>. This page is unsavable.

It displays perfectly in IE, but once the source is saved/viewed, it no
long display right in IE.


Maybe it uses css, or needs images to provide formatting hints.

Have you looked at the source codes of www.ebay.com?
I don't know what you mean by uses images to provide formatting hints.
Jul 23 '05 #3
nntp wrote:
I don't know what you mean by uses images to provide formatting hints.


Transparent GIFs, perhaps?

Jul 23 '05 #4
[F'ups set to a.w.w.]

nntp wrote:
http://www.ebay.com
I view the source, and I see a lot of ?/td>. This page is unsavable.
It displays perfectly in IE, but once the source is saved/viewed, it no long
display right in IE. My question is how Ebay allow any brower to
display the content right without allowing viewing source or safe as?


IE doesn't simply show you the source when you hit the "view source"
button. Oh no. That would be too easy. It does all kinds of weird crap
first and then shows you some modified source code. I'm guessing that some
of that weird crap screws up some of the characters.

Look at the source code in a different browser and it displays fine.

Not that you should try to emulate any of that code. It's pants.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Jul 23 '05 #5
"nntp" <nn**@rogers.com> wrote in message
news:_d********************@rogers.com...
I am trying to parse
http://www.ebay.com without success.

I view the source, and I see a lot of ?/td>. This page is unsavable.

It displays perfectly in IE, but once the source is saved/viewed, it no
long
display right in IE.

When I use LYNX to view it, it is formated perfectly.

My question is how Ebay allow any brower to display the content right
without allowing viewing source or safe as?


I don't have a copy of Lynx, so I can't duplicate your problem, but...
Opera saves the file with images and IE displays it just fine from the saved
files.

Ebay.com (index.html) uses an external CSS stylesheet. It also uses a
sizeable number of external javascript files and 68 images to make up the
page I looked at.

George

Jul 23 '05 #6

Quoth "nntp" <nn**@rogers.com>:
I am trying to parse
http://www.ebay.com without success.

I view the source, and I see a lot of ?/td>. This page is unsavable.

It displays perfectly in IE, but once the source is saved/viewed, it no long
display right in IE.

When I use LYNX to view it, it is formated perfectly.

My question is how Ebay allow any brower to display the content right
without allowing viewing source or safe as?


They can't. You've probably got character-set issues. Use LWP to retreive the
page.

Ben

--
I must not fear. Fear is the mind-killer. I will face my fear and
I will let it pass through me. When the fear is gone there will be
nothing. Only I will remain.
be*@morrow.me.uk Frank Herbert, 'Dune'
Jul 23 '05 #7
"nntp" <nn**@rogers.com> wrote in
news:_d********************@rogers.com:
I am trying to parse
http://www.ebay.com without success.

I view the source, and I see a lot of ?/td>. This page is unsavable.


That ain't true. If you have any questions on parsing HTML using
HTML::Parser, please post them here. Otherwise, this waaay off-topic.

Sinan
Jul 23 '05 #8
JRS: In article <Xn****************************@132.236.56.8>, dated
Tue, 26 Oct 2004 21:25:13, seen in news:comp.lang.javascript, A. Sinan
Unur <1u**@llenroc.ude.invalid> posted :
"nntp" <nn**@rogers.com> wrote in
news:_d********************@rogers.com:
I am trying to parse
http://www.ebay.com without success.

I view the source, and I see a lot of ?/td>. This page is unsavable.


That ain't true. If you have any questions on parsing HTML using
HTML::Parser, please post them here. Otherwise, this waaay off-topic.


Please take greater, or at least better, thought before using a word
such as "here".

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
<URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 23 '05 #9
Dr John Stockton <sp**@merlyn.demon.co.uk> wrote:
JRS: In article <Xn****************************@132.236.56.8>, dated
Tue, 26 Oct 2004 21:25:13, seen in news:comp.lang.javascript, A. Sinan
Unur <1u**@llenroc.ude.invalid> posted :
"nntp" <nn**@rogers.com> wrote in
news:_d********************@rogers.com:
I am trying to parse
http://www.ebay.com without success.

I view the source, and I see a lot of ?/td>. This page is unsavable.


That ain't true. If you have any questions on parsing HTML using
HTML::Parser, please post them here. Otherwise, this waaay off-topic.


Please take greater, or at least better, thought before using a word
such as "here".

Please take greater, or at least better, notice of the Newsgroups
header before determining which "where" is "here".

:-)
--
Tad McClellan SGML consulting
ta***@augustmail.com Perl programming
Fort Worth, Texas
Jul 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Don | last post by:
I want the server-side php script to return a browser page that is essentially a copy of the original client page that contained the <form> which referenced the php script in the first place....
6
by: nate | last post by:
Hello, Does anyone know where I can find an ASP server side script written in JavaScript to parse text fields from a form method='POST' using enctype='multipart/form-data'? I'd also like it to...
1
by: rerdavies | last post by:
OS: WIndows Server 2003. Currently logged in user is running with German(German) regional settings. Code fragment: System.Globalization.CultureInfo culture = new...
1
by: Pupkin | last post by:
Hi, I was excited to use Microsoft Index Server, built into IIS, to set-up a site-search function, but it doesn't, by default, parse the ASP code of the pages it indexes. This makes it sort of a...
14
by: Rob Meade | last post by:
Hi all, I'm working on a project where there are just under 1300 course files, these are HTML files - my problem is that I need to do more with the content of these pages - and the thought of...
5
by: js | last post by:
I have a textbox contains text in the format of "yyyy/MM/dd hh:mm:ss". I need to parse the text using System.DateTime.Parse() function with custom format. I got an error using the following code. ...
1
by: KittenKoder | last post by:
Basically I'm trying to parse my friends page on MySpace. It's easy to get Page 1. I can use "MSXML2.ServerXMLHTTP" to get the page and parse all my friends. The question is, how do I get page 2...
7
by: Perks | last post by:
Hi. I am trying to find out if it is possible to open a pdf file from within PHP, and parse its contents in order to extract all form fieldnames that might have been previously setup within the...
2
by: Lawrence Krubner | last post by:
Imagine a template system that works by getting a file, as a string, and then putting it through eval(), something like this: $formAsString = $controller->command("readFileAndReturnString",...
0
by: bruce | last post by:
Hi... I've got a couple of test apps that I use to parse/test different html webpages. However, I'm now looking at how to parse a given site/page that uses javascript calls to dynamically...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.