473,218 Members | 1,892 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,218 software developers and data experts.

HTML Dom Parser

So I was looking for a way to be able to parse images (<img>) from a
given url. It so happened that I stumbled upon a nice little piece of
code called "PHP Simple HTML DOM Parser" found here :
http://simplehtmldom.sourceforge.net/

On the first page, I made a form where you can enter a URL, and then
the script tries to fetch the images.
It was indeed what I needed. A simple code like this did a portion of
the job:

<?php
include('simple_html_dom.php'); // This
is the script, download it on their sourceforge.

// Create DOM from URL or file
$dom = file_get_dom('http://www.google.com'); // In my code,
this was replaced by a url variable.

// Find all <img>
foreach($dom->find('img') as $element)

echo $element->src . "<br />" ;

$dom->clear();
unset($dom);
?>

I understood the code, but I'm still a newbie in PHP. What I still
want to do is:

*Be able to specify that it only fetches .jpeg files for example.
*Only allow images that are bigger than a certain dimensions.
*For now it only gives me the URL (relative or absolute, depending on
the html of the source). What I also want is that it displays the
images parsed.

This is mainly for educational purposes, as the best way to learn PHP
is to keep writing small applications with it. So if anyone can point
me in the right direction, it'll be great. And if you know of another
script with the same functionality, it'll be great, I like learning
different ways to achieve something.

Thanks!
Jun 2 '08 #1
1 2304
On Mon, 19 May 2008 08:30:41 +0200, GoodMan <sh**********@gmail.comwrote:
So I was looking for a way to be able to parse images (<img>) from a
given url. It so happened that I stumbled upon a nice little piece of
code called "PHP Simple HTML DOM Parser" found here :
http://simplehtmldom.sourceforge.net/
Is it faster then PHP5's native DOM (don't mix up Dom & DOM in the manual
though...).
On the first page, I made a form where you can enter a URL, and then
the script tries to fetch the images.
It was indeed what I needed. A simple code like this did a portion of
the job:

<?php
include('simple_html_dom.php'); // This
is the script, download it on their sourceforge.

// Create DOM from URL or file
$dom = file_get_dom('http://www.google.com'); // In my code,
this was replaced by a url variable.

// Find all <img>
foreach($dom->find('img') as $element)

echo $element->src . "<br />" ;

$dom->clear();
unset($dom);
?>

I understood the code, but I'm still a newbie in PHP. What I still
want to do is:

*Be able to specify that it only fetches .jpeg files for example.
preg_match() the src attribute you found, or use DOM & XPATH with a more
sofisticated XPATH query.
*Only allow images that are bigger than a certain dimensions.
getimagesize(), keep in mind relative URL's of the page, build a proper
URL string for this.
*For now it only gives me the URL (relative or absolute, depending on
the html of the source). What I also want is that it displays the
images parsed.
Then output HTML, with img tags with the proper src attributes.
--
Rik Wasmus
....spamrun finished
Jun 2 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: YoBro | last post by:
Hi I have used some of this code from the PHP manual, but I am bloody hopeless with regular expressions. Was hoping somebody could offer a hand. The output of this will put the name of a form...
4
by: Leif K-Brooks | last post by:
I'm writing a site with mod_python which will have, among other things, forums. I want to allow users to use some HTML (<em>, <strong>, <p>, etc.) on the forums, but I don't want to allow bad...
0
by: Himanshu Garg | last post by:
Hello, I am using HTML::Parser to extract text from html pages from http://bbc.co.uk/urdu/ However the encoding of the input text seems to change to some unknown encoding in the output. The...
3
by: Himanshu Garg | last post by:
Hello, I am trying to pinpoint an apparent bug in HTML::Parser. The encoding of the text seems to change incorrectly if the locale isn't set properly. However Parser.pm in the directory...
4
by: bariole | last post by:
Hi I am trying to make lexical analysis of some simplified html code with flex tool. However that kind of work is new to me and I don't know where to start. I have searched a web but I didn't...
82
by: Eric Lindsay | last post by:
I have been trying to get a better understanding of simple HTML, but I am finding conflicting information is very common. Not only that, even in what seemed elementary and without any possibility...
8
by: Lachlan Hunt | last post by:
Hi, I'm interested in finding out how erroneous comment syntax within an HTML document should be handled by a parser, according to SGML rules. At present, every browser handles comments in...
2
by: David Virgil Hobbs | last post by:
Loading text strings containing HTML code into an HTML parser in a Javascript/Jscript I would like to know, how one would go about loading a text string containing HTML code, so as to be able to...
0
by: june | last post by:
Hi, I have a big problem with parsing HTML into a XHTML using Cberneko to validate the html. First I tried to work with a HTML-File. This solutions works fine: String aHTMLFile =...
4
by: Jackie | last post by:
Hi, all, I want to get the information of the professors (name,title) from the following link: "http://www.economics.utoronto.ca/index.php/index/person/faculty/" Ideally, I'd like to have a...
1
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.