472,351 Members | 1,621 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,351 software developers and data experts.

Using preg_match_all to retrieve HTML data

Hey guys, I am trying to make a forum signature generator and I don't have access to the databases, so I was trying to rip info from profile pages. This is the code I am trying to get:

Expand|Select|Wrap|Line Numbers
  1. <span title='0.31% of total forum posts'>133 (2.02 per day)</span>
I want the "133", total posts... this is what I have, I guess it is completely wrong though:

Expand|Select|Wrap|Line Numbers
  1. "<span title=\'(+.?)% of total forum posts\'>(+.?) ((+.?) per day)</span>"
Any help would be AWESOME!
Apr 16 '10 #1
4 3382
5,058 Expert 4TB

I would try something more like this:
Expand|Select|Wrap|Line Numbers
  1. '#<span[^>]*>(\d+)[^<]*</span>#i'
This should only get the "133" in your example (that's the only number you needed, right?).
Apr 16 '10 #2
Yes, it is. Do you mind explaining how you set that up because I'd really love to learn. :D
Apr 17 '10 #3
5,058 Expert 4TB
  1. All regular expressions need to be enclosed in preselected delimiter characters. I chose # because it won't be used in the expression itself.
    (Using the / char is very popular.)
    Expand|Select|Wrap|Line Numbers
    1. ##
  2. Then I add the basics. In this case, you are looking for a <span>, so I start with that.
    Expand|Select|Wrap|Line Numbers
    1. #<span></span>#
  3. The <span> you are looking for includes attributes, all of which are irrelevant to what we are searching for. So I add a character class that looks for everything except the > that would close the span tag -- [^>] -- and I tell it to search for any number of that class by adding a asterisk (*) to it.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*></span>#
  4. We are looking for a number at the beginning of the span's value, so I add a class that searches only for numbers. This would normally be [0-9], but being a very frequently used class, there is a short-hand for it: \d. We are searching for one or more digit, so we use the + operator on the class.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>\d+</span>#
  5. Because we want to be able to retrieve the number, we create a group around it by enclosing it in parenthesis. Groups are useful in many ways, but in this case it's purpose is to make PHP capture it's contents and add it to the output array.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)</span>#
  6. And finally we only need to account for the rest of the span's value, so we add another class to it that searches for anything but the opening < of the closing </span>. Like before, to make it match any number of the char class, we add an asterisk.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#
  7. Because HTML can be either upper or lower case, I added a i to the expression, after the closing # delimiter. This makes it case-insensitive.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#i

To get all the results inside PHP, you would execute the expression using the preg_match_all function.
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. $str = "... lots of HTML from your source ...";
  3. $regexp = '#<span[^>]*>(\d+)[^<]*</span>#';
  5. if(preg_match_all($regexp, $str, $matches)) {
  6.     for($i = 0; $i < count($matches[0]); $i++) {
  7.         echo "Match #$i = {$matches[1][$i]}\n<br>";
  8.     }
  9. }
  10. else {
  11.     echo "No matches.";
  12. }
  13. ?>
Check out regular-exressions.info if you are interested in learning more about regular expression. It's tricky to learn, but well worth it.

Hope that made sense :)
Apr 17 '10 #4
Thank you VERY much. The best explanation of this stuff I have found on the internet.
Apr 17 '10 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

by: Han | last post by:
Determining the pattern below has got my stumped. I have a page of HTML and need to find all occurrences of the following pattern: ...
by: Han | last post by:
I'm wondering if someone can explain why the following works with preg_match_all, but not preg_match: $html = "product=3456789&amp;" ...
by: Han | last post by:
I know this is possible (because preg can do almost anything!), but can't get a handle on the syntax. I have an HTML string: <font size="3"><a...
by: marco | last post by:
Hello, I'm putting together a php webpage which is parsing my (.html) bookmarks list. I want to give them a new lay-out with php and CSS. My...
by: David Rasmussen | last post by:
Some sites seem to be session driven in the sense that if I visit the homepage and do a few clicks I can navigate anywhere I want, but if I paste...
by: Grasshopper | last post by:
Hi, I am automating Access reports to PDF using PDF Writer 6.0. I've created a DTS package to run the reports and schedule a job to run this...
by: bob.herbst | last post by:
I have been trying to use HTML_Table from PEAR to write a PHP script that will access a database and retrieve my data into an HTML table that can...
by: crescent_au | last post by:
Hi all, I've been trying unsuccessfully to get the text from html page. Html tag that I'm interested in looks like this: <a class=link...
by: PaulB | last post by:
Hello, as a newbie I'm requesting some help in understanding the regular expression below preg_match_all("|<tr(.*)</tr>|U",$table,$rows); ...
by: lifeisgreat20009 | last post by:
I am a newbie to Struts and JSP...I have been working on the code below for 5 hours now..I googled a lot but couldn't get much help so finally I am...
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.