469,267 Members | 1,114 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,267 developers. It's quick & easy.

Using preg_match_all to retrieve HTML data

Hey guys, I am trying to make a forum signature generator and I don't have access to the databases, so I was trying to rip info from profile pages. This is the code I am trying to get:

Expand|Select|Wrap|Line Numbers
  1. <span title='0.31% of total forum posts'>133 (2.02 per day)</span>
I want the "133", total posts... this is what I have, I guess it is completely wrong though:

Expand|Select|Wrap|Line Numbers
  1. "<span title=\'(+.?)% of total forum posts\'>(+.?) ((+.?) per day)</span>"
Any help would be AWESOME!
Apr 16 '10 #1
4 3201
5,058 Expert 4TB

I would try something more like this:
Expand|Select|Wrap|Line Numbers
  1. '#<span[^>]*>(\d+)[^<]*</span>#i'
This should only get the "133" in your example (that's the only number you needed, right?).
Apr 16 '10 #2
Yes, it is. Do you mind explaining how you set that up because I'd really love to learn. :D
Apr 17 '10 #3
5,058 Expert 4TB
  1. All regular expressions need to be enclosed in preselected delimiter characters. I chose # because it won't be used in the expression itself.
    (Using the / char is very popular.)
    Expand|Select|Wrap|Line Numbers
    1. ##
  2. Then I add the basics. In this case, you are looking for a <span>, so I start with that.
    Expand|Select|Wrap|Line Numbers
    1. #<span></span>#
  3. The <span> you are looking for includes attributes, all of which are irrelevant to what we are searching for. So I add a character class that looks for everything except the > that would close the span tag -- [^>] -- and I tell it to search for any number of that class by adding a asterisk (*) to it.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*></span>#
  4. We are looking for a number at the beginning of the span's value, so I add a class that searches only for numbers. This would normally be [0-9], but being a very frequently used class, there is a short-hand for it: \d. We are searching for one or more digit, so we use the + operator on the class.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>\d+</span>#
  5. Because we want to be able to retrieve the number, we create a group around it by enclosing it in parenthesis. Groups are useful in many ways, but in this case it's purpose is to make PHP capture it's contents and add it to the output array.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)</span>#
  6. And finally we only need to account for the rest of the span's value, so we add another class to it that searches for anything but the opening < of the closing </span>. Like before, to make it match any number of the char class, we add an asterisk.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#
  7. Because HTML can be either upper or lower case, I added a i to the expression, after the closing # delimiter. This makes it case-insensitive.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#i

To get all the results inside PHP, you would execute the expression using the preg_match_all function.
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. $str = "... lots of HTML from your source ...";
  3. $regexp = '#<span[^>]*>(\d+)[^<]*</span>#';
  5. if(preg_match_all($regexp, $str, $matches)) {
  6.     for($i = 0; $i < count($matches[0]); $i++) {
  7.         echo "Match #$i = {$matches[1][$i]}\n<br>";
  8.     }
  9. }
  10. else {
  11.     echo "No matches.";
  12. }
  13. ?>
Check out regular-exressions.info if you are interested in learning more about regular expression. It's tricky to learn, but well worth it.

Hope that made sense :)
Apr 17 '10 #4
Thank you VERY much. The best explanation of this stuff I have found on the internet.
Apr 17 '10 #5

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

4 posts views Thread by Han | last post: by
2 posts views Thread by Han | last post: by
3 posts views Thread by Han | last post: by
5 posts views Thread by David Rasmussen | last post: by
3 posts views Thread by bob.herbst | last post: by
3 posts views Thread by crescent_au | last post: by
6 posts views Thread by PaulB | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.