By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,575 Members | 1,303 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,575 IT Pros & Developers. It's quick & easy.

Using preg_match_all to retrieve HTML data

P: 3
Hey guys, I am trying to make a forum signature generator and I don't have access to the databases, so I was trying to rip info from profile pages. This is the code I am trying to get:

Expand|Select|Wrap|Line Numbers
  1. <span title='0.31% of total forum posts'>133 (2.02 per day)</span>
I want the "133", total posts... this is what I have, I guess it is completely wrong though:

Expand|Select|Wrap|Line Numbers
  1. "<span title=\'(+.?)% of total forum posts\'>(+.?) ((+.?) per day)</span>"
Any help would be AWESOME!
Apr 16 '10 #1
Share this Question
Share on Google+
4 Replies


Atli
Expert 5K+
P: 5,058
Hey.

I would try something more like this:
Expand|Select|Wrap|Line Numbers
  1. '#<span[^>]*>(\d+)[^<]*</span>#i'
This should only get the "133" in your example (that's the only number you needed, right?).
Apr 16 '10 #2

P: 3
Yes, it is. Do you mind explaining how you set that up because I'd really love to learn. :D
Apr 17 '10 #3

Atli
Expert 5K+
P: 5,058
Sure.
  1. All regular expressions need to be enclosed in preselected delimiter characters. I chose # because it won't be used in the expression itself.
    (Using the / char is very popular.)
    Expand|Select|Wrap|Line Numbers
    1. ##
  2. Then I add the basics. In this case, you are looking for a <span>, so I start with that.
    Expand|Select|Wrap|Line Numbers
    1. #<span></span>#
  3. The <span> you are looking for includes attributes, all of which are irrelevant to what we are searching for. So I add a character class that looks for everything except the > that would close the span tag -- [^>] -- and I tell it to search for any number of that class by adding a asterisk (*) to it.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*></span>#
  4. We are looking for a number at the beginning of the span's value, so I add a class that searches only for numbers. This would normally be [0-9], but being a very frequently used class, there is a short-hand for it: \d. We are searching for one or more digit, so we use the + operator on the class.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>\d+</span>#
  5. Because we want to be able to retrieve the number, we create a group around it by enclosing it in parenthesis. Groups are useful in many ways, but in this case it's purpose is to make PHP capture it's contents and add it to the output array.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)</span>#
  6. And finally we only need to account for the rest of the span's value, so we add another class to it that searches for anything but the opening < of the closing </span>. Like before, to make it match any number of the char class, we add an asterisk.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#
  7. Because HTML can be either upper or lower case, I added a i to the expression, after the closing # delimiter. This makes it case-insensitive.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#i

To get all the results inside PHP, you would execute the expression using the preg_match_all function.
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. $str = "... lots of HTML from your source ...";
  3. $regexp = '#<span[^>]*>(\d+)[^<]*</span>#';
  4.  
  5. if(preg_match_all($regexp, $str, $matches)) {
  6.     for($i = 0; $i < count($matches[0]); $i++) {
  7.         echo "Match #$i = {$matches[1][$i]}\n<br>";
  8.     }
  9. }
  10. else {
  11.     echo "No matches.";
  12. }
  13. ?>
Check out regular-exressions.info if you are interested in learning more about regular expression. It's tricky to learn, but well worth it.

Hope that made sense :)
Apr 17 '10 #4

P: 3
Thank you VERY much. The best explanation of this stuff I have found on the internet.
Apr 17 '10 #5

Post your reply

Sign in to post your reply or Sign up for a free account.