473,373 Members | 1,117 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,373 software developers and data experts.

Using preg_match_all to retrieve HTML data

Hey guys, I am trying to make a forum signature generator and I don't have access to the databases, so I was trying to rip info from profile pages. This is the code I am trying to get:

Expand|Select|Wrap|Line Numbers
  1. <span title='0.31% of total forum posts'>133 (2.02 per day)</span>
I want the "133", total posts... this is what I have, I guess it is completely wrong though:

Expand|Select|Wrap|Line Numbers
  1. "<span title=\'(+.?)% of total forum posts\'>(+.?) ((+.?) per day)</span>"
Any help would be AWESOME!
Apr 16 '10 #1
4 3420
5,058 Expert 4TB

I would try something more like this:
Expand|Select|Wrap|Line Numbers
  1. '#<span[^>]*>(\d+)[^<]*</span>#i'
This should only get the "133" in your example (that's the only number you needed, right?).
Apr 16 '10 #2
Yes, it is. Do you mind explaining how you set that up because I'd really love to learn. :D
Apr 17 '10 #3
5,058 Expert 4TB
  1. All regular expressions need to be enclosed in preselected delimiter characters. I chose # because it won't be used in the expression itself.
    (Using the / char is very popular.)
    Expand|Select|Wrap|Line Numbers
    1. ##
  2. Then I add the basics. In this case, you are looking for a <span>, so I start with that.
    Expand|Select|Wrap|Line Numbers
    1. #<span></span>#
  3. The <span> you are looking for includes attributes, all of which are irrelevant to what we are searching for. So I add a character class that looks for everything except the > that would close the span tag -- [^>] -- and I tell it to search for any number of that class by adding a asterisk (*) to it.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*></span>#
  4. We are looking for a number at the beginning of the span's value, so I add a class that searches only for numbers. This would normally be [0-9], but being a very frequently used class, there is a short-hand for it: \d. We are searching for one or more digit, so we use the + operator on the class.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>\d+</span>#
  5. Because we want to be able to retrieve the number, we create a group around it by enclosing it in parenthesis. Groups are useful in many ways, but in this case it's purpose is to make PHP capture it's contents and add it to the output array.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)</span>#
  6. And finally we only need to account for the rest of the span's value, so we add another class to it that searches for anything but the opening < of the closing </span>. Like before, to make it match any number of the char class, we add an asterisk.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#
  7. Because HTML can be either upper or lower case, I added a i to the expression, after the closing # delimiter. This makes it case-insensitive.
    Expand|Select|Wrap|Line Numbers
    1. #<span[^>]*>(\d+)[^<]*</span>#i

To get all the results inside PHP, you would execute the expression using the preg_match_all function.
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. $str = "... lots of HTML from your source ...";
  3. $regexp = '#<span[^>]*>(\d+)[^<]*</span>#';
  5. if(preg_match_all($regexp, $str, $matches)) {
  6.     for($i = 0; $i < count($matches[0]); $i++) {
  7.         echo "Match #$i = {$matches[1][$i]}\n<br>";
  8.     }
  9. }
  10. else {
  11.     echo "No matches.";
  12. }
  13. ?>
Check out regular-exressions.info if you are interested in learning more about regular expression. It's tricky to learn, but well worth it.

Hope that made sense :)
Apr 17 '10 #4
Thank you VERY much. The best explanation of this stuff I have found on the internet.
Apr 17 '10 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

by: Han | last post by:
Determining the pattern below has got my stumped. I have a page of HTML and need to find all occurrences of the following pattern: score=9999999999&amp; The number shown can be 5-10 characters...
by: Han | last post by:
I'm wondering if someone can explain why the following works with preg_match_all, but not preg_match: $html = "product=3456789&amp;" preg_match_all ("|product=(\d{5,10})&amp;|i", $html, $out); $out...
by: Han | last post by:
I know this is possible (because preg can do almost anything!), but can't get a handle on the syntax. I have an HTML string: <font size="3"><a...
by: marco | last post by:
Hello, I'm putting together a php webpage which is parsing my (.html) bookmarks list. I want to give them a new lay-out with php and CSS. My question is: How can I make a function that counts...
by: David Rasmussen | last post by:
Some sites seem to be session driven in the sense that if I visit the homepage and do a few clicks I can navigate anywhere I want, but if I paste the current location into a new browser window...
by: Grasshopper | last post by:
Hi, I am automating Access reports to PDF using PDF Writer 6.0. I've created a DTS package to run the reports and schedule a job to run this DTS package. If I PC Anywhere into the server on...
by: bob.herbst | last post by:
I have been trying to use HTML_Table from PEAR to write a PHP script that will access a database and retrieve my data into an HTML table that can be sorted by column. Currently I am using the...
by: crescent_au | last post by:
Hi all, I've been trying unsuccessfully to get the text from html page. Html tag that I'm interested in looks like this: <a class=link...
by: PaulB | last post by:
Hello, as a newbie I'm requesting some help in understanding the regular expression below preg_match_all("|<tr(.*)</tr>|U",$table,$rows); Would anybody please just run through...
by: lifeisgreat20009 | last post by:
I am a newbie to Struts and JSP...I have been working on the code below for 5 hours now..I googled a lot but couldn't get much help so finally I am here.. Hoping of getting my problem solved. Please...
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.