By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,949 Members | 1,855 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,949 IT Pros & Developers. It's quick & easy.

Regex Script Help

P: 2
I think I did post this some time ago, but can't find original thread to rehash. I have some questions regarding lifting data from a particular webpage. What makes this unusual and why I need to ask some questions is that within the tags, there is a lot of white space and the data doesn't actually sit within closed tags eg. <span>data</span>, it falls like this

Expand|Select|Wrap|Line Numbers
  1.                                         <td width="55%"><div class="value">
  2.                                             &pound;6.99 <font size="3"> </font></div>
  3.  
  4.                                         </td>
With the "&pound;6.99" being what I want to extract and use. for example, this code works perfectly for a different website.

Expand|Select|Wrap|Line Numbers
  1. $url = 'http://www.cheapsmells.com/viewProduct.php?id=3462';
  2. $html = file_get_contents($url);
  3.  
  4. preg_match('/<div class=\'productOurPrice\'?>(.+?)(\d+\.\d+)(.+?)?<\/div>/', $html, $match);
  5. $out = $match[2];
Where the url is http://www.directcosmetics.com/results/products.cfm?ctype=ME&range=Hummer&code=34744 which is where the first example where the whitespace is, how can I adjust the above regex to obtain the information required, in this case literally "6.99" and nothing more. Is it possible because it's not within closed brackets?

Any help you can shed my way would be greatly appreciated.

Cheers ;D
Feb 6 '08 #1
Share this Question
Share on Google+
1 Reply


pbmods
Expert 5K+
P: 5,821
Ew. That's really unfortunate. They've got a doctype, they're using jQuery... it's like they're really trying... and then they go and dump a useless (and obsolete) font tag and an unexpected </div>.

Probably the new guy who transferred from a .NET project. You don't have to write any code for .NET, you know. It's all drag-and-drop (:

Until the abstraction leaks, of course (http://www.joelonsoftware.com/articl...tractions.html).

Anyway, on to your problem.

It looks like they've changed their markup. Here's the new HTML:
Expand|Select|Wrap|Line Numbers
  1. <tr>
  2.                                         <td width="45%" class="price">OUR<br />PRICE</td>
  3.                                         <td width="55%"><div class="value">
  4.                                             &pound;6.99 <font size="3"> </font></div>
  5.  
  6.                                         </td>
  7.                                       </tr>
Jun 27 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.