473,412 Members | 2,088 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,412 software developers and data experts.

Parsing pricerunner.com results via regular expression.

59
hi have been trying to write a regular expression in php that will get the price of any product page at pricerunner.com, if you could suggest a regular expression i would be very gratefull.
thnaks
May 18 '07 #1
18 3394
pbmods
5,821 Expert 4TB
Changed forum title to better match contents.

Heya, ojsimon. Welcome to TSDN!

First thing to do is to write the script to connect to PriceRunner and grab the results page. Then all you need to do is examine the results, locate the element that contains the price and program a [regex] search for it.

Once you get to that point, let us know if you have any further problems.
May 18 '07 #2
ojsimon
59
I have been trying to write a regular expression to do this i have done the other things this is where my problem lies
May 19 '07 #3
pbmods
5,821 Expert 4TB
I have been trying to write a regular expression to do this i have done the other things this is where my problem lies
Post a snippet of the pricerunner.com stream that your script needs to parse so we can see how the regular expression needs to be structured.
May 19 '07 #4
ojsimon
59
mpare prices : Nokia N93
Talk time: 5, standby time: 240, Camera: Yes, Integrated, 180 gram, WAP, GPRS, MP3 More product info
Price range:
£405.38 - £409.99

from here i need the price range.
Thanks
May 20 '07 #5
pbmods
5,821 Expert 4TB
mpare prices : Nokia N93
Talk time: 5, standby time: 240, Camera: Yes, Integrated, 180 gram, WAP, GPRS, MP3 More product info
Price range:
£405.38 - £409.99
Is pricerunner sending your script plain text like that, or are you receiving HTML or an RSS feed?
May 20 '07 #6
ojsimon
59
html, and i want it to work for all pricerunner product pages
Thanks
May 20 '07 #7
pbmods
5,821 Expert 4TB
html, and i want it to work for all pricerunner product pages
Thanks
All you have to do is find the HTML tags that contain the data you need, then use create a backreferences to capture the values you need.

So for example, if your data were located here:
Expand|Select|Wrap|Line Numbers
  1. <div>Price Range:</div>£405.38 - £409.99
  2.  
Expand|Select|Wrap|Line Numbers
  1. /(?<=<div>Price Range:<\/div>)£(\d+\.\d{2})\s-\s£(\d+\.\d{2})/
  2.  
Run that through preg_match, and your match array will be:
Expand|Select|Wrap|Line Numbers
  1. array
  2. (
  3.     [0] => £405.38 - £409.99
  4.     [1] => 405.38
  5.     [2] => 409.99
  6. )
  7.  
May 20 '07 #8
Hi There

I'm from PriceRunner.

You can just access our API and get everything back in XML format. That would be much easier for you and we would not have the load on our server :)

Send me a mail and I will ensure that you get going.

Best
Martin Andersen
GM, PriceRunner.com
May 24 '07 #9
ojsimon
59
Sorry such a late reply but how do i use
/(?<=<div>Price Range:<\/div>)£(\d+\.\d{2})\s-\s£(\d+\.\d{2})/
in order to get the price i don't understand how you put this in a preg match and replace. and what i am doing at the moment is a simple php get source command is that ok.
Thanks
Jul 4 '07 #10
pbmods
5,821 Expert 4TB
Heya, ojsimon.

Sorry such a late reply but how do i use ... in order to get the price i don't understand how you put this in a preg match and replace. and what i am doing at the moment is a simple php get source command is that ok.
The regex uses lookbehind and lookahead to match (but not include) the block that contains the data you want.

But as PriceRunnerUS mentioned, there is an API for retrieving the info you're looking for.
Jul 4 '07 #11
ojsimon
59
i cannot find an api for the uk version of pricerunner, sorry, but could you please show me how to put it into the preg match and preg replace, i still do not understand this despite quite a lot of research.
Thanks
Jul 5 '07 #12
pbmods
5,821 Expert 4TB
Heya, ojsimon.

It looks like to get access to their API, you must first become a partner:
http://www.pricerunner.com/partner/partner.html

Not sure if that means that you have to give them money. I sent a PM to PriceRunnerUS and asked him to provide more details. We'll see what happens.

The search results page looks a little tricky to parse, but it looks like every price is listed like this:

Expand|Select|Wrap|Line Numbers
  1. <span class="listprice">£184.99</span>
So you need to grab the '184.99' inside of that SPAN. To do that, you must preg_match_all() using a lookbehind:

Expand|Select|Wrap|Line Numbers
  1. $html = file_get_contents('http://pricerunner.co.uk/search?q=' . $searchOrWhateverYouCalledIt);
  2. preg_match_all('/(?<=<span class="listprice">£)\d+\.\d{2}/', $html, $matches);
  3.  
Jul 5 '07 #13
ojsimon
59
Thanks for all your help, i tried the code you suggested and it returned a blank page, i tried to echo the $matches and $html but neither worked, as i am an absolute begginer with php i have no idea what to do could you please explain thanks again for all your help.
Olie
Jul 5 '07 #14
ojsimon
59
sorry, how do i use preg match and replace could you tell me any sites where i can learn how to use them to fulfill my request previously
Thanks
Jul 11 '07 #15
sorry, how do i use preg match and replace could you tell me any sites where i can learn how to use them to fulfill my request previously
Thanks

Here's a simple example on how to use those functions:

$pattern = "/[^a-zA-Z0-9]/";
$replacement = " _";
$replaced_name= preg_replace($pattern,$replacement,$original_name) ;

This example shows you the pattern you are searching for, in this case, anything that is not a letter or number and replacing it with an underscore. It takes an original name variable (i.e. Tac k#y) and after it goes through preg_replace returns something like "Tac_k_y"

Hope that helps
Jul 11 '07 #16
pbmods
5,821 Expert 4TB
Heya, Olie.

A blank page means that your script is probably generating errors.

Check out this article.
Jul 11 '07 #17
ojsimon
59
Heya, Olie.

A blank page means that your script is probably generating errors.

Check out this article.

Sorry for very late reply
but with echo $matches the script retuns 'array' and with echo $html it returns the whole page.
How can i fix this?

[PHP]<?php
$html = file_get_contents('http://pricerunner.co.uk/search?q=ipod');
preg_match_all('/(?<=<span class="listprice">£)\d+\.\d{2}/', $html, $matches);

echo $html;
?>[/PHP]
Thanks
Jun 27 '08 #18
pbmods
5,821 Expert 4TB
Try
Expand|Select|Wrap|Line Numbers
  1. print_r($matches);
Jun 27 '08 #19

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: YoBro | last post by:
Hi I have used some of this code from the PHP manual, but I am bloody hopeless with regular expressions. Was hoping somebody could offer a hand. The output of this will put the name of a form...
11
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being...
17
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions...
3
by: Zach | last post by:
Say I have a string which is of the format {A, B, C, D} for some variable number of objects. I want a regular expression that will put each of A, B, C, and D into its own separate capture...
5
by: timslavin | last post by:
Hi, I'm trying to do something with PHP and I'm not 100% familiar with PHP as I am with VBScript. So if you could bear with me on what is likely a stupid question, I'd appreciate it! I have a...
1
by: bitwxtadpl | last post by:
Hi, I have a simple parsing regular expression that is expecting data and a delimiter. Axy where A is the data and xy is the delimiter When the delimiter is xy it works as expected. But, when I...
1
by: Robert Neville | last post by:
Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the...
3
by: Phillip B Oldham | last post by:
Hi. I'm stretching my boundaries in programming with a little python shell-script which is going to loop through a list of domain names, grab the whois record, parse it, and put the results into a...
5
by: Svenn Are Bjerkem | last post by:
On Jul 23, 1:03 pm, christopher.saun...@durham.ac.uk (c d saunter) wrote: As a start I want to parse VHDL which is going to be synthesised, and I am limiting myself to the entities and the...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.