473,804 Members | 2,123 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

match regular expressions in a webpage, not the entire source

In order to check availability, I want to visit many different pages
from a database and try and match the regular expressions 'out of
stock' or 'unavailable' or 'sold out' etc. If found, the product will
be flagged as unavailable.

I tried using fopen() and preg_match() but the problem is, if the reg.
expression is present in javascript or in comments, the item gets
wrongly flagged. Is there a way of taking what appears on the users
screen, after parsing the page, and looking at it as a string I suppose
?

I feel I might be missing a concept here, can anyone help ?

Jan 17 '06 #1
7 1389
d
"charliefortune " <go****@charlie fortune.com> wrote in message
news:11******** **************@ o13g2000cwo.goo glegroups.com.. .
In order to check availability, I want to visit many different pages
from a database and try and match the regular expressions 'out of
stock' or 'unavailable' or 'sold out' etc. If found, the product will
be flagged as unavailable.

I tried using fopen() and preg_match() but the problem is, if the reg.
expression is present in javascript or in comments, the item gets
wrongly flagged. Is there a way of taking what appears on the users
screen, after parsing the page, and looking at it as a string I suppose
?

I feel I might be missing a concept here, can anyone help ?


The answer is right in front of you ;) Remove the tags from the source
code, (first using regular expressions to remove script chunks, then using
strip_tags to clean out the rest), and you have your document ready to be
examined.

dave
Jan 17 '06 #2
If I understand correctly, this won't tell me the result of executed
scripts on the page, such as this ;

if (index != -1) {
if (productPrice[index] == 'SOLD OUT') {
alert("This Product cannot be purchased at this time.");
result=false;
}

This code is present in all the products, whether in stock or out, and
I need it to run to decide whether or not the words 'SOLD OUT' are
going to appear. I am wondering if eval() might provide the answer ? I
am experimenting with it, with little success so far..

Jan 17 '06 #3
d
"charliefortune " <go****@charlie fortune.com> wrote in message
news:11******** **************@ g47g2000cwa.goo glegroups.com.. .
If I understand correctly, this won't tell me the result of executed
scripts on the page, such as this ;

if (index != -1) {
if (productPrice[index] == 'SOLD OUT') {
alert("This Product cannot be purchased at this time.");
result=false;
}

This code is present in all the products, whether in stock or out, and
I need it to run to decide whether or not the words 'SOLD OUT' are
going to appear. I am wondering if eval() might provide the answer ? I
am experimenting with it, with little success so far..


Eval is NEVER the answer. Ugh. Seriously, if you think eval is the answer,
you're doing something horribly wrong.

Nothing will tell you the output of executed scripts on the page, except the
browser in which it's running (which isn't possible in your case, as PHP is
not a javascript-enabled web browser). If you want to see the output,
follow the input, and make your own deductions from that :) Could you show
me the complete block of javascript? Maybe I can help you.

"Winners don't do eval" :-P

dave
Jan 17 '06 #4
Here is an example of one of the pages of a sold out item

http://www.subsidesports.com/uk/stor...ist.jsp?id=402,

and here is an in-stock one

http://www.subsidesports.com/uk/stor...ist.jsp?id=402,

Looking through, it seems that the javaScript array element
productPrice[0] contains the information I need, so I suppose the
sensible thing would be to look at this alone to decide if an item is
in stock. I think it is null if the product is available. So my
question now becomes ....

how can I test the value of this variable using PHP on the retrieved
document ? Surely not looking for the regex

thanks
productPrice[0] = 'SOLD OUT' ?

Jan 17 '06 #5
d
"charliefortune " <go****@charlie fortune.com> wrote in message
news:11******** **************@ f14g2000cwb.goo glegroups.com.. .
Here is an example of one of the pages of a sold out item

http://www.subsidesports.com/uk/stor...ist.jsp?id=402,

and here is an in-stock one

http://www.subsidesports.com/uk/stor...ist.jsp?id=402,

Looking through, it seems that the javaScript array element
productPrice[0] contains the information I need, so I suppose the
sensible thing would be to look at this alone to decide if an item is
in stock. I think it is null if the product is available. So my
question now becomes ....

how can I test the value of this variable using PHP on the retrieved
document ? Surely not looking for the regex

thanks
productPrice[0] = 'SOLD OUT' ?


Why don't you check to see if this text is in the document or not:

<span style="font-size:20px; color:#000000;" >SOLD OUT&nbsp;</span>

Surely that'll tell you if it's sold out or not, regardless of javascript.
You don't even have to strip any tags before checking for it... :)
Jan 17 '06 #6
yes, that's it. thanks for your help.

I am starting to think that there is no way of taking a URL and turning
into a string that represents what the browser would output to the
screen (without writing a browser itself). Or else there is an area of
PHP functions that deal with this that I am unaware of.

Thanks again.

Ruari

Jan 17 '06 #7
d
"charliefortune " <go****@charlie fortune.com> wrote in message
news:11******** **************@ g14g2000cwa.goo glegroups.com.. .
yes, that's it. thanks for your help.

I am starting to think that there is no way of taking a URL and turning
into a string that represents what the browser would output to the
screen (without writing a browser itself). Or else there is an area of
PHP functions that deal with this that I am unaware of.
Exactly - that's what a browser is for :)
Thanks again.
any time!
Ruari

Jan 17 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3983
by: Venkat | last post by:
Hi, I am using match function of string to find if a character is there in a string. The function Match is working fine with all the other characters except when the searching character is "+". Here is the piece of code i am using var line1 = "Hell+O";
19
2162
by: Tom Deco | last post by:
Hi, I'm trying to use a regular expression to match a string containing a # (basically i'm looking for #include ...) I don't seem to manage to write a regular expression that matches this. My (probably to naive) approach is: p = re.compile(r'\b#include\b) I also tried p = re.compile(r'\b\#include\b) in a futile attempt to use a backslash as escape character before the #
2
7960
by: Christian Staffe | last post by:
Hi, I would like to check for a partial match between an input string and a regular expression using the Regex class in .NET. By partial match, I mean that the input string could not yet be complete but I want to know if a match is possible so far. For instance I want to design a text box to enter a date and validate the correctness of the date as the user types character. If the user enters 1953/12/23 it will match my regex of course...
38
15975
by: Steve Kirsch | last post by:
I need a simple function that can match the number of beginning and ending parenthesis in an expression. Here's a sample expression: ( ( "john" ) and ( "jane" ) and ( "joe" ) ) Does .NET have something built-in that can accomplish this, or do I have to write my own parser? I don't want to reinvent the wheel if possible.
6
9067
by: likong | last post by:
Hi, Any idea about how to write a regular expression that matches a substring xxx as long as the string does NOT contain substring yyy? Thanks. Kong
3
2291
by: hendedav | last post by:
Gang, I have been working on this for a few hours and am frustrated beyond all extent. I have tried to research this on the web as well with no success. I am trying to match certain contents within a wrapper div. So for example if the inside of the wrapper div was the following: <div id="wrapper"> <a href="#">a great link that contain text and symbols</a>
19
3180
by: konrad Krupa | last post by:
I'm not expert in Pattern Matching and it would take me a while to come up with the syntax for what I'm trying to do. I hope there are some experts that can help me. I'm trying to match /d/d/d/s/d/d in any text. There could be spaces in front or after the pattern (the nnn nn could be without spaces also) but it shouldn't pick it up in case like this 1234 56768
12
1963
by: cmk128 | last post by:
Hi PHP's regular expression look like doesn't support .*? syntax. So i cannot match the shortest match. For exmaple: $str="a1b a3b"; $str1=ereg_replace("a.*b", "peter", $str1); will produce "peter", but i want "peter peter", so how to? thanks from Peter (cmk128@hotmail.com)
5
8787
by: mikko.n | last post by:
I have recently been experimenting with GNU C library regular expression functions and noticed a problem with pattern matching. It seems to recognize only the first match but ignoring the rest of them. An example: mikko.c: ----- #include <stdio.h> #include <regex.h>
0
9715
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9595
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10600
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10352
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10354
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
6867
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5673
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3835
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3002
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.