473,466 Members | 1,370 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Regexp problem with `('

I have the following text

<title>Goods Item 146 (174459989) - OurWebSite</title>

from which I need to extract
`Goods Item 146 '

Can anyone help with regexp?
Thank you for help
L.

Mar 22 '07 #1
5 1058
Johny a écrit :
I have the following text

<title>Goods Item 146 (174459989) - OurWebSite</title>

from which I need to extract
`Goods Item 146 '

Can anyone help with regexp?
Sure : the documentation is here:
http://docs.python.org/lib/module-re.html

And there's a nice tutorial here:
http://www.amk.ca/python/howto/regex/

Read all this, try to solve your problem, and come back with what you've
done so far if you need more help.
Thank you for help
You're welcome.
Mar 22 '07 #2
On Thu, Mar 22, 2007 at 01:26:22AM -0700, Johny wrote:
I have the following text

<title>Goods Item 146 (174459989) - OurWebSite</title>

from which I need to extract
`Goods Item 146 '

Can anyone help with regexp?
Thank you for help
L.
(Goods\s+Item\s+146\s+)

--
Zeng Nan

MY BLOG: http://zengnan.blogspot.com
Public Key: http://pgp.mit.edu/ | www.keyserver.net

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
In Lexington, Kentucky, it's illegal to carry an ice cream cone in your
pocket.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFGAkBkxFSvMHT0z4kRAlUsAKCq4wRgyQvrWWj/QMxG3jNq/sD8ywCdEp9v
gBHj/zW4yyPUmoN9tSlk2oo=
=71Vb
-----END PGP SIGNATURE-----

Mar 22 '07 #3
Zeng Nan wrote:
On Thu, Mar 22, 2007 at 01:26:22AM -0700, Johny wrote:
>I have the following text

<title>Goods Item 146 (174459989) - OurWebSite</title>

from which I need to extract
`Goods Item 146 '

Can anyone help with regexp?
Thank you for help
L.

(Goods\s+Item\s+146\s+)

[snigger]

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Mar 22 '07 #4
On Mar 22, 3:26 am, "Johny" <pyt...@hope.czwrote:
I have the following text

<title>Goods Item 146 (174459989) - OurWebSite</title>

from which I need to extract
`Goods Item 146 '

Can anyone help with regexp?
Thank you for help
L.
Here's the immediate answer to your question.
import re
src = "<title>Goods Item 146 (174459989) - OurWebSite</title>"
pattern = r"<title>(.*)\("
re.search(pattern,src).groups()[0]
I post it this way so that you can relate the re to your specific
question, and then maybe apply this to whatever else you are scraping
from this web page.

Please don't follow up with a post asking how to extract "45","Rubber
chicken" from "<tr><td>45</td><td>Rubber chicken</td></tr>". At this
point, you should try a little experimentation on your own.

-- Paul

Mar 22 '07 #5
Johny wrote:
I have the following text

<title>Goods Item 146 (174459989) - OurWebSite</title>

from which I need to extract
`Goods Item 146 '

Can anyone help with regexp?
Thank you for help
L.
In general, parsing HTML with regular expressions is a bad idea.
Usually, you use something like BeautifulSoup to parse the HTML,
extract the desired field, like the contents of "<title>", then
work on that.

If you try to do this line by line with regular expressions,
it will fail when the line breaks aren't where you expect. If
you try to do a whole document with regular expressions, other
material such as content in comments can be misrecognized.

Try something like this:

# Regular expression to extract group before "(NNNNN)"
kreextractitem = re.compile(r'^(.*)\(\d+\))
pagetree = BeautifulSoup.BeautifulSoup(stringcontaininghtml)
titleitem = pagetree.find({'title':True, 'TITLE':True})
if titleitem :
titletext = " ".join(atag.findAll(text=True, recursive=True))
# Text of TITLE item is now in "titletext" as a string.
groups = kreextractitem.search(titletext)
if groups :
goodsitem = groups.group(1).strip()
# "goodsitem" now contains everything before "(NNNN)"
This approach will work no matter where the line breaks are in the original
HTML.

John Nagle
Mar 22 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Anand Pillai | last post by:
To search a word in a group of words, say a paragraph or a web page, would a string search or a regexp search be faster? The string search would of course be, if str.find(substr) != -1:...
5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
0
by: Chris Croughton | last post by:
I'm trying to use the EXSLT regexp package from http://www.exslt.org/regexp/functions/match/index.html (specifically the match function) with the libxml xltproc (which supports EXSLT), but...
4
by: Jon Maz | last post by:
Hi All, I want to strip the accents off characters in a string so that, for example, the (Spanish) word "práctico" comes out as "practico" - but ignoring case, so that "PRÁCTICO" comes out as...
8
by: Dmitry Korolyov | last post by:
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web server. A single-line asp:textbox control and regexp validator attached to it. ^\d+$ expression does match an empty...
26
by: Matt Kruse | last post by:
Are there any current browsers that have Javascript support, but not RegExp support? For example, cell phone browsers, blackberrys, or other "minimal" browsers? I know that someone using Netscape...
7
by: Csaba Gabor | last post by:
I need to come up with a function function regExpPos (text, re, parenNum) { ... } that will return the position within text of RegExp.$parenNum if there is a match, and -1 otherwise. For...
4
by: conan | last post by:
This regexp '<widget class=".*" id=".*">' works well with 'grep' for matching lines of the kind <widget class="GtkWindow" id="window1"> on a XML .glade file However that's not true for the...
6
by: runsun pan | last post by:
Hi I am wondering why I couldn't get what I want in the following 3 cases of re: (A) var p=/(+-?+):(+)/g p.exec("style='font-size:12'") -- // expected
4
by: Matt | last post by:
Hello all, I have just discovered (the long way) that using a RegExp object with the 'global' flag set produces inconsistent results when its test() method is executed. I realize that 'global'...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.