473,765 Members | 2,024 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How can I exclude a word by using re?

In re, the punctuation "^" can exclude a single character, but I want
to exclude a whole word now. for example I have a string "hi, how are
you. hello", I want to extract all the part before the world "hello",
I can't use ".*[^hello]" because "^" only exclude single char "h" or
"e" or "l" or "o". Will somebody tell me how to do it? Thanks.
Aug 14 '05 #1
15 17904
re.findall('(.* )hello|(.*)', 'hi, how are you. hello')
re.findall('(.* )hello|(.*)', 'hi, how are you. ello')
take a look at the outputs of these.

Aug 14 '05 #2
could ildg wrote:
In re, the punctuation "^" can exclude a single character, but I want
to exclude a whole word now. for example I have a string "hi, how are
you. hello", I want to extract all the part before the world "hello",
I can't use ".*[^hello]" because "^" only exclude single char "h" or
"e" or "l" or "o". Will somebody tell me how to do it? Thanks.


import re

def demonstrate(reg ex, text):
pattern = re.compile(rege x)
match = pattern.search( text)

print " ", text
if match:
print " Matched '%s'" % match.group(0)
print " Captured '%s'" % match.group(1)
else:
print " Did not match"

# Option 1: Match it all, but capture only the part before "hello." The
(.*?)
# matches as few characters as possible, so that this pattern would end
before
# the first hello in "hello hello".

pattern = r"(.*?)hello "
print "Option 1:", pattern
demonstrate( pattern, "hi, how are you. hello" )

# Option 2: Don't even match the "hello," but make sure it's there.
# The first of these calls will match, but the second will not. The
# (?=...) construct is using a feature called "forward look-ahead."

pattern = r"(.*)(?=hello) "
print "\nOption 2:", pattern
demonstrate( pattern, "hi, how are you. hello" )
demonstrate( pattern, "hi, how are you. ", )
Aug 14 '05 #3
Thank you.
But what should I do if there are more than one hello and I only want
to extract what's before the first "hello". For example, the raw
string is "hi, how are you? hello I'm fine, thank you hello. that's it
hello", I want to extract all the stuff before the first hello?

On 14 Aug 2005 08:02:16 -0700, Christoph Rackwitz
<ch************ ****@gmail.com> wrote:
re.findall('(.* )hello|(.*)', 'hi, how are you. hello')
re.findall('(.* )hello|(.*)', 'hi, how are you. ello')
take a look at the outputs of these.

--
http://mail.python.org/mailman/listinfo/python-list

Aug 14 '05 #4
could ildg a écrit :
Thank you.
But what should I do if there are more than one hello and I only want
to extract what's before the first "hello".
Read The Fine Manual ?-)

For example, the raw
string is "hi, how are you? hello I'm fine, thank you hello. that's it
hello", I want to extract all the stuff before the first hello?


re.findall(r'^( .*)hello', your_string_ful l_of_hellos)
Aug 14 '05 #5
could ildg wrote:
But what should I do if there are more than one hello and I only want
to extract what's before the first "hello". For example, the raw
string is "hi, how are you? hello I'm fine, thank you hello. that's it
hello", I want to extract all the stuff before the first hello?


The simplest solution is to use str.split():
helo = "hi, how are you? HELLO I'm fine, thank you hello. that's it"
helo.split("hel lo", 1)[0] "hi, how are you? HELLO I'm fine, thank you "

But regular expressions offer a similar feature:
re.compile("hel lo", re.IGNORECASE). split(helo, 1)[0]

'hi, how are you? '

Peter

Aug 15 '05 #6
Bruno Desthuilliers wrote:
could ildg a écrit :
Thank you.
But what should I do if there are more than one hello and I only want
to extract what's before the first "hello".

Read The Fine Manual ?-)

For example, the raw
string is "hi, how are you? hello I'm fine, thank you hello. that's it
hello", I want to extract all the stuff before the first hello?

re.findall(r'^( .*)hello', your_string_ful l_of_hellos)


Nice try, but it needs a little refinement to do what the OP asked for:
import re
h = "hi g'day hello hello hello"
re.findall(r'^( .*)hello', h) ["hi g'day hello hello "] re.findall(r'^( .*?)hello', h) ["hi g'day "] re.findall(r'^( .*?)hello', h)[0]

"hi g'day "
Aug 15 '05 #7
could ildg wrote:
In re, the punctuation "^" can exclude a single character, but I want
to exclude a whole word now. for example I have a string "hi, how are
you. hello", I want to extract all the part before the world "hello",
I can't use ".*[^hello]" because "^" only exclude single char "h" or
"e" or "l" or "o". Will somebody tell me how to do it? Thanks.


(1) Why must you use re? It's often a good idea to use string methods
where they can do the job you want.
(2) What do you want to have happen if "hello" is not in the string?

Example:

C:\junk>type upto.py
def upto(strg, what):
k = strg.find(what)
if k > -1:
return strg[:k]
return None # or raise an exception

helo = "hi, how are you? HELLO I'm fine, thank you hello hello hello.
that's it"

print repr(upto(helo, "HELLO"))
print repr(upto(helo, "hello"))
print repr(upto(helo, "hi"))
print repr(upto(helo, "goodbye"))
print repr(upto("", "goodbye"))
print repr(upto("", ""))

C:\junk>upto.py
'hi, how are you? '
"hi, how are you? HELLO I'm fine, thank you "
''
None
None
''

HTH,
John
Aug 15 '05 #8
I want to use re because I want to extract something from a html. It
will be very complicated without using re. But while using re, I
found that I must exlude a hole word "</td>", certainly, there are
many many "</td>" in this html.

My re is as below:
_______________ _______________ _______________
r=re.compile(ur 'valign=top>(?P <number>\d{1,2} )</td><td[^>]*>\s{0,2}'
ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank> '
ur'(?P<name>.+) </td>',re.UNICODE |re.IGNORECASE)
_______________ _______________ _______________
There should be over 30 matches in the html. But I find nothing by
re.finditer(htm l) because my last line of re is wrong. I can't use
"(?P<name>. +)</td>" because there are many many "</td>" in the html
and I just want the ".*" to match what are before the firest "</td>".
So I think if there is some idea I can exclude a word, this will be
done. Assume there is "NOT(WORD)" can do it, I just need to write the
last line of the re as "(?P<name>(NOT( </td>))+)</td>".
But I still have no idea after thinking and trying for a very long time.

In other words, I want the "</td>" of "(?P<name>. +)</td>" to be
exactly the first "</td>" in this match. And there is more than one
match in this html, so this must be done by using re.

And I can't use any of your idea because what I want I deal with is a
very complicated html, not just a single line of word.

I can copy part of the html up to here but it's kinda too lengthy.
On 8/15/05, John Machin <sj******@lexic on.net> wrote:
could ildg wrote:
In re, the punctuation "^" can exclude a single character, but I want
to exclude a whole word now. for example I have a string "hi, how are
you. hello", I want to extract all the part before the world "hello",
I can't use ".*[^hello]" because "^" only exclude single char "h" or
"e" or "l" or "o". Will somebody tell me how to do it? Thanks.


(1) Why must you use re? It's often a good idea to use string methods
where they can do the job you want.
(2) What do you want to have happen if "hello" is not in the string?

Example:

C:\junk>type upto.py
def upto(strg, what):
k = strg.find(what)
if k > -1:
return strg[:k]
return None # or raise an exception

helo = "hi, how are you? HELLO I'm fine, thank you hello hello hello.
that's it"

print repr(upto(helo, "HELLO"))
print repr(upto(helo, "hello"))
print repr(upto(helo, "hi"))
print repr(upto(helo, "goodbye"))
print repr(upto("", "goodbye"))
print repr(upto("", ""))

C:\junk>upto.py
'hi, how are you? '
"hi, how are you? HELLO I'm fine, thank you "
''
None
None
''

HTH,
John
--
http://mail.python.org/mailman/listinfo/python-list

Aug 16 '05 #9
could ildg said:
I want to use re because I want to extract something from a html. It
will be very complicated without using re. But while using re, I
found that I must exlude a hole word "</td>", certainly, there are
many many "</td>" in this html.
Actually, for properly processing html, you shouldn't really be using
regular expressions, precisely because the problem is complicated -
regular expressions are too simple and can't properly model a language
like HTML, which is generated by a context free grammar.

If thats only meaningless technical mumbo-jumbo to you, never mind -
the important point is you shouldn't really use an re. Trust me :)

What you want for a job like is an HTML parser. Theres one in the
standard library; if it doesnt suit, there are plenty of third party
ones. I like Beautiful Soup:

http://www.crummy.com/software/BeautifulSoup/

If you insist on using an re, well I'm sure someone on this group will
figure out a solution to your issue thats as good as you're going to
get...


My re is as below:
_______________ _______________ _______________
r=re.compile(ur 'valign=top>(?P <number>\d{1,2} )</td><td[^>]*>\s{0,2}'
ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank> '
ur'(?P<name>.+) </td>',re.UNICODE |re.IGNORECASE)
_______________ _______________ _______________
There should be over 30 matches in the html. But I find nothing by
re.finditer(htm l) because my last line of re is wrong. I can't use
"(?P<name>. +)</td>" because there are many many "</td>" in the html
and I just want the ".*" to match what are before the firest "</td>".
So I think if there is some idea I can exclude a word, this will be
done. Assume there is "NOT(WORD)" can do it, I just need to write the
last line of the re as "(?P<name>(NOT( </td>))+)</td>".
But I still have no idea after thinking and trying for a very long time.

In other words, I want the "</td>" of "(?P<name>. +)</td>" to be
exactly the first "</td>" in this match. And there is more than one
match in this html, so this must be done by using re.

And I can't use any of your idea because what I want I deal with is a
very complicated html, not just a single line of word.

I can copy part of the html up to here but it's kinda too lengthy.
On 8/15/05, John Machin <sj******@lexic on.net> wrote:
could ildg wrote:
In re, the punctuation "^" can exclude a single character, but I want
to exclude a whole word now. for example I have a string "hi, how are
you. hello", I want to extract all the part before the world "hello",
I can't use ".*[^hello]" because "^" only exclude single char "h" or
"e" or "l" or "o". Will somebody tell me how to do it? Thanks.


(1) Why must you use re? It's often a good idea to use string methods
where they can do the job you want.
(2) What do you want to have happen if "hello" is not in the string?

Example:

C:\junk>type upto.py
def upto(strg, what):
k = strg.find(what)
if k > -1:
return strg[:k]
return None # or raise an exception

helo = "hi, how are you? HELLO I'm fine, thank you hello hello hello.
that's it"

print repr(upto(helo, "HELLO"))
print repr(upto(helo, "hello"))
print repr(upto(helo, "hi"))
print repr(upto(helo, "goodbye"))
print repr(upto("", "goodbye"))
print repr(upto("", ""))

C:\junk>upto.py
'hi, how are you? '
"hi, how are you? HELLO I'm fine, thank you "
''
None
None
''

HTH,
John
--
http://mail.python.org/mailman/listinfo/python-list


Aug 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1804
by: Mohee | last post by:
In VB.NET I am trying to create a regular expression that will validate any string as long as it does not contain a specified string. For example, I want to match any word that does not contain the following strings: "--" or "/*" My main purpose is to not allow string that have Oracle comments in them.
1
3415
by: unotin | last post by:
I have an application in ASP that exports to Word using the Response.ContentType method. The application references another ASP page through the img tag that uses a Response.BinaryWrite (of an img content type) for its output. In other words: App A.ASP contains the code: <%Response.ContentType = "application/vnd-msword"%> <img src="B.ASP">
4
1859
by: s.subbarayan | last post by:
Dear all, I would like to know the easiest efficient way to set or inject a particular value in the given word or byte?The problem is: I have to implement a function which will set a value from position "n" to "n+x" where n and x are passed dynamically,where n is start position of the bit from which i will be setting a value and x is the position where I will be finishing the setting.In short it looks like this:
0
1465
by: Tommy | last post by:
Hello! Does anybody know how to insert text into bookmarks in Word using late binding? If I use early binding everything works fine. What I want to implement is somthing like this: for (int i = 1; i <= document.Bookmarks.Count; i++) { object objI = i; document.Bookmarks.get_Item(ref objI).Range.Text = "Some text";
0
1203
by: Mills | last post by:
Hi, I am currently trying to automate word using c#, I have created a ..dot file and have placed a bookmark in the middle of the document, I am trying to create a variable number of tables from this point, but what happens is the tables get nested within each other, I want it like the following: Table 1 <CRLF> Table 2
2
2308
by: ads | last post by:
hi i was given a task to display sql server data in ms word using xml. Im currently doing research to accomplish the task. What i have in mind is to create an xml template (or schema?) to load the sql server data first.Then add the xml template in the ms word add-in option. Is it possible? Can anyone point me to the right direction? Thanks
0
1434
by: sajil | last post by:
hai i have a problem i have done a program in visual basic 6 where i connected to word but i am not able to create a table in word if possible please tell me and can we move the table in the word using vb please let me know sajil
2
1796
by: prinsipe | last post by:
hi all, i have an app that calls a sp. values generated from sp are stored in a dataset. dataset is then filtered using dataview rowfilter then displayed on datagrid. all works fine. my question is...can i exclude a column from dataview? let's say in my sp, i query for column1, column2 and column3. all columns are stored in a dataset then used dataview.rowfilter to filter values from column3. what if i don't want to show column3 in the...
1
1629
by: abhilash12 | last post by:
hai how can i search word using java from open office og doc file pls help me
0
1094
by: saravanakumar muthurangan | last post by:
Hello all, i need to correct a misspelled word automatically with a most matching word by using MS word.dll in vb.net 2005, i m getting the checkspelling window with the below code but i need the word has to be corrected without asking for suggestions................... Can any one help me on this.................... Imports word = Microsoft.Office.Interop.Word Dim word_server As New word.Application ...
0
9568
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9399
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10163
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9957
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
6649
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3532
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2806
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.