473,480 Members | 2,349 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Regexp Neg. set of chars HowTo?

Hi!

I want to replace some seqs. in a html.
Let:
a-
b
= ab

but:
xxx -
b
must be unchanged, because it is not word split.

I want to search and replace with re, but I don't know how to neg. this
set ['\ \n\t'].

This time I use full set without these chars, but neg. is better and
shorter.

Ok, I can use [^\s], but I want to know, how to neg. set of chars.
sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working.

Thanks for the help:
dd

sNorm1= '([%s]{1})\-\<br\ \/\>\n'
c = range(0, 256)
c.remove(32)
c.remove(13)
c.remove(10)
c.remove(9)
s = ["\\%s" % (hex(v).replace('00x', '')) for v in c]
sNorm1 = sNorm1 % ("".join(s))
print sNorm1

def Normalize(Text):

rx = re.compile(sNorm1)
def replacer(match):
return match.group(1)
return rx.sub(replacer, Text)

print Normalize('a -<br />\nb')
print Normalize('a-<br />\nb')
sys.exit()

Dec 20 '06 #1
3 1422

On Dec 20, 7:40 am, durumdara <durumd...@gmail.comwrote:
Hi!

I want to replace some seqs. in a html.
Let:
a-
b
= ab

but:
xxx -
b
must be unchanged, because it is not word split.

I want to search and replace with re, but I don't know how to neg. this
set ['\ \n\t'].

This time I use full set without these chars, but neg. is better and
shorter.

Ok, I can use [^\s], but I want to know, how to neg. set of chars.
sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working.

Thanks for the help:
dd

sNorm1= '([%s]{1})\-\<br\ \/\>\n'
c = range(0, 256)
c.remove(32)
c.remove(13)
c.remove(10)
c.remove(9)
s = ["\\%s" % (hex(v).replace('00x', '')) for v in c]
sNorm1 = sNorm1 % ("".join(s))
print sNorm1

def Normalize(Text):

rx = re.compile(sNorm1)
def replacer(match):
return match.group(1)
return rx.sub(replacer, Text)

print Normalize('a -<br />\nb')
print Normalize('a-<br />\nb')
sys.exit()
It looks like you are trying to de-hyphenate words that have been
broken across line breaks.

Well, this isn't a regexp solution, it uses pyparsing instead. But
I've added a number of other test cases which may be problematic for an
re.

-- Paul

from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress

brTag,brEndTag = makeHTMLTags("br")
hyphen = Literal("-")
hyphen.leaveWhitespace() # don't skip whitespace before matching this

collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \
+ Word(alphas)
# define action to replace expression with the word before hyphen
# concatenated with the word after the <BRtag
collapse.setParseAction(lambda toks: toks[0]+toks[1])

print collapse.transformString('a -<br />\nb')
print collapse.transformString('a-<br />\nb')
print collapse.transformString('a-<br/>\nb')
print collapse.transformString('a-<br>\nb')
print collapse.transformString('a- <BR clear=all>\nb')

Dec 21 '06 #2
Hi!

Thanks for this! I'll use that!

I found a solution my question in regexp way too:
import re
testtext = " minion battalion nation dion sion wion alion"
m = re.compile("[^t^l]ion")
print m.findall(testtext)

I search for all text that not lion and tion.

dd

Paul McGuire wrote:
It looks like you are trying to de-hyphenate words that have been
broken across line breaks.

Well, this isn't a regexp solution, it uses pyparsing instead. But
I've added a number of other test cases which may be problematic for an
re.

-- Paul
Dec 22 '06 #3
In <ma***************************************@python. org>, durumdara
wrote:
I found a solution my question in regexp way too:
import re
testtext = " minion battalion nation dion sion wion alion"
m = re.compile("[^t^l]ion")
print m.findall(testtext)

I search for all text that not lion and tion.
And ^ion. The first ^ in that character group "negates" that group, the
second is a literal ^, so I guess you meant "[^tl]ion".

Ciao,
Marc 'BlackJack' Rintsch
Dec 22 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2547
by: Patryk Konieczka | last post by:
Hello Here's the thing I have a database edited by some company workers editing descriptions of books in the sotre , unfortunately these workers do not have the habit of inserting a space...
5
9019
by: Bosconian | last post by:
Using preg_replace() is there a simple regexp to strip everything from a string except alpha and numeric chars (a-zA-Z0-9)? $input = "$tring1!"; $pattern = $input = preg_replace($pattern, "",...
5
2335
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
8
1570
by: B. | last post by:
Hello, I've got the following problem: Suppose you have the strings contains: "xxxx aaa { 111, 222, 333} bbb {111, 222,333} yyyy" "xxxx aaa {1112, 2223, 3334} bbb {11112, 22223,33334,44445}...
3
1384
by: MD | last post by:
What do most of the experienced programmers here do when they encounter a situation, constrained to a C programming env obviously, where they could use a regexp system? I find the regcomp(),...
6
1825
by: Edward | last post by:
I need to validate a text box entry, but ONLY if it is 17 characters, otherwise I have to ignore it. My regular expression for the validation is: ^(({9})()()(\d{6}))$ Can I adapt this to...
8
1506
by: Paddy | last post by:
Proposal: Named RE variables ====================== The problem I have is that I am writing a 'good-enough' verilog tag extractor as a long regular expression (with the 'x' flag for...
3
1203
by: vendredi5h | last post by:
Hello all, I would like to create a Regexp that (very simplified situation) read a web page and retreive some informations in the table rows (<tr>) that include a particular text except if it...
1
8346
Atli
by: Atli | last post by:
The following small HowTo is a compilation of an original problem in getting some cookie-values through different methods of string-handling. The original Problem was posted as follows: As...
0
7050
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7091
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6743
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5344
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4787
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4488
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
2999
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
2988
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1303
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.