Regexp Neg. set of chars HowTo?

durumdara

Hi!

I want to replace some seqs. in a html.
Let:
a-
b
= ab

but:
xxx -
b
must be unchanged, because it is not word split.

I want to search and replace with re, but I don't know how to neg. this
set ['\ \n\t'].

This time I use full set without these chars, but neg. is better and
shorter.

Ok, I can use [^\s], but I want to know, how to neg. set of chars.
sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working.

Thanks for the help:
dd

sNorm1= '([%s]{1})\-\<br\ \/\>\n'
c = range(0, 256)
c.remove(32)
c.remove(13)
c.remove(10)
c.remove(9)
s = ["\\%s" % (hex(v).replace('00x', '')) for v in c]
sNorm1 = sNorm1 % ("".join(s))
print sNorm1

def Normalize(Text):

rx = re.compile(sNorm1)
def replacer(match):
return match.group(1)
return rx.sub(replacer, Text)

print Normalize('a - \nb')
print Normalize('a- \nb')
sys.exit()

Dec 20 '06 #1

Subscribe Reply

1422

Paul McGuire

On Dec 20, 7:40 am, durumdara <durumd...@gmail.comwrote:

Hi!

I want to replace some seqs. in a html.
Let:
a-
b
= ab

but:
xxx -
b
must be unchanged, because it is not word split.

I want to search and replace with re, but I don't know how to neg. this
set ['\ \n\t'].

This time I use full set without these chars, but neg. is better and
shorter.

Ok, I can use [^\s], but I want to know, how to neg. set of chars.
sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working.

Thanks for the help:
dd

sNorm1= '([%s]{1})\-\<br\ \/\>\n'
c = range(0, 256)
c.remove(32)
c.remove(13)
c.remove(10)
c.remove(9)
s = ["\\%s" % (hex(v).replace('00x', '')) for v in c]
sNorm1 = sNorm1 % ("".join(s))
print sNorm1

def Normalize(Text):

rx = re.compile(sNorm1)
def replacer(match):
return match.group(1)
return rx.sub(replacer, Text)

print Normalize('a - \nb')
print Normalize('a- \nb')
sys.exit()

It looks like you are trying to de-hyphenate words that have been
broken across line breaks.

Well, this isn't a regexp solution, it uses pyparsing instead. But
I've added a number of other test cases which may be problematic for an
re.

-- Paul

from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress

brTag,brEndTag = makeHTMLTags("br")
hyphen = Literal("-")
hyphen.leaveWhitespace() # don't skip whitespace before matching this

collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \
+ Word(alphas)
# define action to replace expression with the word before hyphen
# concatenated with the word after the <BRtag
collapse.setParseAction(lambda toks: toks[0]+toks[1])

print collapse.transformString('a - \nb')
print collapse.transformString('a- \nb')
print collapse.transformString('a- \nb')
print collapse.transformString('a- \nb')
print collapse.transformString('a- \nb')

Dec 21 '06 #2

durumdara

Hi!

Thanks for this! I'll use that!

I found a solution my question in regexp way too:
import re
testtext = " minion battalion nation dion sion wion alion"
m = re.compile("[^t^l]ion")
print m.findall(testtext)

I search for all text that not lion and tion.

dd

Paul McGuire wrote:

It looks like you are trying to de-hyphenate words that have been
broken across line breaks.

Well, this isn't a regexp solution, it uses pyparsing instead. But
I've added a number of other test cases which may be problematic for an
re.

-- Paul

Dec 22 '06 #3

Marc 'BlackJack' Rintsch

In <ma***************************************@python. org>, durumdara
wrote:

I found a solution my question in regexp way too:
import re
testtext = " minion battalion nation dion sion wion alion"
m = re.compile("[^t^l]ion")
print m.findall(testtext)

I search for all text that not lion and tion.

And ^ion. The first ^ in that character group "negates" that group, the
second is a literal ^, so I guess you meant "[^tl]ion".

Ciao,
Marc 'BlackJack' Rintsch

Dec 22 '06 #4

Similar topics

2547

dot-space in href regexp needed

by: Patryk Konieczka | last post by:

Hello Here's the thing I have a database edited by some company workers editing descriptions of books in the sotre , unfortunately these workers do not have the habit of inserting a space...

PHP

9019

regexp to leave only alpha/numeric chars

by: Bosconian | last post by:

Using preg_replace() is there a simple regexp to strip everything from a string except alpha and numeric chars (a-zA-Z0-9)? $input = "$tring1!"; $pattern = $input = preg_replace($pattern, "",...

PHP

2335

Saving search results in a dictionary

by: Lukas Holcik | last post by:

Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....

Python

1570

Help me with REGEXP please !

by: B. | last post by:

Hello, I've got the following problem: Suppose you have the strings contains: "xxxx aaa { 111, 222, 333} bbb {111, 222,333} yyyy" "xxxx aaa {1112, 2223, 3334} bbb {11112, 22223,33334,44445}...

MySQL Database

1384

regexp confusion

by: MD | last post by:

What do most of the experienced programmers here do when they encounter a situation, constrained to a C programming env obviously, where they could use a regexp system? I find the regcomp(),...

C / C++

1825

Regexp only if string is a certain length

by: Edward | last post by:

I need to validate a text box entry, but ONLY if it is 17 characters, otherwise I have to ignore it. My regular expression for the validation is: ^(({9})()()(\d{6}))$ Can I adapt this to...

ASP.NET

1506

Named regexp variables, an extension proposal.

by: Paddy | last post by:

Proposal: Named RE variables ====================== The problem I have is that I am writing a 'good-enough' verilog tag extractor as a long regular expression (with the 'x' flag for...

Python

1203

Regexp: How to do this...

by: vendredi5h | last post by:

Hello all, I would like to create a Regexp that (very simplified situation) read a web page and retreive some informations in the table rows (<tr>) that include a particular text except if it...

Javascript

8346

String Handling Opportunities with split(), indexOf() and RegExp

by: Atli | last post by:

The following small HowTo is a compilation of an original problem in getting some cookie-values through different methods of string-handling. The original Problem was posted as follows: As...

Javascript

7050

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7091

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

6743

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

5344

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

4787

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4488

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

2999

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

2988

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1303

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp