473,405 Members | 2,421 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

regular expresson for Unix and Dos Lineendings wanted

Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:
import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)
1) Windows r="erewr \r\nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n 1 newtext 'erewr \r\nafjdskl'

2) Unix r="erewr \nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n 2 newtext 'erewr\nafjdskl'


Who can help me (regular expression, which works for both cases).

Thank you in advance!
--
Franz Steinhaeusler
Feb 23 '06 #1
17 1890
Franz Steinhaeusler wrote:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:
import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)
1) Windows r="erewr \r\nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n 1 newtext 'erewr \r\nafjdskl'

2) Unix r="erewr \nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n 2 newtext

'erewr\nafjdskl'

Who can help me (regular expression, which works for both cases).

Thank you in advance!

why not use string methods strip, rstrip and lstrip
Feb 23 '06 #2

Franz Steinhaeusler wrote:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:
import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)
1) Windows r="erewr \r\nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n 1 newtext 'erewr \r\nafjdskl'

2) Unix r="erewr \nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n 2 newtext 'erewr\nafjdskl'


Who can help me (regular expression, which works for both cases).


universal newlines:
http://www.python.org/doc/2.3.3/whatsnew/node7.html
http://mail.python.org/pipermail/pyt...ry/324410.html

Feb 23 '06 #3
On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin
<mf********@gatwick.westerngeco.slb.com> wrote:
> r="erewr \r\nafjdskl "

'erewr \r\nafjdskl'

2) Unix
> r="erewr \nafjdskl "

'erewr\nafjdskl'


why not use string methods strip, rstrip and lstrip


because this removes only the last spaces,
r 'erewr \r\nafjdskl ' r.rstrip()

'erewr \r\nafjdskl'

I want:
'erewr\r\nafjdskl'

or for unix line endings
'erewr\nafjdskl'

--
Franz Steinhaeusler
Feb 23 '06 #4

gene tani wrote:
Franz Steinhaeusler wrote:

Who can help me (regular expression, which works for both cases).


universal newlines:
http://www.python.org/doc/2.3.3/whatsnew/node7.html
http://mail.python.org/pipermail/pyt...ry/324410.html


if multiple end-of line markers are present (\r, \r\n and or \n), use
the file's newlines attribute to see what they are. I think the thread
linked above touched on that. Otherwise newlines (or os.linesep)
should tell you what end of line is in that file.

http://docs.python.org/lib/bltin-file-objects.html

Feb 23 '06 #5
On 23 Feb 2006 06:44:36 -0800, "gene tani" <ge*******@gmail.com> wrote:

gene tani wrote:
Franz Steinhaeusler wrote:
>
> Who can help me (regular expression, which works for both cases).


universal newlines:
http://www.python.org/doc/2.3.3/whatsnew/node7.html
http://mail.python.org/pipermail/pyt...ry/324410.html


if multiple end-of line markers are present (\r, \r\n and or \n), use
the file's newlines attribute to see what they are. I think the thread
linked above touched on that. Otherwise newlines (or os.linesep)
should tell you what end of line is in that file.

http://docs.python.org/lib/bltin-file-objects.html


Thank you for your info.

I need it for a file, whose line endings I don't know.

I wrote for DrPython this script:
(using styled text control and wxPython) and this works,
but I'm looking for a shorter way:

================================================== =================
#drscript
#RemoveTrailingWhitespaces

import re
import string

eol = DrDocument.GetEndOfLineCharacter()
regex = re.compile('\s+' + eol, re.MULTILINE)

relewin = re.compile('\r\n', re.M)
releunix = re.compile('[^\r]\n', re.M)
relemac = re.compile('\r[^\n]', re.M)

text = DrDocument.GetText()

#check line endings
win = unix = mac = 0
if relewin.search(text):
win = 1
if releunix.search(text):
unix = 1
if relemac.search(text):
mac = 1
mixed = win + unix + mac

#correct the lineendings before
if mixed > 1:
wx.MessageDialog(DrFrame, "Line endings mixed", "Remove trailing
Whitespace", wx.ICON_EXCLAMATION).ShowModal()

#ok to remove
else:
lines = text.split(eol)
new_lines = []
nr_lines = 0
nr_clines = 0
first_cline = -1
for line in lines:
nr_lines += 1
result = regex.search(line + eol)
if result != None:
end = result.start()
nr_clines += 1
if first_cline == -1:
first_cline = nr_lines
new_lines.append (line [:end])
else:
new_lines.append(line)

#file has trailing whitespaces
if nr_clines > 0:
d = wx.MessageDialog(DrFrame, "%d of %d lines have trailing
whitespaces (First:%d)\nCorrect?" % (nr_clines, nr_lines, first_cline),
\
"Remove trailing Whitespace", wx.OK | wx.CANCEL |
wx.ICON_QUESTION)
answer = d.ShowModal()
d.Destroy()
if (answer == wx.ID_OK):
newtext = string.join(new_lines, eol)
#save current line
curline = DrDocument.GetCurrentLine()
DrDocument.SetText(newtext)
#jump to saved current line
DrDocument.GotoLine(curline)

#no need to change the file
else:
wx.MessageDialog(DrFrame, "File ok!", "Remove trailing Whitespace",
wx.ICON_EXCLAMATION).ShowModal()

================================================== =========================
--
Franz Steinhaeusler
Feb 23 '06 #6
On Thu, 23 Feb 2006 15:59:54 +0100, Franz Steinhaeusler
<fr*****************@gmx.at> wrote:
I need it for a file, whose line endings I don't know.

I wrote for DrPython this script:
(using styled text control and wxPython) and this works,
but I'm looking for a shorter way:


ah, sorry, I try to make this more clear again:

(DrDocument is instance of a styled text control)

import re
import string

def GetEndOfLineCharacter():
emode = DrDocument.GetEOLMode()
if emode == wx.stc.STC_EOL_CR:
return '\r'
elif emode == wx.stc.STC_EOL_CRLF:
return '\r\n'
return '\n'

text = DrDocument.GetText()

eol = GetEndOfLineCharacter()
regex = re.compile('\s+' + eol, re.MULTILINE)

lines = text.split(eol)

new_lines = []
for line in lines:
result = regex.search(line + eol)
if result != None:
end = result.start()
new_lines.append (line [:end])
else:
new_lines.append(line)
newtext = string.join(new_lines, eol)
DrDocument.SetText(newtext)

--
Franz Steinhaeusler
Feb 23 '06 #7
Franz Steinhaeusler wrote:
On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin
<mf********@gatwick.westerngeco.slb.com> wrote:
>> r="erewr \r\nafjdskl "
'erewr \r\nafjdskl'

2) Unix
>> r="erewr \nafjdskl "
'erewr\nafjdskl'

why not use string methods strip, rstrip and lstrip


because this removes only the last spaces,
r 'erewr \r\nafjdskl ' r.rstrip()

'erewr \r\nafjdskl'

I want:
'erewr\r\nafjdskl'

or for unix line endings
'erewr\nafjdskl'

how about one of these variations

print 'erewr \r\nafjdskl '.replace(" ", "")
print 'erewr \r\nafjdskl '.strip(" \t")

Feb 23 '06 #8
How about r"\s+[\n\r]+|\s+$" ?

Franz Steinhaeusler wrote:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:

import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)

1) Windows
r="erewr \r\nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n
1
newtext
'erewr \r\nafjdskl'

2) Unix
r="erewr \nafjdskl "
newtext, n = retrailingwhitespace.subn('', r)
n
2
newtext


'erewr\nafjdskl'
Who can help me (regular expression, which works for both cases).

Thank you in advance!

Feb 24 '06 #9
On Thu, 23 Feb 2006 14:46:20 +0100, Franz Steinhaeusler
<fr*****************@gmx.at> wrote:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:


Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".

--
Franz Steinhaeusler
Feb 24 '06 #10
Franz Steinhaeusler wrote:
On Thu, 23 Feb 2006 14:46:20 +0100, Franz Steinhaeusler
<fr*****************@gmx.at> wrote:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:


Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".

Then I clearly don't understand your problem... it seems we gave
you several ways of skinning your cat... but none of them 'worked'?
I find that hard to believe... perhaps you can re-state you problem
or show us your more than one line solution...(so that we might learn
from it)
Martin
Feb 24 '06 #11
On Thu, 23 Feb 2006 15:13:01 +0100, Franz Steinhaeusler wrote:
why not use string methods strip, rstrip and lstrip


because this removes only the last spaces,
r 'erewr \r\nafjdskl ' r.rstrip()

'erewr \r\nafjdskl'

I want:
'erewr\r\nafjdskl'

or for unix line endings
'erewr\nafjdskl'

# Untested
def whitespace_cleaner(s):
"""Clean whitespace from string s, returning new string.

Strips all trailing whitespace from the end of the string, including
linebreaks. Removes whitespace except for linebreaks from everywhere
in the string. Internal linebreaks are converted to whatever is
appropriate for the current platform.
"""

from os import linesep
from string import whitespace
s = s.rstrip()
for c in whitespace:
if c in '\r\n':
continue
s = s.replace(c, '')
if linesep == '\n': # Unix, Linux, Mac OS X, etc.
# the order of the replacements is important
s = s.replace('\r\n', '\n').replace('\r', '\n')
elif linesep == '\r': # classic Macintosh
s = s.replace('\r\n', '\r').replace('\n', '\r')
elif linesep == '\r\n': # Windows
s = s.replace('\r\n', '\r').replace('\n', '\r')
s = s.replace('\r', '\r\n')
else: # weird platforms?
print "Unknown line separator, skipping."
return s

--
Steven.

Feb 24 '06 #12
On Thu, 23 Feb 2006 17:41:47 +0000, Martin Franklin
<mf********@gatwick.westerngeco.slb.com> wrote:
Franz Steinhaeusler wrote:
On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin
<mf********@gatwick.westerngeco.slb.com> wrote:
>>> r="erewr \r\nafjdskl "
'erewr \r\nafjdskl'

2) Unix
>>> r="erewr \nafjdskl "
'erewr\nafjdskl'
why not use string methods strip, rstrip and lstrip


because this removes only the last spaces,
> r

'erewr \r\nafjdskl '
> r.rstrip()

'erewr \r\nafjdskl'

I want:
'erewr\r\nafjdskl'

or for unix line endings
'erewr\nafjdskl'

how about one of these variations

print 'erewr \r\nafjdskl '.replace(" ", "")
print 'erewr \r\nafjdskl '.strip(" \t")


Version 1:
w='erewr \r\nafjdskl '.replace(" ", "")
w 'erewr\r\nafjdskl' w='erewr \nafjdskl '.replace(" ", "")
w 'erewr\nafjdskl' w='word1 word2 \nafjdskl '.replace(" ", "")
w 'word1word2\nafjdskl'
it replaces all spaces, not only the trailing whitespaces.
version 2:
w = 'erewr \r\nafjdskl '.strip(" \t")
w 'erewr \r\nafjdskl' w = 'erewr \nafjdskl '.strip(" \t")
w 'erewr \nafjdskl' w = 'word1 word2 \nafjdskl '.strip(" \t")
w 'word1 word2 \nafjdskl'


I found a solution (not the most beautiful, but for
my purpose sufficiently good.)
Given: a file has no mixed lineendings, so it is either
a dos or unix file (mac line endings not respected).
swin="erewr \r\nafjdskl "
sunix="erewr \nafjdskl "

Dos Line endings (at least on '\r' included)?
r is contents of a file:

helpchar = ''
if r.find('\r') != -1:
helpchar = '\r'
retrailingwhitespacelf = re.compile('(?<=\S)[ \t'+helpchar+']+$',
re.MULTILINE)
newtext, n = retrailingwhitespace.subn(helpchar, r)
if n > 1:
r = newtext
--
Franz Steinhaeusler
Feb 24 '06 #13
On Fri, 24 Feb 2006 12:36:01 +0100, Franz Steinhaeusler
<fr*****************@gmx.at> wrote:
if n > 1:


if n > 0: (of course) :-)

--
Franz Steinhaeusler
Feb 24 '06 #14
Franz Steinhaeusler wrote:

Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".


re.sub(r"\s+[\n\r]+", lambda x: x.expand("\g<0>"). \
lstrip(" \t\f\v"),text).rstrip()

....where "text" is the unsplit block of text with mysterious line-endings.

But I think your code is a lot easier to read. :)
Feb 24 '06 #15
Franz Steinhaeusler <fr*****************@gmx.at> wrote:
On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin
why not use string methods strip, rstrip and lstrip

because this removes only the last spaces,
[given r = 'erewr \r\nafjdskl ']
I want:
'erewr\r\nafjdskl'


os.linesep.join(l.rstrip() for l in r.split(os.linesep))

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
Feb 24 '06 #16
On Fri, 24 Feb 2006 07:21:04 -0500, John Zenger
<jo*********@yahoo.com> wrote:
Franz Steinhaeusler wrote:

Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".


re.sub(r"\s+[\n\r]+", lambda x: x.expand("\g<0>"). \
lstrip(" \t\f\v"),text).rstrip()

...where "text" is the unsplit block of text with mysterious line-endings.

But I think your code is a lot easier to read. :)


Hello John,

perfect, thank you,

but as you said, this is somehow not so easy to grasp.
(At least for me). :-)

--
Franz Steinhaeusler
Feb 24 '06 #17
On 24 Feb 2006 14:12:05 +0000 (GMT), Sion Arrowsmith
<si***@chiark.greenend.org.uk> wrote:
Franz Steinhaeusler <fr*****************@gmx.at> wrote:
On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin
why not use string methods strip, rstrip and lstrip

because this removes only the last spaces,
[given r = 'erewr \r\nafjdskl ']
I want:
'erewr\r\nafjdskl'


os.linesep.join(l.rstrip() for l in r.split(os.linesep))


Hello Sion,

thank you, your solution, I like most!!
(it is clean und one don't have to use re's).
--
Franz Steinhaeusler
Feb 27 '06 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: marco | last post by:
hi folks, i can not run any python scripts with dos lineendings under cygwin's python. if i run such a scripts i get stupid syntax error messages from python. what can i do to run these...
22
by: Ryan M | last post by:
I've been programming for a while, but most of my experience is on unix. How do C compilers work on operating systems that weren't written in C? And that have no libc? Compiling C on unix seems...
1
by: Graeme Downes | last post by:
Hi I'm trying to create a regular expression for C using the fnmatch function from the fnmatch.h library. I dont know if you can help, but i need something as follows: #:]# where #...
5
by: Tempo | last post by:
I've been reading a bunch of articles and tutorials on the net, but I cannot quite get ahold of the whole regular expression process. I have a list that contains about thirty strings, each in its...
9
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use...
11
by: Steve | last post by:
Hi All, I'm having a tough time converting the following regex.compile patterns into the new re.compile format. There is also a differences in the regsub.sub() vs. re.sub() Could anyone lend...
0
by: grant.trevor | last post by:
I have a need to define a regular expression within a custom configuration section. This can reside within a web.config or other .config file. Looking at the config below you can see the general...
2
by: Kai Rosenthal | last post by:
Hello, how can I resolve envionment variables in a string. e.g. strVar = /myVar resolve in str1 = /mytest02/$MYVAR/mytest02 --/mytest02//myVar/mytest02 (unix) str2 =$MYVAR/mytest03 ...
5
by: e_matthes | last post by:
Hello, I have a function which uses a regular expression to validate text input. Here's a short code sample testing the regex: <?php $dirty = "hello"; $clean = getCleanText($dirty, 0,50);...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.