Hello, I need a regularexpression, which trims trailing whitespaces.
While with unix line endings, it works;
but not with Window (Dos) CRLF's: import re retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)
1) Windows r="erewr \r\nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n
1 newtext
'erewr \r\nafjdskl'
2) Unix r="erewr \nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n
2 newtext
'erewr\nafjdskl'
Who can help me (regular expression, which works for both cases).
Thank you in advance!
--
Franz Steinhaeusler 17 1890
Franz Steinhaeusler wrote: Hello, I need a regularexpression, which trims trailing whitespaces.
While with unix line endings, it works; but not with Window (Dos) CRLF's:
import re retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE) 1) Windows r="erewr \r\nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n 1 newtext 'erewr \r\nafjdskl'
2) Unix r="erewr \nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n 2 newtext
'erewr\nafjdskl'
Who can help me (regular expression, which works for both cases).
Thank you in advance!
why not use string methods strip, rstrip and lstrip
Franz Steinhaeusler wrote: Hello, I need a regularexpression, which trims trailing whitespaces.
While with unix line endings, it works; but not with Window (Dos) CRLF's:
import re retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE) 1) Windows r="erewr \r\nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n 1 newtext 'erewr \r\nafjdskl'
2) Unix r="erewr \nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n 2 newtext 'erewr\nafjdskl'
Who can help me (regular expression, which works for both cases).
universal newlines: http://www.python.org/doc/2.3.3/whatsnew/node7.html http://mail.python.org/pipermail/pyt...ry/324410.html
On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin
<mf********@gatwick.westerngeco.slb.com> wrote: > r="erewr \r\nafjdskl " 'erewr \r\nafjdskl'
2) Unix> r="erewr \nafjdskl " 'erewr\nafjdskl'
why not use string methods strip, rstrip and lstrip
because this removes only the last spaces, r
'erewr \r\nafjdskl ' r.rstrip()
'erewr \r\nafjdskl'
I want:
'erewr\r\nafjdskl'
or for unix line endings
'erewr\nafjdskl'
--
Franz Steinhaeusler
On 23 Feb 2006 06:44:36 -0800, "gene tani" <ge*******@gmail.com> wrote: gene tani wrote: Franz Steinhaeusler wrote: > > Who can help me (regular expression, which works for both cases).
universal newlines: http://www.python.org/doc/2.3.3/whatsnew/node7.html http://mail.python.org/pipermail/pyt...ry/324410.html
if multiple end-of line markers are present (\r, \r\n and or \n), use the file's newlines attribute to see what they are. I think the thread linked above touched on that. Otherwise newlines (or os.linesep) should tell you what end of line is in that file.
http://docs.python.org/lib/bltin-file-objects.html
Thank you for your info.
I need it for a file, whose line endings I don't know.
I wrote for DrPython this script:
(using styled text control and wxPython) and this works,
but I'm looking for a shorter way:
================================================== =================
#drscript
#RemoveTrailingWhitespaces
import re
import string
eol = DrDocument.GetEndOfLineCharacter()
regex = re.compile('\s+' + eol, re.MULTILINE)
relewin = re.compile('\r\n', re.M)
releunix = re.compile('[^\r]\n', re.M)
relemac = re.compile('\r[^\n]', re.M)
text = DrDocument.GetText()
#check line endings
win = unix = mac = 0
if relewin.search(text):
win = 1
if releunix.search(text):
unix = 1
if relemac.search(text):
mac = 1
mixed = win + unix + mac
#correct the lineendings before
if mixed > 1:
wx.MessageDialog(DrFrame, "Line endings mixed", "Remove trailing
Whitespace", wx.ICON_EXCLAMATION).ShowModal()
#ok to remove
else:
lines = text.split(eol)
new_lines = []
nr_lines = 0
nr_clines = 0
first_cline = -1
for line in lines:
nr_lines += 1
result = regex.search(line + eol)
if result != None:
end = result.start()
nr_clines += 1
if first_cline == -1:
first_cline = nr_lines
new_lines.append (line [:end])
else:
new_lines.append(line)
#file has trailing whitespaces
if nr_clines > 0:
d = wx.MessageDialog(DrFrame, "%d of %d lines have trailing
whitespaces (First:%d)\nCorrect?" % (nr_clines, nr_lines, first_cline),
\
"Remove trailing Whitespace", wx.OK | wx.CANCEL |
wx.ICON_QUESTION)
answer = d.ShowModal()
d.Destroy()
if (answer == wx.ID_OK):
newtext = string.join(new_lines, eol)
#save current line
curline = DrDocument.GetCurrentLine()
DrDocument.SetText(newtext)
#jump to saved current line
DrDocument.GotoLine(curline)
#no need to change the file
else:
wx.MessageDialog(DrFrame, "File ok!", "Remove trailing Whitespace",
wx.ICON_EXCLAMATION).ShowModal()
================================================== =========================
--
Franz Steinhaeusler
On Thu, 23 Feb 2006 15:59:54 +0100, Franz Steinhaeusler
<fr*****************@gmx.at> wrote: I need it for a file, whose line endings I don't know.
I wrote for DrPython this script: (using styled text control and wxPython) and this works, but I'm looking for a shorter way:
ah, sorry, I try to make this more clear again:
(DrDocument is instance of a styled text control)
import re
import string
def GetEndOfLineCharacter():
emode = DrDocument.GetEOLMode()
if emode == wx.stc.STC_EOL_CR:
return '\r'
elif emode == wx.stc.STC_EOL_CRLF:
return '\r\n'
return '\n'
text = DrDocument.GetText()
eol = GetEndOfLineCharacter()
regex = re.compile('\s+' + eol, re.MULTILINE)
lines = text.split(eol)
new_lines = []
for line in lines:
result = regex.search(line + eol)
if result != None:
end = result.start()
new_lines.append (line [:end])
else:
new_lines.append(line)
newtext = string.join(new_lines, eol)
DrDocument.SetText(newtext)
--
Franz Steinhaeusler
Franz Steinhaeusler wrote: On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin <mf********@gatwick.westerngeco.slb.com> wrote:
>> r="erewr \r\nafjdskl " 'erewr \r\nafjdskl'
2) Unix >> r="erewr \nafjdskl " 'erewr\nafjdskl' why not use string methods strip, rstrip and lstrip
because this removes only the last spaces, r 'erewr \r\nafjdskl ' r.rstrip()
'erewr \r\nafjdskl'
I want: 'erewr\r\nafjdskl'
or for unix line endings 'erewr\nafjdskl'
how about one of these variations
print 'erewr \r\nafjdskl '.replace(" ", "")
print 'erewr \r\nafjdskl '.strip(" \t")
How about r"\s+[\n\r]+|\s+$" ?
Franz Steinhaeusler wrote: Hello, I need a regularexpression, which trims trailing whitespaces.
While with unix line endings, it works; but not with Window (Dos) CRLF's:
import re retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)
1) Windows r="erewr \r\nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n 1 newtext 'erewr \r\nafjdskl'
2) Unix r="erewr \nafjdskl " newtext, n = retrailingwhitespace.subn('', r) n 2 newtext
'erewr\nafjdskl'
Who can help me (regular expression, which works for both cases).
Thank you in advance!
On Thu, 23 Feb 2006 14:46:20 +0100, Franz Steinhaeusler
<fr*****************@gmx.at> wrote: Hello, I need a regularexpression, which trims trailing whitespaces.
While with unix line endings, it works; but not with Window (Dos) CRLF's:
Thank you all for the replies.
But I still don't have a solution.
Of course with more lines it is possible,
but it would be fine to have a "oneliner".
--
Franz Steinhaeusler
Franz Steinhaeusler wrote: On Thu, 23 Feb 2006 14:46:20 +0100, Franz Steinhaeusler <fr*****************@gmx.at> wrote:
Hello, I need a regularexpression, which trims trailing whitespaces.
While with unix line endings, it works; but not with Window (Dos) CRLF's:
Thank you all for the replies. But I still don't have a solution.
Of course with more lines it is possible, but it would be fine to have a "oneliner".
Then I clearly don't understand your problem... it seems we gave
you several ways of skinning your cat... but none of them 'worked'?
I find that hard to believe... perhaps you can re-state you problem
or show us your more than one line solution...(so that we might learn
from it)
Martin
On Thu, 23 Feb 2006 15:13:01 +0100, Franz Steinhaeusler wrote: why not use string methods strip, rstrip and lstrip
because this removes only the last spaces, r 'erewr \r\nafjdskl ' r.rstrip()
'erewr \r\nafjdskl'
I want: 'erewr\r\nafjdskl'
or for unix line endings 'erewr\nafjdskl'
# Untested
def whitespace_cleaner(s):
"""Clean whitespace from string s, returning new string.
Strips all trailing whitespace from the end of the string, including
linebreaks. Removes whitespace except for linebreaks from everywhere
in the string. Internal linebreaks are converted to whatever is
appropriate for the current platform.
"""
from os import linesep
from string import whitespace
s = s.rstrip()
for c in whitespace:
if c in '\r\n':
continue
s = s.replace(c, '')
if linesep == '\n': # Unix, Linux, Mac OS X, etc.
# the order of the replacements is important
s = s.replace('\r\n', '\n').replace('\r', '\n')
elif linesep == '\r': # classic Macintosh
s = s.replace('\r\n', '\r').replace('\n', '\r')
elif linesep == '\r\n': # Windows
s = s.replace('\r\n', '\r').replace('\n', '\r')
s = s.replace('\r', '\r\n')
else: # weird platforms?
print "Unknown line separator, skipping."
return s
--
Steven.
On Thu, 23 Feb 2006 17:41:47 +0000, Martin Franklin
<mf********@gatwick.westerngeco.slb.com> wrote: Franz Steinhaeusler wrote: On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklin <mf********@gatwick.westerngeco.slb.com> wrote:
>>> r="erewr \r\nafjdskl " 'erewr \r\nafjdskl'
2) Unix >>> r="erewr \nafjdskl " 'erewr\nafjdskl' why not use string methods strip, rstrip and lstrip
because this removes only the last spaces,> r 'erewr \r\nafjdskl '> r.rstrip() 'erewr \r\nafjdskl'
I want: 'erewr\r\nafjdskl'
or for unix line endings 'erewr\nafjdskl'
how about one of these variations
print 'erewr \r\nafjdskl '.replace(" ", "") print 'erewr \r\nafjdskl '.strip(" \t")
Version 1: w='erewr \r\nafjdskl '.replace(" ", "") w
'erewr\r\nafjdskl' w='erewr \nafjdskl '.replace(" ", "") w
'erewr\nafjdskl' w='word1 word2 \nafjdskl '.replace(" ", "") w
'word1word2\nafjdskl'
it replaces all spaces, not only the trailing whitespaces.
version 2:
w = 'erewr \r\nafjdskl '.strip(" \t") w
'erewr \r\nafjdskl' w = 'erewr \nafjdskl '.strip(" \t") w
'erewr \nafjdskl' w = 'word1 word2 \nafjdskl '.strip(" \t") w
'word1 word2 \nafjdskl'
I found a solution (not the most beautiful, but for
my purpose sufficiently good.)
Given: a file has no mixed lineendings, so it is either
a dos or unix file (mac line endings not respected).
swin="erewr \r\nafjdskl "
sunix="erewr \nafjdskl "
Dos Line endings (at least on '\r' included)?
r is contents of a file:
helpchar = ''
if r.find('\r') != -1:
helpchar = '\r'
retrailingwhitespacelf = re.compile('(?<=\S)[ \t'+helpchar+']+$',
re.MULTILINE)
newtext, n = retrailingwhitespace.subn(helpchar, r)
if n > 1:
r = newtext
--
Franz Steinhaeusler
On Fri, 24 Feb 2006 12:36:01 +0100, Franz Steinhaeusler
<fr*****************@gmx.at> wrote: if n > 1:
if n > 0: (of course) :-)
--
Franz Steinhaeusler
Franz Steinhaeusler wrote: Thank you all for the replies. But I still don't have a solution.
Of course with more lines it is possible, but it would be fine to have a "oneliner".
re.sub(r"\s+[\n\r]+", lambda x: x.expand("\g<0>"). \
lstrip(" \t\f\v"),text).rstrip()
....where "text" is the unsplit block of text with mysterious line-endings.
But I think your code is a lot easier to read. :)
Franz Steinhaeusler <fr*****************@gmx.at> wrote: On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklinwhy not use string methods strip, rstrip and lstrip because this removes only the last spaces, [given r = 'erewr \r\nafjdskl '] I want: 'erewr\r\nafjdskl'
os.linesep.join(l.rstrip() for l in r.split(os.linesep))
--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
On Fri, 24 Feb 2006 07:21:04 -0500, John Zenger
<jo*********@yahoo.com> wrote: Franz Steinhaeusler wrote: Thank you all for the replies. But I still don't have a solution.
Of course with more lines it is possible, but it would be fine to have a "oneliner".
re.sub(r"\s+[\n\r]+", lambda x: x.expand("\g<0>"). \ lstrip(" \t\f\v"),text).rstrip()
...where "text" is the unsplit block of text with mysterious line-endings.
But I think your code is a lot easier to read. :)
Hello John,
perfect, thank you,
but as you said, this is somehow not so easy to grasp.
(At least for me). :-)
--
Franz Steinhaeusler
On 24 Feb 2006 14:12:05 +0000 (GMT), Sion Arrowsmith
<si***@chiark.greenend.org.uk> wrote: Franz Steinhaeusler <fr*****************@gmx.at> wrote:On Thu, 23 Feb 2006 13:54:50 +0000, Martin Franklinwhy not use string methods strip, rstrip and lstrip because this removes only the last spaces, [given r = 'erewr \r\nafjdskl '] I want: 'erewr\r\nafjdskl'
os.linesep.join(l.rstrip() for l in r.split(os.linesep))
Hello Sion,
thank you, your solution, I like most!!
(it is clean und one don't have to use re's).
--
Franz Steinhaeusler This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: marco |
last post by:
hi folks,
i can not run any python scripts with dos lineendings under cygwin's python.
if i run such a scripts i get stupid syntax error messages from python.
what can i do to run these...
|
by: Ryan M |
last post by:
I've been programming for a while, but most of my experience is on unix.
How do C compilers work on operating systems that weren't written in C?
And that have no libc?
Compiling C on unix seems...
|
by: Graeme Downes |
last post by:
Hi
I'm trying to create a regular expression for C using the fnmatch
function from the fnmatch.h library.
I dont know if you can help, but i need something as follows:
#:]#
where #...
|
by: Tempo |
last post by:
I've been reading a bunch of articles and tutorials on the net, but I
cannot quite get ahold of the whole regular expression process. I have
a list that contains about thirty strings, each in its...
|
by: Pete Davis |
last post by:
I'm using regular expressions to extract some data and some links from some
web pages. I download the page and then I want to get a list of certain
links.
For building regular expressions, I use...
|
by: Steve |
last post by:
Hi All,
I'm having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()
Could anyone lend...
|
by: grant.trevor |
last post by:
I have a need to define a regular expression within a custom
configuration section. This can reside within a web.config or
other .config file.
Looking at the config below you can see the general...
|
by: Kai Rosenthal |
last post by:
Hello,
how can I resolve envionment variables in a string.
e.g.
strVar = /myVar
resolve in
str1 = /mytest02/$MYVAR/mytest02 --/mytest02//myVar/mytest02
(unix)
str2 =$MYVAR/mytest03 ...
|
by: e_matthes |
last post by:
Hello,
I have a function which uses a regular expression to validate text
input. Here's a short code sample testing the regex:
<?php
$dirty = "hello";
$clean = getCleanText($dirty, 0,50);...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |