By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,001 Members | 1,176 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,001 IT Pros & Developers. It's quick & easy.

Insert characters into string based on re ?

P: n/a
I am attempting to reformat a string, inserting newlines before certain
phrases. For example, in formatting SQL, I want to start a new line at
each JOIN condition. Noting that strings are immutable, I thought it
best to spllit the string at the key points, then join with '\n'.

Regexps can seem the best way to identify the points in the string
('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
to identify multiple locationg in the string. However, the re.split
method returns the list without the split phrases, and re.findall does
not seem useful for this operation.

Suggestions?

Oct 12 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a

Matt wrote:
I am attempting to reformat a string, inserting newlines before certain
phrases. For example, in formatting SQL, I want to start a new line at
each JOIN condition. Noting that strings are immutable, I thought it
best to spllit the string at the key points, then join with '\n'.

Regexps can seem the best way to identify the points in the string
('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
to identify multiple locationg in the string. However, the re.split
method returns the list without the split phrases
Not without some minor effort on your part :-)
See below.
and re.findall does
not seem useful for this operation.

Suggestions?
Read the fine manual:
"""
split( pattern, string[, maxsplit = 0])

Split string by the occurrences of pattern. If capturing parentheses
are used in pattern, then the text of all groups in the pattern are
also returned as part of the resulting list. If maxsplit is nonzero, at
most maxsplit splits occur, and the remainder of the string is returned
as the final element of the list. (Incompatibility note: in the
original Python 1.5 release, maxsplit was ignored. This has been fixed
in later releases.)
>>re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

# Now see what happens when you use capturing parentheses:
>>re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
"""

HTH,
John

Oct 12 '06 #2

P: n/a

Matt wrote:
I am attempting to reformat a string, inserting newlines before certain
phrases. For example, in formatting SQL, I want to start a new line at
each JOIN condition. Noting that strings are immutable, I thought it
best to spllit the string at the key points, then join with '\n'.

Regexps can seem the best way to identify the points in the string
('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
to identify multiple locationg in the string. However, the re.split
method returns the list without the split phrases, and re.findall does
not seem useful for this operation.

Suggestions?
I think that re.sub is a more appropriate method rather than split and
join

trivial example (non SQL):
>>addnlre = re.compile('LEFT\s.*?\s*JOIN|RIGHT\s.*?\s*JOIN', re.DOTALL + re.IGNORECASE).sub
addnlre(lambda x: x.group() + '\n', '... LEFT JOIN x RIGHT OUTER join y')
'... LEFT JOIN\n x RIGHT OUTER join\n y'

Oct 13 '06 #3

P: n/a
ha***********@informa.com wrote:
>
Matt wrote:
>I am attempting to reformat a string, inserting newlines before
certain phrases. For example, in formatting SQL, I want to start a
new line at each JOIN condition. Noting that strings are immutable, I
thought it best to spllit the string at the key points, then join
with '\n'.

I think that re.sub is a more appropriate method rather than split and
join

trivial example (non SQL):
>>>addnlre = re.compile('LEFT\s.*?\s*JOIN|RIGHT\s.*?\s*JOIN',
re.DOTALL + re.IGNORECASE).sub addnlre(lambda x: x.group() + '\n',
'... LEFT JOIN x RIGHT OUTER join y')
'... LEFT JOIN\n x RIGHT OUTER join\n y'

Quite apart from the original requirement being to insert newlines before
rather than after the phrase, I wouldn't have said re.sub was appropriate.
>>addnlre(lambda x: x.group() + '\n',
"select * from whatever where action in ['user left site', 'user joined site']")
"select * from whatever where action in ['user left site', 'user join\ned site']"

or with the newline before the pattern:
>>addnlre(lambda x: '\n'+x.group(),
"select * from whatever where action in ['user left site', 'user joined site']")
"select * from whatever where action in ['user \nleft site', 'user joined site']"

Oct 13 '06 #4

P: n/a
Matt wrote:
I am attempting to reformat a string, inserting newlines before certain
phrases. For example, in formatting SQL, I want to start a new line at
each JOIN condition. Noting that strings are immutable, I thought it
best to spllit the string at the key points, then join with '\n'.

Regexps can seem the best way to identify the points in the string
('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
to identify multiple locationg in the string. However, the re.split
method returns the list without the split phrases, and re.findall does
not seem useful for this operation.

Suggestions?

Matt,

You may want to try this solution:
>>import SE
>>Formatter = SE.SE (' "~(?i)(left|inner|right|outer).*join~=\n=" ')
# Details explained below the dotted line
>>print Formatter ('select id, people.* from ids left outer join
people where ...\nSELECT name, job from people INNER JOIN jobs WHERE
....;\n')
select id, people.* from ids
left outer join people where ...
SELECT name, job from people
INNER JOIN jobs where ...;

You may add other substitutions as required one by one, interactively
tweaking each one until it does what it is supposed to do:
>>Formatter = SE.SE ('''
"~(?i)(left|inner|right|outer).*join~=\n =" # Add an indentation
"where=\n where" "WHERE=\n WHERE" # Add a newline also
before 'where'
";\n=;\n\n" # Add an extra line feed
"\n=;\n\n" # And add any missing
semicolon
# etc.
''')
>>print Formatter ('select id, people.* from ids left outer join
people where ...\nSELECT name, job from people INNER JOIN jobs WHERE
....;\n')
select id, people.* from ids
left outer join people
where ...;

SELECT name, job from people
INNER JOIN jobs
WHERE ...;
http://cheeseshop.python.org/pypi?:a...SE&version=2.3
Frederic
----------------------------------------------------------------------------------------------------------------------

The anatomy of a replacement definition
>>Formatter = SE.SE (' "~(?i)(left|inner|right|outer).*join~=\n=" ')
target=substitute (first '=')
>>Formatter = SE.SE (' "~(?i)(left|inner|right|outer).*join~=\n=" ')
= (each
following '=' stands for matched target)
>>Formatter = SE.SE (' "~(?i)(left|inner|right|outer).*join~=\n=" ')
~ ~ (contain
regular expression)
>>Formatter = SE.SE (' "~(?i)(left|inner|right|outer).*join~=\n=" ')
" "
(contain definition containing white space)

Oct 14 '06 #5

P: n/a
Frederic Rentsch wrote:
Matt wrote:
>I am attempting to reformat a string, inserting newlines before certain
phrases. For example, in formatting SQL, I want to start a new line at
each JOIN condition. Noting that strings are immutable, I thought it
best to spllit the string at the key points, then join with '\n'.

Regexps can seem the best way to identify the points in the string
('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
to identify multiple locationg in the string. However, the re.split
method returns the list without the split phrases, and re.findall does
not seem useful for this operation.

Suggestions?


Matt,

You may want to try this solution:
>import SE
.... snip
>
http://cheeseshop.python.org/pypi?:a...SE&version=2.3
For reasons unknown, the new download for SE is on the old page:
http://cheeseshop.python.org/pypi/SE/2.2%20beta.
>

Frederic
----------------------------------------------------------------------------------------------------------------------
Oct 15 '06 #6

P: n/a
Hi,
initially I had the same idea before I started writing a SQL Formatter.
I was sure that coding a few "change" commands in a script would
reformat my SQL statements. But step by step I recognized that SQL
statements can not be formatted by regular expressions. Why not?
Because there is a risk that you change e.g. values in literals and
this is changing the result of a query!!
Example:

--Select pieces where status like "Join with master piece"

Inserting line-breaks before joins using a "change" command would
change the SQL statement into

--Select pieces where status like "\nJoin with master piece"

The new select statement is no more working in the same way as the
original one.

In the meantime, the "script" has about 80 pages of code .....

Regards
GuidoMarcel

Oct 19 '06 #7

P: n/a

You can test it here: http://www.sqlinform.com

Oct 27 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.