By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,087 Members | 947 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,087 IT Pros & Developers. It's quick & easy.

Text Suffix to Prefix Conversion

P: n/a
Dear all,

I'm a postgraduate student in Hong Kong, studying english language. I
wanna seek help from all of you about some plain text manipulation.

I have already add part-of-speech (POS) tags with angle bracket by
software tagger, right after every word in my file, as attribute. How
could I change the tag suffix to tag prefix?

Original Sentence: An apple for you.
Present: An<AT0apple<NN1for<PRP>
you<PNP.<.>
Desire: <AT0>An <NN1>apple <PRP>for <PNP>you
<.>.

My file includes several hundred thousands of words. Manual editing is
not possible.

All suggestion are welcome!!

EMC ROY
19/04/2007

Apr 19 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
EMC ROY wrote:
Original Sentence: An apple for you.
Present: An<AT0apple<NN1for<PRPyou<PNP.<.>
Desire: <AT0>An <NN1>apple <PRP>for <PNP>you <.>.
>>text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
import re
re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)
'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'
Apr 19 '07 #2

P: n/a
On Apr 18, 11:08 pm, Steven Bethard <steven.beth...@gmail.comwrote:
EMC ROY wrote:
Original Sentence: An apple for you.
Present: An<AT0apple<NN1for<PRPyou<PNP.<.>
Desire: <AT0>An <NN1>apple <PRP>for <PNP>you <.>.
>text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
import re
re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)

'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'
If you end up calling re.sub() repeatedly, e.g. for each line in your
file, then you should "compile" the regular expression so that python
doesn't have to recompile it for every call:

import re

text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
re.sub(myR, r'\2\1\3', text)
Unfortunately, I must be doing something wrong because I can't get
that code to work. When I run it, I get the error:

Traceback (most recent call last):
File "2pythontest.py", line 3, in ?
myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre.py", line 180, in compile
return _compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre.py", line 225, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_compile.py", line 496, in compile
p = sre_parse.parse(p, flags)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_parse.py", line 668, in parse
p = _parse_sub(source, pattern, 0)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_parse.py", line 308, in _parse_sub
itemsappend(_parse(source, state))
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_parse.py", line 396, in _parse
if state.flags & SRE_FLAG_VERBOSE:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Yet, these two examples work without error:

------
import re

text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
#myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
print re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)

myR = re.compile(r'(hello)')
text = "hello world"
print re.sub(myR, r"\1XXX", text)

---------output:
<AT0>An <NN1>apple <PRP>for <PNP>you <.>.
helloXXX world
Can anyone help?


Apr 19 '07 #3

P: n/a
7stud wrote:
On Apr 18, 11:08 pm, Steven Bethard <steven.beth...@gmail.comwrote:
>EMC ROY wrote:
Original Sentence: An apple for you.
Present: An<AT0apple<NN1for<PRPyou<PNP.<.>
Desire: <AT0>An <NN1>apple <PRP>for <PNP>you <.>.
text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
import re
re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)

'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'

If you end up calling re.sub() repeatedly, e.g. for each line in your
file, then you should "compile" the regular expression so that python
doesn't have to recompile it for every call:

import re

text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
re.compile() doesn't accept a replacement pattern:

"""
Help on function compile in module re:

compile(pattern, flags=0)
Compile a regular expression pattern, returning a pattern object.
"""
re.sub(myR, r'\2\1\3', text)
Unfortunately, I must be doing something wrong because I can't get
that code to work. When I run it, I get the error:

Traceback (most recent call last):
File "2pythontest.py", line 3, in ?
myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre.py", line 180, in compile
return _compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre.py", line 225, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_compile.py", line 496, in compile
p = sre_parse.parse(p, flags)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_parse.py", line 668, in parse
p = _parse_sub(source, pattern, 0)
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_parse.py", line 308, in _parse_sub
itemsappend(_parse(source, state))
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/sre_parse.py", line 396, in _parse
if state.flags & SRE_FLAG_VERBOSE:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Yet, these two examples work without error:

------
import re

text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
#myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
print re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)

myR = re.compile(r'(hello)')
text = "hello world"
print re.sub(myR, r"\1XXX", text)

---------output:
<AT0>An <NN1>apple <PRP>for <PNP>you <.>.
helloXXX world
Can anyone help?
You can precompile the regular expression like this:
>>text = 'An<AT0apple<NN1for<PRPyou<PNP.<.>'
r = re.compile(r'(\S+)(<[^>]+>)(\s*)')
r.sub(r'\2\1\3', text)
'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'

or even
>>sub = re.compile(r'(\S+)(<[^>]+>)(\s*)').sub
sub(r'\2\1\3', text)
'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'

Note that this is not as much more efficient as you might think since
re.sub() and the other re functions look up already compiled regexps in a
cache.

Peter
Apr 19 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.