By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,190 Members | 801 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,190 IT Pros & Developers. It's quick & easy.

searching through a string and pulling characters

P: n/a

This is similar to my last post, but a little different. Here is what I would
like to do.

Lets say I have a text file. The contents look like this, only there is A
LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3.
() The first three letters of the alphabet, used for the whole alphabet.
() In church or chapel style; -- said of compositions sung in the old church
style, without instrumental accompaniment; as, a mass a capella, i. e., a
mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of demarcation, as
a river or road.

Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help. If you made it all the way through thanks!
;)
--
View this message in context: http://www.nabble.com/searching-thro...p19039594.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Aug 18 '08 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On Mon, 18 Aug 2008 13:40:13 -0700 (PDT), Alexnb wrote:
Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help. If you made it all the way through thanks!
;)
findall comes to mind:
>>a="""(string1)
.... (string2)
.... (string3)
.... (string4)
.... (string5)
.... (string6)"""
>>import re
pat = re.compile("(\(.*?\))")
and now let's say you want to get fourth element:
>>pat.findall(a)[3]
'(string4)'

To save some memory use finditer (as long as you don't have to search
for too many of these):
>>for i in enumerate(pat.finditer(a)):
.... if i[0] == 2:
.... print i[1].group()
....
(string3)
>>>

--
Regards,
Wojtek Walczak,
http://www.stud.umk.pl/~wojtekwa/
Aug 18 '08 #2

P: n/a
On Mon, 18 Aug 2008 21:43:43 +0000 (UTC), Wojtek Walczak wrote:
On Mon, 18 Aug 2008 13:40:13 -0700 (PDT), Alexnb wrote:
>Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help. If you made it all the way through thanks!
;)

findall comes to mind:
....forget it, I misread your post :)

--
Regards,
Wojtek Walczak,
http://www.stud.umk.pl/~wojtekwa/
Aug 18 '08 #3

P: n/a
On Aug 19, 6:40 am, Alexnb <alexnbr...@gmail.comwrote:
This is similar to my last post,
Oh, goodie goodie goodie, I love guessing games!
but a little different. Here is what I would
like to do.

Lets say I have a text file. The contents look like this, only there is A
LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3.
() The first three letters of the alphabet, used for the whole alphabet.
() In church or chapel style; -- said of compositions sung in the old church
style, without instrumental accompaniment; as, a mass a capella, i. e., a
mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of demarcation, as
a river or road.
This looks like the "values" part of an abbreviation/acronym
dictionary ... what has happened to the "keys" part (A1, ABC, AC, ?
astride?, ...)

Does "()" appear always at the start of a line (perhaps preceded by
some whitespace), or can it appear in the middle of a line?

Are you sure about "A 2" and "A 3"? I would have expected "A2" and
"A3". In other words, is the above an exact copy of some input or have
you re-typed it?

"()" is a strange way of delimiting things ...

OK, here's my guess: You have acquired a database with two tables.
Table K maps e.g. "ABC" to 2. Table V maps 2 to "The first three
letters of the alphabet, used for the whole alphabet." You have used
some utility or done "select '() ' + column2 from V.
>
Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help.
The best way to do this is to write a small simple Python script. I
suggest that you try this, and if you have difficulties, post your
attempt here together with a lucid description of the perceived
problem.

However searching through a large file (how many Mb?) looking for the
nth occurrence of "()" doesn't sound like a good idea after about the
10th time you do it. Perhaps it might be worth the extra effort to
process the text file once and insert the results in a (say) SQLite
data base so that later you can do "select column2 from V where
column1 = 245".

A really silly question: You say "I will have a number" (e.g. 245);
what is the source or provenance of this ordinal? A random number
generator? Inscription on a ticket passed through a wicket? "select
column2 from K where column1 = 'A1'"? IOW, perhaps you may need to
consider the larger problem.

Cheers,
John
Aug 18 '08 #4

P: n/a

Okay, well the point of this program is to steal from the OS X built-in
dictionary. While most of the files are hidden this one is not.
The "()" You saw actually looks like this: () only the []'s are <'s
and >'s but the forum doesn't take kindly to html.

What you saw was exactly how it will always be (by that I am talking about
the A 2 A 3 thing)

The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work, but
that is the Idea.

Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I can
download to have an offline reference?

John Machin wrote:
>
On Aug 19, 6:40 am, Alexnb <alexnbr...@gmail.comwrote:
>This is similar to my last post,

Oh, goodie goodie goodie, I love guessing games!
>but a little different. Here is what I would
like to do.

Lets say I have a text file. The contents look like this, only there is A
LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3.
() The first three letters of the alphabet, used for the whole alphabet.
() In church or chapel style; -- said of compositions sung in the old
church
style, without instrumental accompaniment; as, a mass a capella, i. e., a
mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of demarcation,
as
a river or road.

This looks like the "values" part of an abbreviation/acronym
dictionary ... what has happened to the "keys" part (A1, ABC, AC, ?
astride?, ...)

Does "()" appear always at the start of a line (perhaps preceded by
some whitespace), or can it appear in the middle of a line?

Are you sure about "A 2" and "A 3"? I would have expected "A2" and
"A3". In other words, is the above an exact copy of some input or have
you re-typed it?

"()" is a strange way of delimiting things ...

OK, here's my guess: You have acquired a database with two tables.
Table K maps e.g. "ABC" to 2. Table V maps 2 to "The first three
letters of the alphabet, used for the whole alphabet." You have used
some utility or done "select '() ' + column2 from V.
>>
Now, I am talking 1000's of these. I need to do something like this. I
will
have a number, and what I want to do is go through this text file, just
like
the example. The trick is this, those "()'s" are what I need to match, so
if
the number is 245 I need to find the 245th () and then get the all the
text
from after it until the next (). If you have an idea about the best way
to
do this I would love your help.

The best way to do this is to write a small simple Python script. I
suggest that you try this, and if you have difficulties, post your
attempt here together with a lucid description of the perceived
problem.

However searching through a large file (how many Mb?) looking for the
nth occurrence of "()" doesn't sound like a good idea after about the
10th time you do it. Perhaps it might be worth the extra effort to
process the text file once and insert the results in a (say) SQLite
data base so that later you can do "select column2 from V where
column1 = 245".

A really silly question: You say "I will have a number" (e.g. 245);
what is the source or provenance of this ordinal? A random number
generator? Inscription on a ticket passed through a wicket? "select
column2 from K where column1 = 'A1'"? IOW, perhaps you may need to
consider the larger problem.

Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list

--
View this message in context: http://www.nabble.com/searching-thro...p19041356.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Aug 18 '08 #5

P: n/a
On Aug 19, 8:34 am, Alexnb <alexnbr...@gmail.comwrote:
The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work, but
that is the Idea.
Consider (1) an existing (free) dictionary application (2) using a
database, if you feel you must write your own application.
>
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I can
download to have an offline reference?
What happened when you did:

Aug 18 '08 #6

P: n/a
On Mon, 18 Aug 2008 13:40:13 -0700, Alexnb wrote:
Lets say I have a text file. The contents look like this, only there is
A LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3. ()
The first three letters of the alphabet, used for the whole alphabet. ()
In church or chapel style; -- said of compositions sung in the old
church style, without instrumental accompaniment; as, a mass a capella,
i. e., a mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of
demarcation, as a river or road.

Now, I am talking 1000's of these. I need to do something like this. I
will have a number, and what I want to do is go through this text file,
just like the example. The trick is this, those "()'s" are what I need
to match, so if the number is 245 I need to find the 245th () and then
get the all the text from after it until the next (). If you have an
idea about the best way to do this I would love your help. If you made
it all the way through thanks! ;)

If I take your description of the problem literally, then the solution is:

text = "() A registry mark given ..." # lots and lots of text
blocks = text.split( "()" ) # use a literal "()" as a delimiter
answer = blocks[n] # whichever number you want, starting counting at 0
I suspect that the problem is more complicated than you are saying. I
guess that in your actual data, the brackets () probably have something
inside them. It looks like you are quoting definitions from a dictionary.

Alex, a word of advice for you: we really don't like playing guessing
games. If you get a reputation for describing your problem inaccurately,
incompletely or cryptically, you will find fewer and fewer people willing
to answer your questions. I recommend that you spend a few minutes now
reading this page and save yourself a lot of grief later:

http://www.catb.org/~esr/faqs/smart-questions.html

Now, back to your problem. If my guess is right, and the brackets
actually have text inside them, then my simple solution above will not
work. You will need a more complicated solution using a regular
expression or a parser. That solution will depend on whether or not you
can get nested brackets "(ab (123 (fee fi fum) 456) cd ef)" or arbitrary
single brackets without the matching pair.

Your question also sounds suspiciously like homework. I don't do people's
homework, but here's something to get you started. It's not a solution,
but it can be used as the first step towards a solution.

text = "() A registry mark given ..." # lots and lots of text
level = 0
blocks = []
for c in text: # process text one character at a time
if c == '(':
print "Found an opening bracket"
level += 1 # one deeper in brackets
elif c == ')':
level -= 1
if level < 0:
print "Found a close bracket without matching open bracket"
else:
print "Found a closing bracket"
else: # any other character
# here's where you do the real work
if level == 0:
print "Not inside a bracket"
blocks.append(c)
else:
print "Inside a bracket"
if level 0:
print "Missing close bracket"
text_minus_bracketed_words = ''.join(blocks)

--
Steven
Aug 18 '08 #7

P: n/a
On Aug 19, 8:34 am, Alexnb <alexnbr...@gmail.comwrote:
>
The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work, but
that is the Idea.
Consider (1) an existing (free) dictionary application (2) using a
database, if you feel you must write your own application.
>
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I can
download to have an offline reference?
There's this thing called google (http://www.google.com). It's an
example of a "web search engine". If you type (for example) "free
dictionary download" (without the quotes!) into the text box and then
click on the "Google Search" button, it will come back with a list of
web pages where those words appear (e.g. http://www.dicts.info/dictionaries.php)

HTH,
John
Aug 18 '08 #8

P: n/a

If by "What happened when you did:" you mean dictionary.com and
yourdictionary.com? Nothing, they work but screen scraping isn't medicore at
best. They both work fine (yourdictionary is better for screen scraping)
but. I want maybe an offline soloution. But the whole reason for the program
is that I can type in 20 words at one time, get them defined and formatted
and then save all from my app. So far, all is good, I just need an offline
soloution, or one from a database. You say a free dictionary program. But
how can I get definitions from another program w/o opening it? Anyway,
Ideas?
John Machin wrote:
>
On Aug 19, 8:34 am, Alexnb <alexnbr...@gmail.comwrote:
>The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work,
but
that is the Idea.

Consider (1) an existing (free) dictionary application (2) using a
database, if you feel you must write your own application.
>>
Also, on a side-note, does anyone know a very simple dictionary site,
that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I
can
download to have an offline reference?

What happened when you did:

--
http://mail.python.org/mailman/listinfo/python-list

--
View this message in context: http://www.nabble.com/searching-thro...p19041720.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Aug 18 '08 #9

P: n/a
On Mon, 18 Aug 2008 15:34:12 -0700, Alexnb wrote:
Okay, well the point of this program is to steal from the OS X built-in
dictionary.
Ah, not homework, but copyright infringement.
Also, on a side-note, does anyone know a very simple dictionary site,
that isn't dictionary.com or yourdictionary.com. Or, a free dictionary
that I can download to have an offline reference?
http://en.wiktionary.org/wiki/Wiktionary:Main_Page

Goggling on "free dictionary OS X" comes up with 417,000 hits. I'm pretty
sure at least some of them will be relevant to what you want.

--
Steven
Aug 18 '08 #10

P: n/a
On 19 Aug, 01:11, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:
On Mon, 18 Aug 2008 15:34:12 -0700, Alexnb wrote:
Okay, well the point of this program is to steal from the OS X built-in
dictionary.

Ah, not homework, but copyright infringement.
It depends what the inquirer is doing and what they mean by "steal".
Given the propaganda around "unauthorised" usage of content that is
pervasive these days ("you don't own that DVD: you just have our
temporary and conditional permission to watch it, pirate!"), the
inquirer may have been led to believe that just reading from a file on
their own system rather than using the nominated application is
somehow to "steal" from that file, even though it is content which has
presumably been obtained legitimately, even paid for in the case of OS
X.

Even if the end-user licence agreement were to attempt to wash away
any "fair use" (or just common sense) rights to using the content in
the way described by the inquirer - recalling that OS X is an Apple
product, so such games wouldn't be beneath that particular vendor - I
can't see how it does much good to dignify such antics with
unqualified cries of "copyright infringement". Indeed, for those not
acquainted with copyright and licensing, it probably just serves to
reinforce the dishonest message that they have to pay over and over
for content they already have and not to question what it is they're
paying for.

Paul
Aug 19 '08 #11

P: n/a
On Mon, 18 Aug 2008 15:34:12 -0700 (PDT), Alexnb wrote:
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com.
This one is my favourite: http://www.lingro.com/
--
Regards,
Wojtek Walczak,
http://tosh.pl/gminick/
Aug 19 '08 #12

P: n/a
On Aug 19, 6:11*am, Wojtek Walczak <gmin...@bzt.bztwrote:
On Mon, 18 Aug 2008 15:34:12 -0700 (PDT), Alexnb wrote:
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com.

This one is my favourite:http://www.lingro.com/

--
Regards,
Wojtek Walczak,http://tosh.pl/gminick/
Thats hot!
Aug 20 '08 #13

This discussion thread is closed

Replies have been disabled for this discussion.