472,971 Members | 1,890 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,971 software developers and data experts.

searching through a string and pulling characters


This is similar to my last post, but a little different. Here is what I would
like to do.

Lets say I have a text file. The contents look like this, only there is A
LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3.
() The first three letters of the alphabet, used for the whole alphabet.
() In church or chapel style; -- said of compositions sung in the old church
style, without instrumental accompaniment; as, a mass a capella, i. e., a
mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of demarcation, as
a river or road.

Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help. If you made it all the way through thanks!
;)
--
View this message in context: http://www.nabble.com/searching-thro...p19039594.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Aug 18 '08 #1
12 2318
On Mon, 18 Aug 2008 13:40:13 -0700 (PDT), Alexnb wrote:
Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help. If you made it all the way through thanks!
;)
findall comes to mind:
>>a="""(string1)
.... (string2)
.... (string3)
.... (string4)
.... (string5)
.... (string6)"""
>>import re
pat = re.compile("(\(.*?\))")
and now let's say you want to get fourth element:
>>pat.findall(a)[3]
'(string4)'

To save some memory use finditer (as long as you don't have to search
for too many of these):
>>for i in enumerate(pat.finditer(a)):
.... if i[0] == 2:
.... print i[1].group()
....
(string3)
>>>

--
Regards,
Wojtek Walczak,
http://www.stud.umk.pl/~wojtekwa/
Aug 18 '08 #2
On Mon, 18 Aug 2008 21:43:43 +0000 (UTC), Wojtek Walczak wrote:
On Mon, 18 Aug 2008 13:40:13 -0700 (PDT), Alexnb wrote:
>Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help. If you made it all the way through thanks!
;)

findall comes to mind:
....forget it, I misread your post :)

--
Regards,
Wojtek Walczak,
http://www.stud.umk.pl/~wojtekwa/
Aug 18 '08 #3
On Aug 19, 6:40 am, Alexnb <alexnbr...@gmail.comwrote:
This is similar to my last post,
Oh, goodie goodie goodie, I love guessing games!
but a little different. Here is what I would
like to do.

Lets say I have a text file. The contents look like this, only there is A
LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3.
() The first three letters of the alphabet, used for the whole alphabet.
() In church or chapel style; -- said of compositions sung in the old church
style, without instrumental accompaniment; as, a mass a capella, i. e., a
mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of demarcation, as
a river or road.
This looks like the "values" part of an abbreviation/acronym
dictionary ... what has happened to the "keys" part (A1, ABC, AC, ?
astride?, ...)

Does "()" appear always at the start of a line (perhaps preceded by
some whitespace), or can it appear in the middle of a line?

Are you sure about "A 2" and "A 3"? I would have expected "A2" and
"A3". In other words, is the above an exact copy of some input or have
you re-typed it?

"()" is a strange way of delimiting things ...

OK, here's my guess: You have acquired a database with two tables.
Table K maps e.g. "ABC" to 2. Table V maps 2 to "The first three
letters of the alphabet, used for the whole alphabet." You have used
some utility or done "select '() ' + column2 from V.
>
Now, I am talking 1000's of these. I need to do something like this. I will
have a number, and what I want to do is go through this text file, just like
the example. The trick is this, those "()'s" are what I need to match, so if
the number is 245 I need to find the 245th () and then get the all the text
from after it until the next (). If you have an idea about the best way to
do this I would love your help.
The best way to do this is to write a small simple Python script. I
suggest that you try this, and if you have difficulties, post your
attempt here together with a lucid description of the perceived
problem.

However searching through a large file (how many Mb?) looking for the
nth occurrence of "()" doesn't sound like a good idea after about the
10th time you do it. Perhaps it might be worth the extra effort to
process the text file once and insert the results in a (say) SQLite
data base so that later you can do "select column2 from V where
column1 = 245".

A really silly question: You say "I will have a number" (e.g. 245);
what is the source or provenance of this ordinal? A random number
generator? Inscription on a ticket passed through a wicket? "select
column2 from K where column1 = 'A1'"? IOW, perhaps you may need to
consider the larger problem.

Cheers,
John
Aug 18 '08 #4

Okay, well the point of this program is to steal from the OS X built-in
dictionary. While most of the files are hidden this one is not.
The "()" You saw actually looks like this: () only the []'s are <'s
and >'s but the forum doesn't take kindly to html.

What you saw was exactly how it will always be (by that I am talking about
the A 2 A 3 thing)

The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work, but
that is the Idea.

Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I can
download to have an offline reference?

John Machin wrote:
>
On Aug 19, 6:40 am, Alexnb <alexnbr...@gmail.comwrote:
>This is similar to my last post,

Oh, goodie goodie goodie, I love guessing games!
>but a little different. Here is what I would
like to do.

Lets say I have a text file. The contents look like this, only there is A
LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3.
() The first three letters of the alphabet, used for the whole alphabet.
() In church or chapel style; -- said of compositions sung in the old
church
style, without instrumental accompaniment; as, a mass a capella, i. e., a
mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of demarcation,
as
a river or road.

This looks like the "values" part of an abbreviation/acronym
dictionary ... what has happened to the "keys" part (A1, ABC, AC, ?
astride?, ...)

Does "()" appear always at the start of a line (perhaps preceded by
some whitespace), or can it appear in the middle of a line?

Are you sure about "A 2" and "A 3"? I would have expected "A2" and
"A3". In other words, is the above an exact copy of some input or have
you re-typed it?

"()" is a strange way of delimiting things ...

OK, here's my guess: You have acquired a database with two tables.
Table K maps e.g. "ABC" to 2. Table V maps 2 to "The first three
letters of the alphabet, used for the whole alphabet." You have used
some utility or done "select '() ' + column2 from V.
>>
Now, I am talking 1000's of these. I need to do something like this. I
will
have a number, and what I want to do is go through this text file, just
like
the example. The trick is this, those "()'s" are what I need to match, so
if
the number is 245 I need to find the 245th () and then get the all the
text
from after it until the next (). If you have an idea about the best way
to
do this I would love your help.

The best way to do this is to write a small simple Python script. I
suggest that you try this, and if you have difficulties, post your
attempt here together with a lucid description of the perceived
problem.

However searching through a large file (how many Mb?) looking for the
nth occurrence of "()" doesn't sound like a good idea after about the
10th time you do it. Perhaps it might be worth the extra effort to
process the text file once and insert the results in a (say) SQLite
data base so that later you can do "select column2 from V where
column1 = 245".

A really silly question: You say "I will have a number" (e.g. 245);
what is the source or provenance of this ordinal? A random number
generator? Inscription on a ticket passed through a wicket? "select
column2 from K where column1 = 'A1'"? IOW, perhaps you may need to
consider the larger problem.

Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list

--
View this message in context: http://www.nabble.com/searching-thro...p19041356.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Aug 18 '08 #5
On Aug 19, 8:34 am, Alexnb <alexnbr...@gmail.comwrote:
The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work, but
that is the Idea.
Consider (1) an existing (free) dictionary application (2) using a
database, if you feel you must write your own application.
>
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I can
download to have an offline reference?
What happened when you did:

Aug 18 '08 #6
On Mon, 18 Aug 2008 13:40:13 -0700, Alexnb wrote:
Lets say I have a text file. The contents look like this, only there is
A LOT of the same thing.

() A registry mark given by underwriters (as at Lloyd's) to ships in
first-class condition. Inferior grades are indicated by A 2 and A 3. ()
The first three letters of the alphabet, used for the whole alphabet. ()
In church or chapel style; -- said of compositions sung in the old
church style, without instrumental accompaniment; as, a mass a capella,
i. e., a mass purely vocal.
() Astride; with a part on each side; -- used specif. in designating the
position of an army with the wings separated by some line of
demarcation, as a river or road.

Now, I am talking 1000's of these. I need to do something like this. I
will have a number, and what I want to do is go through this text file,
just like the example. The trick is this, those "()'s" are what I need
to match, so if the number is 245 I need to find the 245th () and then
get the all the text from after it until the next (). If you have an
idea about the best way to do this I would love your help. If you made
it all the way through thanks! ;)

If I take your description of the problem literally, then the solution is:

text = "() A registry mark given ..." # lots and lots of text
blocks = text.split( "()" ) # use a literal "()" as a delimiter
answer = blocks[n] # whichever number you want, starting counting at 0
I suspect that the problem is more complicated than you are saying. I
guess that in your actual data, the brackets () probably have something
inside them. It looks like you are quoting definitions from a dictionary.

Alex, a word of advice for you: we really don't like playing guessing
games. If you get a reputation for describing your problem inaccurately,
incompletely or cryptically, you will find fewer and fewer people willing
to answer your questions. I recommend that you spend a few minutes now
reading this page and save yourself a lot of grief later:

http://www.catb.org/~esr/faqs/smart-questions.html

Now, back to your problem. If my guess is right, and the brackets
actually have text inside them, then my simple solution above will not
work. You will need a more complicated solution using a regular
expression or a parser. That solution will depend on whether or not you
can get nested brackets "(ab (123 (fee fi fum) 456) cd ef)" or arbitrary
single brackets without the matching pair.

Your question also sounds suspiciously like homework. I don't do people's
homework, but here's something to get you started. It's not a solution,
but it can be used as the first step towards a solution.

text = "() A registry mark given ..." # lots and lots of text
level = 0
blocks = []
for c in text: # process text one character at a time
if c == '(':
print "Found an opening bracket"
level += 1 # one deeper in brackets
elif c == ')':
level -= 1
if level < 0:
print "Found a close bracket without matching open bracket"
else:
print "Found a closing bracket"
else: # any other character
# here's where you do the real work
if level == 0:
print "Not inside a bracket"
blocks.append(c)
else:
print "Inside a bracket"
if level 0:
print "Missing close bracket"
text_minus_bracketed_words = ''.join(blocks)

--
Steven
Aug 18 '08 #7
On Aug 19, 8:34 am, Alexnb <alexnbr...@gmail.comwrote:
>
The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work, but
that is the Idea.
Consider (1) an existing (free) dictionary application (2) using a
database, if you feel you must write your own application.
>
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I can
download to have an offline reference?
There's this thing called google (http://www.google.com). It's an
example of a "web search engine". If you type (for example) "free
dictionary download" (without the quotes!) into the text box and then
click on the "Google Search" button, it will come back with a list of
web pages where those words appear (e.g. http://www.dicts.info/dictionaries.php)

HTH,
John
Aug 18 '08 #8

If by "What happened when you did:" you mean dictionary.com and
yourdictionary.com? Nothing, they work but screen scraping isn't medicore at
best. They both work fine (yourdictionary is better for screen scraping)
but. I want maybe an offline soloution. But the whole reason for the program
is that I can type in 20 words at one time, get them defined and formatted
and then save all from my app. So far, all is good, I just need an offline
soloution, or one from a database. You say a free dictionary program. But
how can I get definitions from another program w/o opening it? Anyway,
Ideas?
John Machin wrote:
>
On Aug 19, 8:34 am, Alexnb <alexnbr...@gmail.comwrote:
>The number is based on the word(s) they type into my program, and then it
fetches the number that word is in the list of words and then will search
the definitions document and go to the nth def. It probably won't work,
but
that is the Idea.

Consider (1) an existing (free) dictionary application (2) using a
database, if you feel you must write your own application.
>>
Also, on a side-note, does anyone know a very simple dictionary site,
that
isn't dictionary.com or yourdictionary.com. Or, a free dictionary that I
can
download to have an offline reference?

What happened when you did:

--
http://mail.python.org/mailman/listinfo/python-list

--
View this message in context: http://www.nabble.com/searching-thro...p19041720.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Aug 18 '08 #9
On Mon, 18 Aug 2008 15:34:12 -0700, Alexnb wrote:
Okay, well the point of this program is to steal from the OS X built-in
dictionary.
Ah, not homework, but copyright infringement.
Also, on a side-note, does anyone know a very simple dictionary site,
that isn't dictionary.com or yourdictionary.com. Or, a free dictionary
that I can download to have an offline reference?
http://en.wiktionary.org/wiki/Wiktionary:Main_Page

Goggling on "free dictionary OS X" comes up with 417,000 hits. I'm pretty
sure at least some of them will be relevant to what you want.

--
Steven
Aug 18 '08 #10
On 19 Aug, 01:11, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:
On Mon, 18 Aug 2008 15:34:12 -0700, Alexnb wrote:
Okay, well the point of this program is to steal from the OS X built-in
dictionary.

Ah, not homework, but copyright infringement.
It depends what the inquirer is doing and what they mean by "steal".
Given the propaganda around "unauthorised" usage of content that is
pervasive these days ("you don't own that DVD: you just have our
temporary and conditional permission to watch it, pirate!"), the
inquirer may have been led to believe that just reading from a file on
their own system rather than using the nominated application is
somehow to "steal" from that file, even though it is content which has
presumably been obtained legitimately, even paid for in the case of OS
X.

Even if the end-user licence agreement were to attempt to wash away
any "fair use" (or just common sense) rights to using the content in
the way described by the inquirer - recalling that OS X is an Apple
product, so such games wouldn't be beneath that particular vendor - I
can't see how it does much good to dignify such antics with
unqualified cries of "copyright infringement". Indeed, for those not
acquainted with copyright and licensing, it probably just serves to
reinforce the dishonest message that they have to pay over and over
for content they already have and not to question what it is they're
paying for.

Paul
Aug 19 '08 #11
On Mon, 18 Aug 2008 15:34:12 -0700 (PDT), Alexnb wrote:
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com.
This one is my favourite: http://www.lingro.com/
--
Regards,
Wojtek Walczak,
http://tosh.pl/gminick/
Aug 19 '08 #12
On Aug 19, 6:11*am, Wojtek Walczak <gmin...@bzt.bztwrote:
On Mon, 18 Aug 2008 15:34:12 -0700 (PDT), Alexnb wrote:
Also, on a side-note, does anyone know a very simple dictionary site, that
isn't dictionary.com or yourdictionary.com.

This one is my favourite:http://www.lingro.com/

--
Regards,
Wojtek Walczak,http://tosh.pl/gminick/
Thats hot!
Aug 20 '08 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: jblazi | last post by:
I should like to search certain characters in a string and when they are found, I want to replace other characters in other strings that are at the same position (for a very simply mastermind game)...
4
by: tgiles | last post by:
Hi, all. Another bewildered newbie struggling with Python goodness. This time it's searching strings. The goal is to search a string for a value. The string is a variable I assigned the name...
4
by: Ken Fine | last post by:
I'm looking to find or create an ASP script that will take a string, examine it for a search term, and if it finds the search term in the string, return the highlighted search term along with the...
5
by: ingsms | last post by:
Hi, Does anyone have any suggestions on an approach to speed this up please? I need to check a given column in a given table to see whether there are any characters within any of the records...
3
by: Stewart Allen | last post by:
Hi there I'm trying to find part serial numbers between 2 numbers. The user selects a part number from a combo box and then enters a range of serial numbers into 2 text boxes and the resulting...
3
by: Paul H | last post by:
I have a text file that contains the following: ******************** __StartCustomerID_41 Name: Fred Smith Address: 57 Pew Road Croydon
8
by: Gordon Knote | last post by:
Hi can anyone tell me what's the best way to search in binary content? Best if someone could post or link me to some source code (in C/C++). The search should be as fast as possible and it would...
35
by: Cor | last post by:
Hallo, I have promised Jay B yesterday to do some tests. The subject was a string evaluation that Jon had send in. Jay B was in doubt what was better because there was a discussion in the C#...
0
by: Gabriel Genellina | last post by:
En Mon, 18 Aug 2008 17:40:13 -0300, Alexnb <alexnbryan@gmail.comescribió: pydata = """() A registry mark given by underwriters (as at Lloyd's) to ships in .... first-class condition. Inferior...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
3
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.