I need to find all the same words in a text .
What would be the best idea to do that?
I used string.find but it does not work properly for the words.
Let suppose I want to find a number 324 in the text
'45 324 45324'
there is only one occurrence of 324 word but string.find() finds 2
occurrences ( in 45324 too)
Must I use regex?
Thanks for help
L. 10 3917
On Sat, Feb 10, 2007 at 05:29:23AM -0800, Johny wrote:
>I need to find all the same words in a text . What would be the best idea to do that? I used string.find but it does not work properly for the words. Let suppose I want to find a number 324 in the text
'45 324 45324'
there is only one occurrence of 324 word but string.find() finds 2 occurrences ( in 45324 too)
>>'45 324 45324'.split(). count('324')
1
>>>
ciao
marco
--
reply to `python -c "print 'm********@itsu ig.ocram'[::-1]"`
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFFzcu6mQR KGuVp5FMRArzTAK CpmT/ykP1K8HQaF30phL eq8zBUzQCfZCEU
6RA4kH2QdMe0wcm 97MrUWfM=
=p9iU
-----END PGP SIGNATURE-----
On Feb 10, 2:42 pm, Marco Giusti <marco.giu...@g mail.comwrote:
On Sat, Feb 10, 2007 at 05:29:23AM -0800, Johny wrote:
I need to find all the same words in a text .
What would be the best idea to do that?
I used string.find but it does not work properly for the words.
Let suppose I want to find a number 324 in the text
'45 324 45324'
there is only one occurrence of 324 word but string.find() finds 2
occurrences ( in 45324 too)
>>'45 324 45324'.split(). count('324')
1
>>>
ciao
Marco,
Thank you for your help.
It works perfectly but I forgot to say that I also need to find the
possition of each word's occurrence.Is it possible that
Thanks.
L
Johny wrote:
>Let suppose I want to find a number 324 in the text
>'45 324 45324'
>there is only one occurrence of 324 word but string.find() finds 2 occurrences ( in 45324 too)
> >>'45 324 45324'.split(). count('324')
1
> >>>
ciao
Marco,
Thank you for your help.
It works perfectly but I forgot to say that I also need to find the
possition of each word's occurrence.Is it possible that
>>[i for i, e in enumerate('45 324 45324'.split()) if e=='324']
[1]
>>>
--
Under construction
On Sat, Feb 10, 2007 at 06:00:05AM -0800, Johny wrote:
>On Feb 10, 2:42 pm, Marco Giusti <marco.giu...@g mail.comwrote:
>On Sat, Feb 10, 2007 at 05:29:23AM -0800, Johny wrote:
>I need to find all the same words in a text . What would be the best idea to do that? I used string.find but it does not work properly for the words. Let suppose I want to find a number 324 in the text
>'45 324 45324'
>there is only one occurrence of 324 word but string.find() finds 2 occurrences ( in 45324 too)
> >>'45 324 45324'.split(). count('324')
1
> >>>
ciao
Marco, Thank you for your help. It works perfectly but I forgot to say that I also need to find the possition of each word's occurrence.Is it possible that
>>li = '45 324 45324'.split() li.index('324 ')
1
>>
play with count and index and take a look at the help of both
ciao
marco
--
reply to `python -c "print 'm********@itsu ig.ocram'[::-1]"`
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFFzdOomQR KGuVp5FMRAt3/AKCSyzCOdSRijxL 0GjK3tspZ/sHaYwCfeDzZ
5pmB1RyUlGjhrnx y1YBFArU=
=r/Hl
-----END PGP SIGNATURE-----
* Johny (10 Feb 2007 05:29:23 -0800)
I need to find all the same words in a text .
What would be the best idea to do that?
I used string.find but it does not work properly for the words.
Let suppose I want to find a number 324 in the text
'45 324 45324'
there is only one occurrence of 324 word but string.find() finds 2
occurrences ( in 45324 too)
Must I use regex?
There are two approaches: one is the "solve once and forget" approach
where you code around this particular problem. Mario showed you one
solution for this.
The other approach would be to realise that your problem is a specific
case of two general problems: partitioning a sequence by a separator
and partioning a sequence into equivalence classes. The bonus for this
approach is that you will have a /lot/ of problems that can be solved
with either one of these utils or a combination of them.
1>>a = '45 324 45324'
2>>quotient_set (part(a, [' ', ' '], 'sep'), ident)
2: {'324': ['324'], '45': ['45'], '45324': ['45324']}
The latter approach is much more flexible. Just imagine your problem
changes to a string that's separated by newlines (instead of spaces)
and you want to find words that start with the same character (instead
of being the same as criterion).
Thorsten
"Johny" <py****@hope.cz on 10 Feb 2007 05:29:23 -0800 didst step
forth and proclaim thus:
I need to find all the same words in a text .
What would be the best idea to do that?
I make no claims of this being the best approach:
=============== =====
def findOccurances( a_string, word):
"""
Given a string and a word, returns a double:
[0] = count [1] = list of indexes where word occurs
"""
import re
count = 0
indexes = []
start = 0 # offset for successive passes
pattern = re.compile(r'\b %s\b' % word, re.I)
while True:
match = pattern.search( a_string)
if not match: break
count += 1;
indexes.append( match.start() + start)
start += match.end()
a_string = a_string[match.end():]
return (count, indexes)
=============== =====
Seems to work for me. No guarantees.
--
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
On 2007-02-10, Johny <py****@hope.cz wrote:
I need to find all the same words in a text .
What would be the best idea to do that?
I used string.find but it does not work properly for the words.
Let suppose I want to find a number 324 in the text
'45 324 45324'
there is only one occurrence of 324 word but string.find() finds 2
occurrences ( in 45324 too)
Must I use regex?
Thanks for help
The first thing to do is to answer the question: What is a word?
The second thing to do is to design some code that can find
words in strings.
The last thing to do is to search those actual words for the word
you're looking for.
--
Neil Cerutti
In order to find all the words in a text, you need to tokenize it first.
The rest is a matter of calling the count method on the list of
tokenized words. For tokenization look here: http://nltk.sourceforge.net/lite/doc/en/words.html
A little bit of warning: depending on what exactly you need to do, the
seemingly trivial taks of tokenizing a text can become quite complex.
Enjoy,
Maël
Neil Cerutti schrieb:
On 2007-02-10, Johny <py****@hope.cz wrote:
>I need to find all the same words in a text . What would be the best idea to do that? I used string.find but it does not work properly for the words. Let suppose I want to find a number 324 in the text
'45 324 45324'
there is only one occurrence of 324 word but string.find() finds 2 occurrences ( in 45324 too)
Must I use regex? Thanks for help
The first thing to do is to answer the question: What is a word?
The second thing to do is to design some code that can find
words in strings.
The last thing to do is to search those actual words for the word
you're looking for.
On Feb 11, 5:13 am, Samuel Karl Peterson
<skpeter...@nos pam.please.ucda vis.eduwrote:
"Johny" <pyt...@hope.cz on 10 Feb 2007 05:29:23 -0800 didst step
forth and proclaim thus:
I need to find all the same words in a text .
What would be the best idea to do that?
I make no claims of this being the best approach:
=============== =====
def findOccurances( a_string, word):
"""
Given a string and a word, returns a double:
[0] = count [1] = list of indexes where word occurs
"""
import re
count = 0
indexes = []
start = 0 # offset for successive passes
pattern = re.compile(r'\b %s\b' % word, re.I)
while True:
match = pattern.search( a_string)
if not match: break
count += 1;
indexes.append( match.start() + start)
start += match.end()
a_string = a_string[match.end():]
return (count, indexes)
=============== =====
Seems to work for me. No guarantees.
More concisely:
import re
pattern = re.compile(r'\b 324\b')
indices = [ match.start() for match in
pattern.findite r(target_string ) ]
print "Indices", indices
print "Count: ", len(indices)
--
Cheers,
Steven This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Gary |
last post by:
Hello,
Is it possible to dynamically update a textbox with words chosen from a
list using form checkboxes and javascript?
Gary
|
by: t0M |
last post by:
It's nearly impossible to find anything on this because of the
Dictionary class, included within the dotnet framework, that pollutes
any search results pertinent to my question.
I want to be able to access an array of words, just like it were a
physical dictionary. It's for a stupid little project, but it
involves decryption. Is there a file that Word has which contains
this collection of words that I can import somehow into my dotnet...
|
by: SoftComplete Development |
last post by:
AlphaTIX is a powerful, fast, scalable and easy to use Full Text
Indexing and Retrieval library that will completely satisfy your
application's indexing and retrieval needs.
AlphaTIX indexing technology provides you with highest indexing
performance, possibility to index very large sets of data in minimal
time even with memory constraints and unbelievable fast query
processing speed.
The main AlphaTIX's feature that makes it first and...
|
by: Paula |
last post by:
Hi !!
I have to find some words in a string.
I can use string.IndexOf, LastIndexOf, etc, but they are case
sensitive.
And there is another problem : If I found the word, I have to get
three words before and after the found word .
Example:
|
by: Raed Sawalha |
last post by:
I have the following text:-
Brian went to stadium to watch the soccer game, Brian MacWoods is bussiness
man and very rich man.
Brian likes to run every morning on beachside.
the problem i have I get the list of words that should be replace in the
provided text as follows:-
Brian (ONLY) : should be replaced by Mr with Brian word itself==> will be
| |
by: micklee74 |
last post by:
hi
say i have string like this
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
if i want to find which postion is 1234, how can i achieve this...? i
want to use index() but it only give me the first occurence. I want to
know the positions of both "1234"
thanks
|
by: =?Utf-8?B?Q2hyaXM=?= |
last post by:
Hi,
How can I implement regex to find complete or partial words or a group of
words. Similar to the "Find" in MS Word. I need to scan text files eg
look for "test"
or "this is a test"
or
"this is a te"
Thanks
|
by: inFocus |
last post by:
Hello,
I am new to python and wanted to write something for myself where
after inputing two words it would search entire drive and when finding
both names in files name would either copy or move thoe files to a
specified directory.
But couple of attempts did not work as desired this is one of them.
Could someone help fix it or maybe give a better example.
|
by: jeddiki |
last post by:
I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.
The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.
So the idea is to search through a string and find repeats of any 3 or 4 word group.
So if the author has repeated the phrase "then I went" 6 times in the text, then this would be found and...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |