473,411 Members | 1,880 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,411 software developers and data experts.

Suggestion: str.itersplit()

>From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?

Apr 21 '07 #1
10 1748
In <11*********************@b58g2000hsg.googlegroups. com>, Dustan wrote:
From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?
Does it really make such a difference?

Ciao,
Marc 'BlackJack' Rintsch
Apr 21 '07 #2
Marc 'BlackJack' Rintsch wrote:
In <11*********************@b58g2000hsg.googlegroups. com>, Dustan wrote:
>From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?

Does it really make such a difference?
It would if you were dealing with enormous blocks of text at once, say
from a database.
--
Michael Hoffman
Apr 21 '07 #3
On Apr 21, 8:58 am, Dustan <DustanGro...@gmail.comwrote:
From my searches here, there is no equivalent to java's

StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?
That would be good, because then you could iterate over strings the
same way that you iterate over files:

for line in string.itersplit("\n"):
## for block ##
Apr 21 '07 #4
On Apr 21, 7:58 am, Dustan <DustanGro...@gmail.comwrote:
From my searches here, there is no equivalent to java's

StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?
If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).

Apr 21 '07 #5
On Apr 21, 5:58 am, Dustan <DustanGro...@gmail.comwrote:
From my searches here, there is no equivalent to java's

StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?


If your delimiter is a non-empty string, you
can use an iterator like:

def it(S, sub):
start = 0
sublen = len(sub)
while True:
idx = S.find(sub,start)
if idx == -1:
yield S[start:]
raise StopIteration
else:
yield S[start:idx]
start = idx + sublen

target_string = 'abcabcabc'
for subs in it(target_string,'b'):
print subs
For something more complex,
you may be able to use
re.finditer.

--
Hope this helps,
Steven

Apr 21 '07 #6
Dustan <Du**********@gmail.comwrites:
If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).
You can try WinZip. Last time I had to use a Windows machine it was
able to untar + gunzip some files perfectly fine (as we are able to
unzip and unrar on *nix...).

--
Jorge Godoy <jg****@gmail.com>
Apr 21 '07 #7
Dustan <Du**********@gmail.comwrote:
On Apr 21, 7:58 am, Dustan <DustanGro...@gmail.comwrote:
>From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?

If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).
Top search hit for
windows tar
is <http://gnuwin32.sourceforge.net/packages/tar.htm, but its contents
suggest using <http://gnuwin32.sourceforge.net/packages/bsdtar.htm>
instead (it has "the ability to direcly create and manipulate .tar,
..tar.gz, tar.bz2, .zip, .gz and .bz2 archives, understands the most-used
options of GNU Tar, and is also much faster; for most purposes it is to
be preferred to GNU Tar", to quote).
Alex
Apr 21 '07 #8
On Apr 21, 4:54 pm, attn.steven....@gmail.com wrote:
On Apr 21, 5:58 am, Dustan <DustanGro...@gmail.comwrote:
>From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.
However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.
Comments?

If your delimiter is a non-empty string, you
can use an iterator like:

def it(S, sub):
start = 0
sublen = len(sub)
while True:
idx = S.find(sub,start)
if idx == -1:
yield S[start:]
raise StopIteration
else:
yield S[start:idx]
start = idx + sublen

target_string = 'abcabcabc'
for subs in it(target_string,'b'):
print subs
Thanks.

Well, now I know it can be implemented in a reasonably efficient
manner in pure python (ie without having side-efect strings that
aren't of any use, as with concatenation). That's what I was mainly
concerned about.

I feel that it could be a builtin function (seriously, the world
wouldn't end if it was, and nor would python), but this'll work.
That's my last word on the subject.
For something more complex,
you may be able to use
re.finditer.

--
Hope this helps,
Steven
Apr 21 '07 #9
On Apr 21, 4:18 pm, Dustan <DustanGro...@gmail.comwrote:
On Apr 21, 7:58 am, Dustan <DustanGro...@gmail.comwrote:
>From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.
However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.
Comments?

If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).
Thanks to both Jorge Godoy and Alex Martelli for their responses; I
went with winzip. After spending about 10 minutes looking at this
stuff, I can easily conclude that having the code and understanding
the code are 2 very different things (and yes, I do have some
experience in C and C++). But that's a matter to tackle on another day.

Apr 21 '07 #10
subscriber123 schrieb:
On Apr 21, 8:58 am, Dustan <DustanGro...@gmail.comwrote:
>>>From my searches here, there is no equivalent to java's

StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?


That would be good, because then you could iterate over strings the
same way that you iterate over files:

for line in string.itersplit("\n"):
## for block ##

>>block = """Hello world.
.... This is a comment.
.... With a few more lines."""
>>for line in block.split("\n"):
.... print line
....
Hello world.
This is a comment.
With a few more lines.
>>for line in block.splitlines(): # could even use this one here
.... print line
....
Hello world.
This is a comment.
With a few more lines.

Iterators would just speed up the whole thing and be more pythonic
(since development goes straight into the direction of converting all
and everything into iterators).
Apr 22 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: John Wellesz | last post by:
Hello, It would be great if there was an option to tell PHP to let the user manage all the HTTP headers instead of sending what it thinks is good for the programmer... For example when you...
5
by: John | last post by:
Hi: I'd like to implement a simple map, which is a 2-D plane with many points, e.g., 100. The points are not evenly distributed, i.e., some points may have two neighbor points; some may have 5...
10
by: Paulo Jan | last post by:
Hi all: Let's say I'm designing a database (Postgres 7.3) with a list of all email accounts in a certain server: CREATE TABLE emails ( clienteid INT4, direccion VARCHAR(512) PRIMARY KEY,...
7
by: J.Marsch | last post by:
I don't know whether this is the appropriate place to give product feedback, but here goes: I would love to see some kind of diagnostic to let me know when implicit boxing has occurred. We...
2
by: vinay | last post by:
I have a scenario, need your suggestion.. Our clients are already using the forms authentication where we check the User/Pwd from SQL svr Database. We also have some SETTINGS for the user saved...
13
by: sandeep chandra | last post by:
Hey guys, I am new to this group.. i never know wot s going on in this group.. but wot made be brought here is cpp.. guys am currently a part of onw reaserch ... am new to everything.. i...
17
by: Jedrzej Miadowicz | last post by:
I recently (re)discovered data binding in Windows Forms thanks to its advances in Visual Studio 2005. As I looked a little deeper, however, I realize that it still suffers from an irksome tendency...
4
by: John Salerno | last post by:
I apologize for the slightly off-topic nature, but I thought I'd just throw this out there for anyone working on text editors or IDEs with auto-completion. I think it should be a feature, when...
20
by: Allan Ebdrup | last post by:
I have a suggestion for C# I would like reader/writer locks to be built in to the language. When you want to aquire a loct on an object o you write lock(o) { ...//critical region } I would...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.