473,548 Members | 2,593 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

generic tokenizer

I remember python having a generic tokenizer in the library. all I want
is to set a list of token seperators and then read tokens out of a
stream, the token seperators should be returned as themselves.

is there anything like this?

cheers, Angus.
Jul 18 '05 #1
2 2255
Hello Angus,
I remember python having a generic tokenizer in the library. all I want
is to set a list of token seperators and then read tokens out of a
stream, the token seperators should be returned as themselves.

is there anything like this?

There are external lex/yacc like packages (like PLY
http://systems.cs.uchicago.edu/ply/). But maybe you're talking about shlex
(http://www.python.org/dev/doc/devel/...ule-shlex.html)

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zo ran.com>
http://tebeka.spymac.net
The only difference between children and adults is the price of the toys
Jul 18 '05 #2
Angus Mackay <ye**@right.com > wrote:
I remember python having a generic tokenizer in the library. all I want
is to set a list of token seperators and then read tokens out of a
stream, the token seperators should be returned as themselves.

is there anything like this?


Not as such in the standard library: the functions in module tokenizer
do not let you 'set a list of token separators'. If what you're
tokenizing can fit in a string in memory, module re can help:
x=re.compile('( \s+|,|;)')
for w in x.split('a,b, c;d; e'): print repr(w),'+',

....
'a' + ',' + 'b' + ',' + '' + ' ' + 'c' + ';' + 'd' + ';' + '' + ' ' +
'e' +
Note that you get empty-string items when two separators abut.

If the limitations of re.split (stuff must fit in memory, &c) are a
problem, then the lexx-like solutions I see somebody else suggested may
be more appropriate for your needs.
Alex
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2958
by: Knackeback | last post by:
task: - read/parse CSV file code snippet: string key,line; typedef tokenizer<char_separator<char> > tokenizer; tokenizer tok(string(""), sep); while ( getline(f, line) ){ ++lineNo; tok.assign(line, sep);
4
6434
by: Java Guy | last post by:
This must be a classical topic -- C++ stgring tokenizer. I just switched from C to C++ ( in Unix ). It turns out that there is no existing C++ string tokenizer. Searching on the Web, I found several and tried one or two of them. Not very satisfied. Any suggestions? Thx!
10
8130
by: Alex | last post by:
I'm looking for a fast way to split a string into individual tokens. The typical format of the string is token1|token2|token3|...|tokenN| and the number of tokens varies (which is why i use a vector to hold the tokens). My current implementation uses the following method to split up the string: vector<char*> tokenize(char* str, char...
3
2142
by: christopher diggins | last post by:
There seems to be a gazillion regular expression libraries. Most of them only work on text, but I wanted something that also worked on arbitrary sequences of data ( this is useful, for instance, in building parse trees from token lists ). This is possible, I think, using the Spirit library from Boost, but the syntax and complexity again is too...
10
2586
by: Lorenzo J. Lucchini | last post by:
Do you see any counter-indication to a token extractor in the following form? typedef int token; token ExtractToken(const char** String, char* Symbol); If it finds a valid token at the start of *String, it copies it into Symbol (unless Symbol==NULL, in which case it doesn't copy it anywhere); it makes *String point to the first character...
0
1363
by: Arthur | last post by:
Given a "linemess.py" file with inconsistent line ending: line 1 \r \r\n line \n tokenized as per: import tokenize f=open('linemess.py','r')
1
3648
by: xlar54 | last post by:
Hey guys, Im writing a simple language tokenizer and Im at that point of "10 ways to do it, which is the best"... All I need it to do, is given a string, look inside it for keywords, and replace them with an ascii value (greater than 128 - each keyword will have its own ascii token). The hitch is, obviously if the keyword is between...
18
3471
by: Robbie Hatley | last post by:
A couple of days ago I dedecided to force myself to really learn exactly what "strtok" does, and how to use it. I figured I'd just look it up in some book and that would be that. I figured wrong! Firstly, Bjarne Stroustrup's "The C++ Programming Language" said: (nothing)
1
1812
by: Karl Kobata | last post by:
Hi Fredrik, This is exactly what I need. Thank you. I would like to do one additional function. I am not using the tokenizer to parse python code. It happens to work very well for my application. However, I would like either or both of the following variance: 1) I would like to add 2 other characters as comment designation 2) write a...
0
7512
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7707
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7951
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7466
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7803
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5362
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3475
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1926
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1051
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.