473,386 Members | 1,864 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Tokenizer inconsistency wrt to new lines in comments

The tokenize.generate_tokens function seems to handle in a context-
sensitive manner the new line after a comment:
>>from StringIO import StringIO
from tokenize import generate_tokens

text = '''
.... # hello world
.... x = (
.... # hello world
.... )
.... '''
>>>
for t in generate_tokens(StringIO(text).readline):
.... print repr(t[1])
....
'\n'
'# hello world\n'
'x'
'='
'('
'\n'
'# hello world'
'\n'
')'
'\n'
''

Is there a reason that the newline is included in the first comment
but not in the second, or is it a bug ?

George
Apr 4 '08 #1
1 1224
On 4 Apr., 18:22, George Sakkis <george.sak...@gmail.comwrote:
The tokenize.generate_tokens function seems to handle in a context-
sensitive manner the new line after a comment:
>from StringIO import StringIO
from tokenize import generate_tokens
>text = '''

... # hello world
... x = (
... # hello world
... )
... '''
>for t in generate_tokens(StringIO(text).readline):

... print repr(t[1])
...
'\n'
'# hello world\n'
'x'
'='
'('
'\n'
'# hello world'
'\n'
')'
'\n'
''

Is there a reason that the newline is included in the first comment
but not in the second, or is it a bug ?

George
I guess it's just an artifact of handling line continuations within
expressions where a different rule is applied. For compilation
purposes both the newlines within expressions as well as the comments
are irrelevant. There are even two different token namely NEWLINE and
NL which are produced for newlines. NL and COMMENT will be ignored.
NEWLINE is relevant for the parser.

If it was a bug it has to violate a functional requirement. I can't
see which one.

Kay
Apr 4 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Maarten van Reeuwijk | last post by:
Hi group, I need to parse various text files in python. I was wondering if there was a general purpose tokenizer available. I know about split(), but this (otherwise very handy method does not...
8
by: Matthew Thorley | last post by:
Greetings, perhaps someone can explain this. I get to different styles of formatting for xmla and xmlb when I do the following: from elementtree import ElementTree as et xmla =...
5
by: Knackeback | last post by:
task: - read/parse CSV file code snippet: string key,line; typedef tokenizer<char_separator<char> > tokenizer; tokenizer tok(string(""), sep); while ( getline(f, line) ){ ++lineNo;...
4
by: Java Guy | last post by:
This must be a classical topic -- C++ stgring tokenizer. I just switched from C to C++ ( in Unix ). It turns out that there is no existing C++ string tokenizer. Searching on the Web, I found...
17
by: Javier Gomez | last post by:
How to quickly delete commented code lines from modules, forms and reports.?? Any idea ? thanks
19
by: Alex Vinokur | last post by:
Is there any tool to count C-program lines except comments? Thanks, ===================================== Alex Vinokur mailto:alexvn@connect.to http://mathforum.org/library/view/10978.html...
82
by: Edward Elliott | last post by:
This is just anecdotal, but I still find it interesting. Take it for what it's worth. I'm interested in hearing others' perspectives, just please don't turn this into a pissing contest. I'm in...
18
by: Robbie Hatley | last post by:
A couple of days ago I dedecided to force myself to really learn exactly what "strtok" does, and how to use it. I figured I'd just look it up in some book and that would be that. I figured...
1
by: Karl Kobata | last post by:
Hi Fredrik, This is exactly what I need. Thank you. I would like to do one additional function. I am not using the tokenizer to parse python code. It happens to work very well for my...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.