473,385 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Problem with tokenize module and indents

Tim
I ran into a problem with a script i was playing with to check code
indents and need some direction. It seems to depend on if tabsize is
set to 4 in editor and spaces and tabs indents are mixed on consecutive
lines. Works fine when editors tabsize was 8 regardless if indents are
mixed.

Below are how the 3 test files are laid out, the sample code and output
I get.
Any help on how to detect this correctly would be appreciated.
# nano -T4 tabspacing_4.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 4 spaces <<
this gets reported as a dedent when there is no change in indent level
self.msg = msg #indent is 2 tabs

#nano -T8 tabspacing_8A.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 8 spaces << no
indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

#nano -T8 tabspacing_8B.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 1 tab <<
no indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

My script

#!/usr/bin/env python

import tokenize
from sys import argv

indent_lvl = 0
line_number = 0
lines = file(argv[1]).readlines()
done = False

def parse():

def feed():

global line_number, lines

if line_number < len(lines):
txt = lines[line_number]
line_number += 1
else:
txt = ''

return txt

def indents(type, token, start, end, line):

global indent_lvl, done

if type == tokenize.DEDENT:
indent_lvl -= 1
elif type == tokenize.INDENT:
indent_lvl += 1
elif type == tokenize.ENDMARKER:
done = True
return
else:
return

print "token=%s, line_number=%i, indent_lvl=%i" %
(tokenize.tok_name[type], start[0], indent_lvl), line.strip()

while not done:
tokenize.tokenize(feed, indents)

parse()
$ ./sample.py tabspacing_4.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=DEDENT, line_number=4, indent_lvl=0 def __init__(self, msg):
#indent is 4 spaces <-- PROBLEM HERE
token=INDENT, line_number=5, indent_lvl=1 self.msg = msg
#indent is 2 tabs
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8A.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8B.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

Aug 23 '06 #1
1 2248
Tim wrote:
I ran into a problem with a script i was playing with to check code
indents and need some direction. It seems to depend on if tabsize is
set to 4 in editor and spaces and tabs indents are mixed on consecutive
lines. Works fine when editors tabsize was 8 regardless if indents are
mixed.

Below are how the 3 test files are laid out, the sample code and output
I get.
Any help on how to detect this correctly would be appreciated.
# nano -T4 tabspacing_4.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 4 spaces <<
this gets reported as a dedent when there is no change in indent level
self.msg = msg #indent is 2 tabs

#nano -T8 tabspacing_8A.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 8 spaces << no
indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

#nano -T8 tabspacing_8B.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 1 tab <<
no indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

My script

#!/usr/bin/env python

import tokenize
from sys import argv

indent_lvl = 0
line_number = 0
lines = file(argv[1]).readlines()
done = False

def parse():

def feed():

global line_number, lines

if line_number < len(lines):
txt = lines[line_number]
line_number += 1
else:
txt = ''

return txt

def indents(type, token, start, end, line):

global indent_lvl, done

if type == tokenize.DEDENT:
indent_lvl -= 1
elif type == tokenize.INDENT:
indent_lvl += 1
elif type == tokenize.ENDMARKER:
done = True
return
else:
return

print "token=%s, line_number=%i, indent_lvl=%i" %
(tokenize.tok_name[type], start[0], indent_lvl), line.strip()

while not done:
tokenize.tokenize(feed, indents)

parse()
$ ./sample.py tabspacing_4.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=DEDENT, line_number=4, indent_lvl=0 def __init__(self, msg):
#indent is 4 spaces <-- PROBLEM HERE
token=INDENT, line_number=5, indent_lvl=1 self.msg = msg
#indent is 2 tabs
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8A.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8B.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0
Well, the simple answer is "Don't mix tabs and spaces." But if that's
unhelpful ;-) , check out the tabnanny script (now in the standard
library) and also the expandtabs() method of strings.

http://docs.python.org/lib/module-tabnanny.html

Peace,
~Simon

Aug 23 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Christian Seberino | last post by:
Linux kernel style guide, Guido's C style guide and (I believe) old K&R style recommends 8 SPACES for indent. I finally got convinced of wisdom of 8 space indentation. Guido also likes 8 space...
16
by: qwweeeit | last post by:
In analysing a very big application (pysol) made of almost 100 sources, I had the need to remove comments. Removing the comments which take all the line is straightforward... Instead for the...
1
by: kristian.hermansen | last post by:
keherman@ibmlnx20:/tmp$ cat helloworld.py #!/usr/bin/env python import pygtk pygtk.require('2.0')
2
by: glomde | last post by:
Hi I would like to extend python so that you could create hiercical tree structures (XML, HTML etc) easier and that resulting code would be more readable. The syntax i would like is something...
9
by: JAF | last post by:
I need help with the following format: 1. Paragragh goes here and text wraps or indents several spaces on second and subsequent lines. 2. Paragragh goes here and text wraps or indents...
11
by: ...:::JA:::... | last post by:
Hello, After my program read and translate this code: koristi os,sys; ispisi 'bok kaj ima'; into the: import os,sys;
1
by: vedrandekovic | last post by:
Hi, I have one more question about installation. After installation my program that uses tokenize module,when I run myprogram.exe (vgsveki.exe): Traceback (most recent call last): File...
1
by: Nicolas M | last post by:
Hi, i've got a problem : i want to iterate over a list of string created via tokenize(), but i also want to fetch an attribute value of a node that as an attribute with the value of my current...
3
by: Rafe | last post by:
Hi, I think I have discovered two bugs with the inspect module and I would like to know if anyone can spot any traps in my workaround. I needed a function which takes a function or method and...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.