Problem with tokenize module and indents

Tim

I ran into a problem with a script i was playing with to check code
indents and need some direction. It seems to depend on if tabsize is
set to 4 in editor and spaces and tabs indents are mixed on consecutive
lines. Works fine when editors tabsize was 8 regardless if indents are
mixed.

Below are how the 3 test files are laid out, the sample code and output
I get.
Any help on how to detect this correctly would be appreciated.
# nano -T4 tabspacing_4.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 4 spaces <<
this gets reported as a dedent when there is no change in indent level
self.msg = msg #indent is 2 tabs

#nano -T8 tabspacing_8A.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 8 spaces << no
indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

#nano -T8 tabspacing_8B.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 1 tab <<
no indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

My script

#!/usr/bin/env python

import tokenize
from sys import argv

indent_lvl = 0
line_number = 0
lines = file(argv[1]).readlines()
done = False

def parse():

def feed():

global line_number, lines

if line_number < len(lines):
txt = lines[line_number]
line_number += 1
else:
txt = ''

return txt

def indents(type, token, start, end, line):

global indent_lvl, done

if type == tokenize.DEDENT:
indent_lvl -= 1
elif type == tokenize.INDENT:
indent_lvl += 1
elif type == tokenize.ENDMARKER:
done = True
return
else:
return

print "token=%s, line_number=%i, indent_lvl=%i" %
(tokenize.tok_name[type], start[0], indent_lvl), line.strip()

while not done:
tokenize.tokenize(feed, indents)

parse()
$ ./sample.py tabspacing_4.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=DEDENT, line_number=4, indent_lvl=0 def __init__(self, msg):
#indent is 4 spaces <-- PROBLEM HERE
token=INDENT, line_number=5, indent_lvl=1 self.msg = msg
#indent is 2 tabs
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8A.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8B.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

Aug 23 '06 #1

Subscribe Post Reply

2248

Simon Forman

Tim wrote:

I ran into a problem with a script i was playing with to check code
indents and need some direction. It seems to depend on if tabsize is
set to 4 in editor and spaces and tabs indents are mixed on consecutive
lines. Works fine when editors tabsize was 8 regardless if indents are
mixed.

Below are how the 3 test files are laid out, the sample code and output
I get.
Any help on how to detect this correctly would be appreciated.
# nano -T4 tabspacing_4.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 4 spaces <<
this gets reported as a dedent when there is no change in indent level
self.msg = msg #indent is 2 tabs

#nano -T8 tabspacing_8A.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 8 spaces << no
indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

#nano -T8 tabspacing_8B.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 1 tab <<
no indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

My script

#!/usr/bin/env python

import tokenize
from sys import argv

indent_lvl = 0
line_number = 0
lines = file(argv[1]).readlines()
done = False

def parse():

def feed():

global line_number, lines

if line_number < len(lines):
txt = lines[line_number]
line_number += 1
else:
txt = ''

return txt

def indents(type, token, start, end, line):

global indent_lvl, done

if type == tokenize.DEDENT:
indent_lvl -= 1
elif type == tokenize.INDENT:
indent_lvl += 1
elif type == tokenize.ENDMARKER:
done = True
return
else:
return

print "token=%s, line_number=%i, indent_lvl=%i" %
(tokenize.tok_name[type], start[0], indent_lvl), line.strip()

while not done:
tokenize.tokenize(feed, indents)

parse()
$ ./sample.py tabspacing_4.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=DEDENT, line_number=4, indent_lvl=0 def __init__(self, msg):
#indent is 4 spaces <-- PROBLEM HERE
token=INDENT, line_number=5, indent_lvl=1 self.msg = msg
#indent is 2 tabs
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8A.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8B.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

Well, the simple answer is "Don't mix tabs and spaces." But if that's
unhelpful ;-) , check out the tabnanny script (now in the standard
library) and also the expandtabs() method of strings.

http://docs.python.org/lib/module-tabnanny.html

Peace,
~Simon

Aug 23 '06 #2

by: Christian Seberino | last post by:

Linux kernel style guide, Guido's C style guide and (I believe) old K&R style recommends 8 SPACES for indent. I finally got convinced of wisdom of 8 space indentation. Guido also likes 8 space...

Python

Removing comments... tokenize error

by: qwweeeit | last post by:

In analysing a very big application (pysol) made of almost 100 sources, I had the need to remove comments. Removing the comments which take all the line is straightforward... Instead for the...

Python

freeze.py builds, but binary doesn't even run locally (shared GTK problem?)

by: kristian.hermansen | last post by:

keherman@ibmlnx20:/tmp$ cat helloworld.py #!/usr/bin/env python import pygtk pygtk.require('2.0')

Python

What to use for adding syntax for hierarcical trees, metaclasses, tokenize.py or PLY?

by: glomde | last post by:

Hi I would like to extend python so that you could create hiercical tree structures (XML, HTML etc) easier and that resulting code would be more readable. The syntax i would like is something...

Python

Need Help with formating, indents, and margins

by: JAF | last post by:

I need help with the following format: 1. Paragragh goes here and text wraps or indents several spaces on second and subsequent lines. 2. Paragragh goes here and text wraps or indents...

HTML / CSS

problem with exec

by: ...:::JA:::... | last post by:

Hello, After my program read and translate this code: koristi os,sys; ispisi 'bok kaj ima'; into the: import os,sys;

Python

tokenize module after installation

by: vedrandekovic | last post by:

Hi, I have one more question about installation. After installation my program that uses tokenize module,when I run myprogram.exe (vgsveki.exe): Traceback (most recent call last): File...

Python

Problem whith a tokenize loop

by: Nicolas M | last post by:

Hi, i've got a problem : i want to iterate over a list of string created via tokenize(), but i also want to fetch an attribute value of a node that as an attribute with the value of my current...

.NET Framework

inspect.findsource problem with llinecache

by: Rafe | last post by:

Hi, I think I have discovered two bugs with the inspect module and I would like to know if anyone can spot any traps in my workaround. I needed a function which takes a function or method and...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Problem with tokenize module and indents

Similar topics