472,374 Members | 1,517 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,374 software developers and data experts.

Problem with tokenize module and indents

Tim
I ran into a problem with a script i was playing with to check code
indents and need some direction. It seems to depend on if tabsize is
set to 4 in editor and spaces and tabs indents are mixed on consecutive
lines. Works fine when editors tabsize was 8 regardless if indents are
mixed.

Below are how the 3 test files are laid out, the sample code and output
I get.
Any help on how to detect this correctly would be appreciated.
# nano -T4 tabspacing_4.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 4 spaces <<
this gets reported as a dedent when there is no change in indent level
self.msg = msg #indent is 2 tabs

#nano -T8 tabspacing_8A.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 8 spaces << no
indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

#nano -T8 tabspacing_8B.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 1 tab <<
no indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

My script

#!/usr/bin/env python

import tokenize
from sys import argv

indent_lvl = 0
line_number = 0
lines = file(argv[1]).readlines()
done = False

def parse():

def feed():

global line_number, lines

if line_number < len(lines):
txt = lines[line_number]
line_number += 1
else:
txt = ''

return txt

def indents(type, token, start, end, line):

global indent_lvl, done

if type == tokenize.DEDENT:
indent_lvl -= 1
elif type == tokenize.INDENT:
indent_lvl += 1
elif type == tokenize.ENDMARKER:
done = True
return
else:
return

print "token=%s, line_number=%i, indent_lvl=%i" %
(tokenize.tok_name[type], start[0], indent_lvl), line.strip()

while not done:
tokenize.tokenize(feed, indents)

parse()
$ ./sample.py tabspacing_4.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=DEDENT, line_number=4, indent_lvl=0 def __init__(self, msg):
#indent is 4 spaces <-- PROBLEM HERE
token=INDENT, line_number=5, indent_lvl=1 self.msg = msg
#indent is 2 tabs
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8A.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8B.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

Aug 23 '06 #1
1 2187
Tim wrote:
I ran into a problem with a script i was playing with to check code
indents and need some direction. It seems to depend on if tabsize is
set to 4 in editor and spaces and tabs indents are mixed on consecutive
lines. Works fine when editors tabsize was 8 regardless if indents are
mixed.

Below are how the 3 test files are laid out, the sample code and output
I get.
Any help on how to detect this correctly would be appreciated.
# nano -T4 tabspacing_4.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 4 spaces <<
this gets reported as a dedent when there is no change in indent level
self.msg = msg #indent is 2 tabs

#nano -T8 tabspacing_8A.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 8 spaces << no
indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

#nano -T8 tabspacing_8B.py
class Test:
"""triple quote""" #indent is 1 tab
def __init__(self, msg): #indent is 1 tab <<
no indent change reported
self.msg = msg #indent is 1 tab + 4 spaces

My script

#!/usr/bin/env python

import tokenize
from sys import argv

indent_lvl = 0
line_number = 0
lines = file(argv[1]).readlines()
done = False

def parse():

def feed():

global line_number, lines

if line_number < len(lines):
txt = lines[line_number]
line_number += 1
else:
txt = ''

return txt

def indents(type, token, start, end, line):

global indent_lvl, done

if type == tokenize.DEDENT:
indent_lvl -= 1
elif type == tokenize.INDENT:
indent_lvl += 1
elif type == tokenize.ENDMARKER:
done = True
return
else:
return

print "token=%s, line_number=%i, indent_lvl=%i" %
(tokenize.tok_name[type], start[0], indent_lvl), line.strip()

while not done:
tokenize.tokenize(feed, indents)

parse()
$ ./sample.py tabspacing_4.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=DEDENT, line_number=4, indent_lvl=0 def __init__(self, msg):
#indent is 4 spaces <-- PROBLEM HERE
token=INDENT, line_number=5, indent_lvl=1 self.msg = msg
#indent is 2 tabs
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8A.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0

$ ./sample.py tabspacing_8B.py
token=INDENT, line_number=3, indent_lvl=1 """triple quote"""
#indent is 1 tab
token=INDENT, line_number=5, indent_lvl=2 self.msg = msg
#indent is 1 tab + 4 spaces
token=DEDENT, line_number=8, indent_lvl=1
token=DEDENT, line_number=8, indent_lvl=0
Well, the simple answer is "Don't mix tabs and spaces." But if that's
unhelpful ;-) , check out the tabnanny script (now in the standard
library) and also the expandtabs() method of strings.

http://docs.python.org/lib/module-tabnanny.html

Peace,
~Simon

Aug 23 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Christian Seberino | last post by:
Linux kernel style guide, Guido's C style guide and (I believe) old K&R style recommends 8 SPACES for indent. I finally got convinced of wisdom of 8 space indentation. Guido also likes 8 space...
16
by: qwweeeit | last post by:
In analysing a very big application (pysol) made of almost 100 sources, I had the need to remove comments. Removing the comments which take all the line is straightforward... Instead for the...
1
by: kristian.hermansen | last post by:
keherman@ibmlnx20:/tmp$ cat helloworld.py #!/usr/bin/env python import pygtk pygtk.require('2.0')
2
by: glomde | last post by:
Hi I would like to extend python so that you could create hiercical tree structures (XML, HTML etc) easier and that resulting code would be more readable. The syntax i would like is something...
9
by: JAF | last post by:
I need help with the following format: 1. Paragragh goes here and text wraps or indents several spaces on second and subsequent lines. 2. Paragragh goes here and text wraps or indents...
11
by: ...:::JA:::... | last post by:
Hello, After my program read and translate this code: koristi os,sys; ispisi 'bok kaj ima'; into the: import os,sys;
1
by: vedrandekovic | last post by:
Hi, I have one more question about installation. After installation my program that uses tokenize module,when I run myprogram.exe (vgsveki.exe): Traceback (most recent call last): File...
1
by: Nicolas M | last post by:
Hi, i've got a problem : i want to iterate over a list of string created via tokenize(), but i also want to fetch an attribute value of a node that as an attribute with the value of my current...
3
by: Rafe | last post by:
Hi, I think I have discovered two bugs with the inspect module and I would like to know if anyone can spot any traps in my workaround. I needed a function which takes a function or method and...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
2
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.