Hex editor display - can this be more pythonic?

Hi:

I'm building a hex line editor as a first real Python programming exercise.

Yesterday I posted about how to print the hex bytes of a string. There
are two decent options:

ln = '\x00\x01\xFF 456\x0889abcde~'
import sys
for c in ln:
sys.stdout.write( '%.2X ' % ord(c) )

or this:

sys.stdout.write( ' '.join( ['%.2X' % ord(c) for c in ln] ) + ' ' )

Either of these produces the desired output:

00 01 FF 20 34 35 36 08 38 39 61 62 63 64 65 7E

I find the former more readable and simpler. The latter however has a
slight advantage in not putting a space at the end unless I really want
it. But which is more pythonic?

The next step consists of printing out the ASCII printable characters.
I have devised the following silliness:

printable = '
1!2@3#4$5%6^7&8*9(0)aAbBcCdDeEfFgGhHiIjJkKlLmMnNoO pPqQrRsStTuUvVwWxXyYzZ\
`~-_=+\\|[{]};:\'",<.>/?'
for c in ln:
if c in printable: sys.stdout.write(c)
else: sys.stdout.write('.')

print

Which when following the list comprehension based code above, produces
the desired output:

00 01 FF 20 34 35 36 08 38 39 61 62 63 64 65 7E ... 456.89abcde~

I had considered using the .translate() method of strings, however this
would require a larger translation table than my printable string. I
was also using the .find() method of the printable string before
realizing I could use 'in' here as well.

I'd like to display the non-printable characters differently, since they
can't be distinguished from genuine period '.' characters. Thus, I may
use ANSI escape sequences like:

for c in ln:
if c in printable: sys.stdout.write(c)
else:
sys.stdout.write('\x1B[31m.')
sys.stdout.write('\x1B[0m')

print
I'm also toying with the idea of showing hex bytes together with their
ASCII representations, since I've often found it a chore to figure out
which hex byte to change if I wanted to edit a certain ASCII char.
Thus, I might display data something like this:

00(\0) 01() FF() 20( ) 34(4) 35(5) 36(6) 08(\b) 38(8) 39(9) 61(a) 62(b)
63(c) 64(d) 65(e) 7E(~)

Where printing chars are shown in parenthesis, characters with Python
escape sequences will be shown as their escapes in parens., while
non-printing chars with no escapes will be shown with nothing in parens.

Or perhaps a two-line output with offset addresses under the data. So
many possibilities!
Thanks for input!

--
_____________________
Christopher R. Carlen
cr***@bogus-remove-me.sbcglobal.net
SuSE 9.1 Linux 2.6.5

Jul 29 '07 #1

Subscribe Post Reply

2595

Marc 'BlackJack' Rintsch

On Sun, 29 Jul 2007 12:24:56 -0700, CC wrote:

ln = '\x00\x01\xFF 456\x0889abcde~'
import sys
for c in ln:
sys.stdout.write( '%.2X ' % ord(c) )

or this:

sys.stdout.write( ' '.join( ['%.2X' % ord(c) for c in ln] ) + ' ' )

Either of these produces the desired output:

00 01 FF 20 34 35 36 08 38 39 61 62 63 64 65 7E

I find the former more readable and simpler. The latter however has a
slight advantage in not putting a space at the end unless I really want
it. But which is more pythonic?

I would use the second with fewer spaces, a longer name for `ln` and in
recent Python versions with a generator expression instead of the list
comprehension:

sys.stdout.write(' '.join('%0X' % ord(c) for c in line))

The next step consists of printing out the ASCII printable characters.
I have devised the following silliness:

printable = '
1!2@3#4$5%6^7&8*9(0)aAbBcCdDeEfFgGhHiIjJkKlLmMnNoO pPqQrRsStTuUvVwWxXyYzZ\
`~-_=+\\|[{]};:\'",<.>/?'

I'd use `string.printable` and remove the "invisible" characters like '\n'
or '\t'.

for c in ln:
if c in printable: sys.stdout.write(c)
else: sys.stdout.write('.')

print

Which when following the list comprehension based code above, produces
the desired output:

00 01 FF 20 34 35 36 08 38 39 61 62 63 64 65 7E ... 456.89abcde~

I had considered using the .translate() method of strings, however this
would require a larger translation table than my printable string.

The translation table can be created once and should be faster.

I'd like to display the non-printable characters differently, since they
can't be distinguished from genuine period '.' characters. Thus, I may
use ANSI escape sequences like:

for c in ln:
if c in printable: sys.stdout.write(c)
else:
sys.stdout.write('\x1B[31m.')
sys.stdout.write('\x1B[0m')

print

`re.sub()` might be an option here.

I'm also toying with the idea of showing hex bytes together with their
ASCII representations, since I've often found it a chore to figure out
which hex byte to change if I wanted to edit a certain ASCII char. Thus,
I might display data something like this:

00(\0) 01() FF() 20( ) 34(4) 35(5) 36(6) 08(\b) 38(8) 39(9) 61(a) 62(b)
63(c) 64(d) 65(e) 7E(~)

Where printing chars are shown in parenthesis, characters with Python
escape sequences will be shown as their escapes in parens., while
non-printing chars with no escapes will be shown with nothing in parens.

For escaping:

In [90]: '\n'.encode('string-escape')
Out[90]: '\\n'

Ciao,
Marc 'BlackJack' Rintsch

Jul 29 '07 #2

Marc 'BlackJack' Rintsch wrote:

On Sun, 29 Jul 2007 12:24:56 -0700, CC wrote:
>>The next step consists of printing out the ASCII printable characters.
I have devised the following silliness:

printable = '
1!2@3#4$5%6^7&8*9(0)aAbBcCdDeEfFgGhHiIjJkKlLmMnN oOpPqQrRsStTuUvVwWxXyYzZ\
`~-_=+\\|[{]};:\'",<.>/?'

I'd use `string.printable` and remove the "invisible" characters like '\n'
or '\t'.

What is `string.printable` ? There is no printable method to strings,
though I had hoped there would be. I don't yet know how to make one.

>>for c in ln:
if c in printable: sys.stdout.write(c)
else: sys.stdout.write('.')

The translation table can be created once and should be faster.

I suppose the way I'm doing it requires a search through `printable` for
each c, right? Whereas the translation would just be a lookup
operation? If so then perhaps the translation would be better.

>>I'd like to display the non-printable characters differently, since they
can't be distinguished from genuine period '.' characters. Thus, I may
use ANSI escape sequences like:

for c in ln:
if c in printable: sys.stdout.write(c)
else:
sys.stdout.write('\x1B[31m.')
sys.stdout.write('\x1B[0m')

print

`re.sub()` might be an option here.

Yeah, that is an interesting option. Since I don't wish to modify the
block of data unless the user specifically edits it, so I might prefer
the simple display operation.

For escaping:

In [90]: '\n'.encode('string-escape')
Out[90]: '\\n'

Hmm, I see there's an encoder that can do my hex display too.

Thanks for the input!

--
_____________________
Christopher R. Carlen
cr***@bogus-remove-me.sbcglobal.net
SuSE 9.1 Linux 2.6.5

Jul 30 '07 #3

Dennis Lee Bieber wrote:

On Sun, 29 Jul 2007 12:24:56 -0700, CC <cr***@BOGUS.sbcglobal.net>
declaimed the following in comp.lang.python:
>>for c in ln:
if c in printable: sys.stdout.write(c)
else:
sys.stdout.write('\x1B[31m.')
sys.stdout.write('\x1B[0m')
Be aware that this does require having a terminal that understands
the escape sequences (which, to my understanding, means unusable on a
WinXP console window)

Yeah, with this I'm not that concerned about Windows. Though, can WinXP
still load the ansi.sys driver?

>>Thus, I might display data something like this:

00(\0) 01() FF() 20( ) 34(4) 35(5) 36(6) 08(\b) 38(8) 39(9) 61(a) 62(b)
63(c) 64(d) 65(e) 7E(~)

UGH!

:-D Lovely isn't it?

If the original "hex bytes dotted ASCII" side by side isn't
workable, I'd suggest going double line...

00 01 FF 20 34 35 36 08 38 39 61 62 63 64 65 7E
nul soh xFF sp 4 5 6 bs 8 9 a b c d e ~

Yeah, something like that is probably nicer.

Use the standard "name" for the control codes (though I shortened
"space" to "sp", and maybe just duplicate the hex for non-named,
non-printable, codes (mostly those in the x80-xFF range, unless you are
NOT using ASCII but something like ISO-Latin-1

I've got a lot to learn about this encoding business.

To allow for the names, means using a field width of four. Using a
line width of 16-data bytes makes for an edit window width of 64, and
you could fit a hex offset at the left of each line to indicate what
part of the file is being worked.

Right.
Thanks for the reply!
--
_____________________
Christopher R. Carlen
cr***@bogus-remove-me.sbcglobal.net
SuSE 9.1 Linux 2.6.5

Jul 30 '07 #4

Marc 'BlackJack' Rintsch

On Sun, 29 Jul 2007 18:27:25 -0700, CC wrote:

Marc 'BlackJack' Rintsch wrote:
>I'd use `string.printable` and remove the "invisible" characters like '\n'
or '\t'.

What is `string.printable` ? There is no printable method to strings,
though I had hoped there would be. I don't yet know how to make one.

In [8]: import string

In [9]: string.printable
Out[9]: '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLM NOPQRSTUVWXYZ!"#$%&\'(
)*+,-./:;<=>?@[\\]^_`{|}~\t\n\r\x0b\x0c'

>>>for c in ln:
if c in printable: sys.stdout.write(c)
else: sys.stdout.write('.')

>The translation table can be created once and should be faster.

I suppose the way I'm doing it requires a search through `printable` for
each c, right? Whereas the translation would just be a lookup
operation?

Correct. And it is written in C.

Ciao,
Marc 'BlackJack' Rintsch

Jul 30 '07 #5

Neil Cerutti

On 2007-07-30, Dennis Lee Bieber <wl*****@ix.netcom.comwrote:

On Sun, 29 Jul 2007 18:30:22 -0700, CC <cr***@BOGUS.sbcglobal.net>
declaimed the following in comp.lang.python:

>>
Yeah, with this I'm not that concerned about Windows. Though, can WinXP
still load the ansi.sys driver?

I'm actually not sure...

I think if one uses the 16-bit command parser it is available, but
not the 32-bit parser...

command.com vs cmd.exe

Yes. You can load the ansi.sys driver in command.com on Windows
2000 and XP, and it will work with simply batch files. But it
doesn't work with Python, for reasons I don't know enough about
Windows console programs to understand.

--
Neil Cerutti
The audience is asked to remain seated until the end of the recession.
--Church Bulletin Blooper

Jul 30 '07 #6

Similar topics

HTML Editor

by: tomy_baseo | last post by:

I'm new to HTML and want to learn the basics by learning to code by hand (with the assistance of an HTML editor to eliminate repetitive tasks). Can anyone recommend a good, basic HTML editor that's...

HTML / CSS

how do you move to a new line in your text editor?

by: John Salerno | last post by:

This is a real small point, but I'd like to hear what others do in this case. It's more an 'administrative' type question than Python code question, but it still involves a bit of syntax. One...

Python

help developing an editor to view openoffice files.

by: krishnakant Mane | last post by:

hello, right now I am involved on doing a very important accessibility work. as many people may or may not know that I am a visually handicap person and work a lot on accessibility. the main...

Python

Online code editor for beginner's PHP class

by: doznot | last post by:

Let's say you want to use Moodle to teach an introductory class in PHP programming. Some of the students have little or no computer experience. In addition to background reading and...

PHP

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice