472,978 Members | 2,092 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,978 software developers and data experts.

Working with fixed format text db's

Many of the file formats I have to work with are so-called
fixed-format records, where every line in the file is a record,
and every field in a record takes up a specific amount of space.

For example, one of my older Python programs contains the
following to create a fixed-format text record for a batch of new
students:

new = file("new.dat", "w")
if not new:
print "Error. Could not open file new.dat for writing."
raw_input("Press Return To Exit.")
sys.exit(1)

for s in freshmen:
new.write(s.ssn.ljust(9))
new.write(s.id.ljust(10))
new.write(s.last[:16].ljust(16))
new.write(s.first[:11].ljust(11))
new.write(' '.ljust(10)) # Phone Number
new.write(' '.ljust(1254)) # Empty 'filler' space.
new.write('2813 ')
new.write(s.major.ljust(5))

# Etc...

Luckily, the output format has not changed yet, so issues with
maintaining the above haven't arisen.

However, I'd like something better.

Is there already a good module for working with fixed format
records available? I couldn't find one.

If not, please suggest how I might improve the above code.

--
Neil Cerutti
When "yearn" was sung, the performers ounded like they were in a state of
yearning. --Music Lit Essay
Jun 8 '07 #1
13 3190
Neil Cerutti <ho*****@yahoo.comwrote:
Luckily, the output format has not changed yet, so issues with
maintaining the above haven't arisen.
The problem surely is that when you want to change the format you have to do
so in all files (and what about the backups then?) and all programs
simultaneously.

Maintaining the code is the least of your the problems, I'd say.

You could change the data layout so that eg each field was terminated by a
marker character, then read/write delimited values. But unless you also
review all the other parts of your programs, you need to be sure that you
don't have any other code anywhere that implicitly relies on a particular
field being a known fixed length.
>
However, I'd like something better.
What precisely do you want to achieve?
--
Jeremy C B Nicoll - my opinions are my own.
Jun 8 '07 #2
On 2007-06-08, Jeremy C B Nicoll <je****@omba.demon.co.ukwrote:
Neil Cerutti <ho*****@yahoo.comwrote:
>Luckily, the output format has not changed yet, so issues with
maintaining the above haven't arisen.

The problem surely is that when you want to change the format
you have to do so in all files (and what about the backups
then?) and all programs simultaneously.
I don't have control of the format, unfortunately. It's an import
file format for a commercial database application.
Maintaining the code is the least of your the problems, I'd
say.

You could change the data layout so that eg each field was
terminated by a marker character, then read/write delimited
values. But unless you also review all the other parts of your
programs, you need to be sure that you don't have any other
code anywhere that implicitly relies on a particular field
being a known fixed length.
>However, I'd like something better.

What precisely do you want to achieve?
I was hoping for a module that provides a way for me to specify a
fixed file format, along with some sort of interface for writing
and reading files that are in said format.

It is not actually *hard* to do this with ad-hoc code, but then
the program is indecipherable without a hardcopy of the spec in
hand. And also, as you say, if the spec ever does change, the
hand-written batch of ljust, rjust and slice will be somewhat of
a pain to reconfigure.

But biggest weakness, to me, is that the specification is not in
the code, or read and used by the code, and I think it should be.

If nothing exists already I guess I'll roll my own. But I'd like
to be lazier, and virtually all published modules are better than
what I'll write for myself. ;)

The underlying problem, of course, is the archaic flat-file
format with fixed-width data fields. Even the Department of
Education has moved on to XML for most of it's data files, which
are much simpler for me to parse.

--
Neil Cerutti
Jun 8 '07 #3
In <sl********************@FIAD06.norwich.edu>, Neil Cerutti wrote:
new = file("new.dat", "w")
if not new:
print "Error. Could not open file new.dat for writing."
raw_input("Press Return To Exit.")
sys.exit(1)
Hey, Python is not C. File objects should *always* be "true". An error
is handled via exceptions.

Ciao,
Marc 'BlackJack' Rintsch
Jun 8 '07 #4
On 2007-06-08, Marc 'BlackJack' Rintsch <bj****@gmx.netwrote:
In <sl********************@FIAD06.norwich.edu>, Neil Cerutti wrote:
>new = file("new.dat", "w")
if not new:
print "Error. Could not open file new.dat for writing."
raw_input("Press Return To Exit.")
sys.exit(1)

Hey, Python is not C. File objects should *always* be "true".
An error is handled via exceptions.
Thanks. Update in progress.

--
Neil Cerutti
The doctors X-rayed my head and found nothing. --Dizzy Dean
Jun 8 '07 #5
Neil Cerutti wrote:
The underlying problem, of course, is the archaic flat-file
format with fixed-width data fields. Even the Department of
Education has moved on to XML for most of it's data files,
:(

I'm writing a small app, and was wondering the best way to store data.
Currently the fields are separated by spaces. I was toying with the idea
of using sqlite, yaml or json, but I think I've settled on CSV. Dull,
but it's easy to parse for humans and computers.
Jun 8 '07 #6
Neil Cerutti <ho*****@yahoo.comwrites:
I was hoping for a module that provides a way for me to specify a
fixed file format, along with some sort of interface for writing and
reading files that are in said format.
Isn't that done by the 'struct' module
<URL:http://www.python.org/doc/lib/module-struct>?
>>records = [
... "Foo 13 Bar ",
... "Spam 23 Eggs ",
... "Guido 666Robot ",
... ]
>>record_format = "8s3s8s"
for record in [struct.unpack(record_format, r) for r in records]:
... print record
...
('Foo ', '13 ', 'Bar ')
('Spam ', '23 ', 'Eggs ')
('Guido ', '666', 'Robot ')

--
\ "Buy not what you want, but what you need; what you do not need |
`\ is expensive at a penny." -- Cato, 234-149 BC, Relique |
_o__) |
Ben Finney
Jun 8 '07 #7
Neil Cerutti <ho*****@yahoo.comwrote:
On 2007-06-08, Jeremy C B Nicoll <je****@omba.demon.co.ukwrote:
Neil Cerutti <ho*****@yahoo.comwrote:
Luckily, the output format has not changed yet, so issues with
maintaining the above haven't arisen.
The problem surely is that when you want to change the format
you have to do so in all files (and what about the backups
then?) and all programs simultaneously.

I don't have control of the format, unfortunately. It's an import
file format for a commercial database application.
You're saying your program merely has to read data files created by that
database app? It's not that you have a whole suite of programs that create
and read these files, nor that you have years worth of old files that would
need their format converted if the programs were changed?

It is not actually *hard* to do this with ad-hoc code, but then
the program is indecipherable without a hardcopy of the spec in
hand. And also, as you say, if the spec ever does change, the
hand-written batch of ljust, rjust and slice will be somewhat of
a pain to reconfigure.
You could presumably define a list (of some sort, might be the wrong
terminology) that defines the 'name', type, length, justification and
padding of each field, and then make the explicit code you showed loop
through that list and do what's needed field by field.

There's a risk that abstracting the definitions will make the code less
clear to anyone else; at least it's clear what the current stuff does.
But biggest weakness, to me, is that the specification is not in
the code, or read and used by the code, and I think it should be.
It'd be better if you could read the data layout spec from some file
produced by the database system. No chance perhaps of having the dat files
include some sort of dummy first record that contains the necessary info in
a form that you could interpret?
--
Jeremy C B Nicoll - my opinions are my own.
Jun 9 '07 #8
On Jun 9, 7:55 am, Jeremy C B Nicoll <jer...@omba.demon.co.ukwrote:
Neil Cerutti <horp...@yahoo.comwrote:
On 2007-06-08, Jeremy C B Nicoll <jer...@omba.demon.co.ukwrote:
Neil Cerutti <horp...@yahoo.comwrote:
>Luckily, the output format has not changed yet, so issues with
>maintaining the above haven't arisen.
The problem surely is that when you want to change the format
you have to do so in all files (and what about the backups
then?) and all programs simultaneously.
I don't have control of the format, unfortunately. It's an import
file format for a commercial database application.

You're saying your program merely has to read data files created by that
database app? It's not that you have a whole suite of programs that create
and read these files, nor that you have years worth of old files that would
need their format converted if the programs were changed?
It is not actually *hard* to do this with ad-hoc code, but then
the program is indecipherable without a hardcopy of the spec in
hand. And also, as you say, if the spec ever does change, the
hand-written batch of ljust, rjust and slice will be somewhat of
a pain to reconfigure.

You could presumably define a list (of some sort, might be the wrong
terminology) that defines the 'name', type, length, justification and
padding of each field, and then make the explicit code you showed loop
through that list and do what's needed field by field.

There's a risk that abstracting the definitions will make the code less
clear to anyone else; at least it's clear what the current stuff does.
But biggest weakness, to me, is that the specification is not in
the code, or read and used by the code, and I think it should be.

It'd be better if you could read the data layout spec from some file
produced by the database system. No chance perhaps of having the dat files
include some sort of dummy first record that contains the necessary info in
a form that you could interpret?
The OP is *WRITING* not reading.


Jun 9 '07 #9
Neil Cerutti wrote:
The underlying problem, of course, is the archaic flat-file
format with fixed-width data fields. Even the Department of
Education has moved on to XML for most of it's data files, which
are much simpler for me to parse.
XML easier to parse than fixed position file. Wow!

Very likely this file is created by a COBOL program, because this is
what COBOL loves.

01 my-record.
05 ssn pic 9(9).
05 id pic 9(10).
05 last-name pic x(16).
05 first-name pic x(11).
05 phone-nbr pic 9(10).
05 filler pic x(1254).
05 filler pic x(6) value '2813'.
05 major pic x(5).

write my-record

Haha. I'm just amused that new languages make simpler some things that
were hard in older languages, but in turn make more difficult things
that were simple!

Frank
COBOL expert/Python newbie
Jun 9 '07 #10
On Jun 8, 6:18?pm, Ben Finney <bignose+hates-s...@benfinney.id.au>
wrote:
Neil Cerutti <horp...@yahoo.comwrites:
I was hoping for a module that provides a way for me to specify a
fixed file format, along with some sort of interface for writing and
reading files that are in said format.

Isn't that done by the 'struct' module
<URL:http://www.python.org/doc/lib/module-struct>?
>>records = [
... "Foo 13 Bar ",
... "Spam 23 Eggs ",
... "Guido 666Robot ",
... ]
>>record_format = "8s3s8s"
>>for record in [struct.unpack(record_format, r) for r in records]:
... print record
...
('Foo ', '13 ', 'Bar ')
('Spam ', '23 ', 'Eggs ')
('Guido ', '666', 'Robot ')
But when you pack a struct, the padding is null bytes,
not spaces.

>
--
\ "Buy not what you want, but what you need; what you do not need |
`\ is expensive at a penny." -- Cato, 234-149 BC, Relique |
_o__) |
Ben Finney

Jun 9 '07 #11
On Jun 8, 5:50 pm, Neil Cerutti <horp...@yahoo.comwrote:
Many of the file formats I have to work with are so-called
fixed-format records, where every line in the file is a record,
and every field in a record takes up a specific amount of space.

For example, one of my older Python programs contains the
following to create a fixed-format text record for a batch of new
students:

new = file("new.dat", "w")
if not new:
print "Error. Could not open file new.dat for writing."
raw_input("Press Return To Exit.")
sys.exit(1)

for s in freshmen:
new.write(s.ssn.ljust(9))
new.write(s.id.ljust(10))
new.write(s.last[:16].ljust(16))
new.write(s.first[:11].ljust(11))
new.write(' '.ljust(10)) # Phone Number
new.write(' '.ljust(1254)) # Empty 'filler' space.
new.write('2813 ')
new.write(s.major.ljust(5))
I have to do this occasionally, and also find it cumbersome.

I toyed with the idea of posting a feature request for a new 'fixed
length' string formatting operator, with optional parameters for left/
right-justified and space/zero-filled.

We already have '%-12s' to space fill for a length of 12, but it is
not truly fixed-length, as if the value has a length greater than 12
you need it to be truncated, and this construction will not do that.

Assume we have a new flag '!n', which defaults to left-justified and
space-filled, but allows an optional 'r' and '0' to override the
defaults.

Then the above example could be written as

format = '%!9s%!10s%!16s%!11s%!10s%!1254s%!6s%!5s'
for s in freshmen:
new.write (format %
(s.ssn,s.id,s.last,s.first,
' ',' ','2813',s.major))

I never felt strongly enough about it to propose it, but I thought I
would mention it.

Frank Millman

Jun 9 '07 #12
On Jun 9, 5:48 am, Mark Carter <m...@privacy.netwrote:
Neil Cerutti wrote:
The underlying problem, of course, is the archaic flat-file
format with fixed-width data fields. Even the Department of
Education has moved on to XML for most of it's data files,

:(

I'm writing a small app, and was wondering the best way to store data.
Currently the fields are separated by spaces. I was toying with the idea
of using sqlite, yaml or json, but I think I've settled on CSV. Dull,
but it's easy to parse for humans and computers.
Yup, humans find that parsing stuff like the following is quite easy:

"Jack ""The Ripper"" Jones","""Eltsac Ruo"", 123 Smith St",,Paris TX
12345

Cheers,
John

Jun 9 '07 #13
Frank Millman <fr***@chagford.comwrites:
On Jun 8, 5:50 pm, Neil Cerutti <horp...@yahoo.comwrote:
>Many of the file formats I have to work with are so-called
fixed-format records, where every line in the file is a record,
and every field in a record takes up a specific amount of space.

[ ... ]

We already have '%-12s' to space fill for a length of 12, but it is
not truly fixed-length, as if the value has a length greater than 12
you need it to be truncated, and this construction will not do that.
In this case, we can use '%-12.12s'.

--
Lloyd Zusman
lj*@asfast.com
God bless you.

Jun 9 '07 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

26
by: Adrian Parker | last post by:
I'm using the code below in my project. When I print all of these fixed length string variables, one per line, they strings in questions do not properly pad with 0s. strQuantity prints as " ...
1
by: dmiller23462 | last post by:
Hey guys.... I put an error-handling in my page and have it posted at the complete end of the code, see below(when people were putting in 's I was getting the delimiter errors). Great, I...
179
by: SoloCDM | last post by:
How do I keep my entire web page at a fixed width? ********************************************************************* Signed, SoloCDM
3
by: Billy Jacobs | last post by:
I have created a DataGridColumnDatePicker Component so that I can put a datetimepicker control in my datagrid. It almost works. When I put my mouse in the cell it changes to a datetimepicker...
14
by: Roland Hall | last post by:
I have two(2) issues. I'm experiencing a little difficulty and having to resort to a work around. I already found one bug, although stated the bug was only in ODBC, which I'm not using. It...
2
by: ezelasky | last post by:
We are using the bcp utility (via APIs) to export data from a SQL table in a fixed format text file. BCP is inserting spaces for a field if the field contains a NULL. This is fine with us except...
4
by: John | last post by:
I can create text file, but how can I create a text file where the values are at the same location on every line? I want to define the location where the values starts and define the length of...
1
by: kendrick82 | last post by:
Hi, I would like to seek some advise and assistance regarding the following matter as I am new to VB.Net. I'll appreciate any helps render. I am developing a VB application using VB.Net 2003 to...
4
by: Jeff | last post by:
Hey I'm wondering how the Fixed-Width Text Format is What I know is that the top line in this text format will contain column names. and each row beneath the top line represent for example a...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.