473,324 Members | 2,268 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Reading Data from Fixed-Width Text Files

code green
1,726 Expert 1GB
I am receiving orders from a web-site as a text file of fixed-width format
Expand|Select|Wrap|Line Numbers
  1. THYNMD........08000000......00000022.....000011
The data sections are padded out with spaces (denoted as dots in example) and appear on single lines of hundreds of characters in length.
Only about three lines altogether seperated by CR.
The widths are precisley defined in the specs
Expand|Select|Wrap|Line Numbers
  1. Field    From    To    Length    
  2. Name1    500    539    40
  3. Name2    540    579    40
I hope to produce a class that extracts the data into associtive arrays for import into MySQL,
so I imagine extensive use of string and/or array functions.

Has anybody worked with with this format that may have functions to help me start or advice on pitfalls to avoid?
Thanks in advance
Mar 17 '11 #1

✓ answered by Stewart Ross

Although my experience does not relate to web-based work in PHP I worked with fixed-width text files when submitting funding returns to a student funding council. I used VBA and Access to process them. Other databases and programming languages would have been equally as useful.

Although I used Access tables to store the data then queries to process them it is just as possible to use arrays in memory processed using bespoke code instead. I did as much as I could in database queries, to make it as transparent as possible what was going on (for maintenance purposes). If all of the processing is done in code it is more difficult for other people who might follow on from ourselves to maintain, as code is another step removed from the experience of non-programmers and more difficult to understand (at least initially) than an SQL query may be.

The funding returns were in two fixed-width text files, one representing course information (known as FES 1) and the other student data (FES 2), including the number of fundable units (student units of measure as they were called) we were claiming for each student. To process these files I read them into an Access table containing a single column called [Lines] that contained the text read from the text file to the end-of-line in each row.

An anonymised example of the text lines from the FES 2 table is shown below. The file has an internal structure of header rows followed by detail rows, repeated for each programme:

Expand|Select|Wrap|Line Numbers
  1. IN130121FES51605530013
  2. NAMEA               DW00000011/1          020091355241019881270XX22 1XXINLM  22101708         08  110920060000000018.00000.00000.00014082006250520072410200606303420.00000.000
  3. NAMEBBBB            JM00000015/1          010050006270519881270XX51 9XXINLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302120.00000.000
  4. NAMECC              JW00000000/1          030066561071119891270XX2 4XX INLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302720.00000.000
  5. NAMEDDDDD           M 00000000/1          030049071290719891270XX49 9XXINLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302620.00000.000
As with your example there was a detailed specification for each component field of the file, and for the header/detail structure. The field start and end character numbers and field names were stored in a table. Some of the fields listed are shown below:

Expand|Select|Wrap|Line Numbers
  1. Field Name   P1  P2
  2. 01 Surname    1  20
  3. 02 Initials  21  22
  4. 03 SQA       23  42
  5. 04 ID        23  42
  6. 05 DOB       52  59
  7. 06 Gend      60  60
  8. 07 Home      61  63
  9. 08 PCode     64  71
  10. 09 Cmps      72  77
In my case I used an unjoined Access Crosstab query to extract the field structure from the student data text file and the specification file for checking purposes. The simple query which extracts the fields using the unjoined crosstab and a sample of the consequent decoded data is shown below:

Expand|Select|Wrap|Line Numbers
  1. TRANSFORM First(ExtractText([Lines],[p1],[p2])) AS [Text]
  2. SELECT   [FES 2 Text File].Lines, Len([Lines]) AS L
  3. FROM     [FES 2 Text File], [FES 2 Table Fields]
  4. GROUP BY [FES 2 Text File].Lines, Len([Lines])
  5. PIVOT    [FES 2 Table Fields].[Field Name];
(The use of function First in the TRANSFORM statement is just a dummy in this case to allow the crosstab to take place, as Access will not pivot a value unless some kind of aggregate function is used on the data.)

Expand|Select|Wrap|Line Numbers
  1. 01 Surname 02 Initials  03 SQA      04 ID       05 DOB    06 Gend  07 Home  08 PCode  09 Cmps  10 Src of Fin  11 Ethnic  12 MOA  13 Cat  14 Schl  15 Dis All  16 Dis  17 Outcm  18 QED  19 H  20 A  21 SCE1  22 SCE2  23 Units  24 Other  25 SUMs  26 ELS  27 SUM Eq  28 Start  29 End   30 25%   31 Enr no  32 Trf SUMs  33 Trf ELS
  2. NAME1111   HC           00000000/4  00000000/4  13101989  1        300      XX30 8XX  INLM     06             11         08      10       0121052 0           07      00                0     0     0        0        0                   010.00   000.00  000.00     23042007  25052007 15052007 000000     10.00        000.00
  3. NAME2      BR           00000000/2  00000000/2  22071987  1        270      XX15 9XX  INLM     05             10         06      08       0109102 0           06      00                0     0     0        0        0                   007.50   000.00  000.00     11092006  30032007 31102006 000000     07.50        000.00
  4. NAME3333   RE           00000000/1  00000000/1  04061990  2        270      XX6 7XX   INAAC    42             10         07      03       0118092 0           06      00                0     0     0        0        0                   006.00   000.00  000.00     21082006  23042007 21102006 000000     06.00        000.00
  5. NAME4444   K            00000000/2  00000000/2  11051985  2        270      XX2 4XX   INMM     10             10         08      11       0109102 0           06      00                0     0     0        0        0                   001.00   000.00  000.00     11092006  22062007 15112006 000000     01.00        000.00

VBA Functions to extract the data from the text string were very simple, and should be adaptable to other languages relatively easily:

Expand|Select|Wrap|Line Numbers
  1. Public Function ExtractField(source, P1 As Integer, P2 As Integer) As String
  2. ' Returns a sub-string of the source string, trimmed to remove
  3. ' trailing spaces. Leading spaces remain to assist in identifying field
  4. ' starts
  5.  
  6. Dim Result As String, l As Integer
  7. Result = ""
  8. If Not IsNull(source) Then
  9.     l = Len(source)
  10.     If (P1 <= l) And (l - P1 - P2 + 1 >= 0) Then
  11.         Result = RTrim(Mid$(source, P1, P2))
  12.     End If
  13. End If
  14. ExtractField = Result
  15. End Function
  16.  
  17. Public Function ExtractText(source, ByVal P1 As Integer, ByVal P2 As Integer) As String
  18. ' In:   source string
  19. '       p1 = start position in source string
  20. '       p2 = end position in source string
  21. ' Out:  substring  of source starting at p1 and ending at p2
  22. ' Uses extractfield() to get the data from the source string
  23. '
  24.   ExtractText = ExtractField(source, P1, P2 - P1 + 1)
  25. End Function
My main task was actually not so much the decoding of the text file back into its constituent fields but to generate data in the fixed format. This was done by recordset processing of individual fields from final queries, setting up fixed-width and fixed-format values as appropriate to match the spec.

The downside to all the string extracts is that processing can be very slow. In my case the records were one to a text line, so processing more than 8000 lines each containing around 32 individually-extracted fields was relatively slow to accomplish.

In your case you have a record structure which appears to go across rows with multiple records in a single row (hence only three rows in total in your text file). If that is the case you may have to pre-process the text lines to extract the individual records into an array or table, say, before you begin extracting the fields as such.

Hope this brief summary of experience doing similar things outside of PHP itself helps. I wish you all success in doing the work in PHP, or indeed for developing in any other scripting language you may want to use to do the processing for you.

Regards

Stewart

2 2893
Stewart Ross
2,545 Expert Mod 2GB
Although my experience does not relate to web-based work in PHP I worked with fixed-width text files when submitting funding returns to a student funding council. I used VBA and Access to process them. Other databases and programming languages would have been equally as useful.

Although I used Access tables to store the data then queries to process them it is just as possible to use arrays in memory processed using bespoke code instead. I did as much as I could in database queries, to make it as transparent as possible what was going on (for maintenance purposes). If all of the processing is done in code it is more difficult for other people who might follow on from ourselves to maintain, as code is another step removed from the experience of non-programmers and more difficult to understand (at least initially) than an SQL query may be.

The funding returns were in two fixed-width text files, one representing course information (known as FES 1) and the other student data (FES 2), including the number of fundable units (student units of measure as they were called) we were claiming for each student. To process these files I read them into an Access table containing a single column called [Lines] that contained the text read from the text file to the end-of-line in each row.

An anonymised example of the text lines from the FES 2 table is shown below. The file has an internal structure of header rows followed by detail rows, repeated for each programme:

Expand|Select|Wrap|Line Numbers
  1. IN130121FES51605530013
  2. NAMEA               DW00000011/1          020091355241019881270XX22 1XXINLM  22101708         08  110920060000000018.00000.00000.00014082006250520072410200606303420.00000.000
  3. NAMEBBBB            JM00000015/1          010050006270519881270XX51 9XXINLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302120.00000.000
  4. NAMECC              JW00000000/1          030066561071119891270XX2 4XX INLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302720.00000.000
  5. NAMEDDDDD           M 00000000/1          030049071290719891270XX49 9XXINLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302620.00000.000
As with your example there was a detailed specification for each component field of the file, and for the header/detail structure. The field start and end character numbers and field names were stored in a table. Some of the fields listed are shown below:

Expand|Select|Wrap|Line Numbers
  1. Field Name   P1  P2
  2. 01 Surname    1  20
  3. 02 Initials  21  22
  4. 03 SQA       23  42
  5. 04 ID        23  42
  6. 05 DOB       52  59
  7. 06 Gend      60  60
  8. 07 Home      61  63
  9. 08 PCode     64  71
  10. 09 Cmps      72  77
In my case I used an unjoined Access Crosstab query to extract the field structure from the student data text file and the specification file for checking purposes. The simple query which extracts the fields using the unjoined crosstab and a sample of the consequent decoded data is shown below:

Expand|Select|Wrap|Line Numbers
  1. TRANSFORM First(ExtractText([Lines],[p1],[p2])) AS [Text]
  2. SELECT   [FES 2 Text File].Lines, Len([Lines]) AS L
  3. FROM     [FES 2 Text File], [FES 2 Table Fields]
  4. GROUP BY [FES 2 Text File].Lines, Len([Lines])
  5. PIVOT    [FES 2 Table Fields].[Field Name];
(The use of function First in the TRANSFORM statement is just a dummy in this case to allow the crosstab to take place, as Access will not pivot a value unless some kind of aggregate function is used on the data.)

Expand|Select|Wrap|Line Numbers
  1. 01 Surname 02 Initials  03 SQA      04 ID       05 DOB    06 Gend  07 Home  08 PCode  09 Cmps  10 Src of Fin  11 Ethnic  12 MOA  13 Cat  14 Schl  15 Dis All  16 Dis  17 Outcm  18 QED  19 H  20 A  21 SCE1  22 SCE2  23 Units  24 Other  25 SUMs  26 ELS  27 SUM Eq  28 Start  29 End   30 25%   31 Enr no  32 Trf SUMs  33 Trf ELS
  2. NAME1111   HC           00000000/4  00000000/4  13101989  1        300      XX30 8XX  INLM     06             11         08      10       0121052 0           07      00                0     0     0        0        0                   010.00   000.00  000.00     23042007  25052007 15052007 000000     10.00        000.00
  3. NAME2      BR           00000000/2  00000000/2  22071987  1        270      XX15 9XX  INLM     05             10         06      08       0109102 0           06      00                0     0     0        0        0                   007.50   000.00  000.00     11092006  30032007 31102006 000000     07.50        000.00
  4. NAME3333   RE           00000000/1  00000000/1  04061990  2        270      XX6 7XX   INAAC    42             10         07      03       0118092 0           06      00                0     0     0        0        0                   006.00   000.00  000.00     21082006  23042007 21102006 000000     06.00        000.00
  5. NAME4444   K            00000000/2  00000000/2  11051985  2        270      XX2 4XX   INMM     10             10         08      11       0109102 0           06      00                0     0     0        0        0                   001.00   000.00  000.00     11092006  22062007 15112006 000000     01.00        000.00

VBA Functions to extract the data from the text string were very simple, and should be adaptable to other languages relatively easily:

Expand|Select|Wrap|Line Numbers
  1. Public Function ExtractField(source, P1 As Integer, P2 As Integer) As String
  2. ' Returns a sub-string of the source string, trimmed to remove
  3. ' trailing spaces. Leading spaces remain to assist in identifying field
  4. ' starts
  5.  
  6. Dim Result As String, l As Integer
  7. Result = ""
  8. If Not IsNull(source) Then
  9.     l = Len(source)
  10.     If (P1 <= l) And (l - P1 - P2 + 1 >= 0) Then
  11.         Result = RTrim(Mid$(source, P1, P2))
  12.     End If
  13. End If
  14. ExtractField = Result
  15. End Function
  16.  
  17. Public Function ExtractText(source, ByVal P1 As Integer, ByVal P2 As Integer) As String
  18. ' In:   source string
  19. '       p1 = start position in source string
  20. '       p2 = end position in source string
  21. ' Out:  substring  of source starting at p1 and ending at p2
  22. ' Uses extractfield() to get the data from the source string
  23. '
  24.   ExtractText = ExtractField(source, P1, P2 - P1 + 1)
  25. End Function
My main task was actually not so much the decoding of the text file back into its constituent fields but to generate data in the fixed format. This was done by recordset processing of individual fields from final queries, setting up fixed-width and fixed-format values as appropriate to match the spec.

The downside to all the string extracts is that processing can be very slow. In my case the records were one to a text line, so processing more than 8000 lines each containing around 32 individually-extracted fields was relatively slow to accomplish.

In your case you have a record structure which appears to go across rows with multiple records in a single row (hence only three rows in total in your text file). If that is the case you may have to pre-process the text lines to extract the individual records into an array or table, say, before you begin extracting the fields as such.

Hope this brief summary of experience doing similar things outside of PHP itself helps. I wish you all success in doing the work in PHP, or indeed for developing in any other scripting language you may want to use to do the processing for you.

Regards

Stewart
Mar 19 '11 #2
code green
1,726 Expert 1GB
Informative and useful.
Thanks for the reply.
Mar 21 '11 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Andy | last post by:
Hi, In the code below (not pretty I know but it's an early version :-P) I'm having problems reading the data object back in. If I move the reading code to immediately after the section where it...
1
by: Viswa | last post by:
PythonGurus, I would like to read the ascii data from a file with many columns into a list. I want the type to be float. data=open('dat1.dat','r') lines=data.readlines() print lines
1
by: Anatoly Kurilin | last post by:
Hi, I need to periodically read data from an DBF file which structure is permanent but its name and path change from reading to reading. Is it possible to do without setting a link. For instance,...
15
by: djj858 | last post by:
Another newbie question: How do I begin reading data, but starting from the xth line down a list? In other words, how do I skip the first lines and not read in those values?
1
by: Roob | last post by:
I am learning C# and was wandering if there is a straightforward way to read a binary file created by C++. For example, I have a data structure as follows: typedef struct _data time_t date;...
0
by: chandra.somesh | last post by:
Hi I wanted to draw charts/graphs by reading data from System.Web.UI.WebControls.DataGrid and render it on the web page.What is the best method to achieve this? Thanks Somesh Chandra
3
by: omariqbalnaru | last post by:
Is there any way to identify end of line while reading data from an ifstream?
8
by: Vivek Menon | last post by:
Hi, I am using a C program to write/read from a serial port. The writing part is working perfectly fine. However, I am not able to read the values correctly and display them. To debug this issue I...
5
by: Hetal | last post by:
Hi there.. I am a VB6 developer so kinda trying to figure how to work with VB.NET. I have a scenario where i have a DataSet bound to a combo box, and i would like to read a row from the DataSet...
1
by: =?Utf-8?B?U2hlZXMgQWJpZGk=?= | last post by:
I read an article on the link: http://support.microsoft.com/default.aspx?scid=kb;en-us;306572 related to reading data from Excel using OLEDB The topic's heading is: How to query and display excel...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.