Although my experience does not relate to web-based work in PHP I worked with fixed-width text files when submitting funding returns to a student funding council. I used VBA and Access to process them. Other databases and programming languages would have been equally as useful.
Although I used Access tables to store the data then queries to process them it is just as possible to use arrays in memory processed using bespoke code instead. I did as much as I could in database queries, to make it as transparent as possible what was going on (for maintenance purposes). If all of the processing is done in code it is more difficult for other people who might follow on from ourselves to maintain, as code is another step removed from the experience of non-programmers and more difficult to understand (at least initially) than an SQL query may be.
The funding returns were in two fixed-width text files, one representing course information (known as FES 1) and the other student data (FES 2), including the number of fundable units (student units of measure as they were called) we were claiming for each student. To process these files I read them into an Access table containing a single column called [Lines] that contained the text read from the text file to the end-of-line in each row.
An anonymised example of the text lines from the FES 2 table is shown below. The file has an internal structure of header rows followed by detail rows, repeated for each programme:
- IN130121FES51605530013
-
NAMEA DW00000011/1 020091355241019881270XX22 1XXINLM 22101708 08 110920060000000018.00000.00000.00014082006250520072410200606303420.00000.000
-
NAMEBBBB JM00000015/1 010050006270519881270XX51 9XXINLM 22101708 01 110920060000000018.00000.00000.00014082006250520072410200606302120.00000.000
-
NAMECC JW00000000/1 030066561071119891270XX2 4XX INLM 22101708 01 110920060000000018.00000.00000.00014082006250520072410200606302720.00000.000
-
NAMEDDDDD M 00000000/1 030049071290719891270XX49 9XXINLM 22101708 01 110920060000000018.00000.00000.00014082006250520072410200606302620.00000.000
As with your example there was a detailed specification for each component field of the file, and for the header/detail structure. The field start and end character numbers and field names were stored in a table. Some of the fields listed are shown below:
- Field Name P1 P2
-
01 Surname 1 20
-
02 Initials 21 22
-
03 SQA 23 42
-
04 ID 23 42
-
05 DOB 52 59
-
06 Gend 60 60
-
07 Home 61 63
-
08 PCode 64 71
-
09 Cmps 72 77
In my case I used an unjoined Access Crosstab query to extract the field structure from the student data text file and the specification file for checking purposes. The simple query which extracts the fields using the unjoined crosstab and a sample of the consequent decoded data is shown below:
- TRANSFORM First(ExtractText([Lines],[p1],[p2])) AS [Text]
-
SELECT [FES 2 Text File].Lines, Len([Lines]) AS L
-
FROM [FES 2 Text File], [FES 2 Table Fields]
-
GROUP BY [FES 2 Text File].Lines, Len([Lines])
-
PIVOT [FES 2 Table Fields].[Field Name];
(The use of function First in the TRANSFORM statement is just a dummy in this case to allow the crosstab to take place, as Access will not pivot a value unless some kind of aggregate function is used on the data.)
- 01 Surname 02 Initials 03 SQA 04 ID 05 DOB 06 Gend 07 Home 08 PCode 09 Cmps 10 Src of Fin 11 Ethnic 12 MOA 13 Cat 14 Schl 15 Dis All 16 Dis 17 Outcm 18 QED 19 H 20 A 21 SCE1 22 SCE2 23 Units 24 Other 25 SUMs 26 ELS 27 SUM Eq 28 Start 29 End 30 25% 31 Enr no 32 Trf SUMs 33 Trf ELS
-
NAME1111 HC 00000000/4 00000000/4 13101989 1 300 XX30 8XX INLM 06 11 08 10 0121052 0 07 00 0 0 0 0 0 010.00 000.00 000.00 23042007 25052007 15052007 000000 10.00 000.00
-
NAME2 BR 00000000/2 00000000/2 22071987 1 270 XX15 9XX INLM 05 10 06 08 0109102 0 06 00 0 0 0 0 0 007.50 000.00 000.00 11092006 30032007 31102006 000000 07.50 000.00
-
NAME3333 RE 00000000/1 00000000/1 04061990 2 270 XX6 7XX INAAC 42 10 07 03 0118092 0 06 00 0 0 0 0 0 006.00 000.00 000.00 21082006 23042007 21102006 000000 06.00 000.00
-
NAME4444 K 00000000/2 00000000/2 11051985 2 270 XX2 4XX INMM 10 10 08 11 0109102 0 06 00 0 0 0 0 0 001.00 000.00 000.00 11092006 22062007 15112006 000000 01.00 000.00
VBA Functions to extract the data from the text string were very simple, and should be adaptable to other languages relatively easily:
- Public Function ExtractField(source, P1 As Integer, P2 As Integer) As String
-
' Returns a sub-string of the source string, trimmed to remove
-
' trailing spaces. Leading spaces remain to assist in identifying field
-
' starts
-
-
Dim Result As String, l As Integer
-
Result = ""
-
If Not IsNull(source) Then
-
l = Len(source)
-
If (P1 <= l) And (l - P1 - P2 + 1 >= 0) Then
-
Result = RTrim(Mid$(source, P1, P2))
-
End If
-
End If
-
ExtractField = Result
-
End Function
-
-
Public Function ExtractText(source, ByVal P1 As Integer, ByVal P2 As Integer) As String
-
' In: source string
-
' p1 = start position in source string
-
' p2 = end position in source string
-
' Out: substring of source starting at p1 and ending at p2
-
' Uses extractfield() to get the data from the source string
-
'
-
ExtractText = ExtractField(source, P1, P2 - P1 + 1)
-
End Function
My main task was actually not so much the decoding of the text file back into its constituent fields but to generate data in the fixed format. This was done by recordset processing of individual fields from final queries, setting up fixed-width and fixed-format values as appropriate to match the spec.
The downside to all the string extracts is that processing can be very slow. In my case the records were one to a text line, so processing more than 8000 lines each containing around 32 individually-extracted fields was relatively slow to accomplish.
In your case you have a record structure which appears to go across rows with multiple records in a single row (hence only three rows in total in your text file). If that is the case you may have to pre-process the text lines to extract the individual records into an array or table, say, before you begin extracting the fields as such.
Hope this brief summary of experience doing similar things outside of PHP itself helps. I wish you all success in doing the work in PHP, or indeed for developing in any other scripting language you may want to use to do the processing for you.
Regards
Stewart