473,385 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

validating plain text input files .....

we have a system that recives plain text files from numerous external
sources (database and others). our system recieves the information and
then processes it. each line in the input file is a record. our system
recognizes part of the records by character location
(1-6)=date,(7-9)=age etc, parses the record according to the
configuration it reads and proceses the data.

quite often, the quality of the recieved data is not "good" and we
have problems. while it would be preferable to perform the detailed
validation inside the application i'm keen to investigate alternatives
for a number of reasons.

this may sound like a bad idea but would it make any sense whatsoever
to parse the file into xml format and the use native modules in
languages such as perl to compare the xml file with the xml schema for
the input file? i know it would be much better to force the input
files to be xml but that may be a bridge too far for now.

i'm open to any suggestions. this idea came to me when i was running
around the park this evening and may be partly due to my dehydration
at the time!
Jul 20 '05 #1
2 1862

"championsleeper" <st*****@yahoo.co.uk> wrote in message
news:10**************************@posting.google.c om...

this may sound like a bad idea but would it make any sense whatsoever
to parse the file into xml format and the use native modules in
languages such as perl to compare the xml file with the xml schema for
the input file? i know it would be much better to force the input
files to be xml but that may be a bridge too far for now.

You might be interested in checking out my open source project
http://servingxml.sourceforge.net/, which supports this idea. Check out the
"countries" and "hot 1" examples in the Examples link. This software
supports input streams of flat file records that may have different formats,
represented by record types. The record type is used as the document
element, and each field is represented as an element.

Regards,
Daniel Parker
http://servingxml.sourceforge.net/
Jul 20 '05 #2
championsleeper wrote:
... each line in the input file is a record...

this may sound like a bad idea but would it make any sense whatsoever
to parse the file into xml format and the use native modules in
languages such as perl to compare the xml file with the xml schema for
the input file? i know it would be much better to force the input
files to be xml but that may be a bridge too far for now.


The input parsing can be coded also in Perl, so no need for XML
intermediate representation. Besides that, if the input is
line-oriented, then AWK could be a better tool (simpler than Perl).

Post your question and sample data on news:comp.lang.awk and you will
probably have useful suggestions and even sample code.
--
To reply by e-mail, please remove the extra dot
in the given address: m.collado -> mcollado

Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Gleep | last post by:
Hi PHP coders, I've got an issue I'm stuck with. Imagine there is a large form that has 5 columns and 20 rows. In each row there is a check box - then 4 input fields. I already have the code...
4
by: hoke | last post by:
I want to display plain text files in the browser. The files contain html and javascript and have a .txt extension. This works fine with files with just html. Unfortunately when showing files with...
3
by: Mark | last post by:
Hi, Im trying to validate a form, all the validating works apart from one field. This particular field must consist of the first 2 characters as letters, & the following 5 as numbers. And if it...
6
by: mike | last post by:
Hello, After trying to validate this page for a couple of days now I was wondering if someone might be able to help me out. Below is a list of snippets where I am having the errors. 1. Line 334,...
6
by: Mark | last post by:
so after much searching, and thinking, and pondering and planning, i came up with this most amazing thing, and then realized one major flaw which i was hoping you guys might help me overcome. ...
232
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first...
2
by: SONIQ | last post by:
Using javascripts to validate this form. Basic operation, when a user clicks the submit order button, the javascript code must validate everything entered by the user. Please help finnish this...
8
by: cutlass | last post by:
Need you assistance to anyone who is willing to offer. I have been working on this script and can't get it to work. The issue I'm having is the statement: function validateSender($Address)...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.