473,799 Members | 3,638 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing a generic data file

Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for ideas
on how to parse a data file.

I dont know XML but I know it parses data in text format.

I have a structured data file of the general form shown below. I dont have
any definition of the data. Basically it looks like it is hierarchical,
token/data pairs defined by brackets and square brackets.

I would like to parse this out into some sort of data object(s) in C++ so
that I can gain programmatic access to the variables.

My app is C++ so the solution must be the same. Also it must be very
lightweight and *very* fast as I must decode multiple pages in realtime.

Would adapting an XML parser to do this be a possible solution?

Any pointers/ideas/references/code snippets/observations appreciated.

TIA

Basic example showing data structure (whitespaces and carriage returns added
by me for clarity).

{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationalit y":
"Kazakhstan "},
"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","s ex":"Female"}],
"Brown":
[{"First Name":"John","s ex":"Male"}],
"Jackson":
[{"First Name":"Jackie", "sex":"Fema le"}]
}
],
"Grades":
[
{
"Test":
[{"grade":"A","p oints":68},{"gr ade":"B","point s":25},{"grade" :"C","points":1 5}],
"Test":
[{"grade":"C","p oints":2},{"gra de":"B","points ":29},{"grade": "A","points":55 }],
"Test":
[{"grade":"C","p oints":2},{"gra de":"A","points ":72},{"grade": "A","points":65 }]
}
]

}




Dec 14 '07 #1
6 2395
"Jasper" <no*******@dont mail.comwrote in message
news:2p******** *************@p ipex.net...
Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
ideas on how to parse a data file.

I dont know XML but I know it parses data in text format.

I have a structured data file of the general form shown below. I dont have
any definition of the data. Basically it looks like it is hierarchical,
token/data pairs defined by brackets and square brackets.

I would like to parse this out into some sort of data object(s) in C++ so
that I can gain programmatic access to the variables.

My app is C++ so the solution must be the same. Also it must be very
lightweight and *very* fast as I must decode multiple pages in realtime.

Would adapting an XML parser to do this be a possible solution?

Any pointers/ideas/references/code snippets/observations appreciated.

TIA

Basic example showing data structure (whitespaces and carriage returns
added by me for clarity).

{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationalit y":
"Kazakhstan "},
"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","s ex":"Female"}],
"Brown":
[{"First Name":"John","s ex":"Male"}],
"Jackson":
[{"First Name":"Jackie", "sex":"Fema le"}]
}
],
"Grades":
[
{
"Test":

[{"grade":"A","p oints":68},{"gr ade":"B","point s":25},{"grade" :"C","points":1 5}],
"Test":

[{"grade":"C","p oints":2},{"gra de":"B","points ":29},{"grade": "A","points":55 }],
"Test":

[{"grade":"C","p oints":2},{"gra de":"A","points ":72},{"grade": "A","points":65 }]
}
]

}



Looks like JSON to me, search for a JSON library.
JSON is a way of representing objects using string literals that is used for
passing information to clients that use JavaScript.

--

Joe Fawcett (MVP - XML)
http://joe.fawcett.name

Dec 14 '07 #2

"msnews.microso ft.com" <jo********@new sgroup.nospamwr ote in message
news:uC******** ******@TK2MSFTN GP03.phx.gbl...
"Jasper" <no*******@dont mail.comwrote in message
news:2p******** *************@p ipex.net...
>Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
ideas on how to parse a data file.
Looks like JSON to me, search for a JSON library.
JSON is a way of representing objects using string literals that is used
for passing information to clients that use JavaScript.
Does it? Makes sense if that's true. I was sure it fit some sort of "web
format" but I didn't know which.
I presume there must be some sort of C++ code available to parse it out.

I'll take a look.

Thanks
Dec 14 '07 #3

"Jasper" <no*******@dont mail.comwrote in message
news:2p******** *************@p ipex.net...
Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
ideas on how to parse a data file.
can't you create arrays of C++ structs or classes to hold this data? As for
parsing it, if you don't want to write your own parser there has to be an
abundance of libraries out there you could use out of the box, no?
Efficiency will vary but I can't see why any decent commercial product, if
not your own code, would not be *very* fast

I guess I'm not seeing why you would use XML or XML tools to intermediate
this process when the data is not coming at you in XML and you've given no
indication that you need to out it as XML for other processes to consume ...
?

I dont know XML but I know it parses data in text format.

I have a structured data file of the general form shown below. I dont have
any definition of the data. Basically it looks like it is hierarchical,
token/data pairs defined by brackets and square brackets.

I would like to parse this out into some sort of data object(s) in C++ so
that I can gain programmatic access to the variables.

My app is C++ so the solution must be the same. Also it must be very
lightweight and *very* fast as I must decode multiple pages in realtime.

Would adapting an XML parser to do this be a possible solution?

Any pointers/ideas/references/code snippets/observations appreciated.

TIA

Basic example showing data structure (whitespaces and carriage returns
added by me for clarity).

{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationalit y":
"Kazakhstan "},
"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","s ex":"Female"}],
"Brown":
[{"First Name":"John","s ex":"Male"}],
"Jackson":
[{"First Name":"Jackie", "sex":"Fema le"}]
}
],
"Grades":
[
{
"Test":

[{"grade":"A","p oints":68},{"gr ade":"B","point s":25},{"grade" :"C","points":1 5}],
"Test":

[{"grade":"C","p oints":2},{"gra de":"B","points ":29},{"grade": "A","points":55 }],
"Test":

[{"grade":"C","p oints":2},{"gra de":"A","points ":72},{"grade": "A","points":65 }]
}
]

}




Dec 14 '07 #4
"Jasper" <no*******@dont mail.comwrote in message
news:Or******** *************** *******@pipex.n et...
>
"msnews.microso ft.com" <jo********@new sgroup.nospamwr ote in message
news:uC******** ******@TK2MSFTN GP03.phx.gbl...
"Jasper" <no*******@dont mail.comwrote in message
news:2p******** *************@p ipex.net...
Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
ideas on how to parse a data file.
Looks like JSON to me, search for a JSON library.
JSON is a way of representing objects using string literals that is used
for passing information to clients that use JavaScript.

Does it? Makes sense if that's true. I was sure it fit some sort of "web
format" but I didn't know which.
I presume there must be some sort of C++ code available to parse it out.
It is JSON. You would need to be looking at the Javascript eval method to
parse it. The returned object would then have a heiarchy you could pull
data from e.g.:-

var x = o.Class.Subject

x == "Politics" // will be true

However the structure is somewhat suspect.

The students array contains only one object on which all students are
placed. Each student having their last name as the attribute ID for their
object (what happens if the class is attended by more than one Smith?).
This object is in turn an array containing only one object.

The Grades array suffers the same problem where again inappropriate use of
{ } causes the array to contain only one object and in this case the same
identifier "Test" used multiple times resulting in it being redefined and
only containing the last entry.

Here is a cleaner version (although I'm not entirely happy with the
identifiers "Last Name" and "First Name" containing a space it is legal):-

{

"teacher":{
"name": "Mr Borat",
"age": 35,
"Nationalit y": "Kazakhstan "
},
"Class":{
"Semester": "Summer",
"Room": null,
"Subject": "Politics",
"Notes": "We're happy, you happy?"
},

"Students":
[
{"Last Name":"Smith",
"First Name":"Mary","s ex":"Female"} ,
{"Last Name":"Brown",
"First Name":"John","s ex":"Male"},
{"Last Name":"Jackson" ,
"First Name":"Jackie", "sex":"Fema le"}
],
"Grades":
[
{"Test":"Nam e of a Test",
Points: {"A":68,"B":25, "C":15}}
{"Test":"Nam e of a different test",
Points: {"A":55,"B":29, "C":2}}
{"Test": "Name of yet another test",
Points: {"A":72,"B":65, "C":2}}
]

}
--
Anthony Jones - MVP ASP/ASP.NET
Dec 14 '07 #5

<dn********@gma il.comwrote in message
news:da******** *************** ***********@e23 g2000prf.google groups.com...
The question arises as to whether the output XML should represent the
data
that would be available in the set of generated objects had the JSON
been
eval'd?

Perhaps the Grades section should look like this:-

<Grades>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

since only this data would appear in the an eval of the JSON?

The answer is clearly: No.
Oh, I thought the raison d'être behind JSON was that a data structure could
be serialised to a string that could be passed to Javascript and
re-assembled easily by using the Eval statement.
It is the definition of JSON (and the convertors from XML to JSON use
this) that a sequence of repeating xml elements with the same name are
represented as an ARRAY in JSON.
Is there a spec? Where does it say that?
>
We don't care what an JScript interpreter would do with the data, but
we must implement a truthful and lossless conversion. Not producing
all <test /and <grade /elements results in data loss.
Agreed. I'm willing to be shown wrong on this but if you're right than JSON
is bust and pointless.

--
Anthony Jones - MVP ASP/ASP.NET
Dec 22 '07 #6
<dn********@gma il.comwrote in message
news:f2******** *************** ***********@a35 g2000prf.google groups.com...
I also think that a more appropriate JSON representation than:

"Grades":
[
{
"Test":
[{"grade":"A","p oints":68},{"gr ade":"B","point s":25},
{"grade":"C","p oints":15}],
"Test":
[{"grade":"C","p oints":2},{"gra de":"B","points ":29},
{"grade":"A","p oints":55}],
"Test":
[{"grade":"C","p oints":2},{"gra de":"A","points ":72},
{"grade":"A","p oints":65}]
}
]
should have been:

"Grades":

{
"Test":
[
{"grade":"A","p oints":68,"grad e":"B","points" :
25,"grade":"C", "points":15 },

{"grade":"C","p oints":2, "grade":"B","po ints":29,
"grade":"A","po ints":55},

{"grade":"C","p oints":2, "grade":"A","po ints":72,
"grade":"A","po ints":65}
]
}
We're just guessing at the intent but that appears to be an object called
Grades that contains just one member an array called Test containing what
appears to be grades required to pass each test. Seems a little convoluted
and how is each test identified? Ordinal position?
Also, instead of:

"Students":
[
{
"Smith":
[{"First Name":"Mary","s ex":"Female"}],
"Brown":
[{"1First Name":"John","s ex":"Male"}],
"Jackson":
[{"2First Name":"Jackie", "sex":"Fema le"}]
}
],

it is better to have just:

"Students":
{
"Smith":
{"First Name":"Mary","s ex":"Female"} ,
"Brown":
{"1First Name":"John","s ex":"Male"},
"Jackson":
{"2First Name":"Jackie", "sex":"Fema le"}
}
,
And if you have two students with the last name Smith? Smith magically
becomes an array?
>Maybe, the original data was produced by a faulty XML --JSON
convertor.
Its difficult to make sense of what appears to be faulty both as JSON and as
a logical structure.

--
Anthony Jones - MVP ASP/ASP.NET
Dec 22 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
3002
by: Kylotan | last post by:
I have a text file where the fields are delimited in various different ways. For example, strings are terminated with a tilde, numbers are terminated with whitespace, and some identifiers are terminated with a newline. This means I can't effectively use split() except on a small scale. For most of the file I can just call one of several functions I wrote that read in just as much data as is required from the input string, and return the...
8
9449
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $ Last-Modified: $Date: 2003/10/28 19:48:44 $ Author: A.M. Kuchling <amk@amk.ca> Status: Draft Type: Standards Track
2
5062
by: Boris Boutillier | last post by:
Hi all, I'm looking for parsing a Verilog file in my python module, is there already such a tool in python (a module in progress) to help instead of doing a duplicate job. And do you know of some generic parsing module in python, in which you give some kind of grammar and callbacks ? Thanks for the help
29
4271
by: zoltan | last post by:
Hi, The scenario is like this : struct ns_rr { const u_char* rdata; }; The rdata field contains some fields such as :
0
1145
by: akira | last post by:
Hello, I am searching for good sample showing how to use SAX in C++. Especially parsing complex XML file containing List of element and mapping it with C++ generic objects.(eg : {"Key", "Value"}) <elem1>toto<elem1> ====> generic object "m_elem1", "toto"
9
4065
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics" portions of Babe Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and store that info in a CSV file. Also, I would like to do this for numerous players whose IDs I have stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.)....
3
4387
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in the file) with the location. And for a particular section I parse only that section. The file is something like, .... DATAS
2
2436
by: Gary42103 | last post by:
Hi I need Perl Script to do Data Parsing using existing data files. I have my existing data files in the following directory: Directory Name: workfs/ams Data File Names: 20070504.dat, 20070503.dat, 20070502.dat In each of above data files there will be some millions of records. So my job is read those data files and also read first 3 letters of each record in all above data files and write into new data files.For example
7
2413
by: Daniel Fetchinson | last post by:
Many times a more user friendly date format is convenient than the pure date and time. For example for a date that is yesterday I would like to see "yesterday" instead of the date itself. And for a date that was 2 days ago I would like to see "2 days ago" but for something that was 4 days ago I would like to see the actual date. This is often seen in web applications, I'm sure you all know what I'm talking about. I'm guessing this...
0
9687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10485
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10231
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9073
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7565
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6805
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5585
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4141
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.