473,748 Members | 3,604 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Efficiently Parsing Data

Hi,

I have multiple data files which need parsing in realtime so high
performance is *crucial*.

I dont have a format definition, but from what I can see there is a
hierarchy of data.
Each data field is named thus <"name":(the <are mine).
The data can be quoted text or unquoted text or a composite hierarcy field.
Each name/data pair is terminated by a comma unless it is the last in the
group.

A comma can also appear within a quoted text data field.

The hierarchical tokens are open and close braces <{}and open and
close square brackets <[]>.

Thats all there is to it :)

The data describes, say, a school class, so we have a rigid set of data
groups.
eg we have data describing the teacher, data describing the class taken, and
a repeating group describing each kid and grades.

So it would be nice to be able to parse this data out into appropriate
structures.

Below is a snipped of dummy data (in reality there is much more). I have
added the spacing and carriage returns for clarity. The real data has no
white spaces. There may be a variable number of parameters (I think) so it
would be useful to be able to ID and potentially store the variable name
with its data value.

Anyone got any ideas/code snips/references of the best, most speedy (at run
time), way to go about it? A tight, pure c++ solution (with or without the
stl) would be needed.

Thanks in advance for any help
{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationalit y":
"Kazakhstan "},
"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","s ex":"Female}],
"Brown":
[{"First Name":"John","s ex":"Male}],
"Jackson":
[{"First Name":"Jackie", "sex":"Fema le}]
}
],
"Grades":
[
{
"Test":
[{"grade":A,"poi nts":68},{"grad e":B,"points":2 5},{"grade":C," points":15}],
"Test":
[{"grade":C,"poi nts":2},{"grade ":B,"points":29 },{"grade":A,"p oints":55}],
"Test":
[{"grade":C,"poi nts":2},{"grade ":A,"points":72 },{"grade":A,"p oints":65}]
}
]

}



Dec 14 '07 #1
9 1764
Jasper wrote:
I have multiple data files which need parsing in realtime so high
performance is *crucial*.

I dont have a format definition, but from what I can see there is a
hierarchy of data.
You better come up with a definition, otherwise you're programming
without a spec. Even if you are reverse-engineering, you need to
begin by writing a specification. A good spec gets you half way
to the solution.

Once you have the definition, you can write a flex/yacc grammar for
it, and then you generate the code that handles that file. Simple
as that.
[..]
Anyone got any ideas/code snips/references of the best, most speedy
(at run time), way to go about it? A tight, pure c++ solution (with
or without the stl) would be needed.
I can only say, good luck with your homework!

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Dec 14 '07 #2
You better come up with a definition, otherwise you're programming
without a spec. Even if you are reverse-engineering, you need to
begin by writing a specification. A good spec gets you half way
to the solution.
The only thing that defines the data as far as I can see is it's
hierarchical structure as given by the bracket and square bracket.
The token names and data values are irrelevant to this.
Once you have the definition, you can write a flex/yacc grammar for
it, and then you generate the code that handles that file. Simple
as that.
I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar with
it, but it seems a bit of overkill for what I want.
I wondered if I could adapt a lightweight XML parser, but all I need is a
sort of DOM based on the "{}" and "[]"

>[..]
Anyone got any ideas/code snips/references of the best, most speedy
(at run time), way to go about it? A tight, pure c++ solution (with
or without the stl) would be needed.

I can only say, good luck with your homework!
Homework? What's that supposed to mean?

Dec 14 '07 #3
Jasper wrote:
>Once you have the definition, you can write a flex/yacc grammar for
it, and then you generate the code that handles that file. Simple
as that.

I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar with
it, but it seems a bit of overkill for what I want.
I wondered if I could adapt a lightweight XML parser, but all I need is a
sort of DOM based on the "{}" and "[]"
Maybe this can help you:
http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html
>
>>[..]
Anyone got any ideas/code snips/references of the best, most speedy
(at run time), way to go about it? A tight, pure c++ solution (with
or without the stl) would be needed.
I can only say, good luck with your homework!

Homework? What's that supposed to mean?
Here is definition:
http://en.wikipedia.org/wiki/Homework
Dec 14 '07 #4

"anon" <an**@no.nowrot e in message
news:fj******** **@el-srv04-CHE.srvnet.east link.de...
Jasper wrote:
>>Once you have the definition, you can write a flex/yacc grammar for
it, and then you generate the code that handles that file. Simple
as that.

I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar
with it, but it seems a bit of overkill for what I want.
I wondered if I could adapt a lightweight XML parser, but all I need is
a sort of DOM based on the "{}" and "[]"

Maybe this can help you:
http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html
Thanks, I'll take a look. If you kow about the tool and XML (I assume) - I
dont.
Will I have to rewrite the parser to handle the brackets or are they part of
the XML spec (in some way)?
(I'm just asking for a "quickstart ").

>>
Homework? What's that supposed to mean?

Here is definition:
http://en.wikipedia.org/wiki/Homework

Oh thats what it is. Thanls
Dec 14 '07 #5
Jasper wrote:
"anon" <an**@no.nowrot e in message
news:fj******** **@el-srv04-CHE.srvnet.east link.de...
>Jasper wrote:
>>>Once you have the definition, you can write a flex/yacc grammar for
it, and then you generate the code that handles that file. Simple
as that.
I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar
with it, but it seems a bit of overkill for what I want.
I wondered if I could adapt a lightweight XML parser, but all I need is
a sort of DOM based on the "{}" and "[]"
Maybe this can help you:
http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html

Thanks, I'll take a look. If you kow about the tool and XML (I assume) - I
dont.
There is a tutorial, explaining xml format and the library.
Will I have to rewrite the parser to handle the brackets or are they part of
the XML spec (in some way)?
(I'm just asking for a "quickstart ").
I don't know about brackets. Maybe
Dec 14 '07 #6
Hi Jasper,

you might want to have a look at boost::serializ ation:

http://boost.org/libs/serialization/doc/index.html

It is a bit tricky to get in touch with 1st.
But once understood, it is straight forward to use.

rgds!

Frank
Dec 14 '07 #7
Jasper wrote:
I wondered if I could adapt a lightweight XML parser, but all I need is a
Wrong turn. I know, today it's XML for any conceivable i/o situation all
hyped up so that people hardly know that software can actually exist
that doesn't use XML, but still. You said you want something fast (and
probably simple).
sort of DOM based on the "{}" and "[]"
You can easily write a simple recursive-descent parser for that. A
recursive descent parser is just another name for the intuitive solution.
Dec 14 '07 #8

"Matthias Buelow" <mk*@incubus.de wrote in message
news:5s******** *****@mid.dfnci s.de...
Jasper wrote:
>I wondered if I could adapt a lightweight XML parser, but all I need is
a

Wrong turn. I know, today it's XML for any conceivable i/o situation all
hyped up so that people hardly know that software can actually exist
that doesn't use XML, but still. You said you want something fast (and
probably simple).
Actually, I have just discovered that the format of the data is JSON.
Dec 14 '07 #9
On Dec 14, 6:34 am, "Jasper" <notaro...@dont mail.comwrote:
"Matthias Buelow" <m...@incubus.d ewrote in message

news:5s******** *****@mid.dfnci s.de...
Jasper wrote:
I wondered if I could adapt a lightweight XML parser, but all I need is
a
Wrong turn. I know, today it's XML for any conceivable i/o situation all
hyped up so that people hardly know that software can actually exist
that doesn't use XML, but still. You said you want something fast (and
probably simple).

Actually, I have just discovered that the format of the data is JSON.
As someone already mentioned, figuring out the format gets you well on
your
way to a solution. Google JSON C++ and you will see that there are
existing
solutions written to parse that format. So now the next step is to
determine
whether that work is sufficient for your task.
Dec 15 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
1536
by: Klaus Neuner | last post by:
Hello, I need to gather information that is contained in various files. Like so: file1: ===================== foo : 1 2 bar : 2 4
3
3070
by: Girish | last post by:
Hi All, I have written a component(ATL COM) that wraps Xerces C++ parser. I am firing necessary events for each of the notifications that I have handled for the Content and Error handler. The events can then I am able to parse XML input in the form of files. I also have provided support for parsing of XML content in the form of string data. I am able to do so by creating a MemBufInputSource object using the XML content provided to the...
4
2658
by: ralphNOSPAM | last post by:
Is there a function or otherwise some way to pull out the target text within an XML tag? For example, in the XML tag below, I want to pull out 'CALIFORNIA'. <txtNameUSState>CALIFORNIA</txtNameUSState>
10
2650
by: Russell Mangel | last post by:
What would be the best way to parse this XML document? I want to avoid using XMLDocument. I don't know if I should use XMLTextReader, or Xpath classes. There is only one element <MessageStore> element in the document, "always" at the end of the document. There will be thousands of <Messages> elements, "always" before <MessageStore> element. 1st Step
0
1536
by: Fei Liu | last post by:
Yet another problem to deal with dynamic data type that can only be determined at run time. For a netCDF file (a scientific data format), a variable is defined with its associating dimensions, i.e. data(time, z, x). Each dimension is defined as wel in the netCDF file, time(time), z(z), x(x), for example netcdf andrew_test_data { dimensions: XAX = 24 ; bnds = 2 ;
9
4061
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics" portions of Babe Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and store that info in a CSV file. Also, I would like to do this for numerous players whose IDs I have stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.)....
3
4385
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in the file) with the location. And for a particular section I parse only that section. The file is something like, .... DATAS
13
4511
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple command set consisting of several single letter commands which take no arguments. A few additional single letter commands take arguments:
0
1190
by: taa | last post by:
Hi there I’m trying to come up with a smart way of parsing content from textboxes in C#. I have about 7-10 boxes with different content; dates, times, numbers and text that has to be parsed and inserted into my database. Of course the obvious solution is to make a lot of if’s and parse each box one at the time with the properties I know and catch all exceptions on the way, if any, but does anyone has an idea of a smarter way of doing...
0
8991
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8830
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9544
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9324
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9247
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
4874
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3313
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2783
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2215
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.