473,796 Members | 2,536 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

C++ based fast parser for delimited records

We need to process a very large amount of delimited variable length
ASCII data in files as large as 3-4 gigs. We need a high performance
parser for this and as always, we have no money to buy one. We are ok
with building one as long as that can be done quick enough and I was
wondering if Boost has a panacea for us. Can anyone help with their
ideas / experience.

I am also very open to any suggestions outside Boost. Any outline on
how to build such a parser would be very welcome. If some comparative
performance figures can be mentioned, it would be of tremendous help.
Any fast C++ library would be of help.

We develop a market analytics tool on HP-UX and Linux on 32/64 bits.

Cheers,
Andy

Aug 30 '05 #1
3 2498
<ga***********@ yahoo.com> wrote in message
news:11******** **************@ g14g2000cwa.goo glegroups.com.. .
: We need to process a very large amount of delimited variable length
: ASCII data in files as large as 3-4 gigs. We need a high performance
: parser for this and as always, we have no money to buy one. We are ok
: with building one as long as that can be done quick enough and I was
: wondering if Boost has a panacea for us. Can anyone help with their
: ideas / experience.
:
: I am also very open to any suggestions outside Boost. Any outline on
: how to build such a parser would be very welcome. If some comparative
: performance figures can be mentioned, it would be of tremendous help.
: Any fast C++ library would be of help.

The parser itself may not be the performance-limiting factor as much
as the technique you use for i/o.

In similar circumstances, I usually use memory-mapping (map, or
MapViewOfFile on Windows) to bring the file (or large segments of it)
into memory. The OS page caching is typically much more efficient
than any file i/o API.

For parsing, I tend to rely on the tried and true flex tool
(http://www.gnu.org/software/flex/). Flex-generated code is very
likely to be faster than boost::spirit (but I have no data).
A hand-coded parser might be fastest if the structure of the
records is simple enough.
Maybe you can just delimit lines and use sscanf?

: We develop a market analytics tool on HP-UX and Linux on 32/64 bits.

Wishing you success - Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Aug 30 '05 #2
ga***********@y ahoo.com wrote:
We need to process a very large amount of delimited
variable length ASCII data in files as large as 3-4
gigs. We need a high performance parser for this and as
always, we have no money to buy one. We are ok with
building one as long as that can be done quick enough
and I was wondering if Boost has a panacea for us. Can
anyone help with their ideas / experience.


Use Boost.Spirit:
http://www.boost.org/libs/spirit/index.html

Marc

Aug 30 '05 #3

ga***********@y ahoo.com wrote:
I am also very open to any suggestions outside Boost. Any outline on
how to build such a parser would be very welcome. If some comparative
performance figures can be mentioned, it would be of tremendous help.
Any fast C++ library would be of help.


http://sourceforge.net/projects/re2c

Aug 30 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3945
by: Jiho Han | last post by:
I am thinking of embedding my schemas as embedded resources instead of reading it using URI at run-time. I came across some snags while trying to do just that such as, previously unknown to me, XmlValidatingReader.Schemas.Add was using XmlValidatingReader.Resolver to resolve my schemas using the URI method. Resolver property was never set, so the reader simply ignored any external references even though one of the schema explicitly imports...
3
7035
by: Ben | last post by:
Hi all - I am having a bit of trouble and thought maybe someone in this group could shed some light. Here's the skinny... I am creating an automated process to import a bunch of text files into Access. I want to avoid creating a separate "Spec" for each file (there are over 180 files) and instead want to code my own dynamic importing rules. So far it's been going fine, except for one item...
2
434
by: Annette Massie | last post by:
I have a bunch of images stored in a folder on the harddrive. Is there a way for Access 2000 to read the directory and write a record for the images found? Eventually the records need to be exported to an ASCII, comma delimited file. For example, say there is a folder on C called 1357. Withing 1357 I have images called 0587002, 0587003, 0587004, 05870042(this is the second page of 0587004). Can I create records in a table that would...
5
12872
by: STeve | last post by:
Hey guys, I currently have a 100 page word document filled with various "articles". These articles are delimited by the Style of the text (IE. Heading 1 for the various titles) These articles will then be converted into HTML and saved. I want to write a parser through vb.net that uses the word object model and was wondering how this could be achieved? The problem i am running into is that i can not test whether the selected text is...
13
8442
by: krbyxtrm | last post by:
hi, i have problem implemting a string parser that parser comman delimited string: "str1,str2,str3" INTO: 1. str1 2. str2 3. str3 *also strings are of any string (no specific string/keyword)
0
1008
by: ltfcphil | last post by:
Sorryif I'm a bit of a noob here, but can any one advise me on if/how it's possible to write to files using the MY object text field parser, or suggest another method of amending records in files using delimiters. Below is how I read from the file: Dim x As Integer Dim AClass As String Dim currentRow As String() Dim rowlen As Integer Using Myreader As New _ ...
4
1781
by: cacanene | last post by:
My question is what will be the fast algorithm to add records in a table based on the value of a field of other table. For example: TABLE2 contains two fields ID and DESCRIPTION TABLE1 contains three fields ID, DESCRIPTION and TOTALNUMBER Then, how to create so many rows in TABLE2 as the value of TABLE1.TOTALNUMBER If
0
1701
by: wisaunders | last post by:
the file I'm importing has > 200,000 records I have one problem: One of the columns in the .txt file (customerID) has almost all Inetger values except for about 30 records. Those 30 records have one letter in the customerID field (M123456). The field they are going into is VARCHAR(1000) . All of the integers go in correctly but for some reason the cutomerID values that stert with a letter (M123456) are NULL. Any help? I'm stumped. ...
16
2226
by: Malcolm McLean | last post by:
I want this to be a serious, fruitful thread. Sabateurs will be plonked. Table-based programming is a new paradigm, similar to object-orientation, procedural decomposition, or functional programming. The idea is that all the data in the program comes in "tables". A table consists of records and fields, and is thus a 2d entry. Fields may be numbers or strings, and have names, descriptors, prevalidation conditions and postvalidation...
0
9680
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9528
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10230
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9052
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7548
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6788
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5442
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5575
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3731
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.