473,385 Members | 1,317 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Efficiency of parsing a string in C vs C++

I have a comma delimited file to parse up. So I figured I will go line by line and parse them up. Now efficiency is the top priority here. So I figure I can use the old-school C approach or use std::string. The way I see it I could (assuming infile is a valid std::ifstream)

Expand|Select|Wrap|Line Numbers
  1. char buffer [4096];
  2. while (!infile.eof () && !infile.bad ())
  3.     {
  4.     // Get the next line in the file    
  5.     infile.getline (buffer, 4095);
  6.  
  7.     // If fail bit is set it means we did not read the entire
  8.     // line in. So we need to skip it, but the rest of the line is
  9.     // still on the stream. So clear the error and contnue reading
  10.     // until the end of the line, skipping it all.
  11.     if (infile.fail ())
  12.         {
  13.         while (infile.fail ())
  14.             {
  15.             infile.clear ();
  16.             infile.getline (buffer, 4095);
  17.             }
  18.         continue;
  19.         }
  20.  
  21.     // If the string is empty or all whitepace skip it.
  22.     char *str = buffer;
  23.     while (*str != '\0' && isspace (*str))
  24.         str++;
  25.  
  26.     if (strlen (str) == 0)
  27.         continue;
  28.  
  29.     char *tokstr = strtok (str, ",");
  30.     while (tokstr)
  31.         {
  32.         // Do something with the data
  33.         tokstr = strtok (NULL, ",");
  34.         }
  35.     }
  36.  
or

Expand|Select|Wrap|Line Numbers
  1. string buffer;
  2. while (!infile.eof () && !infile.bad ())
  3.     {
  4.     getline (infile, buffer);
  5.  
  6.     // Don't worry about fail bit, string is dynamic and can read all lines
  7.  
  8.     string::iterator stritr = buffer.begin ();
  9.     while (isspace (*stritr))
  10.         stritr++;
  11.  
  12.     if (stritr == buffer.end ())
  13.         continue;
  14.     if (stritr != buffer.begin ())
  15.         buffer.erase (buffer.begin (), stritr)
  16.  
  17.     int startpos = 0;
  18.     int endpos = buffer.find_first_of (",");
  19.     while (startpos != npos)
  20.         {
  21.         valuestr = buffer.substr (startpos, endpos);
  22.         // Do something with the data
  23.  
  24.         startpos = endpos + 1;
  25.         endpos = buffer.find_first_of (",", startpos);
  26.         }
  27.     }

The string version is a little nicer on memory since there is no real cap on the line size whereas the first approach caps it at 4095 and requires a bit more logic to deal with longer lines (not to mention not importing them). However, I have a strong feeling the first one is much faster. Any thoughts on which is faster? Any optimizations you see?
Jul 18 '07 #1
3 2068
weaknessforcats
9,208 Expert Mod 8TB
The STL templates are faster than the C functions.

P J Plauger, who wrote the STL templates says they were optimized for speed and that If you think you can do this faster, then think three times.

Operations on STL containers with millions of strings are much faster than C code.
Jul 18 '07 #2
Cool. My concern of speed was in the line:

Expand|Select|Wrap|Line Numbers
  1. valuestr = buffer.substr (startpos, endpos);
Simply because its allocatin/dealloc temporary memory. But the other speed enhancements probably outweigh it.
Jul 18 '07 #3
ravenspoint
111 100+
Perhaps you can avoid the copy - it depends on what "do something with the data" involves.

Let's suppose you want to covert it to an integer

// replace comma with null

buffer[ endpos ] = '\0';

// convert

int i = atoi( (const char*) &buffer[ startpos ] );

All very tricky, of course, and of dubious value. Are you SURE this is the bottleneck? Have you run a profile?

James
Jul 19 '07 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
2
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
2
by: Oliver Corona | last post by:
I am wondering if anyone has any insights on the performance benefit (or detriment) of declaring local variables instead of referencing members. Is allocating memory for a new variable more...
12
by: PD | last post by:
I am currently making a dating website. I want to have some information on how to structure the database and the php files so that I can achieve speed and efficiency. Can some one please give...
19
by: vamshi | last post by:
Hi all, This is a question about the efficiency of the code. a :- int i; for( i = 0; i < 20; i++ ) printf("%d",i); b:- int i = 10;
2
by: Benny the Guard | last post by:
I know this sounds like a homework problem, but it's honestly not one. I am writing some code which given an IP address in string format will calculate the integer value of it. Now the trick is I...
9
by: OldBirdman | last post by:
Efficiency I've never stumbled on any discussion of efficiency of various methods of coding, although I have found posts on various forums where individuals were concerned with efficiency. I'm...
6
by: gw7rib | last post by:
I have a program that needs to do a small amount of relatively simple parsing. The routines I've written work fine, but the code using them is a bit long-winded. I therefore had the idea of...
1
by: eyeore | last post by:
Hello everyone my String reverse code works but my professor wants me to use pop top push or Stack code and parsing code could you please teach me how to make this code work with pop top push or...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.