473,386 Members | 1,835 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

My Explode function(s) are too slow.

Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
int iStart = 0;

while(iPos>-1)
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}

//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A.

How could I speed up my Explode?

Many thanks

FFMG

Jun 8 '06 #1
5 3441
* FFMG:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
int iStart = 0;

while(iPos>-1)
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}

//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A.

How could I speed up my Explode?


I'd first try to

* Read the complete file into a buffer in one or a very few large
gulps -- that typically improves the reading by at least one
order of magnitude.

* Analyze whether an /explicit representation/ of the complete token
set is really required, or whether you can just proceed by handing
one at a time up to calling code or down to code that you call.

* If explicit representation is required, and performance really
suffered, I'd first try the obvious of checking whether compiler
options could fix the performance; second whether a rewrite to a
"get" function (not returning the result via function result but
via a reference argument) would fix it; third, I'd consider things
such as a vector of StringSpan objects, each such object containing
just a pointer to the start and end of a substring.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Jun 8 '06 #2
FFMG wrote:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.
You are aware that not everyone knows PHP? What does "explode each line"
mean?
The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator
You should pass those strings by reference.
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
int iStart = 0;
The return type of std::string::find and std::string::length is
std::string::size_type, not int.
while(iPos>-1)
find() returns std::string::npos if nothing is found.
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}


//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A.


The second one just uses a single char as separator, while the first one
uses a whole string. If a single char is enough, you could implement A as:

std::vector<std::string> explode(const std::string& s, const char separator)
{
std::vector<std::string> ret;
std::stringstream stream(s);
std::string element;
while (std::getline(stream, element, separator))
ret.push_back(element);
return ret;
}

Jun 8 '06 #3
FFMG wrote, On 8.6.2006 10:57:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator Pass the two arguments by reference instead of by value.
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0); Better use std::string::size_type.
int iStart = 0;

while(iPos>-1) Don't use -1, better use std::string:npos
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}

//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A. Probably because it does not copy whole string s on each call.

How could I speed up my Explode?

Many thanks

FFMG


--
Vaclav Haisman
Jun 8 '06 #4
FFMG wrote:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
The problem is memory allocation overhead.
How could I speed up my Explode?


Something like:

void explode(
std::vector<char const*>& ret,
std::string& s,
char const separator
)
{
ret.reserve(s.size() / 10u);
std::string::size_type const iPit = 1;
std::string::size_type iPos = s.find(separator, 0);
std::string::size_type iStart = 0;

while(iPos != std::string::npos)
{
if(iPos!=0){
s[iPos] = '\0'; //null it out
ret.push_back(&s[iStart]);
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(&ret.first[iStart]);
}
}

ret will only be valid for as long as the passed s is not modified, and
note that s is modified by the call.

Tom
Jun 8 '06 #5
On 8 Jun 2006 01:57:55 -0700, "FFMG" <sp********@myoddweb.com> wrote:
I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.
I guess 'explode' means tokenize.
The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...


Others have already pointed out that you copy objects unnecessarily
instead of using references (only old-fashioned 'modern' C++
programmers copy everything by value). One performance inhibitor for
large data is also std::vector. You must reserve (with
vector.reserve()) enough space for the vector (estimate!) otherwise it
reallocates often and performs many unnecessary copies.
Some tokenize implementation are discussed in:
http://groups.google.com/group/comp....4daafacd01ce26
, see esp. John Potter's solution and the following.

Best wishes,
Roland Pibinger
Jun 8 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: the friendly display name | last post by:
Hello, In PHP there is the explode() function, it splits a string and returns an array consistent of the string parts. The string is split this way: $pizza =...
1
by: FrzzMan | last post by:
Hello, If I have a string "abc.def.ghi" Is there any simple way to explode it to an array like PHP explode() function? Like this: /* This is not C# code */
12
by: frizzle | last post by:
Hi there, i have a site with fake folders & files. htaccess rewrites everything to index.php?vars now in index.php i decide what file to include with a switch/case statement. to define where...
0
by: k04jg02 | last post by:
Python has a nifty operator that will take a container and pass its elements as function parameters. In Python you can make a list like so: x = Then you can say: f(*x)
5
by: DJH | last post by:
I am trying to find the most efficient way to code a php explode in C++. What existing STL function may I use to separate this in to its parts where the delimiter is "||" ? example: ...
1
by: 848lu | last post by:
PHP - display from txt file with explode function hi, i am trying to display all data from a txt file with a function of explode but does not work..... <? #reads a text file outputs it to...
3
by: lbvox | last post by:
Hello, I have created this script, but I do not succeed to insert the function "explode ()" for being able to gain given following dates ---- ($host="hostname" / $msg="msg" /$date="date") <?php...
5
by: sathyashrayan | last post by:
Dear group, The function to be used as follows: $links = "http://www.campaignindia.in/feature/analysis"; $tag1 = '<div class=feature-wrapper>'; $tag2 = '<h1><a href'; $tag3 = "</a>"; $op =...
8
by: vinpkl | last post by:
hi all i want to use explode url for shotening my urls i have a url like http://localhost/vineet/products.php?dealer_id=12&category_id=2 This is my navigation php code that has url...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.