473,395 Members | 1,666 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

stringstream str(), buffer overflows

While writing some code to demonstrate different local search
strategies, I found something kind of unusual. I suspect it's a bug in
my understanding rather than a bug in GCC, and I'm hoping someone here
can help me out.

The task is simple: solve the game Boggle using a pure hill-climbing
strategy, returning a tab-separated list of words as a char*. (I'm
interfacing with Python, so the char* is necessary; that seems to be
the easiest way to get Python strings back from C++ code.) When
reading the data out I wind up overrunning the end of the allocated
space, despite the fact that at first blush it appears I'm doing things
right.

I'm retyping the offending code here. For clarity I'm leaving off the
std:: prefix from some calls, but these should be obvious. The
following is a method, not a function; it's declared inline in the
header file.

=====
char* words() const
{
stringstream ss;
ostream_iterator<string> oi(ss, "\t");
copy(_wordset->begin(), _wordset->end(), oi);
// The stringstream has a trailing '\t' at the end of it
// which we're not going to copy. This will turn into
// a trailing '\0'.
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}
=====

On OS X, this code works as expected. On Win32 and Debian, SIGSEGV (or
its Windows equivalent) is caught, with the offending line being the
call to copy. Replacing the last two lines of the method with

string t = ss.str().substr(0, ss.str().size() - 1);
copy(t.begin(), t.end(), rv);
return rv;

.... makes everything work just fine, though.

Can anyone give me a clear, concise description of my error in
understanding? Or is this a bug in GCC?

Feb 9 '06 #1
12 8985
Robert J. Hansen wrote:
stringstream ss; [...] char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}


That's a good one :-) Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy. In the above code I spotted a total
of four copies which can easily turn into an unnecessary performance
problem even if the code happens to work for whatever unfortunate
reason you encounter (a condidate could be the use of some form of
reference counted copy where the resulting sequence is actually a
valid one). As a result of having multiple copies, you try to
iterate over an invalid range by using the begin iterator of one
sequence and the end iterator of another. Apart from the resulting
range being a different size you also run the danger of trying to
access arbitrary memory in between.

Thus, you should get the string only once and copy it from there.
I'd write the above portion of copying the code something like
below which also avoids the unnecessary call to 'memset()' which
merely assigns values which will be overridden anyway:

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;
return rc;
}

...
return to_c_string(ss.str());
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Feb 9 '06 #2
> Note that the 'str()' method of the string stream returns
the string *by value*


Thanks! This is exactly the thing I was missing--for some reason I
thought it was returning by reference. In light of this, your remarks
about unnecessary construction of string objects is well-taken,
although it's not a performance hit in this problem (the code executes
in sub-second time).

Once more, thanks for the reply!

Feb 9 '06 #3
On Thu, 09 Feb 2006 16:53:37 +0100, Dietmar Kuehl
<di***********@yahoo.com> wrote:
Robert J. Hansen wrote:
stringstream ss;[...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}


That's a good one :-) Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy.


The main problem with the code is that it mixes low-level (memset,
char[]) and (insufficiently understood) high level constructs
(stringstream, string, copy). That mixture is always an indication for
problems in the code.
In the above code I spotted a total
of four copies which can easily turn into an unnecessary performance
problem even if the code happens to work for whatever unfortunate
reason you encounter (a condidate could be the use of some form of
reference counted copy where the resulting sequence is actually a
valid one). As a result of having multiple copies, you try to
iterate over an invalid range by using the begin iterator of one
sequence and the end iterator of another. Apart from the resulting
range being a different size you also run the danger of trying to
access arbitrary memory in between.

Thus, you should get the string only once and copy it from there.
I'd write the above portion of copying the code something like
below which also avoids the unnecessary call to 'memset()' which
merely assigns values which will be overridden anyway:

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;
Don't know what the above line does ...
return rc;
}

...
return to_c_string(ss.str());


Of course, returning a new-ed object to the caller (delete by caller)
is bad style. Very bad style, IMO.

Best regards,
Roland Pibinger
Feb 9 '06 #4
On 9 Feb 2006 09:33:56 -0800, "Robert J. Hansen"
<ci********@gmail.com> wrote:
Note that the 'str()' method of the string stream returns
the string *by value*


Thanks! This is exactly the thing I was missing--for some reason I
thought it was returning by reference.


You are right and the interface is wrong. There really should be a
'reference-iterface' instead of a 'value-interface'.

Best wishes,
Roland Pibinger
Feb 9 '06 #5
Roland Pibinger wrote:
On Thu, 09 Feb 2006 16:53:37 +0100, Dietmar Kuehl
<di***********@yahoo.com> wrote:
Robert J. Hansen wrote:
stringstream ss;[...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}


That's a good one :-) Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy.


The main problem with the code is that it mixes low-level (memset,
char[]) and (insufficiently understood) high level constructs
(stringstream, string, copy). That mixture is always an indication for
problems in the code.


Whether the mixture of code is problematic or not depends on the
user's requirements. The problem he is encountering, however, is
not related to this mixture but to the use of iterators to different
temporaries.
char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;


Don't know what the above line does ...


Right... There is small but rather obvious typo: the last asterisk
should have been an assignment:

*std::copy(str.begin(), str.end() - 1, rc) = 0;

Sorry for that.
Of course, returning a new-ed object to the caller (delete by caller)
is bad style. Very bad style, IMO.


I generally agree on this. However, as was explained in the original
article, the author considered this to be mandatory.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Feb 10 '06 #6
> the author considered this to be mandatory

I'd phrase it as "regrettably mandatory", myself. I really dislike how
unsafe and obnoxious-to-use C ways of doing things are the lowest
common denominator for inter-language communications, but for now that
appears to be the only game in town.

On the other hand, I haven't taken a look at Boost::Python yet. Maybe
I should.

Feb 10 '06 #7
On Fri, 10 Feb 2006 11:32:27 +0100, Dietmar Kuehl
<di***********@yahoo.com> wrote:
char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;


Don't know what the above line does ...


Right... There is small but rather obvious typo: the last asterisk
should have been an assignment:

*std::copy(str.begin(), str.end() - 1, rc) = 0;


Nitpicking ... you need to check for str.size() > 0.
Feb 10 '06 #8
On 10 Feb 2006 08:07:06 -0800, "Robert J. Hansen"
<ci********@gmail.com> wrote:
the author considered this to be mandatory
I'd phrase it as "regrettably mandatory", myself.


When (on which occasion) do you delete the returned char[]?
I really dislike how
unsafe and obnoxious-to-use C ways of doing things are the lowest
common denominator for inter-language communications, but for now that
appears to be the only game in town.


At least, we have a common, non-proprietary, widely understood
'inter-language'.

Best wishes,
Roland Pibinger
Feb 10 '06 #9
> When ... do you delete the returned char[]?

This is Officially Not My Problem(tm). SWIG creates a wrapper which
takes the char* and takes responsibility for deleting it. Or, rather,
that's my understanding of what SWIG is doing; it wouldn't be
impossible for me to be mistaken.

Feb 10 '06 #10
On 10 Feb 2006 15:05:29 -0800, "Robert J. Hansen"
<ci********@gmail.com> wrote:
When ... do you delete the returned char[]?
This is Officially Not My Problem(tm). SWIG creates a wrapper which
takes the char* and takes responsibility for deleting it.


That would be a surprise to me. How can SWIG know in which way the
char[] has been created? With new[], malloc, a custom allocator, as
static, ...?
Or, rather, that's my understanding of what SWIG is doing; it
wouldn't be impossible for me to be mistaken.


Maybe a SWIG expert can comment ...
Feb 11 '06 #11
> How can SWIG know in which way the char[]
has been created?


Just hazarding a guess, the "-c++" flag I pass to SWIG might be a good
clue to it that it should use delete.

Feb 11 '06 #12
Robert J. Hansen wrote:
How can SWIG know in which way the char[]
has been created?


Just hazarding a guess, the "-c++" flag I pass to SWIG might be a good
clue to it that it should use delete.


Which one though? Here are the built-in candidates but these are not
the only ones which might exist in a program:

delete p;
delete[] p;
operator delete(p);
operator delete[](p);

My guess would be that it actually uses 'malloc(p)' in which case
the memory is better allocated differently...
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Feb 11 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Ellarco | last post by:
Im sorry for asking a question that is surely in the archives somewhere, but I have been unable to locate it. Its about string memory management. I need to dynamically construct a C-style string...
2
by: Woodster | last post by:
I am using std::stringstream to format a string. How can I clear the stringstream variable I am using to "re use" the same variable? Eg: Using std::string std::string buffer; buffer =...
4
by: Dylan | last post by:
Hi again, In the following program I expected step 3 to assign the values 1 and 2 to iVal1 and iVal2 yet it appears ss.seekg(0, std::ios::beg) does not move the read position back to the...
5
by: Marcin Kalicinski | last post by:
Is there a vectorstream class that implements the functionality similar to std::stringstream but with std::vector, not std::string? cheers, Marcin
9
by: Àî°× | last post by:
hi all: I want erase a stringstream' content for input new data to it. example: std::stringstream stm; stm<<"this is a string"; std::cout<<stm.str(); // here print:this is a string
10
by: vigacmoe | last post by:
I was trying to cast some strings to integers with stringstream, then this strange problem poped up. Here is my test code: stringstream conv; string from; int to; from = "1"; conv << from;
2
by: akitoto | last post by:
Hi there, I am using the std::stringstream to create a byte array in memory. But it is not working properly. Can anyone help me? Code: #include <vector> #include <sstream> #include...
9
by: martinezfive | last post by:
Hi, I feel no doubt some documentation contains my answer, so bare with me. Given the following: #inclde <stdio.h> #include <sstream> void f() { std::stringstream a("Hello World!\n");
4
by: dominolog | last post by:
Hello I want to convert a char* buffer to a std::string containing a hex description of it. I use a std::stringstream in the following manner: std::tstring ToHex( const char* buffer, size_t...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.