473,406 Members | 2,387 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Destroying STL Strings

RP
I have a requirement to sanitise a particular std::string, myString, during
destruction. To achieve this we use the replace function as follows:

myString.replace(0, 10, "UUUUUUUUUU");

According to Josuttis in his "The C++ Standard Library: A Tutorial and
Reference", this should replace, at most, 10 characters of *this, starting
with the index 0, with all 'U's.

However, I have been informed that this is unsatisfactory because
std::string in C++ does not behave like a simple char* string in C. When the
value of the string changes, the old string data is not overwritten; instead
the memory is simply deallocated and a new chunk of approrpiate length is
allocated on the heap. The std::string.replace function is no exception and
will leave the string unsanitised.

Can anyone shed any light on this as there appears to be a contradiction.

TIA

RP.


Apr 5 '06 #1
13 1731

"RP" <RP@RP.co.uk> wrote in message
news:zC*******************@fe2.news.blueyonder.co. uk...
I have a requirement to sanitise a particular std::string, myString, during
destruction. To achieve this we use the replace function as follows:

myString.replace(0, 10, "UUUUUUUUUU");

According to Josuttis in his "The C++ Standard Library: A Tutorial and
Reference", this should replace, at most, 10 characters of *this, starting
with the index 0, with all 'U's.

However, I have been informed that this is unsatisfactory because
std::string in C++ does not behave like a simple char* string in C. When
the value of the string changes, the old string data is not overwritten;
instead the memory is simply deallocated and a new chunk of approrpiate
length is allocated on the heap. The std::string.replace function is no
exception and will leave the string unsanitised.

Can anyone shed any light on this as there appears to be a contradiction.


I don't see anywhere that says replace has to be implemented in-place, and
can't therefore re-allocate. But (if you have to) why not just loop over
the string, using myString[i] = 'U';?

-Howard


Apr 5 '06 #2
In article <zC*******************@fe2.news.blueyonder.co.uk >,
"RP" <RP@RP.co.uk> wrote:
I have a requirement to sanitise a particular std::string, myString, during
destruction. To achieve this we use the replace function as follows:

myString.replace(0, 10, "UUUUUUUUUU");

According to Josuttis in his "The C++ Standard Library: A Tutorial and
Reference", this should replace, at most, 10 characters of *this, starting
with the index 0, with all 'U's.

However, I have been informed that this is unsatisfactory because
std::string in C++ does not behave like a simple char* string in C. When the
value of the string changes, the old string data is not overwritten; instead
the memory is simply deallocated and a new chunk of approrpiate length is
allocated on the heap. The std::string.replace function is no exception and
will leave the string unsanitised.

Can anyone shed any light on this as there appears to be a contradiction.


AFAIK, there are no guarantees either way. I think to satisfy such a
requirement, you will have to roll your own. That's not surprising BTW,
of all the standard classes, string seems to be the one that is least
satisfactory in real projects.

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.
Apr 5 '06 #3
RP wrote:
myString.replace(0, 10, "UUUUUUUUUU");


Sutter and ... Andrei cover this in their /C++ Coding Standards/ book. To
grab control of the controlled character array within a string, use &s[0].
So this will work:

memset(&s[0], 'U', s.size());

Note that size() returns the number of elements, not bytes. And note that
'\xff' will stamp out all bits.

However, the string might reallocate in memory at any time the implementors
feel like, including just before the [] call.

Now if you really want to sanitize your string, put your computer inside an
RF cage so I can't use remote masers to peek at your bus and sniff every
packet out. (That's how the Dept of Homeland Security keeps tabs on
Democrats and other terrorists.)

The contents of your string will exist in several places in memory, and in
the CPU cache, and the odds are good that some of these places will still
contain the contents, as garbage, after you sanitize the last version of
the controlled array inside that string.

If you have a protected mode operating system, then your memory is as secure
from normal programs as your OS's binary image is secure. Other programs
cannot get to your memory because the CPU's hardware protects it.

But if the virtual memory system swaps your string to disk, then a scanning
tunneling electron microscope might find it, in only a few months of labor.

If your boss is telling your clients your program is "secure", and you have
not yet studied the full envelop of security, and if your team thinks that
writing on strings will secure them, then something fishy is going on.

--
Phlip
http://www.greencheese.org/ZeekLand <-- NOT a blog!!!
Apr 5 '06 #4
Phlip wrote:
RP wrote:
myString.replace(0, 10, "UUUUUUUUUU");


Sutter and ... Andrei cover this in their /C++ Coding Standards/ book. To
grab control of the controlled character array within a string, use &s[0].
So this will work:

memset(&s[0], 'U', s.size());


I think, the standard makes no guarantee that std::string objects are stored
contiguously. The specifications for std::string, however, suggest a
contiguous implementation. Thus, the above line is likely to work on most
implementations, but it is not blessed by the standard.

[snip]
Best

Kai-Uwe Bux

Apr 5 '06 #5

RP wrote:

However, I have been informed that this is unsatisfactory because
std::string in C++ does not behave like a simple char* string in C. When the
value of the string changes, the old string data is not overwritten; instead
the memory is simply deallocated and a new chunk of approrpiate length is
allocated on the heap. The std::string.replace function is no exception and
will leave the string unsanitised.

Can anyone shed any light on this as there appears to be a contradiction.

TIA

RP.

You have described the behaviour of strings in C#, they are immutable
and all changes are done by deallocating and reallocating. The same
may also apply to C++/CLI aka Managed C++ under Visual Studio. Since
standard C++ strings are mutable the same should not apply to
std::string though compiler writers can do strange things on occasions.

HTH

rossum

Apr 6 '06 #6
In article <78*****************@newssvr33.news.prodigy.com> ,
Phlip <ph*******@gmail.com> wrote:
However, the string might reallocate in memory at any time the implementors
feel like, including just before the [] call.


I must admit I've never done it myself, but this sounds like a place where
you might want to write your own allocator. Have the deallocator
over-write the data.
Apr 6 '06 #7
In article <78*****************@newssvr33.news.prodigy.com> ,
Phlip <ph*******@gmail.com> wrote:
RP wrote:
myString.replace(0, 10, "UUUUUUUUUU");
Sutter and ... Andrei cover this in their /C++ Coding Standards/ book. To
grab control of the controlled character array within a string, use &s[0].
So this will work:

memset(&s[0], 'U', s.size());


I don't think that is guaranteed to work at any level. (a) there is no
guarantee that the memory in the string is contiguous. It may be
implemented like a deque for example or even a series of shared memory
blocks in which case something like the above might change other strings.

However, the string might reallocate in memory at any time the implementors
feel like, including just before the [] call.

Now if you really want to sanitize your string, put your computer inside an
RF cage so I can't use remote masers to peek at your bus and sniff every
packet out. (That's how the Dept of Homeland Security keeps tabs on
Democrats and other terrorists.)

The contents of your string will exist in several places in memory, and in
the CPU cache, and the odds are good that some of these places will still
contain the contents, as garbage, after you sanitize the last version of
the controlled array inside that string.

If you have a protected mode operating system, then your memory is as secure
from normal programs as your OS's binary image is secure. Other programs
cannot get to your memory because the CPU's hardware protects it.

But if the virtual memory system swaps your string to disk, then a scanning
tunneling electron microscope might find it, in only a few months of labor.

If your boss is telling your clients your program is "secure", and you have
not yet studied the full envelop of security, and if your team thinks that
writing on strings will secure them, then something fishy is going on.


I can't help but wonder what the test would look like that ensures that
the string was actually sanitized... If the OP can come up with a test,
then maybe we can come up with a way to pass it. Until then...

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.
Apr 6 '06 #8
Roy Smith wrote:
In article <78*****************@newssvr33.news.prodigy.com> ,
Phlip <ph*******@gmail.com> wrote:
However, the string might reallocate in memory at any time the
implementors feel like, including just before the [] call.


I must admit I've never done it myself, but this sounds like a place where
you might want to write your own allocator. Have the deallocator
over-write the data.


Maybe you are onto something. I was playing with this idea:

#include <memory>
#include <cstring>

template < typename T, typename Alloc = std::allocator<T> >
struct destructive_allocator : public Alloc {

void deallocate ( typename Alloc::pointer ptr,
typename Alloc::size_type length ) {
unsigned char * uc_ptr = reinterpret_cast< unsigned char * >( ptr );
std::memset( uc_ptr, 0, sizeof(T)*length );
Alloc::deallocate( ptr, length );
}

template < typename S >
struct rebind {
typedef destructive_allocator< S,
typename Alloc::template rebind<S>::other > other;
};

}; // class<T> allocator_base
#include <string>

typedef std::basic_string< char, std::char_traits< char >,
destructive_allocator< char > > my_string;
#include <iostream>

int main ( void ) {
my_string str ( "hello world!" );
std::cout << str << '\n';
}
However, there are catches:

a) I think, the deallocator has undefined behavior -- that might not be a
problem since the "expected" undefined behavior could be exactly what the
OP needs.

b) I also think, the compiler is entitled to eliminate the call to
std::memset under the as-if rule: modification of memory that is about to
be reclaimed is very likely not observable.
Best

Kai-Uwe Bux
Apr 6 '06 #9
Kai-Uwe Bux wrote:

I think, the standard makes no guarantee that std::string objects are stored
contiguously.


As of yesterday's Library Working Group vote, it does. Subject to
approval by the full Standards Committee tomorrow.

--

Pete Becker
Roundhouse Consulting, Ltd.
Apr 6 '06 #10
Pete Becker wrote:
Kai-Uwe Bux wrote:

I think, the standard makes no guarantee that std::string objects are
stored contiguously.


As of yesterday's Library Working Group vote, it does. Subject to
approval by the full Standards Committee tomorrow.


Cool. That's a valuable piece of information.

BTW: are those decisions of the LWG / Standards Committee published
somewhere (if so, a pointer would be highly appreciated) or do mortals like
me have to wait until C++0X comes out?
Thanks

Kai-Uwe Bux
Apr 6 '06 #11
Kai-Uwe Bux wrote:

BTW: are those decisions of the LWG / Standards Committee published
somewhere (if so, a pointer would be highly appreciated) or do mortals like
me have to wait until C++0X comes out?


The day-by-day decisons aren't publicly available, but if you check the
LWG issues lists you can see which ones have been marked DR, which means
we've agreed on a fix, or WP, which means the fix is in the Working
Draft (formerly Working Paper, hence WP). Don't take the difference too
seriously: often things that are marked DR don't get changed to WP when
they've gone into the draft. After each meeting there's a new Working
Draft, and they're available in the papers section on the WG21 web site.
That's www.open-std.org/jtc1/sc22/wg21.

--

Pete Becker
Roundhouse Consulting, Ltd.
Apr 6 '06 #12
* Pete Becker:
Kai-Uwe Bux wrote:

BTW: are those decisions of the LWG / Standards Committee published
somewhere (if so, a pointer would be highly appreciated) or do mortals
like
me have to wait until C++0X comes out?


The day-by-day decisons aren't publicly available, but if you check the
LWG issues lists you can see which ones have been marked DR, which means
we've agreed on a fix, or WP, which means the fix is in the Working
Draft (formerly Working Paper, hence WP). Don't take the difference too
seriously: often things that are marked DR don't get changed to WP when
they've gone into the draft. After each meeting there's a new Working
Draft, and they're available in the papers section on the WG21 web site.
That's www.open-std.org/jtc1/sc22/wg21.


Presumably this issue, about contiguous strings, is the issue numbered
530, "Must elements of a string be contiguous?", in <url:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html>.

That's not marked as being decided in any way.

Am I looking in the wrong place?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Apr 8 '06 #13
Alf P. Steinbach wrote:

Presumably this issue, about contiguous strings, is the issue numbered
530, "Must elements of a string be contiguous?", in <url:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html>.

That's not marked as being decided in any way.

Am I looking in the wrong place?


Yes, in the four-dimensional space-time continuum. The active issues
list dated Feb. 24, 2006 does not contain changes made after it was written.

--

Pete Becker
Roundhouse Consulting, Ltd.
Apr 8 '06 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Marcus | last post by:
I know from php.net that when register_globals is turned on, session_start(); session_unset(); session_destroy(); will succeed in unsetting all session variables and then destroying the...
3
by: Pete | last post by:
I have a site which is using sessions to pass data from one page through to the next. The problem that I have is that there are only two places where the session could/should be destroyed. This...
5
by: Matthias Kaeppler | last post by:
Hi, I thought it'd be a better idea to start a new thread dealing directly with my problem. Okay, here's again what's happening: I'm storing boost::shared_ptrS in an std::set. I supposed that on...
1
by: someone else | last post by:
I have some code that creates dynamic enumerations for use in a PropertyGrid control. This all works perfectly but the memory usage of the program increases quite quicly when viewing the...
4
by: pachanga | last post by:
After you destroy a Object and its send to the garbage collections, can you retrieve the object back? Also, if you can, can you destroy an object permantly with no trace of it?
2
by: jaymtz78 | last post by:
Hi, I have a huge windows application that I'm working on and I'm completely baffled. Sometimes when I try to close the application, it won't let me! I have an Exit button in my menu bar that...
4
by: Olumide | last post by:
Hello - I have two classes A and B as follows: class B{ public: ~B(){ cout << "destroying B" << endl; } }; class A{
3
by: Bartholomew Simpson | last post by:
I am writing some C++ wrappers around some legacy C ones - more specifically, I am providing ctors, dtors and assignment operators for the C structs. I have a ton of existing C code that uses...
41
by: =?Utf-8?B?VGltIE1hcnNkZW4=?= | last post by:
Hi, I am after suggestions on the best practice declaring and destroying objects. some example code: Private Sub MySub Dim frmMyForm As MyForm Try
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.