back_inserter & basic_string

Mark A. Gibbs

I have a string conversion function that looks something like this
(apologies, but I cannot post the actual code):

whar_t char_to_wchar(c har);

wstring to_wstring(cons t string& s)
{
wstring result(s.length (), '\0');

transform(s.beg in(), s.end(), result.begin(), ptr_fun(char_to _wchar));

return result;
}

Now, I wonder if I might gain some improvement if instead of filling the
string with null characters then overwriting them I use:

wstring result;
result.reserve( s.length());
transform(s.beg in(), s.end(), back_inserter(r esult),
ptr_fun(char_to _wchar));

The only problem is that back_insert_ite rator apparently uses
push_back(), which is fine for most containers, but not for strings. So
I was wondering whether specializing
back_insert_ite rator<basic_str ing<Char> > to use operator+= instead of
push_back was a good idea.

If not, I can always just write a custom string_back_ins ert_iterator,
but would it be a good idea then to write a specialization of
back_inserter<b asic_string<Cha r> > that returns the custom iterator?

On a completely unrelated topic, something else occurred to me as I was
reviewing the code (which led to my investigation of back_inserter). Is
the use of length() instead of size() potentially dangerous? Would the
original code be safer if I either replaced s.length() with s.size() or
replaced the call to s.end() with advance(s.begin (), s.length())?

Mark

Jul 22 '05 #1

Subscribe Reply

2731

Jonathan Turkanis

"Mark A. Gibbs" <gi*******@xrog ersx.com> wrote in message
news:hp******** *******@twister 01.bloor.is.net .cable.rogers.c om...

I have a string conversion function that looks something like this
(apologies, but I cannot post the actual code):

whar_t char_to_wchar(c har);

wstring to_wstring(cons t string& s)
{
wstring result(s.length (), '\0');

transform(s.beg in(), s.end(), result.begin(), ptr_fun(char_to _wchar));
return result;
}
You're calling char_to_wchar once for each character. This can be
expensive. If you call this function a lot, its better to make a table
which maps narrow characters to wide characters. Then you can do all
the conversion in one loop, without any function calls.

You can initialize the table the first time through:

#include <climits>
#include <locale>
#include <string>

void init_conversion _table(wchar_t* table)
{
using namespace std;
const ctype<wchar_t>& ct =
use_facet< std::ctype<wcha r_t> >(locale::class ic());
for (int i = CHAR_MIN; i <= CHAR_MAX; ++i)
table[i - CHAR_MIN] = ct.widen(static _cast<char>(i)) ;
}

std::wstring to_wstring(cons t std::string& s)
{
using namespace std;
wchar_t table[CHAR_MAX - CHAR_MIN + 1];
static bool first = true;
if (first) {
first = false;
init_conversion _table(table);
}
size_t len = s.size();
wstring result(len, '\0');
for (size_t i = 0; i < len; ++i)
result[i] = table[static_cast<int >(s[i]) - CHAR_MIN];
return result;
}

(Loosely adapted from "How to do case-insensitive string comparison"
by Matt Austern.)

The multithreaded case is more complex (and platform dependent).

Now, I wonder if I might gain some improvement if instead of filling the string with null characters then overwriting them I use:

wstring result;
result.reserve( s.length());
transform(s.beg in(), s.end(), back_inserter(r esult),
ptr_fun(char_to _wchar));

The only problem is that back_insert_ite rator apparently uses
push_back(), which is fine for most containers, but not for strings.
basic_string has push_back. What's wrong with it?
So
I was wondering whether specializing
back_insert_ite rator<basic_str ing<Char> > to use operator+= instead of push_back was a good idea.

If not, I can always just write a custom string_back_ins ert_iterator, but would it be a good idea then to write a specialization of
back_inserter<b asic_string<Cha r> > that returns the custom iterator?
Neither is necessary.

On a completely unrelated topic, something else occurred to me as I was reviewing the code (which led to my investigation of back_inserter). Is the use of length() instead of size() potentially dangerous? Would the original code be safer if I either replaced s.length() with s.size() or replaced the call to s.end() with advance(s.begin (), s.length())?

size and length are synonymous.

Jonathan

Jul 22 '05 #2

Jonathan Turkanis

"Jonathan Turkanis" <te******@kanga roologic.com> wrote in message
news:c1******** *****@ID-216073.news.uni-berlin.de...

using namespace std;
wchar_t table[CHAR_MAX - CHAR_MIN + 1];
static bool first = true;

table should be static too.

Jonathan

Jul 22 '05 #3

Mark A. Gibbs

Jonathan Turkanis wrote:

You're calling char_to_wchar once for each character. This can be
expensive. If you call this function a lot, its better to make a table
which maps narrow characters to wide characters. Then you can do all

Actually, that's a damned good idea, and I guess the same table could be
used with a little reworking for the reverse (wchar_to_char) . I'll work
with that soultion instead.

The only problem is that back_insert_ite rator apparently uses
push_back() , which is fine for most containers, but not for strings.

basic_string has push_back. What's wrong with it?

Really? Maybe it's just missing from the rogue wave implementation - or
even just the documentation in Borland C++ Builder 5.

replaced the call to s.end() with advance(s.begin (), s.length())?

size and length are synonymous.

Actually, I recall reading that they are not (though I can't find the
reference now, I think it was in Kalev's ISO C++ programmer's handbook).
From what I recall, size and length are related in a similar way to
c_str and data, in that c_str must return a pointer to a null-terminated
character sequence while all data has to do is return a pointer to the
internal data (which may or may not be null-terminated). size returns
the amount of character data in the string while length returns
strlen(c_str()) , so it's possible for size to be bigger than length, but
not the opposite.

At any rate, if they are syonymous, why bother to have them both?
Wouldn't it be wiser to advocate using size all the time instead of
length so that other containers could just as easily be swapped in?

But I've heard so much contradictory information on the STL - especially
in regards to international support and allocators - that I honestly
don't know what is really true anymore.

Mark

Jul 22 '05 #4

Jonathan Turkanis

"Mark A. Gibbs" <gi*******@xrog ersx.com> wrote in message
news:Fq******** *****@news01.bl oor.is.net.cabl e.rogers.com...

Jonathan Turkanis wrote:

You're calling char_to_wchar once for each character. This can be
expensive. If you call this function a lot, its better to make a table
which maps narrow characters to wide characters. Then you can do all

Actually, that's a damned good idea,
Thanks.
and I guess the same table could be
used with a little reworking for the reverse (wchar_to_char) . I'll work with that soultion instead.
You'll need to modifiy it a bit. If you try applying the same
technique literally, you'll get a really huge table, most entries of
which contain a narrow character represnting an invalid conversion.
The only problem is that back_insert_ite rator apparently uses
push_back() , which is fine for most containers, but not for
strings.

basic_string has push_back. What's wrong with it?
Really? Maybe it's just missing from the rogue wave implementation -

or even just the documentation in Borland C++ Builder 5.
replaced the call to s.end() with advance(s.begin (), s.length())?
size and length are synonymous.

Actually, I recall reading that they are not (though I can't find

the reference now, I think it was in Kalev's ISO C++ programmer's handbook). From what I recall, size and length are related in a similar way to
c_str and data, in that c_str must return a pointer to a null-terminated character sequence while all data has to do is return a pointer to the internal data (which may or may not be null-terminated). size returns the amount of character data in the string while length returns
strlen(c_str()) , so it's possible for size to be bigger than length, but not the opposite.
length is specified in the standard this way:

Returns: size().

size() was added for consistency with the other containers. length()
seems more natural, to many, when dealing with strings. I always use
size().

Jonathan

Mark

Jul 22 '05 #5

Mark A. Gibbs

Jonathan Turkanis wrote:

"Mark A. Gibbs" <gi*******@xrog ersx.com> wrote in message
news:Fq******** *****@news01.bl oor.is.net.cabl e.rogers.com...
Actually, that's a damned good idea,
Thanks.

Welcome.
You'll need to modifiy it a bit. If you try applying the same
technique literally, you'll get a really huge table, most entries of
which contain a narrow character represnting an invalid conversion.
Yes, I was thinking of something like:

namespace {

const int TABLE_SIZE = CHAR_MAX - CHAR_MIN + 1;

const wchar_t* get_table() {
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first) {
// Init the table as you described
first = false;
}

return table;
}

}

wstring to_wstring(cons t string& s) {
const wchar_t* table = get_table();
size_t size = s.size();
wstring result;

result.reserve( size);

for(size_t i = 0; i < size; ++i) {
result += table[static_cast<int >(s[i]) - CHAR_MIN];
}

return result;
}

string to_string(const wstring& s) {
const wchar_t* table = get_table();
const wchar_t* end = table + TABLE_SIZE;
const char def_char = '?';
size_t size = s.size();
string result;

result.reserve( size);

for(size_t i = 0; i < size; ++i) {
const wchar_t* p = find(table, end, s[i]);
result += (p == end) ? def_char :
static_cast<cha r>(distance(tab le, p) + CHAR_MIN);
}

return result;
}

You mentioned that there might be multithreading issues, but I think
that if I were to write get_table as:

const wchar_t* get_table()
{
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first)
{
platform_specif ic_mutex_or_cri tical_section_o bject x;

if (first)
{
// Init the table as you described
first = false;
}

// x releases on destruction
}

return table;
}

Then those problems go away.
length is specified in the standard this way:

Returns: size().

size() was added for consistency with the other containers. length()
seems more natural, to many, when dealing with strings. I always use
size().

Ah, thanks for clearing that up. I would probably have been using size
myself all along if I knew that.

I'm curious to know if there is some kind of definitive resource out
there on the ISO standard besides the standard itself. I've looked at a
copy of the standard and it's (understandably ) wordy and a little obtuse
in parts (to me at least). And of course, I don't have a copy at home.
There's a lot of stuff out there that's either outdated or just plain
wrong (a fresh example is Borland's online help).

I mean, I looked through the comp.lang.c++ FAQ's suggested references
(Item 36.4) and:

1.) Claims basic_string has no push_back
2.) Dead link
3.) Doesn't even mention basic_string and appears to have been last
modified in 1996.
4.) Has a disclaimer in 3cm high, bold, red letters that the reference
is way out of date.
5.) Agrees with your assertion of the existence of
basic_string::p ush_back, but in the same breath talks about rope<T,
Allocator>. (To be fair, it explicitly states that rope is an extension,
although how can I be *sure* that basic_string::p ush_back is not?)

Is there anything out there online that is accurate and yet does not
have a bunch of non-standard extension stuff thrown in?

Mark

Jul 22 '05 #6

Jonathan Turkanis

"Mark A. Gibbs" <gi*******@xrog ersx.com> wrote in message
news:UW******** **********@twis ter01.bloor.is. net.cable.roger s.com...

Jonathan Turkanis wrote:
"Mark A. Gibbs" <gi*******@xrog ersx.com> wrote in message
news:Fq******** *****@news01.bl oor.is.net.cabl e.rogers.com...
Actually, that's a damned good idea,
Thanks.

Welcome.
You'll need to modifiy it a bit. If you try applying the same
technique literally, you'll get a really huge table, most entries of which contain a narrow character represnting an invalid conversion.
Yes, I was thinking of something like:

namespace {

const int TABLE_SIZE = CHAR_MAX - CHAR_MIN + 1;

const wchar_t* get_table() {
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first) {
// Init the table as you described
first = false;
}

return table;
}

}

<snip>
string to_string(const wstring& s) {
const wchar_t* table = get_table();
const wchar_t* end = table + TABLE_SIZE;
const char def_char = '?';
size_t size = s.size();
string result;

result.reserve( size);

for(size_t i = 0; i < size; ++i) {
const wchar_t* p = find(table, end, s[i]);
result += (p == end) ? def_char :
static_cast<cha r>(distance(tab le, p) + CHAR_MIN);
}

return result;
}
I'd use a map instead. The linear search through the table could be
rather slow. But the map might be even slower than just calling the
conversion function for each character. You'd have to compare them.

You mentioned that there might be multithreading issues, but I think
that if I were to write get_table as:

const wchar_t* get_table()
{
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first)
{
platform_specif ic_mutex_or_cri tical_section_o bject x;

if (first)
{
// Init the table as you described
first = false;
}

// x releases on destruction
}

return table;
}
That's the way to do it.

Then those problems go away.
length is specified in the standard this way:

Returns: size().

size() was added for consistency with the other containers.
length() seems more natural, to many, when dealing with strings. I always use size().

Ah, thanks for clearing that up. I would probably have been using

size myself all along if I knew that.

I'm curious to know if there is some kind of definitive resource out
there on the ISO standard besides the standard itself. I've looked at a copy of the standard and it's (understandably ) wordy and a little obtuse in parts (to me at least). And of course, I don't have a copy at home. There's a lot of stuff out there that's either outdated or just plain wrong (a fresh example is Borland's online help).

I think Nicolai Josuttis's The C++ Standard Library is the best work
library reference. It doesn't cover the other aspects of the standard.

I'd also get a copy of the standard( 2nd edition). It's only $18. Most
of it, particularly the library specification, is surprisingly easy to
read. Parts are really obscure, though.

Jonathan

Jul 22 '05 #7

Mark A. Gibbs

Jonathan Turkanis wrote:

I'd also get a copy of the standard( 2nd edition). It's only $18. Most
of it, particularly the library specification, is surprisingly easy to
read. Parts are really obscure, though.

Wow, I had heard that it was CDN$300. I just read the relevant section
in the FAQ and I see where my confusion came from (ISO vs. ANSI). $18
(~CDN$25-30) is definitely worth the price for admission. Thanks for the
tip.

Mark

Jul 22 '05 #8

Jonathan Turkanis

"Mark A. Gibbs" <gi*******@xrog ersx.com> wrote in message
news:9d******** **********@twis ter01.bloor.is. net.cable.roger s.com...

Jonathan Turkanis wrote:
I'd also get a copy of the standard( 2nd edition). It's only $18. Most of it, particularly the library specification, is surprisingly easy to read. Parts are really obscure, though.
Wow, I had heard that it was CDN$300. I just read the relevant

section in the FAQ and I see where my confusion came from (ISO vs. ANSI). $18 (~CDN$25-30) is definitely worth the price for admission. Thanks for the tip.

Glad to help. Last I heard, the version available at the ansi store
had formatting problems, but this version was clean:

http://www.techstreet.com/cgi-bin/de...uct_id=1143945

Jonathan

Jul 22 '05 #9

Similar topics

4104

Initializing std::basic_string<> with literals

by: Dylan Nicholson | last post by:

Been playing around with this all day, and haven't found a solution I like yet. Assuming some initial function: void foo(std::string& src) { src += "some fixed string"; src += bar(); src += "some other fixed string";

C / C++

1560

STL & basic_string<XMLCh> link error

by: Rémi Peyronnet | last post by:

Hello, I have some link problems while using basic_string with another type (unsigned int in my case) This is a sample source : " #include <string>

C / C++

4595

back_inserter() on strings

by: Joe Laughlin | last post by:

Joe Laughlin wrote: > Mike Wahler wrote: >> "Joe Laughlin" <Joseph.V.Laughlin@boeing.com> wrote in >> message news:I60Dvs.FqL@news.boeing.com... <snip> >>> Joe Laughlin wrote: >>> std::back_insert_iterator<std::string>::operator=(const >>> std::back_insert_iterator<std::string>&) >> >> 'back_insert_iterator' requires that the container type

C / C++

2934

lifetime of temporary object from function return & optimization

by: pt | last post by:

Hallo, i wonder how it is going to be of this code below regarding of the return of temporary object. Prototypes: =========== bool Activation(TCHAR *c); std::basic_string<TCHAR> GetFile();

C / C++

8185

UTF-16 & wchar_t: the 2nd worst thing about C++

by: Steven T. Hatton | last post by:

This is one of the first obstacles I encountered when getting started with C++. I found that everybody had their own idea of what a string is. There was std::string, QString, xercesc::XMLString, etc. There are also char, wchar_t, QChar, XMLCh, etc., for character representation. Coming from Java where a String is a String is a String, that was quite a shock. Well, I'm back to looking at this, and it still isn't pretty. I've found...

C / C++

8815

const string & to fopen..

by: Jae | last post by:

Real(const string &fileName) { FILE * myInputFile = fopen(fileName, "rt"); ..... fclose(myInputFile);

C / C++

2945

C++ strings & C strings

by: arnuld | last post by:

i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors, with "run-time shown", claim that C++ Library strings are faster than C-style character strings. i wrote the same programme in C & hence found that claim of the authors is *partial*. If we use C-style strings in C++ instead of Library String class, then they are slow but if write the same programme in C then C strings are "faster" than both C++ Library strings & C-style...

C / C++

4489

help for back_inserter and end()

by: Jess | last post by:

Hello, The iterator adaptor "back_inserter" takes a container and returns a iterator so that we can insert elements to the end of the container. Out of curiosity, I tried to look at what element the returned iterator refers to. Here is my code: #include<iostream> #include<vector> #include<iterator>

C / C++

3200

Remove Characters from basic_string

by: Mike Copeland | last post by:

I have data I need to normalize - it's "name" data. For example, I have the following: "Watts, J.C." I wish to (1) parse the "first name" ("J.C.") and adjust it to "JC". Essentially, I want to remove the punctuation characters from the "first name" substring. I've looked at the basic_string in C++, but I can't find the functions that will _remove" character(s) from a value. There are operators that are helpful in finding the position...

C / C++

8857

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

8546

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8633

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7367

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

4180

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4347

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2762

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1993

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1752

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General