473,320 Members | 2,006 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

back_inserter & basic_string

I have a string conversion function that looks something like this
(apologies, but I cannot post the actual code):

whar_t char_to_wchar(char);

wstring to_wstring(const string& s)
{
wstring result(s.length(), '\0');

transform(s.begin(), s.end(), result.begin(), ptr_fun(char_to_wchar));

return result;
}

Now, I wonder if I might gain some improvement if instead of filling the
string with null characters then overwriting them I use:

wstring result;
result.reserve(s.length());
transform(s.begin(), s.end(), back_inserter(result),
ptr_fun(char_to_wchar));

The only problem is that back_insert_iterator apparently uses
push_back(), which is fine for most containers, but not for strings. So
I was wondering whether specializing
back_insert_iterator<basic_string<Char> > to use operator+= instead of
push_back was a good idea.

If not, I can always just write a custom string_back_insert_iterator,
but would it be a good idea then to write a specialization of
back_inserter<basic_string<Char> > that returns the custom iterator?

On a completely unrelated topic, something else occurred to me as I was
reviewing the code (which led to my investigation of back_inserter). Is
the use of length() instead of size() potentially dangerous? Would the
original code be safer if I either replaced s.length() with s.size() or
replaced the call to s.end() with advance(s.begin(), s.length())?

Mark

Jul 22 '05 #1
8 2698

"Mark A. Gibbs" <gi*******@xrogersx.com> wrote in message
news:hp***************@twister01.bloor.is.net.cabl e.rogers.com...
I have a string conversion function that looks something like this
(apologies, but I cannot post the actual code):

whar_t char_to_wchar(char);

wstring to_wstring(const string& s)
{
wstring result(s.length(), '\0');

transform(s.begin(), s.end(), result.begin(), ptr_fun(char_to_wchar));
return result;
}
You're calling char_to_wchar once for each character. This can be
expensive. If you call this function a lot, its better to make a table
which maps narrow characters to wide characters. Then you can do all
the conversion in one loop, without any function calls.

You can initialize the table the first time through:

#include <climits>
#include <locale>
#include <string>

void init_conversion_table(wchar_t* table)
{
using namespace std;
const ctype<wchar_t>& ct =
use_facet< std::ctype<wchar_t> >(locale::classic());
for (int i = CHAR_MIN; i <= CHAR_MAX; ++i)
table[i - CHAR_MIN] = ct.widen(static_cast<char>(i));
}

std::wstring to_wstring(const std::string& s)
{
using namespace std;
wchar_t table[CHAR_MAX - CHAR_MIN + 1];
static bool first = true;
if (first) {
first = false;
init_conversion_table(table);
}
size_t len = s.size();
wstring result(len, '\0');
for (size_t i = 0; i < len; ++i)
result[i] = table[static_cast<int>(s[i]) - CHAR_MIN];
return result;
}

(Loosely adapted from "How to do case-insensitive string comparison"
by Matt Austern.)

The multithreaded case is more complex (and platform dependent).

Now, I wonder if I might gain some improvement if instead of filling the string with null characters then overwriting them I use:

wstring result;
result.reserve(s.length());
transform(s.begin(), s.end(), back_inserter(result),
ptr_fun(char_to_wchar));

The only problem is that back_insert_iterator apparently uses
push_back(), which is fine for most containers, but not for strings.
basic_string has push_back. What's wrong with it?
So
I was wondering whether specializing
back_insert_iterator<basic_string<Char> > to use operator+= instead of push_back was a good idea.

If not, I can always just write a custom string_back_insert_iterator, but would it be a good idea then to write a specialization of
back_inserter<basic_string<Char> > that returns the custom iterator?
Neither is necessary.

On a completely unrelated topic, something else occurred to me as I was reviewing the code (which led to my investigation of back_inserter). Is the use of length() instead of size() potentially dangerous? Would the original code be safer if I either replaced s.length() with s.size() or replaced the call to s.end() with advance(s.begin(), s.length())?


size and length are synonymous.

Jonathan
Jul 22 '05 #2

"Jonathan Turkanis" <te******@kangaroologic.com> wrote in message
news:c1*************@ID-216073.news.uni-berlin.de...

using namespace std;
wchar_t table[CHAR_MAX - CHAR_MIN + 1];
static bool first = true;


table should be static too.

Jonathan
Jul 22 '05 #3

Jonathan Turkanis wrote:

You're calling char_to_wchar once for each character. This can be
expensive. If you call this function a lot, its better to make a table
which maps narrow characters to wide characters. Then you can do all


Actually, that's a damned good idea, and I guess the same table could be
used with a little reworking for the reverse (wchar_to_char). I'll work
with that soultion instead.
The only problem is that back_insert_iterator apparently uses
push_back(), which is fine for most containers, but not for strings.

basic_string has push_back. What's wrong with it?


Really? Maybe it's just missing from the rogue wave implementation - or
even just the documentation in Borland C++ Builder 5.
replaced the call to s.end() with advance(s.begin(), s.length())?

size and length are synonymous.


Actually, I recall reading that they are not (though I can't find the
reference now, I think it was in Kalev's ISO C++ programmer's handbook).
From what I recall, size and length are related in a similar way to
c_str and data, in that c_str must return a pointer to a null-terminated
character sequence while all data has to do is return a pointer to the
internal data (which may or may not be null-terminated). size returns
the amount of character data in the string while length returns
strlen(c_str()), so it's possible for size to be bigger than length, but
not the opposite.

At any rate, if they are syonymous, why bother to have them both?
Wouldn't it be wiser to advocate using size all the time instead of
length so that other containers could just as easily be swapped in?

But I've heard so much contradictory information on the STL - especially
in regards to international support and allocators - that I honestly
don't know what is really true anymore.

Mark

Jul 22 '05 #4

"Mark A. Gibbs" <gi*******@xrogersx.com> wrote in message
news:Fq*************@news01.bloor.is.net.cable.rog ers.com...

Jonathan Turkanis wrote:

You're calling char_to_wchar once for each character. This can be
expensive. If you call this function a lot, its better to make a table
which maps narrow characters to wide characters. Then you can do all

Actually, that's a damned good idea,
Thanks.
and I guess the same table could be
used with a little reworking for the reverse (wchar_to_char). I'll work with that soultion instead.
You'll need to modifiy it a bit. If you try applying the same
technique literally, you'll get a really huge table, most entries of
which contain a narrow character represnting an invalid conversion.
The only problem is that back_insert_iterator apparently uses
push_back(), which is fine for most containers, but not for
strings.

basic_string has push_back. What's wrong with it?
Really? Maybe it's just missing from the rogue wave implementation -

or even just the documentation in Borland C++ Builder 5.
replaced the call to s.end() with advance(s.begin(), s.length())?
size and length are synonymous.


Actually, I recall reading that they are not (though I can't find

the reference now, I think it was in Kalev's ISO C++ programmer's handbook). From what I recall, size and length are related in a similar way to
c_str and data, in that c_str must return a pointer to a null-terminated character sequence while all data has to do is return a pointer to the internal data (which may or may not be null-terminated). size returns the amount of character data in the string while length returns
strlen(c_str()), so it's possible for size to be bigger than length, but not the opposite.
length is specified in the standard this way:

Returns: size().

size() was added for consistency with the other containers. length()
seems more natural, to many, when dealing with strings. I always use
size().

Jonathan

Mark

Jul 22 '05 #5


Jonathan Turkanis wrote:
"Mark A. Gibbs" <gi*******@xrogersx.com> wrote in message
news:Fq*************@news01.bloor.is.net.cable.rog ers.com...
Actually, that's a damned good idea,
Thanks.


Welcome.
You'll need to modifiy it a bit. If you try applying the same
technique literally, you'll get a really huge table, most entries of
which contain a narrow character represnting an invalid conversion.
Yes, I was thinking of something like:

namespace {

const int TABLE_SIZE = CHAR_MAX - CHAR_MIN + 1;

const wchar_t* get_table() {
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first) {
// Init the table as you described
first = false;
}

return table;
}

}

wstring to_wstring(const string& s) {
const wchar_t* table = get_table();
size_t size = s.size();
wstring result;

result.reserve(size);

for(size_t i = 0; i < size; ++i) {
result += table[static_cast<int>(s[i]) - CHAR_MIN];
}

return result;
}

string to_string(const wstring& s) {
const wchar_t* table = get_table();
const wchar_t* end = table + TABLE_SIZE;
const char def_char = '?';
size_t size = s.size();
string result;

result.reserve(size);

for(size_t i = 0; i < size; ++i) {
const wchar_t* p = find(table, end, s[i]);
result += (p == end) ? def_char :
static_cast<char>(distance(table, p) + CHAR_MIN);
}

return result;
}

You mentioned that there might be multithreading issues, but I think
that if I were to write get_table as:

const wchar_t* get_table()
{
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first)
{
platform_specific_mutex_or_critical_section_object x;

if (first)
{
// Init the table as you described
first = false;
}

// x releases on destruction
}

return table;
}

Then those problems go away.
length is specified in the standard this way:

Returns: size().

size() was added for consistency with the other containers. length()
seems more natural, to many, when dealing with strings. I always use
size().


Ah, thanks for clearing that up. I would probably have been using size
myself all along if I knew that.

I'm curious to know if there is some kind of definitive resource out
there on the ISO standard besides the standard itself. I've looked at a
copy of the standard and it's (understandably) wordy and a little obtuse
in parts (to me at least). And of course, I don't have a copy at home.
There's a lot of stuff out there that's either outdated or just plain
wrong (a fresh example is Borland's online help).

I mean, I looked through the comp.lang.c++ FAQ's suggested references
(Item 36.4) and:

1.) Claims basic_string has no push_back
2.) Dead link
3.) Doesn't even mention basic_string and appears to have been last
modified in 1996.
4.) Has a disclaimer in 3cm high, bold, red letters that the reference
is way out of date.
5.) Agrees with your assertion of the existence of
basic_string::push_back, but in the same breath talks about rope<T,
Allocator>. (To be fair, it explicitly states that rope is an extension,
although how can I be *sure* that basic_string::push_back is not?)

Is there anything out there online that is accurate and yet does not
have a bunch of non-standard extension stuff thrown in?

Mark

Jul 22 '05 #6

"Mark A. Gibbs" <gi*******@xrogersx.com> wrote in message
news:UW******************@twister01.bloor.is.net.c able.rogers.com...


Jonathan Turkanis wrote:
"Mark A. Gibbs" <gi*******@xrogersx.com> wrote in message
news:Fq*************@news01.bloor.is.net.cable.rog ers.com...
Actually, that's a damned good idea,
Thanks.


Welcome.
You'll need to modifiy it a bit. If you try applying the same
technique literally, you'll get a really huge table, most entries of which contain a narrow character represnting an invalid conversion.
Yes, I was thinking of something like:

namespace {

const int TABLE_SIZE = CHAR_MAX - CHAR_MIN + 1;

const wchar_t* get_table() {
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first) {
// Init the table as you described
first = false;
}

return table;
}

}

<snip>
string to_string(const wstring& s) {
const wchar_t* table = get_table();
const wchar_t* end = table + TABLE_SIZE;
const char def_char = '?';
size_t size = s.size();
string result;

result.reserve(size);

for(size_t i = 0; i < size; ++i) {
const wchar_t* p = find(table, end, s[i]);
result += (p == end) ? def_char :
static_cast<char>(distance(table, p) + CHAR_MIN);
}

return result;
}
I'd use a map instead. The linear search through the table could be
rather slow. But the map might be even slower than just calling the
conversion function for each character. You'd have to compare them.

You mentioned that there might be multithreading issues, but I think
that if I were to write get_table as:

const wchar_t* get_table()
{
static bool first = true;
static wchar_t table[TABLE_SIZE];

if (first)
{
platform_specific_mutex_or_critical_section_object x;

if (first)
{
// Init the table as you described
first = false;
}

// x releases on destruction
}

return table;
}
That's the way to do it.

Then those problems go away.
length is specified in the standard this way:

Returns: size().

size() was added for consistency with the other containers.
length() seems more natural, to many, when dealing with strings. I always use size().


Ah, thanks for clearing that up. I would probably have been using

size myself all along if I knew that.

I'm curious to know if there is some kind of definitive resource out
there on the ISO standard besides the standard itself. I've looked at a copy of the standard and it's (understandably) wordy and a little obtuse in parts (to me at least). And of course, I don't have a copy at home. There's a lot of stuff out there that's either outdated or just plain wrong (a fresh example is Borland's online help).


I think Nicolai Josuttis's The C++ Standard Library is the best work
library reference. It doesn't cover the other aspects of the standard.

I'd also get a copy of the standard( 2nd edition). It's only $18. Most
of it, particularly the library specification, is surprisingly easy to
read. Parts are really obscure, though.

Jonathan
Jul 22 '05 #7

Jonathan Turkanis wrote:
I'd also get a copy of the standard( 2nd edition). It's only $18. Most
of it, particularly the library specification, is surprisingly easy to
read. Parts are really obscure, though.


Wow, I had heard that it was CDN$300. I just read the relevant section
in the FAQ and I see where my confusion came from (ISO vs. ANSI). $18
(~CDN$25-30) is definitely worth the price for admission. Thanks for the
tip.

Mark

Jul 22 '05 #8

"Mark A. Gibbs" <gi*******@xrogersx.com> wrote in message
news:9d******************@twister01.bloor.is.net.c able.rogers.com...

Jonathan Turkanis wrote:
I'd also get a copy of the standard( 2nd edition). It's only $18. Most of it, particularly the library specification, is surprisingly easy to read. Parts are really obscure, though.
Wow, I had heard that it was CDN$300. I just read the relevant

section in the FAQ and I see where my confusion came from (ISO vs. ANSI). $18 (~CDN$25-30) is definitely worth the price for admission. Thanks for the tip.


Glad to help. Last I heard, the version available at the ansi store
had formatting problems, but this version was clean:

http://www.techstreet.com/cgi-bin/de...uct_id=1143945

Jonathan
Jul 22 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Dylan Nicholson | last post by:
Been playing around with this all day, and haven't found a solution I like yet. Assuming some initial function: void foo(std::string& src) { src += "some fixed string"; src += bar(); src...
0
by: Rémi Peyronnet | last post by:
Hello, I have some link problems while using basic_string with another type (unsigned int in my case) This is a sample source : " #include <string>
2
by: Joe Laughlin | last post by:
Joe Laughlin wrote: > Mike Wahler wrote: >> "Joe Laughlin" <Joseph.V.Laughlin@boeing.com> wrote in >> message news:I60Dvs.FqL@news.boeing.com... <snip> >>> Joe Laughlin wrote: >>>...
8
by: pt | last post by:
Hallo, i wonder how it is going to be of this code below regarding of the return of temporary object. Prototypes: =========== bool Activation(TCHAR *c); std::basic_string<TCHAR> GetFile();
23
by: Steven T. Hatton | last post by:
This is one of the first obstacles I encountered when getting started with C++. I found that everybody had their own idea of what a string is. There was std::string, QString, xercesc::XMLString,...
5
by: Jae | last post by:
Real(const string &fileName) { FILE * myInputFile = fopen(fileName, "rt"); ..... fclose(myInputFile);
23
by: arnuld | last post by:
i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors, with "run-time shown", claim that C++ Library strings are faster than C-style character strings. i wrote the same programme in...
17
by: Jess | last post by:
Hello, The iterator adaptor "back_inserter" takes a container and returns a iterator so that we can insert elements to the end of the container. Out of curiosity, I tried to look at what element...
10
by: Mike Copeland | last post by:
I have data I need to normalize - it's "name" data. For example, I have the following: "Watts, J.C." I wish to (1) parse the "first name" ("J.C.") and adjust it to "JC". Essentially, I want to...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.