By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,647 Members | 1,178 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,647 IT Pros & Developers. It's quick & easy.

Getter returning strings by value -- good or bad?

P: n/a
Hi,

I was wondering why library implementors often
make getter functions return strings by value
(copies). For example, in boost::filesystem the
leaf() function returns an std::string by value.
So does Gnome::Vfs::FileInfo::get_name().

Isn't that unnecessary overhead? I could as well
return by reference to const-string and avoid
copying the string, right? That's what I always
do, and it works quite well.
Okay, for strings it's maybe not that of a
performance hit, but with more complex objects I'd
reckon it's a bad idea to return copies.

I'm asking because I have this scenario:
I have a class called File, which defines an
interface for accessing information about a file.
Its state is mostly defined by a
Gnome::Vfs::FileInfo object.
Many functions in the File interface are just
wrappers; they delegate calls to the encapsulated
FileInfo object.

One such example is a function get_name(), which
returns the name of a file. It's implemented like
this:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

I previously wanted to implement it like this:

const Glib::ustring& File::get_name() const {
return m_finfo_ptr->get_name();
}

That however issues a warning, because I'm
referencing a temporary (FileInfo::get_name()
returns a copy of the name-string, that's actually
why this question came up).

But back to the first version:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

How many copies are created here?
FileInfo::get_name() creates a copy, and I'm then
returning a copy of that copy, right? That's terrible!

Please, someone shed some light on this.

Regards,
Matthias
Jul 23 '05 #1
Share this Question
Share on Google+
13 Replies


P: n/a
Matthias Kaeppler wrote:
I was wondering why library implementors often make getter functions
return strings by value (copies). For example, in boost::filesystem the
leaf() function returns an std::string by value. So does
Gnome::Vfs::FileInfo::get_name().

Isn't that unnecessary overhead? I could as well return by reference to
const-string and avoid copying the string, right? [...]


What would that be a reference to, and how is it going to fare in a multi-
threaded program? Returning a string by value is perfectly fine from
a member function called for a temporary, but returning a reference is
most likely isn't...

Compilers can optimize but they can't fix broken code for you. Keep that
in mind :-)

V
Jul 23 '05 #2

P: n/a
Victor Bazarov wrote:
What would that be a reference to, and how is it going to fare in a multi-
threaded program? Returning a string by value is perfectly fine from
a member function called for a temporary, but returning a reference is
most likely isn't...
Well, it would be a reference to an object which
already exists. Like this:

class FileInfo {
private:
Glib::ustring name;
public:
const Glib::ustring& get_name() const {
return name;
}
};

I don't see why this is a problem in multithreaded
environments. The reference returned is a
reference to const, so you can't write it anyway;
you don't need a lock on this object and several
threads may read it at will without breaking
integrity.
Compilers can optimize but they can't fix broken code for you. Keep that
in mind :-)
I don't see where the code above is broken.

V

Jul 23 '05 #3

P: n/a
Matthias Kaeppler wrote:
Victor Bazarov wrote:
What would that be a reference to, and how is it going to fare in a
multi-
threaded program? Returning a string by value is perfectly fine from
a member function called for a temporary, but returning a reference is
most likely isn't...

Well, it would be a reference to an object which already exists. Like this:

class FileInfo {
private:
Glib::ustring name;
public:
const Glib::ustring& get_name() const {
return name;
}
};

I don't see why this is a problem in multithreaded environments. The
reference returned is a reference to const, so you can't write it
anyway; you don't need a lock on this object and several threads may
read it at will without breaking integrity.


Yes, but if one of the threads wants to write to it, the readers would
have to wait, no? But if they decided to use the reference without
locking it, what's going to happen if somebody changes the contents of
that value in some writer thread?
Compilers can optimize but they can't fix broken code for you. Keep that
in mind :-)

I don't see where the code above is broken.


The code above doesn't do anything hence it can't be broken. Now, imagine
somebody writes

std::string const& sr = someclass().somememberreturningaconstref();

a temporary is gone and 'sr' has become invalid as soon as it was defined.
It's more difficult if you return a string by value.

V
Jul 23 '05 #4

P: n/a
Matthias Kaeppler wrote:
I was wondering why library implementors often
make getter functions return strings by value
(copies).
Sometimes because they assume a reference-counted string class.
For example, in boost::filesystem the
leaf() function returns an std::string by value.
Boost is 'off topic'. They just imitate the inefficient STL-style where
everything is 'by value'.
So does Gnome::Vfs::FileInfo::get_name().
Isn't that unnecessary overhead?
Yes, if the string is duplicated. No, if the string is ref-counted.
I could as well
return by reference to const-string and avoid
copying the string, right?
Yes, if it's possible (i.e. if you have an object for which you can
return a refernce).
That's what I always
do, and it works quite well.
Okay, for strings it's maybe not that of a
performance hit, but with more complex objects I'd
reckon it's a bad idea to return copies.

I'm asking because I have this scenario:
I have a class called File, which defines an
interface for accessing information about a file.
Its state is mostly defined by a
Gnome::Vfs::FileInfo object.
Many functions in the File interface are just
wrappers; they delegate calls to the encapsulated
FileInfo object.

One such example is a function get_name(), which
returns the name of a file. It's implemented like
this:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

I previously wanted to implement it like this:

const Glib::ustring& File::get_name() const {
return m_finfo_ptr->get_name();
}

That however issues a warning, because I'm
referencing a temporary (FileInfo::get_name()
returns a copy of the name-string, that's actually
why this question came up).

But back to the first version:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

How many copies are created here?
FileInfo::get_name() creates a copy, and I'm then
returning a copy of that copy, right? That's terrible!


You are right. But there is a huge difference between what is described
in tutorials/books and what is used in the real world. Do you know if
Glib::ustring is ref-counted? If not then the implementation is
probably 'suboptimal'.

Jul 23 '05 #5

P: n/a
Ian
Matthias Kaeppler wrote:

Well, it would be a reference to an object which already exists. Like this:

class FileInfo {
private:
Glib::ustring name;
public:
const Glib::ustring& get_name() const {
return name;
}
};
Why would you want to return a reference in a more general case? The
example you quote is fine, but on some systems it may be more efficient
to return by value.
I don't see why this is a problem in multithreaded environments. The
reference returned is a reference to const, so you can't write it
anyway; you don't need a lock on this object and several threads may
read it at will without breaking integrity.

In this trivial case, no. But in situations where the string is
constructed in the function, maybe using a member variable, you hit
problems.
Compilers can optimize but they can't fix broken code for you. Keep that
in mind :-)

I don't see where the code above is broken.

As a trivial case that can get inlined away, it isn't!

Ian
Jul 23 '05 #6

P: n/a
Panjandrum wrote:
Matthias Kaeppler wrote:
I was wondering why library implementors often
make getter functions return strings by value
(copies).

Sometimes because they assume a reference-counted string class.


That's a good point, I didn't think about that. Actually I don't know if
it's ref-counted. I'll look into that.
How many copies are created here?
FileInfo::get_name() creates a copy, and I'm then
returning a copy of that copy, right? That's terrible!

You are right. But there is a huge difference between what is described
in tutorials/books and what is used in the real world. Do you know if
Glib::ustring is ref-counted? If not then the implementation is
probably 'suboptimal'.


But the compiler will optimize away the second copy, right? (assuming
the worst case, that the string is NOT ref-counted)?
I mean, it doesn't lokk like if I had a choice here.

--
Matthias Kaeppler
Jul 23 '05 #7

P: n/a
Victor Bazarov wrote:

Yes, but if one of the threads wants to write to it, the readers would
have to wait, no? But if they decided to use the reference without
locking it, what's going to happen if somebody changes the contents of
that value in some writer thread?

Okay, you have a point here.

But: If the string should be indeed reference counted, doesn't that mean
all those problems are still there? The "copy" returned by the getter
would probably have a pointer to the string data which it shares with
another string object and this data could at the same time be accessed
by a second thread--because locking the object itself doesn't do
anything. Both objects are internally pointing to the same string data,
but they are still different objects!
That seems to be even worse than returning by reference.
The code above doesn't do anything hence it can't be broken. Now, imagine
somebody writes

std::string const& sr = someclass().somememberreturningaconstref();

a temporary is gone and 'sr' has become invalid as soon as it was defined.
It's more difficult if you return a string by value.


Not sure what you want to tell me here. Why would sr be invalid? sr is
initialized with the string returned by the member function which
returns a reference to it. After that line, working with sr is just like
working with the "real" string, just that you have read-only access.
Nothing is invalid here.

--
Matthias Kaeppler
Jul 23 '05 #8

P: n/a
Ian wrote:
Why would you want to return a reference in a more general case? The
example you quote is fine, but on some systems it may be more efficient
to return by value.
Under what circumstances would it be more efficient to copy-construct an
object than to return a reference to it? ^^
As a trivial case that can get inlined away, it isn't!


Hm. What would happen if the code can't get inlined?

--
Matthias Kaeppler
Jul 23 '05 #9

P: n/a
Ian
Matthias Kaeppler wrote:
Ian wrote:
Why would you want to return a reference in a more general case? The
example you quote is fine, but on some systems it may be more
efficient to return by value.

Under what circumstances would it be more efficient to copy-construct an
object than to return a reference to it? ^^

Consider a reference counted string, this could have a size of 8 bytes
(two pointers). Eight bytes is 64bits which is a single register on a
number of machines. Compilers often reserve one or more registers for
function returns. Some architectures use register wheels, so the sting
could be in a single out register.
As a trivial case that can get inlined away, it isn't!

Hm. What would happen if the code can't get inlined?

OK, inlined away and returns a constant - the last bit being the
important bit.

Ian
Jul 23 '05 #10

P: n/a
Ian
Matthias Kaeppler wrote:
Victor Bazarov wrote:

Yes, but if one of the threads wants to write to it, the readers would
have to wait, no? But if they decided to use the reference without
locking it, what's going to happen if somebody changes the contents of
that value in some writer thread?


Okay, you have a point here.

But: If the string should be indeed reference counted, doesn't that mean
all those problems are still there? The "copy" returned by the getter
would probably have a pointer to the string data which it shares with
another string object and this data could at the same time be accessed
by a second thread--because locking the object itself doesn't do
anything. Both objects are internally pointing to the same string data,
but they are still different objects!
That seems to be even worse than returning by reference.

That's where copy-on-write comes in. The same representation will be
passed around until something changes it. At this time, the
representation will be cloned and then modified.
The code above doesn't do anything hence it can't be broken. Now,
imagine
somebody writes

std::string const& sr = someclass().somememberreturningaconstref();

a temporary is gone and 'sr' has become invalid as soon as it was
defined.
It's more difficult if you return a string by value.

Not sure what you want to tell me here. Why would sr be invalid? sr is
initialized with the string returned by the member function which
returns a reference to it. After that line, working with sr is just like
working with the "real" string, just that you have read-only access.
Nothing is invalid here.

I think Victor was thinking of a member returning a local string variable.

Ian
Jul 23 '05 #11

P: n/a
Matthias Kaeppler wrote:
But: If the string should be indeed reference counted, doesn't that mean
all those problems are still there? The "copy" returned by the getter
would probably have a pointer to the string data which it shares with
another string object and this data could at the same time be accessed
by a second thread--because locking the object itself doesn't do
anything. Both objects are internally pointing to the same string data,
but they are still different objects!
That seems to be even worse than returning by reference.


Whether the string class is ref counted or not is irrelevant to the
client, and if it is ref counted then it's the classes responsibility
to ensure that the references are not messed up. As the client you
should read the documentation to see what guarantees the class makes
about thread safety and so forth.

Jul 23 '05 #12

P: n/a
Matthias Kaeppler wrote:
Panjandrum wrote:
Do you know if
Glib::ustring is ref-counted? If not then the implementation is
probably 'suboptimal'.
But the compiler will optimize away the second copy, right?


Maybe, maybe not. Still one needless heap allocation.
(assuming
the worst case, that the string is NOT ref-counted)?
I mean, it doesn't lokk like if I had a choice here.


It also depends on how often you copy the string. I wouldn't care for a
few dozens superfluous copies (that's the abstraction penalty of C++).
OTOH, things are probably different if you want to create and sort a
large vector<Glib::ustring>.

Jul 23 '05 #13

P: n/a

Matthias Kaeppler wrote:
Ian wrote:
Why would you want to return a reference in a more general case? The
example you quote is fine, but on some systems it may be more efficient
to return by value.


Under what circumstances would it be more efficient to copy-construct an
object than to return a reference to it? ^^
As a trivial case that can get inlined away, it isn't!


Hm. What would happen if the code can't get inlined?

--
Matthias Kaeppler


Inlined or not, the getter function can be optimized with the Return
Value Optimization or the Named Return Value Optimization (RVO and
NRVO, repectively), to eliminate the temporary and the copy, and
construct the object returned, in place at the calling site.

In fact, returning by value can often be more efficient than the usual
alternative of having the client pass a reference to an object to use
to hold the result. The drawback with this approach:
void getString( std::string& outResult) const
is the inconvenient syntax that forces the client to declare a
variable, even in the case that the result is to be passed onto another
function. One workaround is for the function to return a reference to
the supplied object:
std::string& getString( std::string& outResult) const
and while an improvement in some ways, it makes the overall interface
more complicated than needed.

But once the RVO optimization is applied, neither of those two routines
is as efficient as the most natural declaration:
std::string getString() const;
Returning by value need not be inefficient, as many still think.

An class object should certainly never return a non-const reference to
one of its own data members. Doing so essentially allows clients to
change its state behind its back, and pretty much renders useless any
sort of encapsulation the class was meant to provide. One may as well
make the data member a global variable at that point.

Greg

Jul 23 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.