By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,581 Members | 1,975 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,581 IT Pros & Developers. It's quick & easy.

An interview question by MS

P: n/a
In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.

How should i handle such situation?

I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??

Nov 6 '07 #1
Share this Question
Share on Google+
13 Replies


P: n/a
On Tuesday 06 Nov 2007 1:50 pm Lambda <st*********@gmail.comwrote in
article <11**********************@v23g2000prn.googlegroups .com>:
In an interview, I was asked:

Define a function to split a string with some token.
The Standard library strtok does this, but if you're going to reinvent
the functionality you can do better.
I said let the function return a string array.
This is one possibility.
He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.
There is no "the correct" answer. Barring obviously wrong methods, each
design may be appropriate to a certain situation. strtok() design is
one possibility, though as I noted, you can improve upon it if you are
writing from scratch.

In general, other than modifying the original string like strtok(),
there are two methods. One is for the splitting function to allocate
memory itself and return the token. In this case the caller is usually
left with the responsibility of freeing the memory, but it allows for
efficient use of memory(?). The other method is to have the caller
provide a buffer to store the token into. This allows for memory to be
reused easily. Both methods can also be combined.

There are also other less favourable options like using static and
global arrays etc., usually bad design, particularly for a library
function.
How should i handle such situation?
Did they ask you to write the function then and there during the
interview?

Usually the best general answer is to discuss the pros and cons of
various methods. It shows that you are aware of the subject which is
actually a problem not just confined to strtok().
I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
You should ask the original developers whoever they were.
Why not return a string array directly?
Maybe this is the correct answer??
As I said there is no one correct answer, though there are many wrong
answers.

Nov 6 '07 #2

P: n/a
In article <11**********************@v23g2000prn.googlegroups .com>,
Lambda <st*********@gmail.comwrote:
>He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.
That's a plausible way to do it. But since you are going to have to
dynamically allocate the array, you could allocate a small size
initially and call realloc() to grow it when needed. In fact, you
could allocate 1 entry initially and call realloc() every time,
relying on the library to have a reasonable realloc() implementation.
>I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??
It has the advantage of not allocating memory, which may be useful.
It has the substantial disadvantage that it keeps hidden state - the
current position in the string - so you can't interleave calls to
strtok() with different strings. Some implementations have a version
strtok_r() with an extra argument to avoid this.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Nov 6 '07 #3

P: n/a
On Nov 6, 2:20 am, Lambda <stephenh...@gmail.comwrote:
In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.

How should i handle such situation?

I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??

I find it highly improbably that an MS interview question was intended
to elicit a C response. Surely the desired response was to create and
return a C++ container object in some fashion.

Nov 6 '07 #4

P: n/a
On Tuesday 06 Nov 2007 3:32 pm ro***********@yahoo.com
<ro***********@yahoo.comwrote in article
<11**********************@d55g2000hsg.googlegroups .com>:
On Nov 6, 2:20 am, Lambda <stephenh...@gmail.comwrote:
>In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.

How should i handle such situation?

I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??


I find it highly improbably that an MS interview question was intended
to elicit a C response. Surely the desired response was to create and
return a C++ container object in some fashion.
MS is over C++. They would probably want a C#/CLI/.NET managed class
implemented in it's own Sandbox. :)

Nov 6 '07 #5

P: n/a
On Tue, 06 Nov 2007 08:20:51 +0000, Lambda wrote:
In an interview, I was asked:

Define a function to split a string with some token.
<snip>
I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??
Note that the separators can vary between calls to
strtok on the same base string. They could for example depend
on previous tokens (for example some strings might contain
numerical fields separated by spaces, others (eg times) might
contain numerical fields separated by ':'). This would be
awkward to arrange with a splitter that processed the whole
string in a oner.

Nov 6 '07 #6

P: n/a
On Nov 6, 6:02 pm, "robertwess...@yahoo.com" <robertwess...@yahoo.com>
wrote:
On Nov 6, 2:20 am, Lambda <stephenh...@gmail.comwrote:
In an interview, I was asked:
Define a function to split a string with some token.
I said let the function return a string array.
He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.
I think this is maybe not the correct answer.
How should i handle such situation?
I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.
Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??

I find it highly improbably that an MS interview question was intended
to elicit a C response. Surely the desired response was to create and
return a C++ container object in some fashion.

This question is very simple with Java or C++, and I believe it's
simple with C#.
I can use List in Java or vector<stringin C++.

When I asked him may i use that languages, he said that languages hide
the details
And insist on implementing with C :(

I'm interviewed by MS only once, maybe this is not a typical question.
>From my experience, MS like to ask detail questions.
I must write the code in paper in a short time, after i discuss with
him on
what the function interface will be, there is little time for me to
complete the program. :(
Nov 6 '07 #7

P: n/a
Lambda wrote:
In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Noone seems to have comment on this so far, so here goes:

As Richard Heathfield said elsethread, it is impossible to return a
genuine array, though you can return a pointer pointing to the first
element of an array (for further information, see chapter 6 of the
comp.lang.c FAQ.)

Such a pointer to the first element of an array of strings (each of
which is really a pointer to the first element of an array of chars)
would have type char **.

When you return that pointer, it has lost all information about how big
the array was. There are two main methods for communicating the size of
an array pointed to by a pointer: 1) have some 'sentinel' value within
the array which represents the last element, and 2) communicate the size
of the array through some other channel, perhaps by returning a struct
containing the pointer and a size_t size element.

Examples of method 1) include:
* Null-terminated strings. The '\0' character is the sentinel.
* The GLib function g_strdupv():
gchar **g_strdupv (gchar **str_array);
gchar is some character type; g_strdupv copies a NULL-terminated array
of strings of gchar. The NULL-pointer is the sentinel.

Examples of method 2) include:
* The qsort() standard library function.
* The printf() standard library function. Here the length of the
variable-length argument list is communicated through the format string.
(An argument list isn't an array, but this is the same idea.)

Each method has advantages and disadvantages.

--
Philip Potter pgp <atdoc.ic.ac.uk
Nov 6 '07 #8

P: n/a
[comp.lang.c] Richard Heathfield <rj*@see.sig.invalidwrote:
No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.
I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?
How to tell the *caller* the array size bears thinking about. One way is to
wrap the whole thing in a struct, and either have your function accept a
pointer to such a struct or, perhaps better, allocate the memory for it
yourself and return a pointer to it.
I would say that the best solution would be to write both functions
(with the second delegating to the first) and allow the caller to
invoke whichever function is more convenient.

--
C. Benson Manica | I appreciate all corrections, polite or otherwise.
cbmanica(at)gmail.com |
----------------------| I do not currently read any posts posted through
sdf.lonestar.org | Google groups, due to rampant unchecked spam.
Nov 6 '07 #9

P: n/a
Christopher Benson-Manica said:
[comp.lang.c] Richard Heathfield <rj*@see.sig.invalidwrote:
>No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.

I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?
A two-pass algorithm lacks elegance (although of course we might do it in
two passes anyway, for any of various reasons). I would certainly be
swayed in the direction of single-pass if I had reason to think that the
inputs were likely to be colossal (or volatile).

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 6 '07 #10

P: n/a
On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
<rj*@see.sig.invalidwrote in article
<I7*********************@bt.com>:
Christopher Benson-Manica said:
>[comp.lang.c] Richard Heathfield <rj*@see.sig.invalidwrote:
>>No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.

I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?

A two-pass algorithm lacks elegance (although of course we might do it
in two passes anyway, for any of various reasons). I would certainly
be swayed in the direction of single-pass if I had reason to think
that the inputs were likely to be colossal (or volatile).
If the input is volatile, (ignoring for the moment the likelyhood of
this), I think even a one-pass algorithm won't be guaranteed to produce
meaningful results.

Nov 6 '07 #11

P: n/a
santosh said:
On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
<rj*@see.sig.invalidwrote in article
<snip>
>I would certainly
be swayed in the direction of single-pass if I had reason to think
that the inputs were likely to be colossal (or volatile).

If the input is volatile, (ignoring for the moment the likelyhood of
this), I think even a one-pass algorithm won't be guaranteed to produce
meaningful results.
Well, evidently 'volatile' was a poor word choice, given the existence of
the C keyword of that name! To clarify: I'm thinking, for example, of data
coming in over stdin or a network connection, where you can't simply "play
back" the data whenever you like. (Also, it was a pretty lousy
parenthetical comment for another reason: it widens the discussion away
from mere strings, without saying so.)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 6 '07 #12

P: n/a
On Wednesday 07 Nov 2007 12:28 am Richard Heathfield
<rj*@see.sig.invalidwrote in article
<I7*********************@bt.com>:
santosh said:
>On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
<rj*@see.sig.invalidwrote in article

<snip>
>>I would certainly
be swayed in the direction of single-pass if I had reason to think
that the inputs were likely to be colossal (or volatile).

If the input is volatile, (ignoring for the moment the likelyhood of
this), I think even a one-pass algorithm won't be guaranteed to
produce meaningful results.

Well, evidently 'volatile' was a poor word choice, given the existence
of the C keyword of that name! To clarify: I'm thinking, for example,
of data coming in over stdin or a network connection, where you can't
simply "play back" the data whenever you like. (Also, it was a pretty
lousy parenthetical comment for another reason: it widens the
discussion away from mere strings, without saying so.)
OK. That makes sense. Yes, a two pass algorithm is not applicable unless
you manage to buffer the data.

Nov 6 '07 #13

P: n/a
In article <fg**********@chessie.cirr.com>,
Christopher Benson-Manica <at***@faeroes.freeshell.orgwrote:
>No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.
>I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?
Not necessarily. You only call realloc() at most once per token,
rather than once per character. And most implementations don't
allocate the exact size requested, so most calls to realloc() are
likely to be no-ops with a bit of overhead, even if you call realloc()
for each token.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Nov 6 '07 #14

This discussion thread is closed

Replies have been disabled for this discussion.