473,378 Members | 1,441 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

An interview question by MS

In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.

How should i handle such situation?

I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??

Nov 6 '07 #1
13 1865
On Tuesday 06 Nov 2007 1:50 pm Lambda <st*********@gmail.comwrote in
article <11**********************@v23g2000prn.googlegroups .com>:
In an interview, I was asked:

Define a function to split a string with some token.
The Standard library strtok does this, but if you're going to reinvent
the functionality you can do better.
I said let the function return a string array.
This is one possibility.
He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.
There is no "the correct" answer. Barring obviously wrong methods, each
design may be appropriate to a certain situation. strtok() design is
one possibility, though as I noted, you can improve upon it if you are
writing from scratch.

In general, other than modifying the original string like strtok(),
there are two methods. One is for the splitting function to allocate
memory itself and return the token. In this case the caller is usually
left with the responsibility of freeing the memory, but it allows for
efficient use of memory(?). The other method is to have the caller
provide a buffer to store the token into. This allows for memory to be
reused easily. Both methods can also be combined.

There are also other less favourable options like using static and
global arrays etc., usually bad design, particularly for a library
function.
How should i handle such situation?
Did they ask you to write the function then and there during the
interview?

Usually the best general answer is to discuss the pros and cons of
various methods. It shows that you are aware of the subject which is
actually a problem not just confined to strtok().
I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
You should ask the original developers whoever they were.
Why not return a string array directly?
Maybe this is the correct answer??
As I said there is no one correct answer, though there are many wrong
answers.

Nov 6 '07 #2
In article <11**********************@v23g2000prn.googlegroups .com>,
Lambda <st*********@gmail.comwrote:
>He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.
That's a plausible way to do it. But since you are going to have to
dynamically allocate the array, you could allocate a small size
initially and call realloc() to grow it when needed. In fact, you
could allocate 1 entry initially and call realloc() every time,
relying on the library to have a reasonable realloc() implementation.
>I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??
It has the advantage of not allocating memory, which may be useful.
It has the substantial disadvantage that it keeps hidden state - the
current position in the string - so you can't interleave calls to
strtok() with different strings. Some implementations have a version
strtok_r() with an extra argument to avoid this.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Nov 6 '07 #3
On Nov 6, 2:20 am, Lambda <stephenh...@gmail.comwrote:
In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.

How should i handle such situation?

I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??

I find it highly improbably that an MS interview question was intended
to elicit a C response. Surely the desired response was to create and
return a C++ container object in some fashion.

Nov 6 '07 #4
On Tuesday 06 Nov 2007 3:32 pm ro***********@yahoo.com
<ro***********@yahoo.comwrote in article
<11**********************@d55g2000hsg.googlegroups .com>:
On Nov 6, 2:20 am, Lambda <stephenh...@gmail.comwrote:
>In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.

I think this is maybe not the correct answer.

How should i handle such situation?

I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??


I find it highly improbably that an MS interview question was intended
to elicit a C response. Surely the desired response was to create and
return a C++ container object in some fashion.
MS is over C++. They would probably want a C#/CLI/.NET managed class
implemented in it's own Sandbox. :)

Nov 6 '07 #5
On Tue, 06 Nov 2007 08:20:51 +0000, Lambda wrote:
In an interview, I was asked:

Define a function to split a string with some token.
<snip>
I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.

Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??
Note that the separators can vary between calls to
strtok on the same base string. They could for example depend
on previous tokens (for example some strings might contain
numerical fields separated by spaces, others (eg times) might
contain numerical fields separated by ':'). This would be
awkward to arrange with a splitter that processed the whole
string in a oner.

Nov 6 '07 #6
On Nov 6, 6:02 pm, "robertwess...@yahoo.com" <robertwess...@yahoo.com>
wrote:
On Nov 6, 2:20 am, Lambda <stephenh...@gmail.comwrote:
In an interview, I was asked:
Define a function to split a string with some token.
I said let the function return a string array.
He asked me how do i know the array size.
Yes, I have to traverse the string to find how large the array is for
the first time.
And traverse the string when actually doing the split for the second
time.
I think this is maybe not the correct answer.
How should i handle such situation?
I find there is a 'strtok(s, ct)' function in the standard library.
The first cll in a sequence has a non-NULL s.
Each subsequent call, indicated by a NULL value of s.
strtok returns NULL when no further token is found.
Why indeed this function defined this way?
Why not return a string array directly?
Maybe this is the correct answer??

I find it highly improbably that an MS interview question was intended
to elicit a C response. Surely the desired response was to create and
return a C++ container object in some fashion.

This question is very simple with Java or C++, and I believe it's
simple with C#.
I can use List in Java or vector<stringin C++.

When I asked him may i use that languages, he said that languages hide
the details
And insist on implementing with C :(

I'm interviewed by MS only once, maybe this is not a typical question.
>From my experience, MS like to ask detail questions.
I must write the code in paper in a short time, after i discuss with
him on
what the function interface will be, there is little time for me to
complete the program. :(
Nov 6 '07 #7
Lambda wrote:
In an interview, I was asked:

Define a function to split a string with some token.

I said let the function return a string array.

He asked me how do i know the array size.
Noone seems to have comment on this so far, so here goes:

As Richard Heathfield said elsethread, it is impossible to return a
genuine array, though you can return a pointer pointing to the first
element of an array (for further information, see chapter 6 of the
comp.lang.c FAQ.)

Such a pointer to the first element of an array of strings (each of
which is really a pointer to the first element of an array of chars)
would have type char **.

When you return that pointer, it has lost all information about how big
the array was. There are two main methods for communicating the size of
an array pointed to by a pointer: 1) have some 'sentinel' value within
the array which represents the last element, and 2) communicate the size
of the array through some other channel, perhaps by returning a struct
containing the pointer and a size_t size element.

Examples of method 1) include:
* Null-terminated strings. The '\0' character is the sentinel.
* The GLib function g_strdupv():
gchar **g_strdupv (gchar **str_array);
gchar is some character type; g_strdupv copies a NULL-terminated array
of strings of gchar. The NULL-pointer is the sentinel.

Examples of method 2) include:
* The qsort() standard library function.
* The printf() standard library function. Here the length of the
variable-length argument list is communicated through the format string.
(An argument list isn't an array, but this is the same idea.)

Each method has advantages and disadvantages.

--
Philip Potter pgp <atdoc.ic.ac.uk
Nov 6 '07 #8
[comp.lang.c] Richard Heathfield <rj*@see.sig.invalidwrote:
No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.
I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?
How to tell the *caller* the array size bears thinking about. One way is to
wrap the whole thing in a struct, and either have your function accept a
pointer to such a struct or, perhaps better, allocate the memory for it
yourself and return a pointer to it.
I would say that the best solution would be to write both functions
(with the second delegating to the first) and allow the caller to
invoke whichever function is more convenient.

--
C. Benson Manica | I appreciate all corrections, polite or otherwise.
cbmanica(at)gmail.com |
----------------------| I do not currently read any posts posted through
sdf.lonestar.org | Google groups, due to rampant unchecked spam.
Nov 6 '07 #9
Christopher Benson-Manica said:
[comp.lang.c] Richard Heathfield <rj*@see.sig.invalidwrote:
>No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.

I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?
A two-pass algorithm lacks elegance (although of course we might do it in
two passes anyway, for any of various reasons). I would certainly be
swayed in the direction of single-pass if I had reason to think that the
inputs were likely to be colossal (or volatile).

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 6 '07 #10
On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
<rj*@see.sig.invalidwrote in article
<I7*********************@bt.com>:
Christopher Benson-Manica said:
>[comp.lang.c] Richard Heathfield <rj*@see.sig.invalidwrote:
>>No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.

I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?

A two-pass algorithm lacks elegance (although of course we might do it
in two passes anyway, for any of various reasons). I would certainly
be swayed in the direction of single-pass if I had reason to think
that the inputs were likely to be colossal (or volatile).
If the input is volatile, (ignoring for the moment the likelyhood of
this), I think even a one-pass algorithm won't be guaranteed to produce
meaningful results.

Nov 6 '07 #11
santosh said:
On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
<rj*@see.sig.invalidwrote in article
<snip>
>I would certainly
be swayed in the direction of single-pass if I had reason to think
that the inputs were likely to be colossal (or volatile).

If the input is volatile, (ignoring for the moment the likelyhood of
this), I think even a one-pass algorithm won't be guaranteed to produce
meaningful results.
Well, evidently 'volatile' was a poor word choice, given the existence of
the C keyword of that name! To clarify: I'm thinking, for example, of data
coming in over stdin or a network connection, where you can't simply "play
back" the data whenever you like. (Also, it was a pretty lousy
parenthetical comment for another reason: it widens the discussion away
from mere strings, without saying so.)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 6 '07 #12
On Wednesday 07 Nov 2007 12:28 am Richard Heathfield
<rj*@see.sig.invalidwrote in article
<I7*********************@bt.com>:
santosh said:
>On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
<rj*@see.sig.invalidwrote in article

<snip>
>>I would certainly
be swayed in the direction of single-pass if I had reason to think
that the inputs were likely to be colossal (or volatile).

If the input is volatile, (ignoring for the moment the likelyhood of
this), I think even a one-pass algorithm won't be guaranteed to
produce meaningful results.

Well, evidently 'volatile' was a poor word choice, given the existence
of the C keyword of that name! To clarify: I'm thinking, for example,
of data coming in over stdin or a network connection, where you can't
simply "play back" the data whenever you like. (Also, it was a pretty
lousy parenthetical comment for another reason: it widens the
discussion away from mere strings, without saying so.)
OK. That makes sense. Yes, a two pass algorithm is not applicable unless
you manage to buffer the data.

Nov 6 '07 #13
In article <fg**********@chessie.cirr.com>,
Christopher Benson-Manica <at***@faeroes.freeshell.orgwrote:
>No, you don't need to do this in two passes. You can build the array
dynamically, extending it as and when required.
>I would think, though, that the work of traversing the string twice
would be less than the probable extra calls to realloc() - would you
agree?
Not necessarily. You only call realloc() at most once per token,
rather than once per character. And most implementations don't
allocate the exact size requested, so most calls to realloc() are
likely to be no-ops with a bit of overhead, even if you call realloc()
for each token.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Nov 6 '07 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: softwareengineer2006 | last post by:
All Interview Questions And Answers 10000 Interview Questions And Answers(C,C++,JAVA,DOTNET,Oracle,SAP) I have listed over 10000 interview questions asked in interview/placement test papers for...
0
by: Jobs | last post by:
Download the JAVA , .NET and SQL Server interview sheet and rate yourself. This will help you judge yourself are you really worth of attending interviews. If you own a company best way to judge if...
2
by: Jobs | last post by:
Download the JAVA , .NET and SQL Server interview with answers Download the JAVA , .NET and SQL Server interview sheet and rate yourself. This will help you judge yourself are you really worth of...
0
by: connectrajesh | last post by:
INTERVIEWINFO.NET http://www.interviewinfo.net FREE WEB SITE AND SERVICE FOR JOB SEEKERS /FRESH GRADUATES NO ADVERTISEMENT
2
by: freepdfforjobs | last post by:
Full eBook with 4000 C#, JAVA,.NET and SQL Server Interview questions http://www.questpond.com/SampleInterviewQuestionBook.zip Download the JAVA , .NET and SQL Server interview sheet and rate...
0
by: freesoftwarepdfs | last post by:
Ultimate list of Interview question website.....Do not miss it http://www.questpond.com http://msdotnetsupport.blogspot.com/2007/01/net-interview-questions-by-dutt-part-2.html...
0
by: ramu | last post by:
C# Interview Questions and Answers8 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers8.html C# Interview Questions and Answers7...
0
by: reema | last post by:
EJB Interview Questions http://interviewdoor.com/technical/EJB-Interview-Questions.htm CSS Interview Questions http://interviewdoor.com/technical/CSS-Interview-Questions.htm C Interview Questions...
0
by: reema | last post by:
EJB Interview Questions http://interviewdoor.com/technical/EJB-Interview-Questions.htm CSS Interview Questions http://interviewdoor.com/technical/CSS-Interview-Questions.htm C Interview Questions...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.