473,396 Members | 1,877 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

tough interview question

Hi there,
I got a tough interview questions lately, and I would like to hear
your opinion:

An array of N chars is given
Write an efficient algorithm to find all the repeating substring with a

minimal size
of 2
f.e
ABCFABHYIFAB
sunstrings are:
"AB"
"FAB"

Mar 3 '06 #1
9 17798
denis wrote:
I got a tough interview questions lately, and I would like to hear
your opinion:

An array of N chars is given
Write an efficient algorithm to find all the repeating substring with a
minimal size of 2


I would agree that this is a though interview question! Did they
really wanted a correct and (provably) efficient algorithm? ... or
did they want to see how you get about it? I think the following
would do the trick:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <set>

int main()
{
typedef std::string::const_iterator iterator;
std::string s("ABCFABHYIFAB");
std::set<std::string> found;

if (2 < s.size())
for (iterator oit = s.begin() + 1, oe = s.end(); oit != oe; ++oit)
for (iterator iit = s.begin(); iit != oit; ++iit)
{
iterator tmp = std::mismatch(oit, oe, iit).second;;
if (tmp - iit > 1)
found.insert(std::string(iit, tmp));
}

std::copy(found.begin(), found.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}

.... but I'm not sure that it is the most efficient version.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 3 '06 #2
dc
what abt quickly sorting the strings & then go abt....

Mar 3 '06 #3
dc wrote:
what abt quickly sorting the strings & then go abt....


My algorithm is at worst an O(n^3) algorithm. With your algorithm
you would create O(n^2) strings, with a size of up to O(n), i.e.
comparing each string is not a constant but an O(n) operation. I
think this effectively means that sorting the strings takes
O(n^3 ln n) operations. A more thorough analysis might, however,
reduce these numbers somewhat because the majority number of
strings is smaller than O(n), I wouldn't expect this to fit into
an interview, however, unless it is for a rather scientific
position. It is definitely important to know about asymptotic
performance and choose proper algorithms but it is also important
to get the job done in a reasonable amount of time which also
adds to the selection criteria.

I think that both algorithms are in the same order with respect
to implementation complexity, maybe yours is slightly simpler.
Thus, I would expect both to be a reasonable answer, although it
is hard to tell what the interviewer actually wanted to see.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 3 '06 #4

denis wrote:
Hi there,
I got a tough interview questions lately, and I would like to hear
your opinion:

An array of N chars is given
Write an efficient algorithm to find all the repeating substring with a

minimal size
of 2
f.e
ABCFABHYIFAB
sunstrings are:
"AB"
"FAB"


They might have been intrested if you are aware of string search
algorithms. http://en.wikipedia.org/wiki/String_search

Mar 3 '06 #5
Here my version of algorithm. Experimentally I found that number of
searches it performs is:
[Length of string]+[Number of unique substring] - [Some Number]. Where
[Some Number] = 3 or 4 (may be some other). Don't know why. Any ideas?

void foo(std::string& s)
{
std::set<string> sSet;
int findCount = 0;
for (int firstElement = 0; firstElement < s.size(); firstElement++)
{
for (int stringLen = 2; firstElement+stringLen < s.size();
stringLen++)
{
std::string t = s.substr(firstElement,stringLen);
if (sSet.find(t)!=sSet.end())
continue;

findCount++;
if (s.find(t,firstElement+stringLen) != std::string::npos)
sSet.insert(t);
else
break;
}
}
std::copy(sSet.begin(), sSet.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
cout << "Search count " << findCount << endl;
cout << "Calc search count either " << s.length() + sSet.size() - 3
<< endl;
cout << " or " << s.length() + sSet.size() - 4
<< endl;
cout << "==============" << endl;
}

int main( void )
{
std::string s = "ABCFABHYIFAB";
foo(s);

return 0;
}

Mar 3 '06 #6
Hello,

denis wrote:
Hi there,
I got a tough interview questions lately, and I would like to hear
your opinion:

An array of N chars is given
Write an efficient algorithm to find all the repeating substring with
a

minimal size
of 2
f.e
ABCFABHYIFAB
sunstrings are:
"AB"
"FAB"


I wonder, what purpose this interview question should have served. If
you need people with expertise in string algorithms, then it should be
clear from the CV, whether this can be expected or not.

I wonder what the connection of the question is with C++. It is a purely
algorithmic question. Since I had to deal with a similar problem a few
years ago, and implemented it in C++ I can at least give you a few
ideas.

One basic idea is to construct a suffix tree (a generalization of tries,
similar to digital trees), which is possible in time O(N), given
constant alphabet size. Look for authors McCreight or Ukkonen which
have written papers on it. They have been cited a lot, since a lot of
work has been done on this data structure in the last 20 years.

The suffix tree has at most 2 nodes per suffix, one leaf node per
suffix, and there is an inner node per maximal prefix which is a prefix
of at least two suffixes. This is a very rough explanation of what can
be explained better on a few pages with pictures. The proof that this
tree can be constructed in time O(N) is non-trivial, and needs some
extra pointers in the tree. But it should be easier to understand that
the suffix tree has size O(N) and can be traversed in time O(N). The
inner nodes which are are at least two characters down from the root
correspond to substrings of minimal size 2 with repeated occurrences.
So there is a solution in time O(N) for the overall problem of finding
the repeated substrings, i.e. algorithmically optimal. The constant
involved is not overly high, the suffix tree might pay out for N in the
thousands already, and especially if you can use the same suffix tree
many times for different questions.

I don't know of any implementation generic enough that it could be just
plugged into the C++ standard library. The system I implemented the
suffix tree for used a string class of its own with rather different
requirements than std::string. A suffix tree with the O(N) size can
only be implemented as a kind of add-on index of a string or set of
strings. So this does not fit at all anything in the C++ standard
library. It is neither a container of it's own, nor an algorithm, it's
a chimaera, and it is fat in memory consumption, up to about 10
pointers per character, if you add all usual bells and whistles. There
are more recent refinements of suffix trees, e.g. suffix arrays, suffix
cactus and suffix whatever, which try to reduce the memory consumption.

Another method that might help getting near to O(N) is
hashing/fingerprinting as proposed by Karp and Rabin. Due to hasing
O(N) worst case will not be maintainable, but, as usual with hashing,
the average will be nice. The constant and the memory consumption will
be a lot less compared to suffix trees.

Can any curriculum be expected to treat suffix trees or Rabin-Karp
matching? I doubt that there is any message other than the trivial one
from the fact that somebody has not heard about it. An interviewer will
always be able to make you either reinvent simple and inefficient
solutions or give up, since there are many simply stated problems with
non-trivial solutions.

Bernd Strieder

Mar 3 '06 #7
On 2006-03-03, Dietmar Kuehl <di***********@yahoo.com> wrote:
denis wrote:
I got a tough interview questions lately, and I would like to hear
your opinion:

An array of N chars is given
Write an efficient algorithm to find all the repeating substring with a
minimal size of 2
I would agree that this is a though interview question! Did they
really wanted a correct and (provably) efficient algorithm? ... or
did they want to see how you get about it? I think the following
would do the trick:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <set>

int main()
{
typedef std::string::const_iterator iterator;
std::string s("ABCFABHYIFAB");
std::set<std::string> found;

if (2 < s.size())
for (iterator oit = s.begin() + 1, oe = s.end(); oit != oe; ++oit)


You can save (N-1) fruitless calls to mismatch by bumping one off the
end. Since "B" is not a sequence, it needn't be checked against
"A", "B", "C", etc...:

for (iterator oit = s.begin() +1, oe = s.end() -1; oit < oe; ++oit)
for (iterator iit = s.begin(); iit != oit; ++iit)
{
iterator tmp = std::mismatch(oit, oe, iit).second;;
if (tmp - iit > 1)
found.insert(std::string(iit, tmp));
}

std::copy(found.begin(), found.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}

... but I'm not sure that it is the most efficient version.


This may be a problem:

"ebebe" => "be" "ebe"

It's hard to tell from the problem descrition.

--
Neil Cerutti
Mar 3 '06 #8

Dietmar Kuehl schrieb:
dc wrote:
what abt quickly sorting the strings & then go abt....
My algorithm is at worst an O(n^3) algorithm. With your algorithm
you would create O(n^2) strings,


you dont need n^2 strings for the same (buggy, eg "fabfab") results

#include <iostream>
#include <vector>
#include <algorithm>
#include <string.h>

class substring {
public:
char *s;
substring(char *ss) : s(ss) { }
bool operator <(const substring &second) const {
return strcmp(s, second.s) < 0;
}
int match(const substring &second) {
int i = 0;
while (s[i] != 0 && s[i] == second.s[i]) i++;
return i;
}
};

int main()
{
char *s = "ABCFABHYIFAB";

std::vector<substring> strings;

for (int i = 0; i < strlen(s); i++)
strings.push_back(substring(s+i));

std::sort(strings.begin(), strings.end());

int match, lmatch = 0;
std::vector<substring>::iterator it;
for (it = strings.begin(); it != strings.end()-1; it++)
{
if (((match = (*it).match(*(it+1))) > lmatch) && (match 1)) std::cout << std::string((*it).s, match) <<
std::endl;
lmatch = match;
}

return 0;
}
with a size of up to O(n), i.e.
comparing each string is not a constant but an O(n) operation. I
think this effectively means that sorting the strings takes
O(n^3 ln n) operations. A more thorough analysis might, however,
reduce these numbers somewhat because the majority number of
strings is smaller than O(n), I wouldn't expect this to fit into
an interview, however, unless it is for a rather scientific
position. It is definitely important to know about asymptotic
performance and choose proper algorithms but it is also important
to get the job done in a reasonable amount of time which also
adds to the selection criteria.

I think that both algorithms are in the same order with respect
to implementation complexity, maybe yours is slightly simpler.
Thus, I would expect both to be a reasonable answer, although it
is hard to tell what the interviewer actually wanted to see.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence


Mar 3 '06 #9
denis wrote:
Hi there,
I got a tough interview questions lately, and I would like to hear
your opinion:

An array of N chars is given
Write an efficient algorithm to find all the repeating substring with a

eg ABCFABHYIFAB
sunstrings are:
"AB"
"FAB"


You can create an array 'p' of char*s into the given string and
initialise it so that:
p[0] = "ABCFABHYIFAB"
p[1] = "BCFABHYIFAB"
p[2] = "CFABHYIFAB"
p[3] = "FABHYIFAB"
:
p[9] = "FAB"
p[10] = "AB"
p[11] = "B"
Now sort this array. You can then scan the sorted array looking for
adjacent pairs whose first 2 chars are the same, eg (p[0],p[10]) and
(p[3],p[9]) above. This trick appeared in Dr Dobbs some time back.

Mar 3 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

54
by: Spammay Blockay | last post by:
I've been tasked with doing technical interviews at my company, and I have generally ask a range of OO, Java, and "good programming technique" concepts. However, one of my favorite exercises I...
10
by: Gopal Krish | last post by:
I was asked this question in an interview. How can you display the contents of an ASP page (from another web server) in a aspx page, ie, first half of the screen will display contents from asp...
3
by: denis_browne | last post by:
Hi there, I got a tough interview questions lately, and I would like to hear your opinion: An array of N chars is given Write an efficient algorithm to find all the repeating substring with a ...
0
by: Jobs | last post by:
Download the JAVA , .NET and SQL Server interview sheet and rate yourself. This will help you judge yourself are you really worth of attending interviews. If you own a company best way to judge if...
2
by: Jobs | last post by:
Download the JAVA , .NET and SQL Server interview with answers Download the JAVA , .NET and SQL Server interview sheet and rate yourself. This will help you judge yourself are you really worth of...
0
by: connectrajesh | last post by:
INTERVIEWINFO.NET http://www.interviewinfo.net FREE WEB SITE AND SERVICE FOR JOB SEEKERS /FRESH GRADUATES NO ADVERTISEMENT
2
by: freepdfforjobs | last post by:
Full eBook with 4000 C#, JAVA,.NET and SQL Server Interview questions http://www.questpond.com/SampleInterviewQuestionBook.zip Download the JAVA , .NET and SQL Server interview sheet and rate...
0
by: freesoftwarepdfs | last post by:
Ultimate list of Interview question website.....Do not miss it http://www.questpond.com http://msdotnetsupport.blogspot.com/2007/01/net-interview-questions-by-dutt-part-2.html...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.