473,385 Members | 1,518 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

I think I'm gonna cry. (or newby problems with simple string function)

Hello!

I've programmed in c a bit, but nothing very complicated. I've just
come back to it after a long sojourn in the lands of functional
programming and am completely stumped on a very simple function I'm
trying to write. I'm writing a function that takes a string, and
returns an array of strings which are the result of splitting the input
on whitespace and parentheses (but the parentheses should also be
included in the array as strings).

an example:

explode("foo bar baz") -> ["foo", "bar", "boys"]
explode("foo(bar)baz") -> ["foo", "(", "bar", ")", "baz"]

Now I thought I had it. But what I've got now causes a bus error. So
I'm going to post all of the code (sorry) and maybe wiser minds than me
can work it out. Please remember I'm a noob so I would prefer things as
unobfuscated as possible, and I know how bad the style of my code is
also, I'm trying to make it work first.

Thanks in advance, here's the code:

#include <stdio.h>

char* extract(char* str, int len) {
char* out = (char*)malloc(len + 1);
out = memcpy(out, str, len);
out[len] = '\0';
return out;
}

char istax(int ch) {
int out = (ch=='(') | (ch==')');
return out;
}

char** explode(char* str) {
int nt = counttokens(str);
if(!nt) {
return 0;
}

char** ret = (char**)malloc(nt);

int i = 0;
int len = strlen(str);
char ch;
int start = 0;
int mode = 0;
int t = 0;
for (i = 0; i < len; i++) {
ch = str[i];
if (mode == 0) {
if(!isspace(ch)) {
mode = 1;
start = 0;
}
} else {
if(istax(ch)) {
ret[t] = extract(str + start, (i + 1) - start);
t++;
} else if(isspace(ch)) {
mode = 0;
ret[t] = extract(str + start, (i + 1) - start);
t++;
}
}
}
return ret;
}

int counttokens(char* str) {
char ch;
char intoken = 0;
int tokens = 0;
while(ch = str[0]) {
if(ch == '(') {
tokens++;
intoken = 0;
} else if(ch == ')') {
tokens++;
intoken = 0;
} else if(ch != ' ') {
if(!intoken) {
intoken = 1;
tokens++;
}
} else {
intoken = 0;
}
str++;
}
return tokens;
}
thanks again,

robbie

Nov 15 '05 #1
9 1996
ro************@gmail.com wrote:
char* extract(char* str, int len) {
char* out = (char*)malloc(len + 1);
out = memcpy(out, str, len);
out[len] = '\0';
return out;
} char** explode(char* str) {
int nt = counttokens(str);
if(!nt) {
return 0;
}

char** ret = (char**)malloc(nt);

int i = 0;
int len = strlen(str);
char ch;
int start = 0;
int mode = 0;
int t = 0;
for (i = 0; i < len; i++) {
ch = str[i];
if (mode == 0) {
if(!isspace(ch)) {
mode = 1;
start = 0;
}
} else {
if(istax(ch)) {
ret[t] = extract(str + start, (i + 1) - start);
t++;
} else if(isspace(ch)) {
mode = 0;
ret[t] = extract(str + start, (i + 1) - start);
t++;
}
}
}


you aren't reassigning start nor using t in a meaningful way. pick
one. :)

Nov 15 '05 #2

tedu wrote:
ro************@gmail.com wrote:
int i = 0;
int len = strlen(str);
char ch;
int start = 0;
int mode = 0;
int t = 0;
for (i = 0; i < len; i++) {
ch = str[i];
if (mode == 0) {
if(!isspace(ch)) {
mode = 1;
start = 0;
}
} else {
if(istax(ch)) {
ret[t] = extract(str + start, (i + 1) - start);
t++;
} else if(isspace(ch)) {
mode = 0;
ret[t] = extract(str + start, (i + 1) - start);
t++;
}
}
}


you aren't reassigning start nor using t in a meaningful way.


er, sorry, t is ok. i still think you want to be doing something more
with start.

Nov 15 '05 #3
Dan
When is start anything other than 0?

Nov 15 '05 #4
"Dan" <gi*****@aol.com> writes:
When is start anything other than 0?


I have no clue what you're talking about. You need to provide some
context when you post a followup; not everyone has easy access to the
parent article.

A search for "google" "followup" in this very newsgroup currently gets
1100 hits (and now it's going to be 1101).

Dan, maybe you can help us out. We've been telling Google users for
months how and why to post properly using the broken groups.google.com
interface, but it's just not working. Do you have any advice on how
we can get the word out so this stops happening?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 15 '05 #5
On 29 Sep 2005 15:45:11 -0700, ro************@gmail.com wrote in
comp.lang.c:
Hello!

I've programmed in c a bit, but nothing very complicated. I've just
come back to it after a long sojourn in the lands of functional
programming and am completely stumped on a very simple function I'm
trying to write. I'm writing a function that takes a string, and
returns an array of strings which are the result of splitting the input
on whitespace and parentheses (but the parentheses should also be
included in the array as strings).

an example:

explode("foo bar baz") -> ["foo", "bar", "boys"]
explode("foo(bar)baz") -> ["foo", "(", "bar", ")", "baz"]
You haven't shown us any code that calls this function. Do you
actually call it with string literals, and does it attempt to modify
them? Modifying string literals is undefined behavior.
Now I thought I had it. But what I've got now causes a bus error. So
I'm going to post all of the code (sorry) and maybe wiser minds than me
can work it out. Please remember I'm a noob so I would prefer things as
unobfuscated as possible, and I know how bad the style of my code is
also, I'm trying to make it work first.

Thanks in advance, here's the code:

#include <stdio.h>
You haven't included <stdlib.h>, so you don't have a prototype for
malloc() in scope. Calling malloc() without a prototype produces
undefined behavior.

You haven't included <string.h>, so you don't have a prototype for
memcpy() in scope. Calling memcpy() without a prototype produces
undefined behavior.

You haven't included <ctype.h>, so you don't have a prototype for
isspace() in scope.
char* extract(char* str, int len) {
Since the sizeof operator yields a value of type size_t, and malloc()
accepts a single argument of type size_t, why are you using int? It
may not cause a problem in this case, but ultimately you are asking
for a signed/unsigned clash.
char* out = (char*)malloc(len + 1);
No, casting the value returned by malloc() is wrong. You probably did
this to shut up a compiler diagnostic, caused by your failure to
include <stdlib.h> and have a prototype in scope.
out = memcpy(out, str, len);
out[len] = '\0';
return out;
}

char istax(int ch) {
int out = (ch=='(') | (ch==')');
return out;
}

char** explode(char* str) {
int nt = counttokens(str);
if(!nt) {
return 0;
}

char** ret = (char**)malloc(nt);


Whatever you are using, it is not a conforming C compiler, not
conforming to any version of the C language standard. Or you are not
using it that way.

Versions of the C standard prior to 1999 would not allow the
declaration above, because it comes after executable statements in the
current block. And versions of the C standard from and after 1999
will not allow a call to a function without at least a declaration in
scope, and you have none for malloc() or memcpy().

Examine your compiler's documentation to determine how to invoke it as
a conforming C compiler, or ask your question in a compiler-specific
group.

On the other hand, if you are compiling this code with a C++ compiler,
ask in comp.lang.c++.

[snip]

Fix the problems that I've pointed out and then, if you are compiling
this code with a conforming C compiler and still have problems, post
again.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 15 '05 #6
ro************@gmail.com wrote:
...I'm writing a function that takes a string, and returns an array
of strings which are the result of splitting the input on whitespace
and parentheses (but the parentheses should also be included in the
array as strings).

an example:

explode("foo bar baz") -> ["foo", "bar", "boys"]
explode("foo(bar)baz") -> ["foo", "(", "bar", ")", "baz"]
<snip>
#include <stdio.h>
Avoid using unprototyped functions...

#include <ctype.h>
#include <stdlib.h>
#include <string.h>
char* extract(char* str, int len) {
String lengths and object sizes in general are better measured with
size_t than int.
char* out = (char*)malloc(len + 1);
You should check the return value of malloc.
out = memcpy(out, str, len);
out[len] = '\0';
return out;
}

char istax(int ch) {
int out = (ch=='(') | (ch==')');
Look up the difference between | and ||.
return out;
}
This isn't really worth a function.
char** explode(char* str) {
int nt = counttokens(str);
This design is somewhat poor. You have a separate function to
count the number of tokens, yet you use duplicate code to
extract the tokens. If the specifications change, then you
need to maintain two separate pieces of code synchonously.
if(!nt) {
return 0;
}

char** ret = (char**)malloc(nt);
C90 won't let you mix declarations and statements.

int i = 0;
int len = strlen(str);
char ch;
int start = 0;
int mode = 0;
int t = 0;

<snip>

You seem to have more indexing variables than you can handle.

Here's one way that I might do this. The 'work' function does
the counting and the allocation. I just scan through the string
in question (s), and use another pointer t to mark the begining
of an 'identifier' token. Since t can be null, it serves as a
'mode' flag.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *span_dup(const char *s, const char *t)
{
size_t n = t ? t - s : strlen(s);
char *m = malloc(n + 1);
if (m) { memcpy(m, s, n); m[n] = 0; }
return m;
}

size_t work(char **a, const char *s)
{
const char *t = 0;
size_t n = 0;

for (; *s; s++)
{
if (*s == ' ' || *s == '(' || *s == ')')
{
/* add any prior scanned identifier */
if (t) { n++; if (a) *a++ = span_dup(t, s); t = 0; }

/* add a ( or ) token */
if (*s != ' ') { n++; if (a) *a++ = span_dup(s, s+1); }
}
else if (!t)
t = s; /* start new identifier token */
}

/* add any last (outstanding) identifier token */
if (t) { n++; if (a) *a++ = span_dup(t, s); }

return n;
}

char **explode(const char *s)
{
size_t n = work(0, s);
char **m = malloc((n + 1) * sizeof *m);
if (m) { work(m, s); m[n] = 0; }
return m;
}

int main(void)
{
char **s, **m = explode("Hello (World)");
if (m == 0) return 0;
for (s = m; *s; s++) printf("<%s>\n", *s);
return 0;
}

--
Peter

Nov 15 '05 #7
Thanks for all the responses.
Start being assigned to 0 in the loop was very stupid and due to late
night brain fever, it is supposed to be start = i in the loop. Yes I
wanted ||, not |, although doesn't | do the same thing in this case?
Anyway, I've fixed these two, and I'm still getting the same error.
Yes I know the duplication is ugly. I wanted to get a naive
implementation working and then factor it out, but the 'naive' version
is turning out harder than I thought.
The reason istax is a seperate function is because the definition of
token delimiters is likely to change, so I wanted it to be in one place
(and yes, I know I don't use it in counttokens. counttokens came first
and I forgot to factor it out)
As to the many non-standard c things I've done, thanks for pointing
them out. I've been spoilt by a very forgiving compiler (gcc,
presumably doing c++ on the side) which just isn't shouting at me about
it. I'm going to fix all these mistakes, and repost the code.
Thanks peter for the working code. I'll probably end up using that as
it's much neater than what I've been doing, but I'd like to get mine
working so I understand what's wrong with it.

Nov 15 '05 #8

<ro************@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
I've been spoilt by a very forgiving compiler (gcc,
presumably doing c++ on the side) which just isn't shouting at me about
it. I'm going to fix all these mistakes, and repost the code.
Thanks peter for the working code. I'll probably end up using that as
it's much neater than what I've been doing, but I'd like to get mine
working so I understand what's wrong with it.


You can tell gcc to use various dialects of C when compiling

-ansi

-std=c99

Here's link:
http://gcc.gnu.org/onlinedocs/gcc-4....ialect-Options
Nov 15 '05 #9
ro************@gmail.com wrote:
char** explode(char* str) {
int nt = counttokens(str);
if(!nt) {
return 0;
}

char** ret = (char**)malloc(nt);


As well as the problems everyone else has pointed out, this one
is quite likely to cause a crash. I think you want to allocate
'nt' number of pointers. But actually you allocate 'nt' bytes.

You could have avoided this problem by using 'the CLC form'
of malloc:

char **ret = malloc( nt * sizeof *ret );

Nov 15 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Chris Kettenbach | last post by:
Is there a string function that returns the number of times string a occurs in string b? Thanks, Chris
4
by: Seth | last post by:
I want to create a simple hash function that can hash strings. Currently I'm storing passwords as strings in a DB but want to store them as a hash. I don't need any proper standardised hashing e.g....
4
by: Ralph Noble | last post by:
Does anyone know of a string function in Access that will allow me to count the number of instances one string occurs within another? Or if there is some sort of word count function? If there is,...
51
by: Alan | last post by:
hi all, I want to define a constant length string, say 4 then in a function at some time, I want to set the string to a constant value, say a below is my code but it fails what is the correct...
6
by: LongBow | last post by:
Hello all, I am having a little problems getting String's StartsWith and EndsWith methods. If I have a string defined as sNormalPt which equals "0x11D0" then use the following command ...
4
by: drasko | last post by:
Hi all. I need to code simple and fast int regexp_match(char *regexp, char *string) function that will follow the expression regexp, and see if there is a matching in the string. If there is, it...
14
by: nishit.gupta | last post by:
Is their any single fuction available in C++ that can determine that a string contains a numeric value. The value cabn be in hex, int, float. i.e. "1256" , "123.566" , "0xffff" , It can also...
10
by: silverburgh.meryl | last post by:
Hi, Is there a string function to trim all non-ascii characters out of a string? Let say I have a string in python (which is utf8 encoded), is there a python function which I can convert that...
6
TheMan1
by: TheMan1 | last post by:
Hi, I'm having a problem implementing a simple function Function1 that returns a string. Test1.h: class Test1 { public: Test1(); string Function1(string msg); };
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.