ne******@gmail.com writes:
I have a file stored in memory using mmap() and I'd like to parse to
read line by line.
Also, there are several threads that read this buffer so I think
strtok(p, "\n") wouldnt be a good choice. I'd like to hear from you
guys what would be a good implementation in this case.
strtok() is rarely a good choice for anything.
strtok() has at least these problems:
* It merges adjacent delimiters. If you use a comma as
your delimiter, then "a,,b,c" is three tokens, not
four. This is often the wrong thing to do. In fact,
it is only the right thing to do, in my experience,
when the delimiter set is limited to white space.
* The identity of the delimiter is lost, because it is
changed to a null terminator.
* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.
* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost. This isn't a
problem for small programs but it is easy to lose track
of such things in hierarchies of nested functions in
large programs. In other words, strtok() breaks
encapsulation.
Instead, use some substitute, e.g. strtok_r(). Here is an
implementation of strtok_r(). It may be SUSv3 compliant, but I
do not know for sure. If you use it, you should probably rename
it, because (most) names beginning with `str' are reserved:
/* Breaks a string into tokens separated by DELIMITERS. The
first time this function is called, S should be the string to
tokenize, and in subsequent calls it must be a null pointer.
SAVE_PTR is the address of a `char *' variable used to keep
track of the tokenizer's position. The return value each time
is the next token in the string, or a null pointer if no
tokens remain.
This function treats multiple adjacent delimiters as a single
delimiter. The returned tokens will never be length 0.
DELIMITERS may change from one call to the next within a
single string.
strtok_r() modifies the string S, changing delimiters to null
bytes. Thus, S must be a modifiable string. String literals,
in particular, are *not* modifiable in C, even though for
backward compatibility they are not `const'.
Example usage:
char s[] = " String to tokenize. ";
char *token, *save_ptr;
for (token = strtok_r (s, " ", &save_ptr); token != NULL;
token = strtok_r (NULL, " ", &save_ptr))
printf ("'%s'\n", token);
outputs:
'String'
'to'
'tokenize.'
*/
char *
strtok_r (char *s, const char *delimiters, char **save_ptr)
{
char *token;
ASSERT (delimiters != NULL);
ASSERT (save_ptr != NULL);
/* If S is nonnull, start from it.
If S is null, start from saved position. */
if (s == NULL)
s = *save_ptr;
ASSERT (s != NULL);
/* Skip any DELIMITERS at our current position. */
while (strchr (delimiters, *s) != NULL)
{
/* strchr() will always return nonnull if we're searching
for a null byte, because every string contains a null
byte (at the end). */
if (*s == '\0')
{
*save_ptr = s;
return NULL;
}
s++;
}
/* Skip any non-DELIMITERS up to the end of the string. */
token = s;
while (strchr (delimiters, *s) == NULL)
s++;
if (*s != '\0')
{
*s = '\0';
*save_ptr = s + 1;
}
else
*save_ptr = s;
return token;
}
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}