471,086 Members | 1,079 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,086 software developers and data experts.

Trouble using string functions

Hi all,

I tried posting this through a free news server, but it
still has not appeared in Google, so if it turns up again
I apologize.

I hope someone can help me with this, or at least help me
find some information that will help me. If I were not at my
wit's end already, I wouldn't even ask. I'm used to doing all
of my programming in Windows, but now I have a task to
accomplish in UNIX/Linux using good old gcc.

Basically, what I have to do is parse a JavaScript file that
will ALWAYS have the following format:

************************************************
<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
....(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
************************************************
It is auto-generated by another web site, and my job is to
get it to plain HTML format so that it can be linked to in an
email. What we want is for the end result to be:

************************************************

<TABLE BORDER="0" CELLPADDING="2">
<TR>
....(many more lines no longer with the "document.writeln('" )
</TR>
</TABLE>

************************************************

I have a good bit of it done, but I am getting stuck. Here is
the source I have so far in main:
{
int i, nc;
nc = 0;
for(int j = 1; j < 5; j++)
i = getchar();
while (i != EOF)
{
nc = nc + 1;
i = getchar();
if(i == '\n')
{
printf("%c", i);
for(int j = 1; j < 19; j++)
i = getchar();
}
else
{
printf("%c", i);
}
}
}

I am using redirction, i.e.: "prog<infile>outfile" to do this.

This code gets me close to what I need, but I still have the
remaining " '); " on the end of each string, which will not
do for obvious reasons. I tried reading into a string using
strcat(), which might do if I could just get it to work right,
for some reason my program will not run, no matter what I try
with strcat(). It will compile without any errors though.
I'm just not accustomed to using a language that does not
have an inherent "string" type. Everything is done with the
"char" type, and with the way some functions only want
pointers, etc, etc, well it's got me a bit confused and I've
spent WAY too much time on this already.

Can anyone help a fellow out?
Nov 14 '05 #1
6 1488

"Gary Morris" <gm*******@carolina.rr.com> wrote in message
news:1b**************************@posting.google.c om...
Hi all,

I tried posting this through a free news server, but it
still has not appeared in Google, so if it turns up again
I apologize.

I hope someone can help me with this, or at least help me
find some information that will help me. If I were not at my
wit's end already, I wouldn't even ask. I'm used to doing all
of my programming in Windows, but now I have a task to
accomplish in UNIX/Linux using good old gcc.

Basically, what I have to do is parse a JavaScript file that
will ALWAYS
Never say "always" :-)

have the following format:

************************************************
<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
...(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
************************************************
It is auto-generated by another web site, and my job is to
get it to plain HTML format so that it can be linked to in an
email. What we want is for the end result to be:

************************************************

<TABLE BORDER="0" CELLPADDING="2">
<TR>
...(many more lines no longer with the "document.writeln('" )
</TR>
</TABLE>

************************************************
I am using redirction, i.e.: "prog<infile>outfile" to do this.

Can anyone help a fellow out?


Instead of using your approach of depending upon an exact
number and position of characters on each line, the below
extracts all characters between the first occurring pair
of delimiter characters (') on each line. I.e. lines
with less than two delimiters will be skipped, and characters
(if any) past the second delimiter will be skipped.
#include <stdio.h>
#include <string.h>

#define LINE_LEN 128 /* adjust to your needs */

int main()
{
char line[LINE_LEN] = {0};
char delim = '\'';
char *p1 = 0;
char *p2 = 0;

while(fgets(line, sizeof line, stdin))
if(p1 = strchr(line, delim))
if(p2 = strchr(++p1, delim))
{
*p2 = 0;
puts(p1);
}

return 0;
}
Input:

<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
....(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
Output:

<TABLE BORDER="0" CELLPADDING="2">
<TR>
</TR>
</TABLE>
This code has not been thoroughly tested. I'll let you do that. :-)

HTH,
-Mike
Nov 14 '05 #2
mk******@mkwahler.net says...
Instead of using your approach of depending upon an exact
number and position of characters on each line, the below
extracts all characters between the first occurring pair
of delimiter characters (') on each line. I.e. lines
with less than two delimiters will be skipped, and characters
(if any) past the second delimiter will be skipped.

#include <stdio.h>
#include <string.h>

#define LINE_LEN 128 /* adjust to your needs */


This cannot be "adjusted" to the OP's needs. The OP did not say that the
automatically generated output had any line length limit, and given that it
*is* autogenerated, I rather doubt that it will obey any such trivially short
line length.

Using the better string library (http://bstring.sf.net/), this problem, and the
brittleness of the your solution (only searching for ') is trivially solved:

-------------------------------------------------------------------------------
#include <stdio.h>
#include "bstrlib.h"

int parseLines (bstring src) {
struct tagbstring token0 = bsStatic ("document.writeln('");
struct tagbstring token1 = bsStatic ("');");
struct tagbstring t, u;
int i, j;

/* Reference to where 1st token might match in src string */
blk2tbstr (t, src->data, token0.slen);
for (i=0; i < src->slen - token0.slen; i++) {

/* Does the 1st token match exactly? */
if (biseq (&t, &token0)) {

/* Reference to where 2nd token might match */
blk2tbstr (u, t.data, token1.slen);
for (j = i; j < src->slen - token1.slen; j++) {

/* Does the 2nd token match exactly? */
if (biseq (&u, &token1)) {

/* Construct middle string */
bstring b = blk2bstr (t.data + token0.slen,
j - i - token0.slen);

/* Output the '\0' terminated buffer */
puts (b->data);
bdestroy (b);
break;
}

/* Shift 2nd token scan downward */
u.data++;
}
}

/* Shift 1st token scan downward */
t.data++;
}
return 0;
}

int main (int argc, char * argv[]) {
FILE * fp;

if (argc < 2) {
printf ("%s [inputfile]\n", argv[0]);
return -__LINE__;
}

if (NULL != (fp = fopen (argv[1], "rb"))) {
/* Just read the whole file into a bstring */
bstring src = bread ((bNread) fread, fp);
int ret = parseLines (src);
fclose (fp);
bdestroy (src);
return ret;
}
return -__LINE__;
}
-------------------------------------------------------------------------------

Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #3
"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<uq*******************@newsread1.news.pas.ear thlink.net>...
"Gary Morris" <gm*******@carolina.rr.com> wrote in message
news:1b**************************@posting.google.c om...
Hi all,

I tried posting this through a free news server, but it
still has not appeared in Google, so if it turns up again
I apologize.

I hope someone can help me with this, or at least help me
find some information that will help me. If I were not at my
wit's end already, I wouldn't even ask. I'm used to doing all
of my programming in Windows, but now I have a task to
accomplish in UNIX/Linux using good old gcc.

Basically, what I have to do is parse a JavaScript file that
will ALWAYS
Never say "always" :-)


I only say that because another computer generates the script. Why
they would ever change it I can't imagine!
have the following format:

************************************************
<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
...(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
************************************************
It is auto-generated by another web site, and my job is to
get it to plain HTML format so that it can be linked to in an
email. What we want is for the end result to be:

************************************************

<TABLE BORDER="0" CELLPADDING="2">
<TR>
...(many more lines no longer with the "document.writeln('" )
</TR>
</TABLE>

************************************************


I am using redirction, i.e.: "prog<infile>outfile" to do this.

Can anyone help a fellow out?


Instead of using your approach of depending upon an exact
number and position of characters on each line, the below
extracts all characters between the first occurring pair
of delimiter characters (') on each line. I.e. lines
with less than two delimiters will be skipped, and characters
(if any) past the second delimiter will be skipped.
#include <stdio.h>
#include <string.h>

#define LINE_LEN 128 /* adjust to your needs */

int main()
{
char line[LINE_LEN] = {0};
char delim = '\'';
char *p1 = 0;
char *p2 = 0;

while(fgets(line, sizeof line, stdin))
if(p1 = strchr(line, delim))
if(p2 = strchr(++p1, delim))
{
*p2 = 0;
puts(p1);
}

return 0;
}
Input:

<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
...(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
Output:

<TABLE BORDER="0" CELLPADDING="2">
<TR>
</TR>
</TABLE>
This code has not been thoroughly tested. I'll let you do that. :-)

HTH,
-Mike


The code has been thouroghly tested with several of these scripts
and it works perfectly every time so far! Thanks a bunch Mike. I'll
bet this took you all of 5 minutes at the most to cook up, whereas
I had spent quite a few hours trying all manner of different things.
Now I wish I had actually USED that C++ compiler that I got in the
mid 1990's.
Nov 14 '05 #4
Oops, I spoke too soon! I just tried running this on the latest
version, and wouldn't you know that one of the lines had a ' in
it. Being javascript, it is escaped with the backslash like:

another\'s

Given this, it should be a fairly easy matter to check for the
escape character and ignore the next character. Fairly simple
for someone else, that is, but I am certainly going to try now
that I've got something that actually (almost) works like I
need it to.
The code has been thouroghly tested with several of these scripts
and it works perfectly every time so far! Thanks a bunch Mike. I'll
bet this took you all of 5 minutes at the most to cook up, whereas
I had spent quite a few hours trying all manner of different things.
Now I wish I had actually USED that C++ compiler that I got in the
mid 1990's.

Nov 14 '05 #5
"Gary Morris" <gm*******@carolina.rr.com> wrote in message
news:1b**************************@posting.google.c om...
Oops, I spoke too soon! I just tried running this on the latest
version, and wouldn't you know that one of the lines had a ' in
it. Being javascript, it is escaped with the backslash like:

another\'s

Given this, it should be a fairly easy matter to check for the
escape character and ignore the next character. Fairly simple
for someone else, that is, but I am certainly going to try now
that I've got something that actually (almost) works like I
need it to.


Hint for a 'quick-n-dirty' fix:

The function 'strchr()' has a counterpart which starts
searching from the end of a string instead of from the
beginning: 'strrchr()'.

-Mike
Nov 14 '05 #6
gm*******@carolina.rr.com (Gary Morris) wrote:
"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<uq*******************@newsread1.news.pas.ear thlink.net>...
"Gary Morris" <gm*******@carolina.rr.com> wrote in message
news:1b**************************@posting.google.c om...
Basically, what I have to do is parse a JavaScript file that
will ALWAYS


Never say "always" :-)


I only say that because another computer generates the script. Why
they would ever change it I can't imagine!
have the following format:


Hohum. Beware the snark. I've said the same thing before. There was this
other computer, which was supposed to generate data, and that was sent
to me to process. So I wrote a program to process it. Should be a simple
job - after all, it was all computer-generated data, and what reason
could they possibly have for changing the format?

No prizes for guessing what happened two months later.

Richard
Nov 14 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Penn Markham | last post: by
2 posts views Thread by Pjotr Wedersteers | last post: by
9 posts views Thread by Jakle | last post: by
8 posts views Thread by Exits Funnel | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.