473,473 Members | 2,316 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Initializing an array comprised of very long strings

I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support

Luckily gcc supports (much) longer strings so the warnings are just
warnings. However this makes me wonder if there isn't some clever
preprocessor trick that is standards compliant to get past this limit?
For instance, could several shorter strings be combined somehow into a
single longer string within the header file, or must the longer
string be constructed at run time to safely avoid this warning?
That is, is this string length limit for any const char * no matter
how it is put together, or is it just a limitation that applies to
statements like:

astring = "......(many characters)...";

where the limitation is on the right side of the expression?

Thanks,

David Mathog
May 4 '07 #1
5 5526
David Mathog wrote:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support

Luckily gcc supports (much) longer strings so the warnings are just
warnings. However this makes me wonder if there isn't some clever
preprocessor trick that is standards compliant to get past this limit?
For instance, could several shorter strings be combined somehow into a
single longer string within the header file, or must the longer
string be constructed at run time to safely avoid this warning?
One possibility:

char string[] = { 'o', 'n', 'e', ' ', 'b', 'y', ' ', 'o', 'n',
'e', ..., '\0' };

Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;

Both suggestions are hard to maintain. Constructing strings at run
time is probably a better idea.

Or you could ignore or disable the warning.

May 4 '07 #2
David Mathog <ma****@caltech.eduwrites:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support

Luckily gcc supports (much) longer strings so the warnings are just
warnings. However this makes me wonder if there isn't some clever
preprocessor trick that is standards compliant to get past this limit?
[...]

The limit is on the length of a string *literal*, not of a string.
Specifically (C99 5.2.4.1, Translation limits):

-- 4095 characters in a character string literal or wide string
literal (after concatenation)

-- 65535 bytes in an object (in a hosted environment only)

As long as you don't hit the 65535-byte limit, you can build up the
string at runtime from a set of compile-time string literals. This is
likely to be wasteful of space, since you'll have two copies of all
the data. Some cleverness will also be required to avoid scanning the
data multiple times; for example, a simple series of strcat() calls:

char the_whole_thing[BIG_ENOUGH];
the_whole_thing[0] = '\0';
strcat(the_whole_thing, s[0]);
strcat(the_whole_thing, s[1]);
strcat(the_whole_thing, s[2]);
/* ... */

will re-scan the_whole_thing each time to determine where to start
appending.

You might be better off just ignoring the warning, assuming you're not
concerned about the possibility of a compiler that actually imposes a
fixed limit on the size of a string literal.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 4 '07 #3
Harald van Dijk <tr*****@gmail.comwrites:
David Mathog wrote:
>I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support
[snip]
One possibility:

char string[] = { 'o', 'n', 'e', ' ', 'b', 'y', ' ', 'o', 'n',
'e', ..., '\0' };

Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;
Interesting. That takes advantage of the fact that a string literal
in an initializer doesn't have a trailing '\0' if it's *exactly* the
declared size. A simpler example is:

char s[3] = "foo";

Of course, if you accidentally make any of the literals too short, the
compiler will silently insert a '\0' for you. I wouldn't try that
kind of thing unless I had written a program to generate the C source
code for me.
Both suggestions are hard to maintain. Constructing strings at run
time is probably a better idea.

Or you could ignore or disable the warning.
Or (I forgot to mention this in my earlier followup) you could read
the data from a file. You (the OP) may have a good reason not to want
to do that, or you probably wouldn't be asking how to do it directly
in your program.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 4 '07 #4
Keith Thompson wrote:
Harald van Dijk <tr*****@gmail.comwrites:
>David Mathog wrote:
>>I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:
>Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;

Interesting. That takes advantage of the fact that a string literal
in an initializer doesn't have a trailing '\0' if it's *exactly* the
declared size. A simpler example is:

char s[3] = "foo";

Of course, if you accidentally make any of the literals too short, the
compiler will silently insert a '\0' for you. I wouldn't try that
kind of thing unless I had written a program to generate the C source
code for me.
The include file is generated by a script or program, unfortunately
one I don't yet have access to. In any case, the actual format is
currently like this (there are more than 2 scripts, but this illustrates
the point):

char *PerlScriptFile[]={"script1...","script2...");

where the scripts are all sorts of different lengths, and of course the
whole thing is awash in backslash escape characters, lines are all 52
characters long (ending in \ EOL, so effectively 50 characters per
line), and it goes on for several thousand lines. Anyway, if I'm
following this correctly, then doing something like this:

char script1[4500]="script1...";
char script2[7654]="script2...";
char *PerlScriptFile[]={script1,script2};

would eliminate the warnings, so long as the number of characters
used exactly matches the number of characters within the double quotes.

(I think I would have had the program copy from a file or files as well,
instead of doing it this way, but I believe the program's author did
this so that his program could generate these scripts without having to
look around for the source scripts.)

Thanks,

David Mathog
May 7 '07 #5
David Mathog wrote:
Keith Thompson wrote:
>Harald van Dijk <tr*****@gmail.comwrites:
>>David Mathog wrote:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:
>>Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;

Interesting. That takes advantage of the fact that a string literal
in an initializer doesn't have a trailing '\0' if it's *exactly* the
declared size. A simpler example is:

char s[3] = "foo";

Of course, if you accidentally make any of the literals too short, the
compiler will silently insert a '\0' for you. I wouldn't try that
kind of thing unless I had written a program to generate the C source
code for me.

The include file is generated by a script or program, unfortunately
one I don't yet have access to. In any case, the actual format is
currently like this (there are more than 2 scripts, but this illustrates
the point):

char *PerlScriptFile[]={"script1...","script2...");

where the scripts are all sorts of different lengths, and of course the
whole thing is awash in backslash escape characters, lines are all 52
characters long (ending in \ EOL, so effectively 50 characters per
line), and it goes on for several thousand lines. Anyway, if I'm
following this correctly, then doing something like this:

char script1[4500]="script1...";
char script2[7654]="script2...";
char *PerlScriptFile[]={script1,script2};

would eliminate the warnings, so long as the number of characters
used exactly matches the number of characters within the double quotes.
Such a solution is error phrone and not easy to maintain, IMHO. How
about folding the lines in each array, something like this?

$ cat a.c
#include <stdio.h>

const char *script1[] = {
"line 1",
"line 2",
"line 3",
"line 4",
};

const char *script2[] = {
"line 1",
"line 2",
"line 3",
"line 4",
};

struct {
size_t nlines;
const char **code;
} scripts[] = {
{ sizeof script1 / sizeof *script1, script1 },
{ sizeof script2 / sizeof *script2, script2 },
};

int main(void)
{
size_t i, j, nscripts = sizeof scripts / sizeof *scripts;

for(i = 0; i < nscripts; i++) {
for(j = 0; j < scripts[i].nlines; j++) {
printf("%s\n", scripts[i].code[j]);
}
}

return 0;
}
$ gcc -ansi -pedantic -W -Wall -o a a.c

$ ./a
line 1
line 2
line 3
line 4
line 1
line 2
line 3
line 4

The line lengths can now vary and you can have as many lines per
script(array) as you like. You may need to write a tiny script that
reformats the original code, but that's doable. ;-)

Bjørn
[snip]
May 7 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Bart Goeman | last post by:
Hi, I have a question about how to put redundant information in data structures, initialized at compile time. This is often necessary for performance reasons and can't be done at run time (data...
7
by: arkobose | last post by:
hey everyone! i have this little problem. consider the following declaration: char *array = {"wilson", "string of any size", "etc", "input"}; this is a common data structure used to store...
7
by: Eric Johannsen | last post by:
Hi, My C# code is calling VB6 code, which expects all (fixed-length) strings to be padded with spaces. The strings are contained with a struct, something like this (attributes to simulate...
1
by: Jeff | last post by:
I am struggling with the following How do I marshal/access a pointer to an array of strings within a structure Than Jef ----------------------------------------------------------------
26
by: alberto | last post by:
Hi. Im newbie in C language. I have a binary file with many character arrays of 50 character defined as char array But in some cases, many of these 50 characters are not being used. I would...
7
by: nk | last post by:
Hi, I'm a newbie on this language. I would be very happy if you help me about the following issue: The code below, reads some names(strings), stores them, and stores the addresses in the pointer...
34
by: newsposter0123 | last post by:
The code block below initialized a r/w variable (usually .bss) to the value of pi. One, of many, problem is any linked compilation unit may change the global variable. Adjusting // rodata const...
13
by: WaterWalk | last post by:
Hello. When I consult the ISO C++ standard, I notice that in paragraph 3.6.2.1, the standard states: "Objects with static storage duration shall be zero-initialized before any other...
6
by: Jai Prabhu | last post by:
Hi All, Consider the following piece of code: void func (void) { static unsigned char arr = "\x00\xAA\xBB"; fprintf (stderr, "0x%x\n", arr); fprintf (stderr, "0x%x\n", arr);
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.