473,320 Members | 2,189 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Using sscanf() to parse a buffer string containing multiple fixed-length sub-strings

5
I used the %Width[^]s format specifier, in which "Width" specifies the maximum number of characters to be read for the value of the associated variable. It does not appear to work properly or it is incorrect.


Expand|Select|Wrap|Line Numbers
  1. char inbuf[128] = "\0";            //input string just read from infile
  2. char obs_sta[32] = "\0";           //name of observation station
  3. char sky_wx[16] = "\0";            //sky and weather conditions
  4. char tmp[8] = "\0";                //dry bulb temperature (?F)
  5. char dp[8] = "\0";                 //dew point temperature (?F)
  6. char rh[8] = "\0";                 //relative humidity (%)
  7. char wind[16] = "\0";              //wind speed/direction/gust_speed
  8. char pres[16] = "\0";              //barometric pressure (in Hg)
  9. char *rise_fall = '\0';            //pressure rising (R) or falling (F)
  10.                                    //indicatior
  11. char remarks[16] = "\0";
  12.  
  13. //read a record from text file
  14. fgets (inbuf, sizeof(inbuf), fp1);
  15.  
  16. //Five examples of fixed-length fields record layout:
  17. //CITY           SKY/WX    TMP DP  RH WIND      PRES   REMARKS
  18. //ELLINGTON FLD  PTSUNNY   90  75  62 SW10G18   29.94F HAZE    HX 100
  19. //*ZAPATA        SUNNY     99  61  28 SE13G20   29.74F HX 100
  20. //FORT STOCKTON  SUNNY    102 -17   1 SW24G30   29.78F HX 93
  21. //GUYMON         NOT AVBL
  22. //SANTA FE       SUNNY     81   7   6 VRB6G23   30.02F SMOKE   HX 75
  23. //field lengths are as in the sscanf format statement below
  24. //The CITY field is 15 characters long, the SKY/WX field is 8 characters long;
  25. //and the REMARKS field contains those remaining characters to the newline.
  26. //Each of these fields can contain a string with embedded spaces.
  27. //inbuf correctly contains the entire record
  28.  
  29. //parse the record into its components (all treated as non-numeric values)
  30. sscanf(inbuf, "%15[^]s%8[^]s%4s%4s%4s%10s%6s%c%[^]s", 
  31.        obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
  32.  
  33.  
  34. //print a parsed line
  35. printf("full line:   [%15s][%8s][%4s][%4s][%4s][%10s][%6s][%c][%s]\n",
  36.         obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
  37.  
  38. //Five examples of parsed output:
  39. //[ELLINGTON FLD  ][        ][    ][    ][    ][          ][      ][ ][]
  40. //[*ZAPATA        ][        ][    ][    ][    ][          ][      ][ ][]
  41. //[FORT STOCKTON  ][        ][    ][    ][    ][          ][      ][ ][]
  42. //[GUYMON         ][        ][    ][    ][    ][          ][      ][ ][]
  43. //[SANTA FE       ][        ][    ][    ][    ][          ][      ][ ][]
May 30 '13 #1
7 6316
Oralloy
985 Expert 512MB
IgorXX,

Try changing your format a little bit, I think it's trying to find a really screwball set of strings, which is not what you want....
Expand|Select|Wrap|Line Numbers
  1. sscanf(inbuf, "%15[^]%8[^]%4s%4s%4s%10s%6s%c%[^]", 
  2.        obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
It looked like you had extraneous "s" characters after the "scanset" elements. These "s" characters would have to be explicitly matched, and they aren't in your input.

Luck!
Oralloy
May 31 '13 #2
IgorXX
5
Oralloy,

Thanks for your help. I must really be missing something. Have worked the format string down to

"%15[^]%8[^]%8[^]%4[^]%4s%6s%16s%8c%[^]"

with the folowing results:

[ELLINGTON FLD ][PTSUNNY ][90][75][62][ SW10G18][][ ][]
[*ZAPATA ][SUNNY ][99][61][28][ SE13G20][][ ][]
[FORT STOCKTON ][SUNNY ][102][-17][1][ SW24G30][][ ][]
[GUYMON ][NOT AVBL][81][7][6][ VRB6G23][][ ][]
[SANTA FE ][SUNNY ][81][7][6][ VRB6G23][][ ][]

but can advance no further. The 3rd, 6th, 7th, and 8th "Width" specifiers are totally screwball. The 3rd one was rigged to get "NOT" and "AVBL" to be treated as one string; the rest followed.
P.S. These data were clipboarded into Notepad from the NOAA NWS hourly weather roundup for a particular state. E.g., http://www.nws.noaa.gov/view/prodsBy...rodtype=hourly for Wyoming.

IgorXX
May 31 '13 #3
Oralloy
985 Expert 512MB
Hey IgorXX,

All of your "%s" formats are still variable-width, blank terminated. I expect that this is having catestrophic consequences on your parsing.

Let's try one of the following format options, instead:
Expand|Select|Wrap|Line Numbers
  1. //INPUT EXAMPLE for WIDTH CHECKING
  2. //....1..../....2..../....3..../....4..../....5..../....6..../....7..../
  3. //....15........|....8..|.4.|.4.|.4.|...10....|...6.||.................
  4. //ELLINGTON FLD  PTSUNNY   90  75  62 SW10G18   29.94F HAZE    HX 100
  5.  
  6. //Using your format methodology
  7. // NUL terminates all output strings.
  8. // Last string is of variable length to end of input string.
  9. // Added error check
  10. int count = sscanf(inbuf, "%15[^]%8[^]%4[^]%4[^]%4[^]%10[^]%6[^]%c%[^]", 
  11.                    obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
  12. if (9 != count)
  13.   printf("ERROR: failed to parse input correctly, count = %d\n", count);
  14.  
  15. //Alternately you might try this format
  16. // Does NOT insert NUL byte at end of each output string, except for remarks.
  17. // Last string is of variable length to end of input string.
  18. // Added error check
  19. sscanf(inbuf, "%15c%8c%4c%4c%4c%10c%6c%1c%[^]",
  20.        obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
  21. if (9 != count)
  22.   printf("ERROR: failed to parse input correctly, count = %d\n", count);
  23.  
  24.  
Observe that I also included error checking. The return value of sscanf can be very illuminating, when there are problems.

Good Luck!
Oralloy
Jun 1 '13 #4
IgorXX
5
Oralloy

Thanks again. At your suggestion, I used error checking. The output lines below are from a debug display using printf("[%s][%s][%s][%s][%s][%s][%s][%c][%s]\n" . The brackets clearly show what is assigned to each variable.

Your suggested format string "%15c%8c%4c%4c%4c%10c%6c%1c%[^]" works better but gives:
[ELLINGTON FLD ][PTSUNNY ][ 90][ 75][ 62][ SW10G18 ][ 29.94][ ][]
[*ZAPATA ][SUNNY ][ 99][ 61][ 28][ SE13G20 ][ 29.74][ ][]
[FORT STOCKTON ][SUNNY ][ 102][ -17][ 1][ SW24G30 ][ 29.78][ ][]
[SANTA FE ][SUNNY ][ 81][ 7][ 6][ VRB6G23 ][ 30.02][ ][]
Note: I have eliminated the GUYMON observation station line because it is a distraction to the immediate problem. Still unable to pick up the single character (field #8) and the remarks field (#9).
Tried the format string "%15c%8c%4c%4c%4c%10c%6c%2c%[^]" and oddly got:
[ELLINGTON FLD ][PTSUNNY ][ 90][ 75][ 62][ SW10G18 ][ 29.94][ ][HAZE]
[*ZAPATA ][SUNNY ][ 99][ 61][ 28][ SE13G20 ][ 29.74][ ][HX]
[FORT STOCKTON ][SUNNY ][ 102][ -17][ 1][ SW24G30 ][ 29.78][ ][HX]
[SANTA FE ][SUNNY ][ 81][ 7][ 6][ VRB6G23 ][ 30.02][ ][SMOKE]
The field #8 Width specifier matters not for that field. No Width specifier for field #9 changes things. It is clear to me that the library implementation of the format string in sscanf() does not act as [I think] intended. Othen than my initial ignorance of the proper use of "%[^]", I doubt I would have had problems with BSD or AT&T implementations. Maybe I should tilt at the fread() windmill.... :)

Alas I rue having access to a UNIX machine where nawk or sed, egrep and cut would make short work of the entire file. On the modern variation on the original proverb below, attributed to the British playwright Ben Jonson in his 1598 play, "Every Man in His Humour", first performed by William Shakespeare: "Curiosity killed the cat, satisfaction brought it back." Well, curiosity was framed; ignorance killed the cat. And the same is true for programmers and software engineers.

IgorXX
Jun 3 '13 #5
Oralloy
985 Expert 512MB
IgorXX,

Use not thyne ancient tales of cats on me, for I shall hear them and laugh.

This all harkens back to the days of FORTRAN card-based input, with fixed length fields in the input records. Were I truly evil, I would suggest that we do the I/O in FORTRAN, but that would be a bit over the top, don't you think?

I/O is the pits in any project. I just hate it when something is way overboard like this is.

From the looks of it, the last group of formats that I suggested may be off by a character or two. You know your input field widths, so you can double check that.

Did you try the format that I suggested in line 10 of my previous post?

Expand|Select|Wrap|Line Numbers
  1. //Using your format methodology
  2.  7. // NUL terminates all output strings.
  3.  8. // Last string is of variable length to end of input string.
  4.  9. // Added error check
  5.  10. int count = sscanf(inbuf, "%15[^]%8[^]%4[^]%4[^]%4[^]%10[^]%6[^]%c%[^]", 
  6.  11.                    obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
  7.  12. if (9 != count)
  8.  13.   printf("ERROR: failed to parse input correctly, count = %d\n", count);
Jun 3 '13 #6
IgorXX
5
Oralloy,

I will have to wait until maybe Wednesday before working on this nay more as Micsoroft has been trying to clear up my machine (Trojans and a trashed regisrty file among other things). Their server was up and down today so they couldn't finish. Now must "wait til the morrow". The mention of FORTRAN brings back memories to an old, toothless programmer. It was the first language I learned, even before proper English. Still have the old MS PowerStation Development System for Windows and MS-DOS ver 1.0 from 1993 (supporting Fortran77). Haven't tried to start it up, so don't know if it would even run on a more-recent version of Windows.

Not sure if I should pursue the sscanf() any further as I feel I have exhausted reasonable format specifiers (string, character, and width), including many variations on your suggestion. Sometimes in the blissful glow of ignorance, a meat axe is the only recourse. Stir-fry kitten anyone?

IgorXX
Jun 3 '13 #7
IgorXX
5
Oralloy

Notes on my last reply:
1. Much earlier in our conversation, I had incorrectly defined rise_fall as
char *rise_fall = '\0';
It should have been
char rise_fall = '\0';
2. My last sentence beginning with "Alas I rue having" should have read "Alas I rue not having".

Having abruptly awoken at 2 AM with drool running down the side of my mouth by a Bronco "Chicken Gristle Grinder" info-mercial on TV, I remembered that a char is like an unsigned int (and all numeric values) and when used in the scanf() family of functions, requires its address. I.e., &rise_fall, not the value stored in rise_fall. A string name, on the other hand, stores the address of the string array's first element, so the "address of" indicator '&', is not used. Now, if that isn't the cow's "Mooo".

C is a beguiling, rigid and unforgiving sort, always allowing one to have their way with her. And if one strays too far from her, there will be consequences. Consequences, indeed.

char rise_fall = '\0'; //pressure rising (R) or falling (F) indicatior

count = sscanf(inbuf, "%15c%8c%4c%4c%4c%10c%6c%c%[^]",
obs_sta, sky_wx, tmp, dp, rh, wind, pres, &rise_fall, remarks);
[ELLINGTON FLD ][PTSUNNY ][ 90][ 75][ 62][ SW10G18 ][ 29.94][F][]
ERROR: failed to parse input correctly, count = 8
[FORT STOCKTON ][SUNNY ][ 102][ -17][ 1][ SW24G30 ][ 29.78][F][]
ERROR: failed to parse input correctly, count = 8
[*ZAPATA ][SUNNY ][ 99][ 61][ 28][ SE13G20 ][ 29.74][F][]
ERROR: failed to parse input correctly, count = 8
[SANTA FE ][SUNNY ][ 81][ 7][ 6][ VRB6G23 ][ 30.02][F][]
ERROR: failed to parse input correctly, count = 8

Now how to capture that last field....

IgorXX
Jun 4 '13 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

15
by: Jeannie | last post by:
Hello group! I'm in Europe, traveling with my laptop, and I don't any compilers other than Borland C++ 5.5. available. I also don't have any manuals or help files available. Sadly, more...
1
by: Andy Britcliffe | last post by:
Hi I'm faced with the situation where I could have a single physical file that could contain multiplie XML documents e.g file.txt contains the following: <?xml version="1.0"...
10
by: Reiner Merz | last post by:
Hi, I'm looking for advice on how to parse a timestamp string according to the ISO 8601 specification. For those unfamiliar with the standard, here's an example: 2003-09-09T23:00:00Z...
19
by: linzhenhua1205 | last post by:
I want to parse a string like C program parse the command line into argc & argv. I hope don't use the array the allocate a fix memory first, and don't use the memory allocate function like malloc....
3
by: Tran Hong Quang | last post by:
Hi, I am new to C. How to generate an fixed-length string containing an random digits? for example string of 5 characters, the value can be 03234 or 23423 or 02343 Thanks Tran Hong Quang
7
by: Eric Lilja | last post by:
Hello, I have an unsigned long that I need to convert to a std::string. The unsigned long holds 32-bit checksums and sometimes the most significant byte is 0 and in those cases the string should be...
1
by: Jason S | last post by:
Is there a way to get the position of multiple substrings that match a regexp without using closures? match() returns the substrings themselves, not the positions, and search() seems to only return...
0
by: Jack | last post by:
Hi, I do a webrequest and it returns some text data in a stream. I want to put this tyext data into a string. I've got it working just fine, but I have to put the text data into into a...
1
by: boss1 | last post by:
i have textfile containing multiple rows.Each row containing same type of strings.now my problem is i can read only one row frequently but i can't read multiple rows. so how can i solve this...
1
by: (2b|!2b)==? | last post by:
I am expecting a string of this format: "id1:param1,param2;id2:param1,param2,param3;id" The tokens are seperated by semicolon ";" However each token is really a struct of the following...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.