473,792 Members | 3,400 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

parsing from file

Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius
Nov 14 '05 #1
8 1724
Darius Fatakia wrote:

I have a file that I have opened for reading and this file
contains lines with several different types of constraint
information. For example, here are a few lines:

length(0) = 10 Duration of task 0 is 10.
needs(16,1) Operation 16 uses resource 1.
before(49,9) Operation 49 must be before operation 9.
release(17) = 0 Operation 17 can start at or after time 0.
due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string)
and then i have either one or 2 parameters (both integers) inside
the parentheses, and then possibly (for due, release, and length)
an integer value.

I am wondering what the best way to parse this input would be,
given that I don't know what type of constraint I will encounter
when I read in the line.


If you can change the file format, it would be simplified by a
single format, such as:

<constraint> '(' <integer> [',' <integer>] ')'

Then you could read the initial string up to the '(', check it
against a list of valid values, and either flush the line with an
error message or read the appropriate parameters. The '=' chars
in your list seem totally unnecessary, and the simple parentheses
delimited parameters enable flushing the (assumed) comment portion
of the line easy.

Then you would have:

length(0,10)
release(17,0)
due(0,149)

At any rate, I would build anything around getc() and a few tests.

--
"I'm a war president. I make decisions here in the Oval Office
in foreign policy matters with war on my mind." - Bush.
"Churchill and Bush can both be considered wartime leaders, just
as Secretariat and Mr Ed were both horses." - James Rhodes.
Nov 14 '05 #2
Darius Fatakia wrote:
Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius


Here is my recommendation:

1/ Read the entire line into a buffer.
2. Extract the constraint type.
3. Execute a function for the restraint type. Pass the string
and optionally the position (after the parenthesis). This
function will take care of parsing the rest of the parameters
for the constraint type.
Since "switch" statements don't work with strings, I recommend
using a table of <constraint_nam e, function_pointe r>.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.l earn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library

Nov 14 '05 #3

"Darius Fatakia" <da************ @yahoo.com> a écrit dans le message de
news:c8******** **@news.Stanfor d.EDU...
Hello,
Hi,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i have either one or 2 parameters (both integers) inside the parentheses, and then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I don't know what type of constraint I will encounter when I read in the line. Thanks!


If your lines stricly follow a format such
constraint_type _name(a<opt>,b</opt>) <opt>= c</opt> (the opt tags meaning
optional parts of the line) I would process the file line after line (with
fgets()) and use fscanf() with the corresponding format specifier, this
latter being built according to if the ',' and/or '=' characters have been
found or not thanks to the strchr() function.

Another way is to use strchr() and strtol(). e.g:

/* Ugly example, not modularized, not safe, but it's able to parse according
to your specs */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

int main(int argc, char *argv[])
{
FILE * fp;

char linebuffer[50];
char constraint_type[50] = { 0 };

char *p_left, *comma, *equal;

if (argc < 2)
{
fprintf(stderr, "Usage : %s <file_to_parse> \n", argv[0]);
return EXIT_FAILURE;
}

fp = fopen(argv[1],"r");

if (fp)
{
linebuffer[49] = '\0';

while (fgets(linebuff er, 50, fp))
{
int a, b, c;

/* Using INT_MIN as dummy value*/
a = b = c = INT_MIN;

if (linebuffer[0] == '\n') continue;

p_left = strchr(linebuff er,'(');

if(p_left)
{
memset(constrai nt_type,0,50);
strncpy(constra int_type,linebu ffer,p_left-linebuffer);
a = strtol(p_left+1 , NULL, 10);
comma = strchr(p_left,' ,');
b = (comma) ? strtol(comma+1, NULL, 10) : INT_MIN;
equal = strchr(p_left,' =');
c = (equal) ? strtol(equal+1, NULL, 10) : INT_MIN;
}

if (c != INT_MIN)
{
if (b != INT_MIN)
{
printf("%s => parameters: %d,%d ; "
"assignemen t: %d\n",
constraint_type , a, b, c);
}
else
{
printf("%s => parameter: %d ; "
"assignemen t: %d\n",
constraint_type , a, c);
}
}
else
{
if (b != INT_MIN)
{
printf("%s => parameters: %d,%d\n",
constraint_type , a, b);
}
else
{
printf("%s => parameter: %d\n",
constraint_type , a);
}
}
}

}
else
{
fprintf(stderr, "Unable to open : %s\n", argv[1]);
return EXIT_FAILURE;
}

return EXIT_SUCCESS;
}

Given this text file:
length(0) = 10
needs(16,1)
before(49,9)
release(17) = 0
due(0) = 149

The program outputs:
length => parameter: 0 ; assignement: 10
needs => parameters: 16,1
before => parameters: 49,9
release => parameter: 17 ; assignement: 0
due => parameter: 0 ; assignement: 149
Regis
Nov 14 '05 #4
> If your lines stricly follow a format such
constraint_type _name(a<opt>,b</opt>) <opt>= c</opt> (the opt tags meaning
optional parts of the line) I would process the file line after line (with
fgets()) and use fscanf() with the corresponding format specifier, this

[...]
^^^^^^
I meant sscanf()
Nov 14 '05 #5
Darius Fatakia wrote:
Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius

A thumb rule to deal with files is as follows -

Copy all file contents to memory.
Close the file
Process the file contents from data saved in Step 1.

This would give a big performance boost.

For eg-
while (!feof(fp) ) {
fscanf( fp, "%s", buff);
}
--
Karthik.
Humans please 'removeme_' for my real email.
Nov 14 '05 #6

"Darius Fatakia" <da************ @yahoo.com> wrote in message news:c8******** **@news.Stanfor d.EDU...
Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius


The format of the file needs to be pretty uniform in order to use
the following method:

F:\Vijay\C> type scanf.c
#include <stdio.h>
#include <stdlib.h>

int
main ( void )
{
int i, j, k, l, n;

n = scanf ( "length(%d) = %d duration of task %d is %d", &i, &j, &k, &l );
if ( n == 4 )
printf ( "n = %d\ni = %d\nj = %d\nk = %d\nl = %d\n", n, i, j, k, l );
return EXIT_SUCCESS;
}

F:\Vijay\C> gcc scanf.c
F:\Vijay\C> a.exe
length(0) = 10 duration of task 0 is 10
n = 4
i = 0
j = 10
k = 0
l = 10

Z.
Nov 14 '05 #7
"Karthik" <re************ *******@yahoo.c om> wrote:
A thumb rule to deal with files is as follows -

Copy all file contents to memory.
Close the file
Process the file contents from data saved in Step 1.
I would only suggest that approach if the algorithm requires moving back and
forth across the whole file's data. Even in that case, for particularly
large files where that approach is not viable, you may be better off using
fseek() or something.
This would give a big performance boost.
I don't see how it does give a big performance boost. It might make your
program require much more memory than is necessary.
For eg-

while (!feof(fp) ) {
fscanf( fp, "%s", buff);
}


This is a terrible example. Seeing while(!feof(fp) ) should flag problems
immediately. A while loop should depend on the success or failure of the
actual file reading function, not the secondary feof test. The problem with
this is that it often causes out-by-one errors in the number of times it
loops.

scanf or fscanf with plain "%s" are just as bad as the gets function. It has
no way to prevent going outside the bounds of the buffer given. You must
always specify a maximum field width with the %s specifier. In addition,
your loop never checks the returned value of fscanf, and it just keeps
overwriting the same buffer with each (whitespace-delimited) string read,
without separating those out into memory properly.

In this case I'd parse one line at a time:

while(fgets(buf f, sizeof buff, fp))
{
/* work on the current line in buff */
}

--
Simon.
Nov 14 '05 #8
Karthik wrote:
Darius Fatakia wrote:
[snip]
A thumb rule to deal with files is as follows -

Copy all file contents to memory.
Close the file
Process the file contents from data saved in Step 1.

This would give a big performance boost.

For eg-
while (!feof(fp) ) {
fscanf( fp, "%s", buff);
}


Yes, this would give a better performance boost, but
many applications cannot fit an entire data file into
memory. A trade-off is to read the data file into
large "chunks", where a chunk is sufficiently large
to reduce the I/O overhead time (such as starting
and stopping a harddrive). Small buffer sizes may
not provide any performance benefits due to buffering
by the operating system and perhaps by the I/O device.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.l earn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
3661
by: Willem Ligtenberg | last post by:
I decided to use SAX to parse my xml file. But the parser crashes on: File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in processing external entity reference This is caused by: <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN" "NCBI_Entrezgene.dtd">
2
3959
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home Canonicalpath-Directory4: \\wkdis3\ROOT\home\bwe\ You selected the file named AAA.XML getXmlAlgorithmDocument(): IOException Not logged in
3
3508
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story> <story_id>112233</story_id> <pub_name>Puleen's Publication</pub_name> <pub_code>PP</pub_code> <edition_date>20031201</edition_date>
1
2465
by: Christoph Bisping | last post by:
Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement and drawing instructions like "move to's" and "change color's" optionally followed by one or more numeric (int or float) arguments. My problem is that the parsing algorithm I've currently implemented is extremely slow.
4
4868
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr> </table> With an XSLT styles sheet, I can use for-each to grab the values in
3
4386
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in the file) with the location. And for a particular section I parse only that section. The file is something like, .... DATAS
9
1991
by: Paulers | last post by:
Hello, I have a log file that contains many multi-line messages. What is the best approach to take for extracting data out of each message and populating object properties to be stored in an ArrayList? I have tried looping through the logfile using regex, if statements and flags to find the start and end of each message but I do not see a good time in this process to create a new instance of my Message object. While messing around with...
13
4516
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple command set consisting of several single letter commands which take no arguments. A few additional single letter commands take arguments:
13
2835
by: charliefortune | last post by:
I am fetching some product feeds with PHP like this $merch = substr($key,1); $feed = file_get_contents($_POST); $fp = fopen("./feeds/feed".$merch.".txt","w+"); fwrite ($fp,$feed); fclose ($fp); and then parsing them with PHP's native parsing functions. This is succesful for most of the feeds, but a couple of them claim to be
2
3617
by: Felipe De Bene | last post by:
I'm having problems parsing an HTML file with the following syntax : <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> <TH Width='10%' BGCOLOR='#c0c0c0'>Name</TH><TH width='7%' BGCOLOR='#c0c0c0'>Date</TH> and so on.... whenever I feed the parser with such file I get the error :
0
9669
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9517
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10428
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10156
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9997
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5559
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3718
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2916
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.