Noob File IO question

sore eyes

Hi
I just downloaded the free Watcom compiler and am having a little
trouble with File IO http://www.openwatcom.org/index.php/Download
I downloaded the following example, commented out the Command line
arguments so that I could debug more easily. The example is a simple
file copy. and it works. but I would like to customize this to
automate some redundant changes in some large files.

Here's the problem: When I debug this and watch the input buffer
buf[], I dont see a stream of characters instead I see a sequence
of integers. I suppose these are probably the chararters but every
other integer in the array is a 0. I am guessing the file uses 16 bit
characters and my program thinks they are 8 bit. I tried defining
UNICODE = 1 in my project setting but that didn't work.

My program will have to be able recognize sequences of characters
read from the file. Can someone tell me what I should do to
recognize the characters in the buffer?

------------------------------------------------------------------------------------
/
* stdc-file-copy.c - copy one file to a new location, possibly under a
* different name.
*/

#include <stdio.h /* standard input/output routines.
*/

#define MAX_LINE_LEN 1000 /* maximal line length supported.
*/

/*
* function: main. copy the given source file to the given target
file.
* input: path to source file and path to target file.
* output: target file is being created with identical contents to
* source file.
*/
void
main(int argc, char* argv[])
{
char* file_path_from; /* path to source file. */
char* file_path_to; /* path to target file. */
FILE* f_from; /* stream of source file. */
FILE* f_to; /* stream of target file. */
char buf[MAX_LINE_LEN+1]; /* input buffer. */

/* read command line arguments */
/*
if (argc != 3 || !argv[1] || !argv[2]) {
fprintf(stderr, "Usage: %s <source file path<target file
path>\n",
argv[0]);
exit(1);
}
file_path_from = argv[1];
file_path_to = argv[2];
*/
file_path_from = "newcode.html";
file_path_to = "filecopy.out";

/* open the source and the target files. */
f_from = fopen(file_path_from, "r");
if (!f_from) {
fprintf(stderr, "Cannot open source file: ");
perror("");
exit(1);
}
f_to = fopen(file_path_to, "w+");
if (!f_from) {
fprintf(stderr, "Cannot open target file: ");
perror("");
exit(1);
}

/* copy source file to target file, line by line. */
while (fgets(buf, MAX_LINE_LEN+1, f_from)) {
if (fputs(buf, f_to) == EOF) { /* error writing data */
fprintf(stderr, "Error writing to target file: ");
perror("");
exit(1);
}
}
if (!feof(f_from)) { /* fgets failed _not_ due to encountering EOF
*/
fprintf(stderr, "Error reading from source file: ");
perror("");
exit(1);
}

/* close source and target file streams. */
if (fclose(f_from) == EOF) {
fprintf(stderr, "Error when closing source file: ");
perror("");
}
if (fclose(f_to) == EOF) {
fprintf(stderr, "Error when closing target file: ");
perror("");
}
}

Apr 4 '07 #1

Subscribe Post Reply

2013

santosh

sore eyes wrote:

Hi
I just downloaded the free Watcom compiler and am having a little
trouble with File IO http://www.openwatcom.org/index.php/Download

I downloaded the following example, commented out the Command line
arguments so that I could debug more easily. The example is a simple
file copy. and it works. but I would like to customize this to
automate some redundant changes in some large files.

Here's the problem: When I debug this and watch the input buffer
buf[], I dont see a stream of characters instead I see a sequence
of integers. I suppose these are probably the chararters but every
other integer in the array is a 0. I am guessing the file uses 16 bit
characters and my program thinks they are 8 bit. I tried defining
UNICODE = 1 in my project setting but that didn't work.

My program will have to be able recognize sequences of characters
read from the file. Can someone tell me what I should do to
recognize the characters in the buffer?

------------------------------------------------------------------------------------
/
* stdc-file-copy.c - copy one file to a new location, possibly under a
* different name.
*/

#include <stdio.h /* standard input/output routines.*/

You'll also need stdlib.h for exit.

#define MAX_LINE_LEN 1000 /* maximal line length supported. */

If you did a character by character copying this wouldn't be a
restriction.

/*
* function: main. copy the given source file to the given target file.
* input: path to source file and path to target file.
* output: target file is being created with identical contents to
* source file.
*/
void
main(int argc, char* argv[])

Return type of main should be an int.

{
char* file_path_from; /* path to source file. */
char* file_path_to; /* path to target file. */
FILE* f_from; /* stream of source file. */
FILE* f_to; /* stream of target file. */
char buf[MAX_LINE_LEN+1]; /* input buffer. */

/* read command line arguments */
/*
if (argc != 3 || !argv[1] || !argv[2]) {

This is faulty test. Checking argc alone is sufficient. The concerned
strings, i.e. argv[1] and argv[2] should be checked seperately.

fprintf(stderr, "Usage: %s <source file path<target file
path>\n",
argv[0]);
exit(1);

Anything other than 0, EXIT_SUCCESS and EXIT_FAILURE are not fully
portable return codes. stdlib.h declares exit as well as the two
EXIT_xx macros.

}
file_path_from = argv[1];
file_path_to = argv[2];
*/
file_path_from = "newcode.html";
file_path_to = "filecopy.out";

/* open the source and the target files. */
f_from = fopen(file_path_from, "r");
if (!f_from) {
fprintf(stderr, "Cannot open source file: ");
perror("");

fopen is guaranteed by the Standard to set errno to any sensible value
after failure. Also you're not including errno.h.

exit(1);
}
f_to = fopen(file_path_to, "w+");
if (!f_from) {
fprintf(stderr, "Cannot open target file: ");
perror("");
exit(1);
}

/* copy source file to target file, line by line. */
while (fgets(buf, MAX_LINE_LEN+1, f_from)) {
if (fputs(buf, f_to) == EOF) { /* error writing data */
fprintf(stderr, "Error writing to target file: ");
perror("");

Neither is fputs guaranteed by the Standard to set errno upon error.

exit(1);
}
}
if (!feof(f_from)) { /* fgets failed _not_ due to encountering EOF
*/
fprintf(stderr, "Error reading from source file: ");
perror("");

You should also call perror immediatly after the failing function.
Otherwise interveaning functions like fprintf here may themselves
alter errno and you might get spurious messages.

exit(1);
}

/* close source and target file streams. */
if (fclose(f_from) == EOF) {
fprintf(stderr, "Error when closing source file: ");
perror("");
}
if (fclose(f_to) == EOF) {
fprintf(stderr, "Error when closing target file: ");
perror("");
}
}

Apr 4 '07 #2

santosh

sore eyes wrote:

<snip>

/
* stdc-file-copy.c - copy one file to a new location, possibly under a
* different name.
*/

#include <stdio.h /* standard input/output routines. */

#define MAX_LINE_LEN 1000 /* maximal line length supported. */

/*
* function: main. copy the given source file to the given target file.
* input: path to source file and path to target file.
* output: target file is being created with identical contents to
* source file.
*/
void
main(int argc, char* argv[])
{

/* open the source and the target files. */
f_from = fopen(file_path_from, "r");
if (!f_from) {
fprintf(stderr, "Cannot open source file: ");
perror("");
exit(1);
}
f_to = fopen(file_path_to, "w+");
if (!f_from) {

You should check f_to here.

<snip rest>

Apr 4 '07 #3

santosh

santosh wrote:

sore eyes wrote:

file_path_from = "newcode.html";
file_path_to = "filecopy.out";

/* open the source and the target files. */
f_from = fopen(file_path_from, "r");
if (!f_from) {
fprintf(stderr, "Cannot open source file: ");
perror("");

fopen is guaranteed by the Standard to set errno to any sensible value
after failure. Also you're not including errno.h.

I meant to write:

fopen is *not* guaranteed by the Standard to set errno to any sensible
value
after failure. Also you're not including errno.h.

<snip>

Apr 4 '07 #4

sore eyes

On 4 Apr 2007 03:33:34 -0700, "santosh" <sa*********@gmail.comwrote:

As I mentioned in my first message, this is a downloaded example that
does indeed perform the intended function of copying a file. While I
do appreciate you pointing out all the flaws in the code, I wish you
would have attempted to address the problem that motivated me to
post the message.

As it is copying, I can watch the input buffer during debugging and
instead of seeing

buf[0]= 'r'
buf[1]= 'a
buf[2]='n'
buf[3]='d'
buf[4]='o'
buf[5]=m

I see a stream that looks typically like
buf[0]= 0
buf[1]= 71
buf[2] = 0
buf[3] = 76
buf[4] = 0
buf[5] = 79
buf[6] = 0
buf[7] = 84

As I mentioned before, I suspect that the file I opened has 16 bit
characters but the compiler/debugger is assuming 8 bit characters.
Do you agree that this is what is probably happening?

The intended target is an html file and I want to be able to replace
some bad tags during a copy so the ablity recognize and manipulate a
sequence of characters is important. As I mentioned before, I tried
setting UNICODE=1 in my project, but that didn't appear to affect
anything. Anyone know what I can do about this??

Apr 4 '07 #5

Ian Malone

sore eyes wrote:

On 4 Apr 2007 03:33:34 -0700, "santosh" <sa*********@gmail.comwrote:

>
As it is copying, I can watch the input buffer during debugging and
instead of seeing

buf[0]= 'r'
buf[1]= 'a
buf[2]='n'
buf[3]='d'
buf[4]='o'
buf[5]=m

I see a stream that looks typically like
buf[0]= 0
buf[1]= 71
buf[2] = 0
buf[3] = 76
buf[4] = 0
buf[5] = 79
buf[6] = 0
buf[7] = 84

As I mentioned before, I suspect that the file I opened has 16 bit
characters but the compiler/debugger is assuming 8 bit characters.
Do you agree that this is what is probably happening?

Okay, that does look like a 16 bit character encoding,
though I haven't checked that the numbers translate to
an existing encoding.

The intended target is an html file and I want to be able to replace
some bad tags during a copy so the ablity recognize and manipulate a
sequence of characters is important. As I mentioned before, I tried
setting UNICODE=1 in my project, but that didn't appear to affect
anything. Anyone know what I can do about this??

UNICODE=1 isn't going to do anything by itself, C
just reads a file into the buffer char by char.
If the encoding doesn't match the execution character
set then you'll have to deal with that somehow, which
involves determining the input encoding (if this is
html then a UTF-16 encoded variant of UCS would be a
good guess, the html spec defines ways to work it out).
C doesn't really deal with this[1], you may want to
see what international character support is available
on your platform, or just write support for the most
common variants (UTF-8 and UTF-16 UCS, but this will
make text processing more difficult).

From your original message it sounds like you want
a copy utility which does something clever on
encountering certain files. Practically speaking
it needs to be able to determine the file type (by
looking at the name, the contents or being told),
and understand enough of the format to make its
changes (in the case of XML et al. knowing the
spec and therefore how to work out the encoding
would be part of this).

[1] There is wchar, but whether it will do what you
want depends on the platform.

--
imalone

Apr 4 '07 #6

santosh

sore eyes wrote:

On 4 Apr 2007 03:33:34 -0700, "santosh" <sa*********@gmail.comwrote:

As I mentioned in my first message, this is a downloaded example that
does indeed perform the intended function of copying a file. While I
do appreciate you pointing out all the flaws in the code, I wish you
would have attempted to address the problem that motivated me to
post the message.

As it is copying, I can watch the input buffer during debugging and
instead of seeing

buf[0]= 'r'
buf[1]= 'a
buf[2]='n'
buf[3]='d'
buf[4]='o'
buf[5]=m

I see a stream that looks typically like
buf[0]= 0
buf[1]= 71
buf[2] = 0
buf[3] = 76
buf[4] = 0
buf[5] = 79
buf[6] = 0
buf[7] = 84

As I mentioned before, I suspect that the file I opened has 16 bit
characters but the compiler/debugger is assuming 8 bit characters.
Do you agree that this is what is probably happening?

There's no point in agreeing since what's happening is system specific
and impossible to tell without further details. Did you check the
file's
encoding to see if it's actually 16 bit?

The intended target is an html file and I want to be able to replace
some bad tags during a copy so the ablity recognize and manipulate a
sequence of characters is important. As I mentioned before, I tried
setting UNICODE=1 in my project, but that didn't appear to affect
anything. Anyone know what I can do about this??

If the file is pure text and is under your manipulation try converting
it to UTF-8.
Otherwise you may have to change the locale for your C program and use
the
wide-character functions. What exactly need to be done is very much
dependent
on what your implementation actually supports as well as the
capabilities of the
underlying system. Maybe these links will help:

<http://evanjones.ca/unicode-in-c.html>
<http://www.cl.cam.ac.uk/~mgk25/unicode.html>

Apr 4 '07 #7

sore eyes

Thanks for the help Santosh and Ian

I've been able to work around the problem for the time being by:
1)copying the html file from my editor to the Windows clipboard
2)pasting the file into notepad
3) saving file as a txt file
4) renaming the txt file to my orginal html file name.
apparently that sequence translates the charactors into the 8bit
format that my program needs. I would prefer knowing the correct way
to handle the 16 bit characters but at least this will allow me to
get working again on the program's logic. I did try replacing char
with wchar but the Watcom compiler didn't recoginize that type.
Thanks again for the assitance.

Apr 4 '07 #8

Barry Schwarz

On Wed, 04 Apr 2007 00:45:13 -0500, sore eyes
<are_you_kidding@target_for_Spammers.comwrote:

>Hi
I just downloaded the free Watcom compiler and am having a little
trouble with File IO http://www.openwatcom.org/index.php/Download
I downloaded the following example, commented out the Command line
arguments so that I could debug more easily. The example is a simple
file copy. and it works. but I would like to customize this to
automate some redundant changes in some large files.

Here's the problem: When I debug this and watch the input buffer
buf[], I dont see a stream of characters instead I see a sequence
of integers. I suppose these are probably the chararters but every
other integer in the array is a 0. I am guessing the file uses 16 bit
characters and my program thinks they are 8 bit. I tried defining
UNICODE = 1 in my project setting but that didn't work.

How was newcode.html built? Have you looked at it with a hex editor
to see what it really contains?

Have you tried to create a simple text file whose contents you know
and test your program on that?
Remove del for email

Apr 5 '07 #9

Similar topics

Sorta noob question - file vs. open?

by: Peter A. Schott | last post by:

Been reading the docs saying that file should replace open in our code, but this doesn't seem to work: # Open file for writing, write something, close file MyFile = file("MyFile.txt", "w")...

Python

noob question Letters in words?

by: Ivan Shevanski | last post by:

Alright heres another noob question for everyone. Alright, say I have a menu like this. print "1. . .Start" print "2. . .End" choice1 = raw_input("> ") and then I had this to determine what...

Python

Noob question

by: Dan McCollick | last post by:

Hi All, Noob question that I can not seem to figure out: I am trying to implement a screenscraper to pull data from two seperate websites, here is some test code so far: public static void...

C# / C Sharp

Noob php error question

by: AndyW | last post by:

Hey folks. I am trying to get a soap wsdl service working and have a bit of a noob php programming question for it. I'm using PHP 5.x btw. I have written a soap server that contains a...

PHP

Complete NOOB need help

by: gham | last post by:

I am a complete noob to linux and shell scripting please help this is what I am trying to do 1. Create a script that takes 1 argument being a file, read the inputted file, and look for...

Linux

noob question: what's a { doing there?

by: tavspamnofwd | last post by:

I'm a total noob, and I'm trying to understand this code: var newsi = { name:"newsi", dom:false }; newsi.Client=function(){ //stuff }

Javascript

Reading / Writing XML's - noob question

by: Lang Murphy | last post by:

I'm baaaaack... some of you answered a question I had last week. Only problem is: I'm a dope who doesn't understand most of what y'all posted. Raw noob when it comes to .Net and C#. So I'm going...

C# / C Sharp

Noob simple question

by: Richhpcnec | last post by:

I have two text strings that I have read from a text file. The text strings that I have read are the 8 characters each, 859c58d4 and 80434d43. I have assigned them the varible names of string1 and...

Visual Basic 4 / 5 / 6

help debugging noob code - converting binary data to images...

by: larry | last post by:

Ok I'm a Python noob, been doing OK so far, working on a data conversion program and want to create some character image files from an 8-bit ROM file. Creating the image I've got down, I open...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware