hi,
I decided to extract the text from some powerpoint files. The results have
thrown up some questions.
When I use the 'char *valid' character array (in the program below) to
choose the characters to write in the new file... the result is totally
different to when I use the line with isalpha() and isdigit().
Yes .. There are more valid characters in the valid array but this is not
the problem .. Using it, I see extra spaces in the new file and it is more
difficult to read (in notepad there appears to be a space between each
character .. in wordpad there are boxes between characters).. why?
anyone care to investigate and enlighten me? .. the code is below all you
need to do is comment and uncommment to achieve the differences I am talking
about
To use the program (with MS Windows) all you need to do is drag the file you
want to process onto the .exe file
cheeers
cw
the program:
############
#include<stdio.h>
#include<ctype.h>
void writeFile(FILE *infile,FILE *outfile);
int main(int argc, char *argv[])
{
FILE *outfile = NULL; //the file to write to
FILE *infile = NULL; //the file to read
if(((infile=fopen(argv[1],"rb"))==NULL)||((outfile=fopen("new.txt","wb"))== NULL))
{
printf("error opening file - fatal error - goodbye");
getchar();
exit(1);
}
writeFile(infile,outfile);
fflush(stdout);
system("pause");
return 0;
}
void writeFile(FILE *infile,FILE *outfile)
{
char *valid =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW XYZ0123456789
\n.;:<>?/|\\!\"£$%^&*()_-=+,#~[]{}";
int byte;
while(1)
{
byte = fgetc(infile);/*read one byte*/
if(feof(infile)){break;}/*break from while at end of file*/
/*if(strchr(valid,byte))*/
if((isalpha(byte))||(isdigit(byte))||(byte==' ')||(byte == '\n'))
{
fputc(byte,outfile);
}
else
{ }
}
}
############ 3 2378
"code_wrong" <ta*@tac.ouch.co.uk> wrote:
<snip> When I use the 'char *valid' character array (in the program below) to choose the characters to write in the new file... the result is totally different to when I use the line with isalpha() and isdigit().
Yes .. There are more valid characters in the valid array but this is not the problem .. Using it, I see extra spaces in the new file and it is more difficult to read (in notepad there appears to be a space between each character .. in wordpad there are boxes between characters).. why?
<snip>void writeFile(FILE *infile,FILE *outfile) { char *valid = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV WXYZ0123456789 \n.;:<>?/|\\!\"£$%^&*()_-=+,#~[]{}";
You'd better off declaring the array static, but that's not the
problem.
int byte;
while(1) { byte = fgetc(infile);/*read one byte*/ if(feof(infile)){break;}/*break from while at end of file*/
/*if(strchr(valid,byte))*/
I've only skimmed over your code, and won't comment style flaws, but
above line (the one giving you troubles, if uncommented, right?) does
not check for 0 bytes. In the strchr function, the terminating null
character is considered to be part of the string. You want something
like:
if( byte && strchr(valid,byte)) { fputc(byte,outfile); } else { }
} }
Best regards
--
Irrwahn Grausewitz (ir*******@freenet.de)
welcome to clc : http://www.ungerhu.com/jxh/clc.welcome.txt
clc faq-list : http://www.faqs.org/faqs/C-faq/faq/
clc frequent answers: http://benpfaff.org/writings/clc
"Irrwahn Grausewitz" <ir*******@freenet.de> wrote in message
news:e4********************************@4ax.com...
snip I've only skimmed over your code, and won't comment style flaws, but above line (the one giving you troubles, if uncommented, right?) does not check for 0 bytes. In the strchr function, the terminating null character is considered to be part of the string. You want something like:
if( byte && strchr(valid,byte))
snip
Thanks, you have identified the line of code that was producing the
boxes/spaces in the output file. .... this one: if(strchr(valid,byte)) ...
So I guess the program reads a null character in the file and writes it to
the output file ...
wonder why there are so many null characters in the powerpoint file (every
second character) ....interesting
cheers
cw
"code_wrong" <ta*@tac.ouch.co.uk> wrote in message
news:43**********@mk-nntp-2.news.uk.tiscali.com... "Irrwahn Grausewitz" <ir*******@freenet.de> wrote in message news:e4********************************@4ax.com...
snip
I've only skimmed over your code, and won't comment style flaws, but above line (the one giving you troubles, if uncommented, right?) does not check for 0 bytes. In the strchr function, the terminating null character is considered to be part of the string. You want something like:
if( byte && strchr(valid,byte))
snip
Thanks, you have identified the line of code that was producing the boxes/spaces in the output file. .... this one: if(strchr(valid,byte)) ... So I guess the program reads a null character in the file and writes it to the output file ...
wonder why there are so many null characters in the powerpoint file (every second character) ....interesting
Well, it's a 'binary' file (as opposed to 'plain text'), in which embedded
zero characters are common. Your remark about 'every second character'
makes me guess that perhaps (at least part of) the data might be stored
as multibyte or 'wide' characters (e.g. Unicode). You might want to look
into that possibility.
-Mike This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: cstudent79 |
last post by:
Hello folks,how do u do ?
I want to develop an application that can extract text from a
powerpoint presentation.But i am in dark about the powerpoint file
format.I would be obliged if somebody can...
|
by: Jonathan Trevor |
last post by:
Hi,
For the last couple of releases of a product we're developing we've been
running to very wierd behavior from IE and our ASP.NET web application which
serves up various types of files and I'm...
|
by: Akeel |
last post by:
Hi,
I want to read all the text of powerpoint presentation (.ppt file),
then i have to show it on some webpage without any formating (so i need
to retrieve the text only).
Can somebody help me...
|
by: ellenh |
last post by:
I have read postings on the similar subject including the posting from
2003 shown below. This process works fine to display a single page
snapshot report in PowerPoint. I need to display...
|
by: pankajhotmailone |
last post by:
I m trying to make a powerpoint presentation in VB6
Want to export text and images to Powerpoint Presentation.
I have already many slide like pictureboxs which feel like powerpoint but it isn't....
|
by: =?Utf-8?B?R2VvcmdlQXRraW5z?= |
last post by:
Greetings!
I wrote a small Exe that simply runs Shell to load PowerPoint and launch a
particular file, depending on the day of the week. However, it was set up for
office 2003 (I naively hardcoded...
|
by: BWPanda |
last post by:
Hi everyone,
I'm wanting to use VB.NET to display a powerpoint presentation, much the same
way as the presenter that comes with PowerPoint (when used on multiple
monitors).
Basically, I want to...
|
by: LucasLondon |
last post by:
Hi,
I'm trying to use VBA to extract underlying data from charts in powerpoint to excel, i.e from the underlying powerpoint datasheet that feeds the chart.
I've found the macro below on the...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |