I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all
this about ? 31 2521
broli said:
I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all this about ?
Someone's pulling your leg. 2000 lines of text is nothing. Just write the
program so that it's clear, correct, and easy to understand. Then, if and
only if it's too slow (and you should define the "fast enough"/"too slow"
boundary before you start writing the program), it's time to think about
how it might be made faster.
--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
broli said:
<snip>
But then I
was told that " normally we don't read scientific data in ascii for
accuracy and speed concerns" which made me wonder what was so wrong ?
The statement!
I could parse 2000 lines in hardly any time and there was no problem
with ascii either.
Right. Someone's pulling your leg, or is overly concerned with efficiency
at the expense of development time and clarity. That isn't to say that
efficiency isn't important. But let's just pretend, for the sake of
argument, that you write it /both/ ways, and then you measure. You
discover that the "binary" technique takes 0.025 seconds to process the
2000 data groups, whereas the "text" version takes 0.075 seconds - three
times slower! Surely this is a triumph for binary!
Yeah, right, but who cares? You press ENTER, and then it takes you 0.1
seconds to look up at the screen, and everything's finished, no matter
which one you ran.
Write it clear, simple, and correct. Then worry about speed if and only if
you have to.
--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
In article <4e************ *************** *******@s19g200 0prg.googlegrou ps.com>,
broli <Br*****@gmail. comwrote:
>I need to parse a file which has about 2000 lines and I'm getting told that reading the file in ascii would be a slower way to do it and so i need to resort to binary by reading it in large chunks. Can any one please explain what is all this about ?
Reading in large chunks is unrelated to whether it's binary or
ascii. Perhaps they meant that character-at-a-time reading with
getchar() is slow, which it is on some systems. You can perfectly
well use fread() on text files.
-- Richard
--
:wq
Chris Dollin said:
Richard Heathfield wrote:
<snip>
>> Someone's pulling your leg. 2000 lines of text is nothing. Just write the program so that it's clear, correct, and easy to understand. Then, if and only if it's too slow (and you should define the "fast enough"/"too slow" boundary before you start writing the program), it's time to think about how it might be made faster.
I agree that speed is unlikely to be a factor -- but accuracy may be.
Possibly, but that comes under correctness, not performance.
<snip>
After all, if they want to read those 2000 lines 1000 times per second
...
....and that is covered by "fast enough/too slow". Again, I would emphasise
that the first priority is to make the program *clear* (because it's
easier to make a clear program correct than to make a correct program
clear). The second priority (and a sine qua non, obviously) is to make the
program *correct*. When and only when it works, it's time to worry about
speed. (This obviously does *not* mean that one should intentionally adopt
gross algorithmic inefficiencies. )
--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Richard HeathField,
There are many modules involved in my software package and this is
just one of them. My software would also involve huge number of
calculations, searching, memory allocation etc etc but the thing is
that I have to parallelize the software code to run on different
machines anyway. Even if speed is an issue, I doubt that reading a
file in ascii or "binary" would make a huge impact overall.
broli said:
<snip>
But when I use fgets() then wouldn't I get a string
of characters (also many tabs, null character etc) ?
Yes.
Wouldn't it be a
difficult task to convert an array of characters into double type
floating numbers again ?
I don't see that you have any choice. If what you've described is correct,
the numbers are already in text form. Converting is easy enough, though,
using strtod.
I think using fread will make it very fast
(considering that it allows you to read as many bytes of data at a
time as you want) but once again I'm not very adept at file handling
just at the begginign stages.
It's very likely that the input stream is buffered, so it won't actually
make much, if any, difference.
--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999 ri*****@cogsci. ed.ac.uk (Richard Tobin) writes:
In article <4e************ *************** *******@s19g200 0prg.googlegrou ps.com>,
broli <Br*****@gmail. comwrote:
>>I need to parse a file which has about 2000 lines and I'm getting told that reading the file in ascii would be a slower way to do it and so i need to resort to binary by reading it in large chunks. Can any one please explain what is all this about ?
Reading in large chunks is unrelated to whether it's binary or
ascii.
I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.
Perhaps they meant that character-at-a-time reading with
getchar() is slow, which it is on some systems. You can perfectly
well use fread() on text files.
The text file will be larger. There is a need to parse the ascii text
into the destination formats.
It will be slower in the great majority of cases.
>
-- Richard
Richard wrote: ri*****@cogsci. ed.ac.uk (Richard Tobin) writes:
>In article <4e************ *************** *******@s19g200 0prg.googlegrou ps.com>, broli <Br*****@gmail. comwrote:
>>>I need to parse a file which has about 2000 lines and I'm getting told that reading the file in ascii would be a slower way to do it and so i need to resort to binary by reading it in large chunks. Can any one please explain what is all this about ?
Reading in large chunks is unrelated to whether it's binary or ascii.
I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.
> Perhaps they meant that character-at-a-time reading with getchar() is slow, which it is on some systems. You can perfectly well use fread() on text files.
The text file will be larger. There is a need to parse the ascii text
into the destination formats.
It will be slower in the great majority of cases.
Quick test, one file, 2000 lines, each line with two floats (1.12345
and 7.890), about 28Kb total.
One single big-enough fread:
real 0m0.002s
user 0m0.000s
sys 0m0.001s
Repeat fscanf( ... "%lf %lf" ... ) until EOF:
real 0m0.004s
user 0m0.002s
sys 0m0.002s
Yes, in this test it's twice as slow. The data file is probably
cached (it's been read several other times already as I /cough/
debugged my code). It includes program start-up time (I just did
`time ./a.out` to get the numbers) so the actual reading time will
be less.
Myself I wouldn't count that as "LOTS faster" for binary data,
but doubtless there are applications where it is so counted;
I don't think the OPs case is one of them, and it does look as
though he's reading a text file anyway.
--
"Creation began." - James Blish, /A Clash of Cymbals/
Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN
In article <fr**********@r egistered.motza rella.org>,
Richard <de***@gmail.co mwrote:
>Reading in large chunks is unrelated to whether it's binary or ascii.
>I would question that statement. Reading in binary will be a LOT faster ,if its the same platform. for reading in the same NUMBER of readings.
I didn't say whether it's in binary is unrelated to *speed*.
I meant: there are two separate issues; whether you read it in large
chunks, and whether it's binary. You can read each of text or binary
in small or large chunks. Each of these choices will separately affect
the speed.
-- Richard
--
:wq This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Willem Ligtenberg |
last post by:
I decided to use SAX to parse my xml file.
But the parser crashes on:
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in processing external entity reference
This is caused by:
<!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
"NCBI_Entrezgene.dtd">
|
by: Cigdem |
last post by:
Hello,
I am trying to parse the XML files that the user selects(XML files are
on anoher OS400 system called "wkdis3"). But i am permenantly getting
that error:
Directory0: \\wkdis3\ROOT\home
Canonicalpath-Directory4: \\wkdis3\ROOT\home\bwe\
You selected the file named AAA.XML
getXmlAlgorithmDocument(): IOException Not logged in
|
by: Pir8 |
last post by:
I have a complex xml file, which contains stories within a magazine. The
structure of the xml file is as follows:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
|
by: Christoph Bisping |
last post by:
Hello!
Maybe someone is able to give me a little hint on this:
I've written a vb.net app which is mainly an interpreter for specialized
CAD/CAM files.
These files mainly contain simple movement and drawing instructions like
"move to's" and "change color's" optionally followed by one or more numeric
(int or float) arguments. My problem is that the parsing algorithm I've
currently implemented is extremely slow.
|
by: Rick Walsh |
last post by:
I have an HTML table in the following format:
<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>
<tr><td>3</td><td>4</td></tr>
<tr><td>5</td><td>6</td></tr>
</table>
With an XSLT styles sheet, I can use for-each to grab the values in
| |
by: toton |
last post by:
Hi,
I have some ascii files, which are having some formatted text. I want
to read some section only from the total file.
For that what I am doing is indexing the sections (denoted by .START
in the file) with the location.
And for a particular section I parse only that section.
The file is something like,
.... DATAS
|
by: Paulers |
last post by:
Hello,
I have a log file that contains many multi-line messages. What is the
best approach to take for extracting data out of each message and
populating object properties to be stored in an ArrayList? I have tried
looping through the logfile using regex, if statements and flags to
find the start and end of each message but I do not see a good time in
this process to create a new instance of my Message object. While
messing around with...
|
by: Chris Carlen |
last post by:
Hi:
Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.
I have a very simple command set consisting of several single letter
commands which take no arguments. A few additional single letter
commands take arguments:
|
by: charliefortune |
last post by:
I am fetching some product feeds with PHP like this
$merch = substr($key,1);
$feed = file_get_contents($_POST);
$fp = fopen("./feeds/feed".$merch.".txt","w+");
fwrite ($fp,$feed);
fclose ($fp);
and then parsing them with PHP's native parsing functions. This is
succesful for most of the feeds, but a couple of them claim to be
|
by: Felipe De Bene |
last post by:
I'm having problems parsing an HTML file with the following syntax :
<TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'>
<TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH>
<TH Width='10%' BGCOLOR='#c0c0c0'>Name</TH><TH width='7%'
BGCOLOR='#c0c0c0'>Date</TH>
and so on....
whenever I feed the parser with such file I get the error :
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |