473,394 Members | 1,800 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Efficiently reading a string from a specific point in a file

Hi,

I'm writing a program which creates an index of text files. For each
file it
processes, the program records the start and end positions (as
returned by
tellg()) of sections of interest, and then some time later uses these
positions
to read the interesting sections from the file.

When reading the sections, I'm currently using get() to read
characters from the
file one by one and concatenating them to what has already been read.
However, I
guess this will be fairly inefficient if the text to extract is long.

Is there a more efficient way to do this, perhaps using an existing
library
function? I'd imagine that this question has been asked before, but
when
googling for answers I could only find solutions for reading entire
files
completely; I can't do that because the files are too large to store
in memory.

My code is below; any advice would be gratefully received!

#include <iostream>
#include <string>
#include <fstream>
std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end) {

in.seekg(start);

std::string s;

while (in.tellg() != end) {
s += in.get(); // Not very efficient?
}

return s;
}

int main(void) {

std::ifstream in("test_file", std::ios_base::binary);

// Hard-coded positions below; these would normally be returned
from tellg()
std::cout << "\"" << get_string(in, 10, 19) << "\"" << std::endl;

return 0;
}

May 11 '07 #1
7 3035
On 11 Maj, 14:11, random guy <r...@mail.comwrote:
Hi,

I'm writing a program which creates an index of text files. For each
file it
processes, the program records the start and end positions (as
returned by
tellg()) of sections of interest, and then some time later uses these
positions
to read the interesting sections from the file.

When reading the sections, I'm currently using get() to read
characters from the
file one by one and concatenating them to what has already been read.
However, I
guess this will be fairly inefficient if the text to extract is long.

Is there a more efficient way to do this, perhaps using an existing
library
function? I'd imagine that this question has been asked before, but
when
googling for answers I could only find solutions for reading entire
files
completely; I can't do that because the files are too large to store
in memory.

My code is below; any advice would be gratefully received!

#include <iostream>
#include <string>
#include <fstream>

std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end) {

in.seekg(start);

std::string s;

while (in.tellg() != end) {
s += in.get(); // Not very efficient?
}

return s;

}
You can do something like this:

std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];
in.get(s, end - start + 1);
return std::string(s);
}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading. I'm not sure what will happen if it reaches EOF.

--
Erik Wikström

May 11 '07 #2
On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
....
std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];
no corresponding delete[] ...

use std::vector<chars(end - start + 1);
in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading. I'm not sure what will happen if it reaches EOF.

--
Erik Wikström

May 11 '07 #3
In message <11*********************@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi*******@mariani.wswrites
>On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
>std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];

no corresponding delete[] ...

use std::vector<chars(end - start + 1);
> in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.
If you know exactly how many characters you want to read, use in.read().
>I'm not sure what will happen if it reaches EOF.
--
Richard Herring
May 11 '07 #4
On 2007-05-11 17:28, Richard Herring wrote:
In message <11*********************@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi*******@mariani.wswrites
>>On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
>>std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];

no corresponding delete[] ...

use std::vector<chars(end - start + 1);
>> in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.

If you know exactly how many characters you want to read, use in.read().
No, read() is for unformated data (binary) get() should be used for text.

--
Erik Wikström
May 11 '07 #5
On May 11, 7:09 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-05-11 17:28, Richard Herring wrote:
In message <1178892634.840378.29...@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi3nos...@mariani.wswrites
>On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];
>no corresponding delete[] ...
>use std::vector<chars(end - start + 1);
> in.get(s, end - start + 1);
return std::string(s);
>}
>Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.
If you know exactly how many characters you want to read, use in.read().
No, read() is for unformated data (binary) get() should be used for text.
What makes you say that? read() works perfectly well for text.

Note, however, that there is not necessarily a relationship
between the number of characters, and the difference end -
start, converted to an integral type. It will probably work
under Unix, but will certainly result in two many characters
under Windows, and on some systems, it may result in nothing
even remotely usable.

Also, of course, on a lot of systems, you can't necessarily
allocate a buffer this big anyway.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 11 '07 #6
On 2007-05-11 21:56, James Kanze wrote:
On May 11, 7:09 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
>On 2007-05-11 17:28, Richard Herring wrote:
In message <1178892634.840378.29...@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi3nos...@mariani.wswrites
On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];
>>no corresponding delete[] ...
>>use std::vector<chars(end - start + 1);
>> in.get(s, end - start + 1);
return std::string(s);
>>}
>>Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.
If you know exactly how many characters you want to read, use in.read().
>No, read() is for unformated data (binary) get() should be used for text.

What makes you say that? read() works perfectly well for text.

Note, however, that there is not necessarily a relationship
between the number of characters, and the difference end -
start, converted to an integral type. It will probably work
under Unix, but will certainly result in two many characters
under Windows, and on some systems, it may result in nothing
even remotely usable.
Well, you can of course use whichever one you like, but with get() you
get the null-character at the end of the array for free, which you don't
with read().

--
Erik Wikström
May 11 '07 #7
On May 11, 10:54 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-05-11 21:56, James Kanze wrote:
[...]
Well, you can of course use whichever one you like, but with get() you
get the null-character at the end of the array for free, which you don't
with read().
He's using it to construct a string, so he doesn't need the null
character.

FWIW: the next version of the standard will allow reading the
string "in place". Something like:

std::string result ;
result.resize( size ) ;
if ( ! in.get( &result[ 0 ], result.size( 0 ) ) {
result.resize( in.gcount() ) ;
}

This will also work with all current implementations, and since
it will be standard in the future, the probability of an
implementation changing so that it won't work is pretty small.

The real problem in his code, of course, was the arithmetic on
streampos, which isn't guaranteed to give anything usable for
other than positionning in a file. (In particular, under most
systems---Unix is the only exception I know of---the difference
between two streampos will *not* result in the number of char
that can be read between those two positions. Under Windows,
the number will typically be somewhat larger, and on other
systems, who knows.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 12 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: The_Kingpin | last post by:
Hi again guys, I've decided to cut my project in section and I found it way easier like this. I'm having a little problem reading struct in a file though. I think after this I'll be able to...
6
by: dough | last post by:
Heres a snippet of my code. I am trying to dynamically allocate memory for reading in strings from a file. FILE *f; /* file to read */ char *s; ...
2
by: melanieab | last post by:
Hi, I'm trying to store all of my data into one file (there're about 140 things to keep track of). I have no problem reading a specific string from the array file, but I wasn't sure how to...
3
by: Yoavo | last post by:
I want to add a string to a file after a specific string for example "abcde", Which means I need to open the file find the string "abcde" and then add my string just after. How do I do it? ...
2
by: SteMc | last post by:
today I tackled, for the first time, opening and reading from a text file. Following the example on the MSDN and declared a variable, strline as a string and objstreamreader as a streamreader. ...
29
by: yourmycaffiene | last post by:
Okay, this if my first post so go easy on me plus I've only been using C for a couple of weeks. I'm working on a program currently that requires me to read data from a .dat file into a 2d array and...
1
by: syhzaidi | last post by:
How can we do Parsing of Hexdecimel in C# reading string from stream file for eg.. i have a file like.......... 0f 2f 12 2d 3a.......in hexa decimal save in a file.txt and i m reading it from...
3
by: psbasha | last post by:
Hi , When ever we read any data from file ,we read as a single line string ,and we convert the respective field data available in that string based on the data type ( say int,float ). ...
3
by: rtngolem | last post by:
Hi this is part of a function that reads in data and stores in into vectors the input looks like CL905 1 2 SB11 2 3 SB12 2 7 4 5 SB13 2 6 4 now when i run this code the output i get...
4
by: Jahoo | last post by:
Hello I'm wondering howto make such thing: i would like to add specific string into some configuration files. but not in the end of file (it's obviously easy) but somewhere in the middle for...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.