Efficiently reading a string from a specific point in a file

random guy

Hi,

I'm writing a program which creates an index of text files. For each
file it
processes, the program records the start and end positions (as
returned by
tellg()) of sections of interest, and then some time later uses these
positions
to read the interesting sections from the file.

When reading the sections, I'm currently using get() to read
characters from the
file one by one and concatenating them to what has already been read.
However, I
guess this will be fairly inefficient if the text to extract is long.

Is there a more efficient way to do this, perhaps using an existing
library
function? I'd imagine that this question has been asked before, but
when
googling for answers I could only find solutions for reading entire
files
completely; I can't do that because the files are too large to store
in memory.

My code is below; any advice would be gratefully received!

#include <iostream>
#include <string>
#include <fstream>
std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end) {

in.seekg(start);

std::string s;

while (in.tellg() != end) {
s += in.get(); // Not very efficient?
}

return s;
}

int main(void) {

std::ifstream in("test_file", std::ios_base::binary);

// Hard-coded positions below; these would normally be returned
from tellg()
std::cout << "\"" << get_string(in, 10, 19) << "\"" << std::endl;

return 0;
}

May 11 '07 #1

Subscribe Post Reply

3035

=?iso-8859-1?q?Erik_Wikstr=F6m?=

On 11 Maj, 14:11, random guy <r...@mail.comwrote:

Hi,

I'm writing a program which creates an index of text files. For each
file it
processes, the program records the start and end positions (as
returned by
tellg()) of sections of interest, and then some time later uses these
positions
to read the interesting sections from the file.

When reading the sections, I'm currently using get() to read
characters from the
file one by one and concatenating them to what has already been read.
However, I
guess this will be fairly inefficient if the text to extract is long.

Is there a more efficient way to do this, perhaps using an existing
library
function? I'd imagine that this question has been asked before, but
when
googling for answers I could only find solutions for reading entire
files
completely; I can't do that because the files are too large to store
in memory.

My code is below; any advice would be gratefully received!

#include <iostream>
#include <string>
#include <fstream>

std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end) {

in.seekg(start);

std::string s;

while (in.tellg() != end) {
s += in.get(); // Not very efficient?
}

return s;

}

You can do something like this:

std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];
in.get(s, end - start + 1);
return std::string(s);
}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading. I'm not sure what will happen if it reaches EOF.

--
Erik Wikström

May 11 '07 #2

Gianni Mariani

On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
....

std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];

no corresponding delete[] ...

use std::vector<chars(end - start + 1);

in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading. I'm not sure what will happen if it reaches EOF.

--
Erik Wikström

May 11 '07 #3

Richard Herring

In message <11*********************@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi*******@mariani.wswrites

>On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
>std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];

no corresponding delete[] ...

use std::vector<chars(end - start + 1);

> in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.

If you know exactly how many characters you want to read, use in.read().

>I'm not sure what will happen if it reaches EOF.

--
Richard Herring

May 11 '07 #4

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

On 2007-05-11 17:28, Richard Herring wrote:

In message <11*********************@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi*******@mariani.wswrites
>>On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
>>std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];

no corresponding delete[] ...

use std::vector<chars(end - start + 1);

>> in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.

If you know exactly how many characters you want to read, use in.read().

No, read() is for unformated data (binary) get() should be used for text.

--
Erik Wikström

May 11 '07 #5

James Kanze

On May 11, 7:09 pm, Erik Wikström <Erik-wikst...@telia.comwrote:

On 2007-05-11 17:28, Richard Herring wrote:

In message <1178892634.840378.29...@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi3nos...@mariani.wswrites
>On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];

>no corresponding delete[] ...

>use std::vector<chars(end - start + 1);

> in.get(s, end - start + 1);
return std::string(s);

>}

>Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.

If you know exactly how many characters you want to read, use in.read().

No, read() is for unformated data (binary) get() should be used for text.

What makes you say that? read() works perfectly well for text.

Note, however, that there is not necessarily a relationship
between the number of characters, and the difference end -
start, converted to an integral type. It will probably work
under Unix, but will certainly result in two many characters
under Windows, and on some systems, it may result in nothing
even remotely usable.

Also, of course, on a lot of systems, you can't necessarily
allocate a buffer this big anyway.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 11 '07 #6

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

On 2007-05-11 21:56, James Kanze wrote:

On May 11, 7:09 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
>On 2007-05-11 17:28, Richard Herring wrote:

In message <1178892634.840378.29...@q75g2000hsh.googlegroups. com>,
Gianni Mariani <gi3nos...@mariani.wswrites
On May 11, 11:01 pm, Erik Wikström <eri...@student.chalmers.sewrote:
...
std::string get_string(std::ifstream &in,
std::ifstream::pos_type start,
std::ifstream::pos_type end)
{
char* s = new char[end - start + 1];

>>no corresponding delete[] ...

>>use std::vector<chars(end - start + 1);

>> in.get(s, end - start + 1);
return std::string(s);

>>}

>>Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.

If you know exactly how many characters you want to read, use in.read().

>No, read() is for unformated data (binary) get() should be used for text.

What makes you say that? read() works perfectly well for text.

Note, however, that there is not necessarily a relationship
between the number of characters, and the difference end -
start, converted to an integral type. It will probably work
under Unix, but will certainly result in two many characters
under Windows, and on some systems, it may result in nothing
even remotely usable.

Well, you can of course use whichever one you like, but with get() you
get the null-character at the end of the array for free, which you don't
with read().

--
Erik Wikström

May 11 '07 #7

James Kanze

On May 11, 10:54 pm, Erik Wikström <Erik-wikst...@telia.comwrote:

On 2007-05-11 21:56, James Kanze wrote:

[...]

Well, you can of course use whichever one you like, but with get() you
get the null-character at the end of the array for free, which you don't
with read().

He's using it to construct a string, so he doesn't need the null
character.

FWIW: the next version of the standard will allow reading the
string "in place". Something like:

std::string result ;
result.resize( size ) ;
if ( ! in.get( &result[ 0 ], result.size( 0 ) ) {
result.resize( in.gcount() ) ;
}

This will also work with all current implementations, and since
it will be standard in the future, the probability of an
implementation changing so that it won't work is pretty small.

The real problem in his code, of course, was the arithmetic on
streampos, which isn't guaranteed to give anything usable for
other than positionning in a file. (In particular, under most
systems---Unix is the only exception I know of---the difference
between two streampos will *not* result in the number of char
that can be read between those two positions. Under Windows,
the number will typically be somewhat larger, and on other
systems, who knows.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 12 '07 #8

Similar topics

Problem reading struct in a file

by: The_Kingpin | last post by:

Hi again guys, I've decided to cut my project in section and I found it way easier like this. I'm having a little problem reading struct in a file though. I think after this I'll be able to...

C / C++

Dynamic Memory Allocation for Reading String

by: dough | last post by:

Heres a snippet of my code. I am trying to dynamically allocate memory for reading in strings from a file. FILE *f; /* file to read */ char *s; ...

C / C++

writing to a specific point in an array file

by: melanieab | last post by:

Hi, I'm trying to store all of my data into one file (there're about 140 things to keep track of). I have no problem reading a specific string from the array file, but I wasn't sure how to...

C# / C Sharp

Add string in a file

by: Yoavo | last post by:

I want to add a string to a file after a specific string for example "abcde", Which means I need to open the file find the string "abcde" and then add my string just after. How do I do it? ...

C# / C Sharp

reading from a text file problem

by: SteMc | last post by:

today I tackled, for the first time, opening and reading from a text file. Following the example on the MSDN and declared a variable, strline as a string and objstreamreader as a streamreader. ...

Visual Basic .NET

reading data from a file into a 2d array

by: yourmycaffiene | last post by:

Okay, this if my first post so go easy on me plus I've only been using C for a couple of weeks. I'm working on a program currently that requires me to read data from a .dat file into a 2d array and...

C / C++

Help for Godsake Help me!! How can we do Parsing of Hexdecimel in C# reading string

by: syhzaidi | last post by:

How can we do Parsing of Hexdecimel in C# reading string from stream file for eg.. i have a file like.......... 0f 2f 12 2d 3a.......in hexa decimal save in a file.txt and i m reading it from...

.NET Framework

Code refactoring: Reading string data from file and converting the data type

by: psbasha | last post by:

Hi , When ever we read any data from file ,we read as a single line string ,and we convert the respective field data available in that string based on the data type ( say int,float ). ...

Python

Problem reading string from input file then subsequent integers

by: rtngolem | last post by:

Hi this is part of a function that reads in data and stores in into vectors the input looks like CL905 1 2 SB11 2 3 SB12 2 7 4 5 SB13 2 6 4 now when i run this code the output i get...

C / C++

add string into the file (somewhere in the middle)

by: Jahoo | last post by:

Hello I'm wondering howto make such thing: i would like to add specific string into some configuration files. but not in the end of file (it's obviously easy) but somewhere in the middle for...

PHP

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General