473,561 Members | 3,486 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Efficiently reading a string from a specific point in a file

Hi,

I'm writing a program which creates an index of text files. For each
file it
processes, the program records the start and end positions (as
returned by
tellg()) of sections of interest, and then some time later uses these
positions
to read the interesting sections from the file.

When reading the sections, I'm currently using get() to read
characters from the
file one by one and concatenating them to what has already been read.
However, I
guess this will be fairly inefficient if the text to extract is long.

Is there a more efficient way to do this, perhaps using an existing
library
function? I'd imagine that this question has been asked before, but
when
googling for answers I could only find solutions for reading entire
files
completely; I can't do that because the files are too large to store
in memory.

My code is below; any advice would be gratefully received!

#include <iostream>
#include <string>
#include <fstream>
std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end) {

in.seekg(start) ;

std::string s;

while (in.tellg() != end) {
s += in.get(); // Not very efficient?
}

return s;
}

int main(void) {

std::ifstream in("test_file" , std::ios_base:: binary);

// Hard-coded positions below; these would normally be returned
from tellg()
std::cout << "\"" << get_string(in, 10, 19) << "\"" << std::endl;

return 0;
}

May 11 '07 #1
7 3054
On 11 Maj, 14:11, random guy <r...@mail.comw rote:
Hi,

I'm writing a program which creates an index of text files. For each
file it
processes, the program records the start and end positions (as
returned by
tellg()) of sections of interest, and then some time later uses these
positions
to read the interesting sections from the file.

When reading the sections, I'm currently using get() to read
characters from the
file one by one and concatenating them to what has already been read.
However, I
guess this will be fairly inefficient if the text to extract is long.

Is there a more efficient way to do this, perhaps using an existing
library
function? I'd imagine that this question has been asked before, but
when
googling for answers I could only find solutions for reading entire
files
completely; I can't do that because the files are too large to store
in memory.

My code is below; any advice would be gratefully received!

#include <iostream>
#include <string>
#include <fstream>

std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end) {

in.seekg(start) ;

std::string s;

while (in.tellg() != end) {
s += in.get(); // Not very efficient?
}

return s;

}
You can do something like this:

std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end)
{
char* s = new char[end - start + 1];
in.get(s, end - start + 1);
return std::string(s);
}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading. I'm not sure what will happen if it reaches EOF.

--
Erik Wikström

May 11 '07 #2
On May 11, 11:01 pm, Erik Wikström <eri...@student .chalmers.sewro te:
....
std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end)
{
char* s = new char[end - start + 1];
no corresponding delete[] ...

use std::vector<cha rs(end - start + 1);
in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading. I'm not sure what will happen if it reaches EOF.

--
Erik Wikström

May 11 '07 #3
In message <11************ *********@q75g2 000hsh.googlegr oups.com>,
Gianni Mariani <gi*******@mari ani.wswrites
>On May 11, 11:01 pm, Erik Wikström <eri...@student .chalmers.sewro te:
...
>std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end)
{
char* s = new char[end - start + 1];

no corresponding delete[] ...

use std::vector<cha rs(end - start + 1);
> in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.
If you know exactly how many characters you want to read, use in.read().
>I'm not sure what will happen if it reaches EOF.
--
Richard Herring
May 11 '07 #4
On 2007-05-11 17:28, Richard Herring wrote:
In message <11************ *********@q75g2 000hsh.googlegr oups.com>,
Gianni Mariani <gi*******@mari ani.wswrites
>>On May 11, 11:01 pm, Erik Wikström <eri...@student .chalmers.sewro te:
...
>>std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end)
{
char* s = new char[end - start + 1];

no corresponding delete[] ...

use std::vector<cha rs(end - start + 1);
>> in.get(s, end - start + 1);
return std::string(s);

}

Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.

If you know exactly how many characters you want to read, use in.read().
No, read() is for unformated data (binary) get() should be used for text.

--
Erik Wikström
May 11 '07 #5
On May 11, 7:09 pm, Erik Wikström <Erik-wikst...@telia. comwrote:
On 2007-05-11 17:28, Richard Herring wrote:
In message <1178892634.840 378.29...@q75g2 000hsh.googlegr oups.com>,
Gianni Mariani <gi3nos...@mari ani.wswrites
>On May 11, 11:01 pm, Erik Wikström <eri...@student .chalmers.sewro te:
...
std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end)
{
char* s = new char[end - start + 1];
>no corresponding delete[] ...
>use std::vector<cha rs(end - start + 1);
> in.get(s, end - start + 1);
return std::string(s);
>}
>Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.
If you know exactly how many characters you want to read, use in.read().
No, read() is for unformated data (binary) get() should be used for text.
What makes you say that? read() works perfectly well for text.

Note, however, that there is not necessarily a relationship
between the number of characters, and the difference end -
start, converted to an integral type. It will probably work
under Unix, but will certainly result in two many characters
under Windows, and on some systems, it may result in nothing
even remotely usable.

Also, of course, on a lot of systems, you can't necessarily
allocate a buffer this big anyway.

--
James Kanze (Gabi Software) email: ja*********@gma il.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 11 '07 #6
On 2007-05-11 21:56, James Kanze wrote:
On May 11, 7:09 pm, Erik Wikström <Erik-wikst...@telia. comwrote:
>On 2007-05-11 17:28, Richard Herring wrote:
In message <1178892634.840 378.29...@q75g2 000hsh.googlegr oups.com>,
Gianni Mariani <gi3nos...@mari ani.wswrites
On May 11, 11:01 pm, Erik Wikström <eri...@student .chalmers.sewro te:
...
std::string get_string(std: :ifstream &in,
std::ifstream:: pos_type start,
std::ifstream:: pos_type end)
{
char* s = new char[end - start + 1];
>>no corresponding delete[] ...
>>use std::vector<cha rs(end - start + 1);
>> in.get(s, end - start + 1);
return std::string(s);
>>}
>>Notice that by default get() stops reading at \n, if you don't want
that behaviour you need to provide a third argument which is a
delimiting character, \0 should work if you never want it to stop
reading.
If you know exactly how many characters you want to read, use in.read().
>No, read() is for unformated data (binary) get() should be used for text.

What makes you say that? read() works perfectly well for text.

Note, however, that there is not necessarily a relationship
between the number of characters, and the difference end -
start, converted to an integral type. It will probably work
under Unix, but will certainly result in two many characters
under Windows, and on some systems, it may result in nothing
even remotely usable.
Well, you can of course use whichever one you like, but with get() you
get the null-character at the end of the array for free, which you don't
with read().

--
Erik Wikström
May 11 '07 #7
On May 11, 10:54 pm, Erik Wikström <Erik-wikst...@telia. comwrote:
On 2007-05-11 21:56, James Kanze wrote:
[...]
Well, you can of course use whichever one you like, but with get() you
get the null-character at the end of the array for free, which you don't
with read().
He's using it to construct a string, so he doesn't need the null
character.

FWIW: the next version of the standard will allow reading the
string "in place". Something like:

std::string result ;
result.resize( size ) ;
if ( ! in.get( &result[ 0 ], result.size( 0 ) ) {
result.resize( in.gcount() ) ;
}

This will also work with all current implementations , and since
it will be standard in the future, the probability of an
implementation changing so that it won't work is pretty small.

The real problem in his code, of course, was the arithmetic on
streampos, which isn't guaranteed to give anything usable for
other than positionning in a file. (In particular, under most
systems---Unix is the only exception I know of---the difference
between two streampos will *not* result in the number of char
that can be read between those two positions. Under Windows,
the number will typically be somewhat larger, and on other
systems, who knows.)

--
James Kanze (Gabi Software) email: ja*********@gma il.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 12 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1773
by: The_Kingpin | last post by:
Hi again guys, I've decided to cut my project in section and I found it way easier like this. I'm having a little problem reading struct in a file though. I think after this I'll be able to handle it on my own. Right now the file is opened correctly and I'm able to read each line and print them in a new file. My problem is to insert each...
6
3407
by: dough | last post by:
Heres a snippet of my code. I am trying to dynamically allocate memory for reading in strings from a file. FILE *f; /* file to read */ char *s; /* string being read */ f = "somefile.txt"; s = malloc(sizeof(char*)); /* allocates mem of string */ while( fscanf(f, "%s", s) !=...
2
6844
by: melanieab | last post by:
Hi, I'm trying to store all of my data into one file (there're about 140 things to keep track of). I have no problem reading a specific string from the array file, but I wasn't sure how to replace just one item. I know I can get the entire array, then save the whole thing (with a for loop and if statements so that the changed data will be...
3
1561
by: Yoavo | last post by:
I want to add a string to a file after a specific string for example "abcde", Which means I need to open the file find the string "abcde" and then add my string just after. How do I do it? Yoav.
2
1267
by: SteMc | last post by:
today I tackled, for the first time, opening and reading from a text file. Following the example on the MSDN and declared a variable, strline as a string and objstreamreader as a streamreader. Basically I read in a line of a text file - this works fine. I then read in another line. This works fine However, the third time round I read...
29
10387
by: yourmycaffiene | last post by:
Okay, this if my first post so go easy on me plus I've only been using C for a couple of weeks. I'm working on a program currently that requires me to read data from a .dat file into a 2d array and then print out the contents of the 2d array to the screen. I wil also need to append data to the .dat file but mostly right now I'm worrying about...
1
1639
by: syhzaidi | last post by:
How can we do Parsing of Hexdecimel in C# reading string from stream file for eg.. i have a file like.......... 0f 2f 12 2d 3a.......in hexa decimal save in a file.txt and i m reading it from the file....... now i have to convert this in decimal and save in an array.of integers.......i thought it can be achieved through parsing ..means 0f...
3
3245
by: psbasha | last post by:
Hi , When ever we read any data from file ,we read as a single line string ,and we convert the respective field data available in that string based on the data type ( say int,float ). Please suggest me which is the best way of handling the file data. I- Method: ---------------- Store as single line string data's(upto end of file...
3
1678
by: rtngolem | last post by:
Hi this is part of a function that reads in data and stores in into vectors the input looks like CL905 1 2 SB11 2 3 SB12 2 7 4 5 SB13 2 6 4 now when i run this code the output i get that uses this code is
4
2066
by: Jahoo | last post by:
Hello I'm wondering howto make such thing: i would like to add specific string into some configuration files. but not in the end of file (it's obviously easy) but somewhere in the middle for example in my.cnf after line I'd like to add: key_buffer = 16M
0
7647
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7570
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
1
7618
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6210
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5472
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3617
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3600
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1181
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
896
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.