How to read tsv file?

BCC

Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how
long it may be. I think I have to have a vector of vectors of strings, and
then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the
line for tabs and newlines, and stuff the elements into the vector. Is it
better to read in the whole line then parse it? Can I parse it on the fly?
How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
}

Thanks,
Bryan

Jul 22 '05 #1

Subscribe Post Reply

13237

Victor Bazarov

"BCC" <a@b.c> wrote...

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how
long it may be. I think I have to have a vector of vectors of strings, and then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the line for tabs and newlines, and stuff the elements into the vector. Is it
better to read in the whole line then parse it?
Oh, so much better...
Can I parse it on the fly?
I don't know. Can you?
How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
If you know how many fields to expect, you could use get( ... , '\t') N-1
times and then get( ... , '\n') and then again and again.

Easier still to get one by one character and watch for '\t' and '\n'. But
I would still do the "get the whole line and then parse it" thing.
}

Jul 22 '05 #2

Sharad Kala

"BCC" <a@b.c> wrote in message
news:p1****************@newssvr29.news.prodigy.com ...

Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how
long it may be. I think I have to have a vector of vectors of strings, and
then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the
line for tabs and newlines, and stuff the elements into the vector. Is it
better to read in the whole line then parse it? Can I parse it on the fly?
How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
}

May be this gives you the basic idea.
I haven't tested it. Also no checks for errors etc.

<UNTESTED CODE>

#include <fstream>
#include <string>
#include <vector>
using namespace std;

void ReadTSV(const char* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}
string str;

vector<vector<string> > vvStr;
vector<string> vStr;
int pos1, pos2;
while (getline(infile, str))
{
pos1 = 0;
while((pos2 = str.find('\t'))!= string::npos)
{
vStr.push_back(str.substr(pos1, pos2));
pos1 = pos2++;
}
vStr.push_back(str.substr(pos1, string::npos));
vvStr.push_back(vStr);
}

}

</UNTESTED CODE>

Best wishes,
Sharad

Jul 22 '05 #3

Jonathan Turkanis

"BCC" <a@b.c> wrote in message
news:p1****************@newssvr29.news.prodigy.com ...

Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how long it may be. I think I have to have a vector of vectors of strings, and then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure ho to parse the line for tabs and newlines, and stuff the elements into the vector. Is it better to read in the whole line then parse it? Can I parse it on the fly? How?

Here's some code I wrote some time ago for splitting sequences of
characters and adding them to lists. I have used it a lot with Visual
C++. I don''t guarantee its portability or efficiency, but I looks
generally okay.

Usage:

struct is_tab {
bool operator(char c) const { return c == '\t'; }
};

// Split s using tab as a separator character,
// adding segments to the end of a vector.
string s;
vector<string> vec;
split(s.begin(), s.end(), back_inserter(vec), is_tab(), false);

Here you could use any input iterators for the first and second
arguments; in particular, you should be able to use istream_iterators
or istreambuf_iterators.

Jonathan
---------------------
//
// File name: split.h
//
// Descriptions: Contains template functions for splitting a string
into
// a list.
//
// Author: Jonathan Turkanis
//
// Copyright: Jonathan Turkanis, July 29, 2002. See Readme.txt for
// license information.
//

#ifndef UT_SPLIT_H_INCLUDED
#define UT_SPLIT_H_INCLUDED

#include <iterator>
#include <locale>
#include <string>
#include <boost/bind.hpp>
#include <boost/ref.hpp>

namespace Utility {

//
// Function name: split.
//
// Description: Splits the given string into components.
//
// Template paramters:
// InIt - An input iterator type with any value type Elem.
// OutIt - An output iterator type with value type equal to
// std::basic_string<Elem>.
// Pred - A predicate with argument type Elem.
// Parameters:
// first - The beginning of the input sequence.
// last - The end of the input sequence.
// dest - Receives the terms in the generated list.
// sep - Determines where to split the input sequence.
// coalesce - true if sequences of consecutive elements satisfying
sep
// should be treated as one. Defaults to true.
//
template<class InIt, class OutIt, class Pred>
void split(InIt first, InIt last, OutIt dest, Pred sep, bool coalesce
= true);

//
// Function name: split_by_whitespace.
//
// Description: Splits the given string into components.
//
// Template paramters:
// InIt - An input iterator type with any value type Elem.
// OutIt - An output iterator type with value type equal to
// std::basic_string<Elem>.
// Pred - A predicate with argument type Elem.
// Parameters:
// first - The begiining of the input sequence.
// last - The end of the input sequence.
// dest - Receives the terms in the generated list.
//
template<class InIt, class OutIt>
void split_by_whitespace(InIt first, InIt last, OutIt dest)
{
using namespace std;
typedef iterator_traits<InIt>::value_type char_type;
locale loc;
split(first, last, dest, boost::bind(isspace<char_type>, _1,
boost::ref(loc)));
}

template<class InIt, class OutIt, class Pred>
void split(InIt first, InIt last, OutIt dest, Pred sep, bool coalesce)
{
using namespace std;
typedef iterator_traits<InIt>::value_type char_type;
typedef basic_string<char_type> string_type;

bool prev = true; // True if prev char was a separator.
string_type term;
while (first != last) {
char_type c = *first++;
bool is_sep = sep(c);
if (is_sep && (!coalesce || coalesce && !prev)) {
*dest++ = term;
term.clear();
}
if (!is_sep)
term += c;
prev = is_sep;
}
if (!term.empty() && !coalesce || coalesce && !prev)
*dest++ = term;
}
}

#endif // #ifndef UT_SPLIT_H_INCLUDED

Jul 22 '05 #4

Sharad Kala

"Sharad Kala" <no*****************@yahoo.com> wrote in message
news:bv************@ID-221354.news.uni-berlin.de...

"BCC" <a@b.c> wrote in message
news:p1****************@newssvr29.news.prodigy.com ...
Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how
long it may be. I think I have to have a vector of vectors of strings, and
then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the
line for tabs and newlines, and stuff the elements into the vector. Is it
better to read in the whole line then parse it? Can I parse it on the fly?
How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
}

May be this gives you the basic idea.
I haven't tested it. Also no checks for errors etc.

<UNTESTED CODE>

#include <fstream>
#include <string>
#include <vector>
using namespace std;

void ReadTSV(const char* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}
string str;

vector<vector<string> > vvStr;
vector<string> vStr;
int pos1, pos2;
while (getline(infile, str))
{
pos1 = 0;
while((pos2 = str.find('\t'))!= string::npos)
{
vStr.push_back(str.substr(pos1, pos2));

oops..second parameter should be pos2-pos1+1 i guess.

Jul 22 '05 #5

Jon Bell

In article <p1****************@newssvr29.news.prodigy.com>, BCC <a@b.c> wrote:

Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors,

Use getline() to read one line at a time, then use a stringstream to split
the line into tokens. Note you can specify some other line terminator
than '\n', for getline().

std::vector<std::vector<std::string> > m_data_vec;
std::string line;
while (std::getline (infile, line))
{
std::istringstream linestream (line);
std::string token;
std::vector<std::string> row;
while (std::getline (linestream, token, '\t')
{
row.push_back (token);
}
m_data_vec.push_back (row);
}

Actually, your example is easy to parse without a stringstream, if you
use a struct to represent a line, with appropriate member data types:

struct data_rec
{
double foo, bar;
std::string baz;
};

std::vector<data_rec> m_data_vec;
data_rec linedata;
while ((infile >> linedata.foo >> linedata.bar))
&& std::getline (input, linedata.baz))
{
m_data_vec.push_back (linedata);
}

--
Jon Bell <jt*******@presby.edu> Presbyterian College
Dept. of Physics and Computer Science Clinton, South Carolina USA

Jul 22 '05 #6

Jon Bell

In article <p1****************@newssvr29.news.prodigy.com>, BCC <a@b.c> wrote:

Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors,

Use getline() to read one line at a time, then use a stringstream to split
the line into tokens. Note you can specify some other line terminator
than '\n', for getline().

std::vector<std::vector<std::string> > m_data_vec;
std::string line;
while (std::getline (infile, line))
{
std::istringstream linestream (line);
std::string token;
std::vector<std::string> row;
while (std::getline (linestream, token, '\t')
{
row.push_back (token);
}
m_data_vec.push_back (row);
}

Actually, your example is easy to parse without a stringstream, if you
use a struct to represent a line, with appropriate member data types:

struct data_rec
{
double foo, bar;
std::string baz;
};

std::vector<data_rec> m_data_vec;
data_rec linedata;
while ((infile >> linedata.foo >> linedata.bar)
&& std::getline (infile, linedata.baz))
{
m_data_vec.push_back (linedata);
}

--
Jon Bell <jt*******@presby.edu> Presbyterian College
Dept. of Physics and Computer Science Clinton, South Carolina USA

Jul 22 '05 #7

Chris Theis

"Sharad Kala" <no*****************@yahoo.com> wrote in message
news:bv************@ID-221354.news.uni-berlin.de...

"Sharad Kala" <no*****************@yahoo.com> wrote in message
news:bv************@ID-221354.news.uni-berlin.de...

"BCC" <a@b.c> wrote in message
news:p1****************@newssvr29.news.prodigy.com ...
Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how long it may be. I think I have to have a vector of vectors of strings, and then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the line for tabs and newlines, and stuff the elements into the vector. Is it better to read in the whole line then parse it? Can I parse it on the fly? How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
}

May be this gives you the basic idea.
I haven't tested it. Also no checks for errors etc.

<UNTESTED CODE>

#include <fstream>
#include <string>
#include <vector>
using namespace std;

void ReadTSV(const char* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}
string str;

vector<vector<string> > vvStr;
vector<string> vStr;
int pos1, pos2;
while (getline(infile, str))
{
pos1 = 0;
while((pos2 = str.find('\t'))!= string::npos)
{
vStr.push_back(str.substr(pos1, pos2));

oops..second parameter should be pos2-pos1+1 i guess.

There is even an easier way to obtain the vStr vector using stringstreams:
template <class T>
std::vector<T> StringToVector( const std::string& Str )
{
std::istringstream iss( Str );
return std::vector<T>( std::istream_iterator<T>(iss),
std::istream_iterator<T>() );
}

[OT]
Using VC++ 6.0 this solution has to be altered a little bit using copy and a
back_inserter 'cause the appropriate ctor of vector is not yet available in
that compiler version.

Regards
Chris

Jul 22 '05 #8

Sharad Kala

"Chris Theis" <Ch*************@nospam.cern.ch> wrote in message
news:bv**********@sunnews.cern.ch...

"Sharad Kala" <no*****************@yahoo.com> wrote in message
news:bv************@ID-221354.news.uni-berlin.de...

"Sharad Kala" <no*****************@yahoo.com> wrote in message
news:bv************@ID-221354.news.uni-berlin.de...

"BCC" <a@b.c> wrote in message
news:p1****************@newssvr29.news.prodigy.com ...
> Hi,
>
> I have a tab separated value table like this:
> header1 header2 header3
> 13.455 55.3 A string
> 4.55 5.66 Another string
>
> I want to load this guy into a vector of vectors, since I do not know how > long it may be. I think I have to have a vector of vectors of strings, and > then extract the doubles later(?):
> std::vector<std::vector<std::string> > m_data_vec;
>
> I started off with this skeletal function, but Im not sure how to parse the > line for tabs and newlines, and stuff the elements into the vector. Is it > better to read in the whole line then parse it? Can I parse it on the fly? > How?
>
> void MyClass::ReadTSV(const* filename)
> {
> using namespace std;
>
> ifstream infile(filename);
> if (!infile) {
> cout << "unable to load file" << endl;
> }
>
> // Now what?
> }
May be this gives you the basic idea.
I haven't tested it. Also no checks for errors etc.

<UNTESTED CODE>

#include <fstream>
#include <string>
#include <vector>
using namespace std;

void ReadTSV(const char* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}
string str;

vector<vector<string> > vvStr;
vector<string> vStr;
int pos1, pos2;
while (getline(infile, str))
{
pos1 = 0;
while((pos2 = str.find('\t'))!= string::npos)
{
vStr.push_back(str.substr(pos1, pos2));

oops..second parameter should be pos2-pos1+1 i guess.

There is even an easier way to obtain the vStr vector using stringstreams:
template <class T>
std::vector<T> StringToVector( const std::string& Str )
{
std::istringstream iss( Str );
return std::vector<T>( std::istream_iterator<T>(iss),
std::istream_iterator<T>() );
}

How do you take care of the '\t' in the string?

Jul 22 '05 #9

David Harmon

On Fri, 30 Jan 2004 16:20:30 +0530 in comp.lang.c++, "Sharad Kala"
<no*****************@yahoo.com> was alleged to have written:

template <class T>
std::vector<T> StringToVector( const std::string& Str )
{
std::istringstream iss( Str );
return std::vector<T>( std::istream_iterator<T>(iss),
std::istream_iterator<T>() );
}

How do you take care of the '\t' in the string?

istream_iterator<T> uses T's operator>> which in turn recognizes any
kind of whitespace as a delimiter.

Jul 22 '05 #10

Chris Theis

"Sharad Kala" <no*****************@yahoo.com> wrote in message
news:bv************@ID-221354.news.uni-berlin.de...
[SNIP]> >

There is even an easier way to obtain the vStr vector using stringstreams:

template <class T>
std::vector<T> StringToVector( const std::string& Str )
{
std::istringstream iss( Str );
return std::vector<T>( std::istream_iterator<T>(iss),
std::istream_iterator<T>() );
}

How do you take care of the '\t' in the string?

This should be done by the istream_iterators (at least in the Dinkumware
implementation used under VC++). However, I did not yet try it under another
compiler like g++.

Cheers
Chris

Jul 22 '05 #11

Similar topics

read from file or mysql

by: Yang Li Ke | last post by:

Hi guys! I have some datas that I must check everytime a visitor comes to my site What is better to do: 1- Read data from a file or 2- Read data from a mysql db Thank you

PHP

Help with "read" issue please

by: ZafT | last post by:

Thanks in advance for any tips that might get me going in the right direction. I am working on a simple exercise for school that is supposed to use read to read a file (about 10 MB). I am...

C / C++

Simultaneously Write to and Read from the same file

by: cnu | last post by:

My program generates a log file for every event that happens in the program. So, I open the file and keep it open till the end. This is how I open the file for writing: <CODE> public...

.NET Framework

Which way is best to read from a file?

by: ESPN Lover | last post by:

Below is two snippets of code from MSDN showing how to read a file. Is one way preferred over the other and why? Thanks. using System; using System.IO; class Test { public static void...

ASP.NET

Write/Read struct to file

by: a | last post by:

I have a struct to write to a file struct _structA{ long x; int y; float z; } struct _structA A; //file open write(fd,A,sizeof(_structA)); //file close

C / C++

Check if program finished to read a whole file

by: lovecreatesbea... | last post by:

The condition at line 31 is added to check if the program finished to read the whole file. Is it needed and correct? Thank you. #include <fstream> #include <iostream> #include <string> using...

C / C++

how to read wav.file and put it into byte array and do the DFT?

by: lovecarole | last post by:

hi, i am the student who should write a program about reading wav file and do the DFT. actually i don't know how to read data of the wav song and save it into the array... if i want to read...

Java

Fast way to read a text file line by line

by: Thomas Kowalski | last post by:

Hi, currently I am reading a huge (about 10-100 MB) text-file line by line using fstreams and getline. I wonder whether there is a faster way to read a file line by line (with std::string line)....

C / C++

read xml file from compressed file using gzip

by: flebber | last post by:

I was working at creating a simple program that would read the content of a playlist file( in this case *.k3b") and write it out . the compressed "*.k3b" file has two file and the one I was trying...

Python

Unable to read large files from zip

by: Kevin Ar18 | last post by:

I posted this on the forum, but nobody seems to know the solution: http://python-forum.org/py/viewtopic.php?t=5230 I have a zip file that is several GB in size, and one of the files inside of it...

Python

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General