Including large amounts of data in C++ binary

bcomeara

I am writing a program which needs to include a large amount of data.
Basically, the data are p values for different possible outcomes from
trials with different number of observations (the p values are
necessarily based on slow simulations rather than on a standard
function, so I estimated them once and want the program to include
this information). Currently, I have this stored as a vector of
vectors of varying sizes (first vector is indexed by number of
observations for the trial; for each number of observations, there is
a vector containing a p value for different numbers of successes, with
these vectors getting longer as the number of observations (and
therefore possible successes) increases). I created a class containing
this vector of vectors; my program, on starting, creates an object of
this class. However, the file containing just this class is ~50,000
lines long and 10 MB in size, and takes a great deal of time to
compile, especially with optimization turned on. Is there a better way
of building large amounts of data into C++ programs? I could just
include a separate datafile, and have the program call it upon
starting, but then that would require having the program know where
the file is, even when I distribute it. In case this helps, I am
already using the GNU Scientific Library in the program, so using any
functions there is an easy option. My apologies if this question has
an obvious, standard solution I should already know about.

Excerpt from class file (CDFvectorholder) containing vector of
vectors:

vector<vector<double CDFvectorholder::Initialize() {
vector<vector<double CDFvectorcontents;
vector<doublecontentsofrow;
contentsofrow.push_back(0.33298);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=3
contentsofrow.clear();
contentsofrow.push_back(0.07352);
contentsofrow.push_back(0.14733);
contentsofrow.push_back(0.33393);
contentsofrow.push_back(0.78019);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=4
contentsofrow.clear();
contentsofrow.push_back(0.01209);
contentsofrow.push_back(0.03292);
contentsofrow.push_back(0.04202);
contentsofrow.push_back(0.0767);
contentsofrow.push_back(0.13314);
contentsofrow.push_back(0.23417);
contentsofrow.push_back(0.40921);
contentsofrow.push_back(0.58934);
contentsofrow.push_back(0.82239);
contentsofrow.push_back(0.98537);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=5
//ETC
return CDFvectorcontents;
}

and the main program file, initializing the vector of vectors:

vector<vector<double CDFvector;
CDFvectorholder bob;
CDFvector=bob.Initialize();

and using it:

double cdfundermodel=CDFvector[integerB][integerA];

Thank you,
Brian O'Meara

Apr 9 '07 #1

Subscribe Post Reply

1909

Jim Langston

"bc******@ucdavis.edu" <om**********@gmail.comwrote in message
news:11*********************@l77g2000hsb.googlegro ups.com...

>I am writing a program which needs to include a large amount of data.
Basically, the data are p values for different possible outcomes from
trials with different number of observations (the p values are
necessarily based on slow simulations rather than on a standard
function, so I estimated them once and want the program to include
this information). Currently, I have this stored as a vector of
vectors of varying sizes (first vector is indexed by number of
observations for the trial; for each number of observations, there is
a vector containing a p value for different numbers of successes, with
these vectors getting longer as the number of observations (and
therefore possible successes) increases). I created a class containing
this vector of vectors; my program, on starting, creates an object of
this class. However, the file containing just this class is ~50,000
lines long and 10 MB in size, and takes a great deal of time to
compile, especially with optimization turned on. Is there a better way
of building large amounts of data into C++ programs? I could just
include a separate datafile, and have the program call it upon
starting, but then that would require having the program know where
the file is, even when I distribute it. In case this helps, I am
already using the GNU Scientific Library in the program, so using any
functions there is an easy option. My apologies if this question has
an obvious, standard solution I should already know about.

Excerpt from class file (CDFvectorholder) containing vector of
vectors:

vector<vector<double CDFvectorholder::Initialize() {
vector<vector<double CDFvectorcontents;
vector<doublecontentsofrow;
contentsofrow.push_back(0.33298);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=3
contentsofrow.clear();
contentsofrow.push_back(0.07352);
contentsofrow.push_back(0.14733);
contentsofrow.push_back(0.33393);
contentsofrow.push_back(0.78019);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=4
contentsofrow.clear();
contentsofrow.push_back(0.01209);
contentsofrow.push_back(0.03292);
contentsofrow.push_back(0.04202);
contentsofrow.push_back(0.0767);
contentsofrow.push_back(0.13314);
contentsofrow.push_back(0.23417);
contentsofrow.push_back(0.40921);
contentsofrow.push_back(0.58934);
contentsofrow.push_back(0.82239);
contentsofrow.push_back(0.98537);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=5
//ETC
return CDFvectorcontents;
}

and the main program file, initializing the vector of vectors:

vector<vector<double CDFvector;
CDFvectorholder bob;
CDFvector=bob.Initialize();

and using it:

double cdfundermodel=CDFvector[integerB][integerA];

Data does not belong in code. The data should go in a seperate file.
Normally this data file would be in the same directory as the executable.

If you really think the ueer will lose the data file, you can do the trick
of adding it to the end of the executable (if your OS allows it).

Apr 9 '07 #2

Victor Bazarov

bc******@ucdavis.edu wrote:

I am writing a program which needs to include a large amount of data.
Basically, the data are p values for different possible outcomes from
trials with different number of observations (the p values are
necessarily based on slow simulations rather than on a standard
function, so I estimated them once and want the program to include
this information).

I sincerely hope that the data reside in a separate, include-able
source file, which is generated by some other program somehow, instead
of being typed in by a human reading some other print-out or protocol
of some experiment...

Currently, I have this stored as a vector of
vectors of varying sizes (first vector is indexed by number of
observations for the trial; for each number of observations, there is
a vector containing a p value for different numbers of successes, with
these vectors getting longer as the number of observations (and
therefore possible successes) increases). I created a class containing
this vector of vectors; my program, on starting, creates an object of
this class. However, the file containing just this class is ~50,000
lines long and 10 MB in size, and takes a great deal of time to
compile, especially with optimization turned on. Is there a better way
of building large amounts of data into C++ programs?

Something like

------------------- experiments.cpp (generated)
namespace DATA {
double data_000[5] = { 0.0, 1., 2.2, 3.33, 4.444 };
double data_001[7] = { 0.0, 1.1, 2.222, 3.3333, 4.44444, 5.55, 6.66 };
....
double data_042[3] = { 1.1, 2.22, 3.333 };

std::vector<doubledata[] = {
std::vector<double>(data_000,
data_000 + sizeof(data_000) / sizeof(double)),
std::vector<double>(data_001,
data_001 + sizeof(data_001) / sizeof(double)),
...
std::vector<double>(data_042,
data_042 + sizeof(data_042) / sizeof(double)),
};
} // namespace DATA

------------------- my_vectors.cpp
#include <experiments.cpp>

std::vector<std::vector<double
CDFvectorcontents(data.begin(), data.end());

-----------------------------------

?

I could just
include a separate datafile, and have the program call it upon
starting, but then that would require having the program know where
the file is, even when I distribute it. In case this helps, I am
already using the GNU Scientific Library in the program, so using any
functions there is an easy option. My apologies if this question has
an obvious, standard solution I should already know about.

Excerpt from class file (CDFvectorholder) containing vector of
vectors:

vector<vector<double CDFvectorholder::Initialize() {
vector<vector<double CDFvectorcontents;
vector<doublecontentsofrow;
contentsofrow.push_back(0.33298);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=3
contentsofrow.clear();
contentsofrow.push_back(0.07352);
contentsofrow.push_back(0.14733);
contentsofrow.push_back(0.33393);
contentsofrow.push_back(0.78019);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=4
contentsofrow.clear();
contentsofrow.push_back(0.01209);
contentsofrow.push_back(0.03292);
contentsofrow.push_back(0.04202);
contentsofrow.push_back(0.0767);
contentsofrow.push_back(0.13314);
contentsofrow.push_back(0.23417);
contentsofrow.push_back(0.40921);
contentsofrow.push_back(0.58934);
contentsofrow.push_back(0.82239);
contentsofrow.push_back(0.98537);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=5
//ETC
return CDFvectorcontents;
}

and the main program file, initializing the vector of vectors:

vector<vector<double CDFvector;
CDFvectorholder bob;
CDFvector=bob.Initialize();

and using it:

double cdfundermodel=CDFvector[integerB][integerA];

Thank you,
Brian O'Meara

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Apr 9 '07 #3

Gianni Mariani

bc******@ucdavis.edu wrote:

I am writing a program which needs to include a large amount of data.
Basically, the data are p values for different possible outcomes from
trials with different number of observations (the p values are
necessarily based on slow simulations rather than on a standard
function, so I estimated them once and want the program to include
this information). Currently, I have this stored as a vector of
vectors of varying sizes (first vector is indexed by number of
observations for the trial; for each number of observations, there is
a vector containing a p value for different numbers of successes, with
these vectors getting longer as the number of observations (and
therefore possible successes) increases). I created a class containing
this vector of vectors; my program, on starting, creates an object of
this class. However, the file containing just this class is ~50,000
lines long and 10 MB in size, and takes a great deal of time to
compile, especially with optimization turned on. Is there a better way
of building large amounts of data into C++ programs? I could just
include a separate datafile, and have the program call it upon
starting, but then that would require having the program know where
the file is, even when I distribute it. In case this helps, I am
already using the GNU Scientific Library in the program, so using any
functions there is an easy option. My apologies if this question has
an obvious, standard solution I should already know about.

Excerpt from class file (CDFvectorholder) containing vector of
vectors:

vector<vector<double CDFvectorholder::Initialize() {
vector<vector<double CDFvectorcontents;
vector<doublecontentsofrow;
contentsofrow.push_back(0.33298);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=3
contentsofrow.clear();
contentsofrow.push_back(0.07352);
contentsofrow.push_back(0.14733);
contentsofrow.push_back(0.33393);
contentsofrow.push_back(0.78019);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=4
contentsofrow.clear();
contentsofrow.push_back(0.01209);
contentsofrow.push_back(0.03292);
contentsofrow.push_back(0.04202);
contentsofrow.push_back(0.0767);
contentsofrow.push_back(0.13314);
contentsofrow.push_back(0.23417);
contentsofrow.push_back(0.40921);
contentsofrow.push_back(0.58934);
contentsofrow.push_back(0.82239);
contentsofrow.push_back(0.98537);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=5
//ETC
return CDFvectorcontents;
}

and the main program file, initializing the vector of vectors:

vector<vector<double CDFvector;
CDFvectorholder bob;
CDFvector=bob.Initialize();

and using it:

double cdfundermodel=CDFvector[integerB][integerA];

If it is truly a "large" amount of data (say >4meg compiled) then you
can think of using a container that can be statically initialized.

i.e.

// in header
struct datatype
{
double coeffs1[5];
double coeffs2[100];
double coeffs3[20];
};

extern datatype data;

// in data file

datatype data = {
{ 0.2, 0.4, 0.6 },
{ 1.1, 1.2 },
{ 0.1, 0.2 }
};
You could write a wrapper class that "looks" like a const std::vector
that wraps either a std::vector or a regular array so that you don't
need to make copies of the data you have.

Apr 9 '07 #4

James Kanze

On Apr 9, 11:17 pm, "bcome...@ucdavis.edu" <omeara.br...@gmail.com>
wrote:

I am writing a program which needs to include a large amount of data.
Basically, the data are p values for different possible outcomes from
trials with different number of observations (the p values are
necessarily based on slow simulations rather than on a standard
function, so I estimated them once and want the program to include
this information). Currently, I have this stored as a vector of
vectors of varying sizes (first vector is indexed by number of
observations for the trial; for each number of observations, there is
a vector containing a p value for different numbers of successes, with
these vectors getting longer as the number of observations (and
therefore possible successes) increases). I created a class containing
this vector of vectors; my program, on starting, creates an object of
this class. However, the file containing just this class is ~50,000
lines long and 10 MB in size, and takes a great deal of time to
compile, especially with optimization turned on.

If it's just data, optimization should make no difference.

Is there a better way
of building large amounts of data into C++ programs? I could just
include a separate datafile, and have the program call it upon
starting, but then that would require having the program know where
the file is, even when I distribute it. In case this helps, I am
already using the GNU Scientific Library in the program, so using any
functions there is an easy option. My apologies if this question has
an obvious, standard solution I should already know about.

Excerpt from class file (CDFvectorholder) containing vector of
vectors:

vector<vector<double CDFvectorholder::Initialize() {
vector<vector<double CDFvectorcontents;
vector<doublecontentsofrow;
contentsofrow.push_back(0.33298);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=3
contentsofrow.clear();
contentsofrow.push_back(0.07352);
contentsofrow.push_back(0.14733);
contentsofrow.push_back(0.33393);
contentsofrow.push_back(0.78019);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=4
contentsofrow.clear();
contentsofrow.push_back(0.01209);
contentsofrow.push_back(0.03292);
contentsofrow.push_back(0.04202);
contentsofrow.push_back(0.0767);
contentsofrow.push_back(0.13314);
contentsofrow.push_back(0.23417);
contentsofrow.push_back(0.40921);
contentsofrow.push_back(0.58934);
contentsofrow.push_back(0.82239);
contentsofrow.push_back(0.98537);
contentsofrow.push_back(1);
CDFvectorcontents.push_back(contentsofrow); //comparison where
ntax=5
//ETC
return CDFvectorcontents;
}

And this is called at program start-up? Start-up isn't going to
be very fast.

and the main program file, initializing the vector of vectors:

vector<vector<double CDFvector;
CDFvectorholder bob;
CDFvector=bob.Initialize();

and using it:

double cdfundermodel=CDFvector[integerB][integerA];

I'd say that this is one case I'd use C style arrays, and static
initialization. It will still take some time to compile it, but
no where near as much as if you call a function on a templated
class for each element. And start-up time will be effectively
zero.

If you do need some of the additional features of std::vector,
then you can still use the static, C-style array to initialize
it, e.g.
std::vector( startAddress, endAddress ) ;
(Whatever code generates the C-style array can also be used to
generate the startAddress and endAddress variables.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 10 '07 #5

by: michaaal | last post by:

If I use a form to pass data (say, for example, through a textbox) the data seems to be limited to somewhat smaller amounts. What should I do if I want to pass a large amount of data? For example...

ASP / Active Server Pages

Dealing with large amounts of data

by: Digety | last post by:

We are looking to store a large amount of user data that will be changed and accessed daily by a large number of people. We expect around 6-8 million subscribers to our service with each record...

Microsoft SQL Server

Inserting large amounts of data

by: oshanahan | last post by:

Does anyone have ideas on the best way to move large amounts of data between tables? I am doing several simple insert/select statements from a staging table to several holding tables, but because...

Microsoft SQL Server

array iteration?

by: CSN | last post by:

Is it possible to iterate over an array in plpgsql? Something like: function insert_stuff (rel_ids int) .... foreach rel_ids as id insert into table (rel_id, val) values (id, 5);

PostgreSQL Database

Encryption large amount of data by using Asymmetric encryption.

by: Bart | last post by:

Dear all, I would like to encrypt a large amount of data by using public/private keys, but I read on MSDN: "Symmetric encryption is performed on streams and is therefore useful to encrypt large...

C# / C Sharp

Bytea/Large Objects/external files best practices

by: David Helgason | last post by:

I think those best practices threads are a treat to follow (might even consider archiving some of them in a sort of best-practices faq), so here's one more. In coding an game asset server I want...

PostgreSQL Database

Transport large binary file from Window client application to Web Server & back

by: gauravkhanna | last post by:

Hi All I need some help for the below problem: Scenario We need to send large binary files (audio file of about 10 MB or so) from the client machine (.Net Windows based application, located...

.NET Framework

Transferring large data using WSE3?

by: Asaf | last post by:

Hi, I am developing a windows forms client application that will send a large XML data to a web server using Web Services. I saw this article http://www.codeproject.com/soap/MTOMWebServices.asp...

.NET Framework

Best practices for moving large amounts of data using WCF ...

by: =?Utf-8?B?TW9iaWxlTWFu?= | last post by:

Hello everyone: I am looking for everyone's thoughts on moving large amounts (actually, not very large, but large enough that I'm throwing exceptions using the default configurations). We're...

.NET Framework

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Including large amounts of data in C++ binary

Similar topics