471,071 Members | 1,340 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,071 software developers and data experts.

Aggregate C function accumulating a text array

Hello,
I am about to write a set of C functions to be used in an aggregate
function in which the final function performs a calculation on an array
of accumulated text data types stored in a text[] array. I need to use
the text type because this function will be used on DNA sequences which
can be very large. My questions are the following. What is the most
efficient way to accumulate a text array while being efficient with
memory? I see construct_array() used in accumulation functions but I am
worried that I might end up making a copy of a potentially very large
text array each time my accumulation function is called.

The general flow is

User defined aggregate function
SELECT pb_distance_k2p(sequence) WHERE family_id = 10;

uses accumulation function

distance_accum(PG_FUNCTION_ARGS);

and uses a final function

calculate_distance_k2p(PG_FUNCTION_ARGS)

which needs to deconstruct_array() to get the text array and loop
through the array to do some pairwise comparisons of the text and return
a multidimensional array

Am I thinking about this correctly? Are there any potential pitfalls in
the proposed strategy? I greatly appreciate your feedback.

- Joel

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #1
1 2133
Joel Dudley wrote:
I am about to write a set of C functions to be used in an aggregate
function in which the final function performs a calculation on an array
of accumulated text data types stored in a text[] array. I need to use
the text type because this function will be used on DNA sequences which
can be very large. My questions are the following. What is the most
efficient way to accumulate a text array while being efficient with
memory? I see construct_array() used in accumulation functions but I am
worried that I might end up making a copy of a potentially very large
text array each time my accumulation function is called.
True, but the intermediate results should be released after each row, I
think. You might try it with some real data before assuming a
performance problem.

If it is a problem, take a look at how contrib/intagg works. It
basically just passes a pointer from call to call. You could do
something similar for the text data type.
The general flow is

User defined aggregate function
SELECT pb_distance_k2p(sequence) WHERE family_id = 10;

uses accumulation function

distance_accum(PG_FUNCTION_ARGS);

and uses a final function

calculate_distance_k2p(PG_FUNCTION_ARGS)

which needs to deconstruct_array() to get the text array and loop
through the array to do some pairwise comparisons of the text and return
a multidimensional array


Makes sense to me. BTW, take a look at PL/R
http://www.joeconway.com/plr/

It would allow you to write your final function in R, which has many
extensions related to bioinformatics -- see:
http://www.bioconductor.org/

HTH,

Joe

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Claudio Lapidus | last post: by
33 posts views Thread by Pushkar Pradhan | last post: by
1 post views Thread by Najib Abi Fadel | last post: by
1 post views Thread by Scott Gerhardt | last post: by
13 posts views Thread by markn | last post: by
MarkoKlacar
5 posts views Thread by MarkoKlacar | last post: by
4 posts views Thread by shapper | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.