473,405 Members | 2,262 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Aggregate C function accumulating a text array

Hello,
I am about to write a set of C functions to be used in an aggregate
function in which the final function performs a calculation on an array
of accumulated text data types stored in a text[] array. I need to use
the text type because this function will be used on DNA sequences which
can be very large. My questions are the following. What is the most
efficient way to accumulate a text array while being efficient with
memory? I see construct_array() used in accumulation functions but I am
worried that I might end up making a copy of a potentially very large
text array each time my accumulation function is called.

The general flow is

User defined aggregate function
SELECT pb_distance_k2p(sequence) WHERE family_id = 10;

uses accumulation function

distance_accum(PG_FUNCTION_ARGS);

and uses a final function

calculate_distance_k2p(PG_FUNCTION_ARGS)

which needs to deconstruct_array() to get the text array and loop
through the array to do some pairwise comparisons of the text and return
a multidimensional array

Am I thinking about this correctly? Are there any potential pitfalls in
the proposed strategy? I greatly appreciate your feedback.

- Joel

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #1
1 2247
Joel Dudley wrote:
I am about to write a set of C functions to be used in an aggregate
function in which the final function performs a calculation on an array
of accumulated text data types stored in a text[] array. I need to use
the text type because this function will be used on DNA sequences which
can be very large. My questions are the following. What is the most
efficient way to accumulate a text array while being efficient with
memory? I see construct_array() used in accumulation functions but I am
worried that I might end up making a copy of a potentially very large
text array each time my accumulation function is called.
True, but the intermediate results should be released after each row, I
think. You might try it with some real data before assuming a
performance problem.

If it is a problem, take a look at how contrib/intagg works. It
basically just passes a pointer from call to call. You could do
something similar for the text data type.
The general flow is

User defined aggregate function
SELECT pb_distance_k2p(sequence) WHERE family_id = 10;

uses accumulation function

distance_accum(PG_FUNCTION_ARGS);

and uses a final function

calculate_distance_k2p(PG_FUNCTION_ARGS)

which needs to deconstruct_array() to get the text array and loop
through the array to do some pairwise comparisons of the text and return
a multidimensional array


Makes sense to me. BTW, take a look at PL/R
http://www.joeconway.com/plr/

It would allow you to write your final function in R, which has many
extensions related to bioinformatics -- see:
http://www.bioconductor.org/

HTH,

Joe

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Claudio Lapidus | last post by:
Hello I would like to know how can I define/create a new aggregate function. I need a custom function that operate on a set of text strings and return a certain string aggregate based on certain...
10
by: neb | last post by:
Dear member of the forum, Ms access has built-in aggregate function like: -Sum, Max, First, Avg, ... Is it possible to build user-defined aggregate? (if you have any clue, do not hesitate to...
33
by: Pushkar Pradhan | last post by:
I'm using clock() to time parts of my code e.g. clk1 = clock(); /* code */ clk2 = clock(); /* calculate time in secs */ ...... clk1 = clock(); /* code */ clk2 = clock();
1
by: Joel Dudley | last post by:
Hello, I am about to write a set of C functions to be used in an aggregate function in which the final function performs a calculation on an array of accumulated text data types stored in a text...
1
by: Najib Abi Fadel | last post by:
Hi i have an ordered table of dates let's say: 1/1/2004 8/1/2004 15/1/2004 29/1/2004 5/2/2004 12/2/2004
1
by: Scott Gerhardt | last post by:
Hello, I am new to the list, my apology if this question is beyond the scope or charter of this list. My questions is: What is the best method to perform an aggregate query to calculate sum()...
13
by: markn | last post by:
Running some code through static analysis, I noticed that gcc will generate a warning if a function returns an aggregate, controlled with this flag (from the gcc manual): -Waggregate-return...
5
MarkoKlacar
by: MarkoKlacar | last post by:
Hi, I have two problems. I have a class a called Edit that contains the following: Header void setCharAt(int index,char ch); void rubout(int index) const;
4
by: shapper | last post by:
Hello, I have the following Linq query: var q = (from p in database.Posts join pt in database.PostsTags on p.PostID equals pt.PostID join t in database.Tags on pt.TagID equals t.TagID group...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.