Today, I learnt how to create an aggregate function in PostgreSQL function that wraps the functionality provided by other PostgreSQL aggregate functions. In this attempt, I created a cv() function that calculates the coefficient of variation. Effectively, cv(x) is equivalent to stddev(x) / avg(x).

Table to be used for verifying the function

susam@nifty:~/math$ createdb statistics
susam@nifty:~/math$ cat perf.sql
CREATE TABLE performance
(
    name VARCHAR,
    duration DOUBLE PRECISION
);

INSERT INTO performance VALUES ('RAND', 101.0);
INSERT INTO performance VALUES ('ZERO', 157.0);
INSERT INTO performance VALUES ('NONE', 209.0);
INSERT INTO performance VALUES ('TEST', 176.0);
INSERT INTO performance VALUES ('UNIT', 197.0);
INSERT INTO performance VALUES ('LOAD', 193.0);
INSERT INTO performance VALUES ('FREE', 198.0);
susam@nifty:~/math$ psql statistics
psql (8.4.3)
Type "help" for help.
statistics=# \i perf.sql
DROP TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
statistics=# select * from performance;
 name | duration
------+----------
 RAND |      101
 ZERO |      157
 NONE |      209
 TEST |      176
 UNIT |      197
 LOAD |      193
 FREE |      198
(7 rows)

statistics=#

Extracting details required to create the function

statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval
statistics-# FROM pg_aggregate
statistics-# WHERE aggfnoid='stddev(double precision)'::regprocedure;
  aggtransfn  |     aggfinalfn     |    aggtranstype    | agginitval
--------------+--------------------+--------------------+------------
 float8_accum | float8_stddev_samp | double precision[] | {0,0,0}
(1 row)

statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval
statistics-# FROM pg_aggregate
statistics-# WHERE aggfnoid='avg(double precision)'::regprocedure;
  aggtransfn  | aggfinalfn |    aggtranstype    | agginitval
--------------+------------+--------------------+------------
 float8_accum | float8_avg | double precision[] | {0,0,0}
(1 row)

statistics=#

The function

susam@nifty:~/math$ cat cv.sql
CREATE OR REPLACE FUNCTION finalcv(double precision[])
RETURNS double precision
AS $$
    SELECT float8_stddev_samp($1) / float8_avg($1);
$$ LANGUAGE SQL;

CREATE AGGREGATE cv(double precision)
(
    sfunc = float8_accum,
    stype = double precision[],
    finalfunc = finalcv,
    initcond = '{0, 0, 0}'
);

susam@nifty:~/math$

Usage and verification:

susam@nifty:~/math$ psql statistics
psql (8.4.3)
Type "help" for help.

statistics=# select stddev(duration), avg(duration) from performance;
      stddev      |       avg
------------------+------------------
 37.1682147873178 | 175.857142857143
(1 row)

statistics=# select stddev(duration) / avg(duration) as cv from performance;
        cv
-------------------
 0.211354592616754
(1 row)

statistics=# \i cv.sql
CREATE FUNCTION
CREATE AGGREGATE
statistics=# select cv(duration) from performance;
        cv
-------------------
 0.211354592616754
(1 row)

statistics=#

Bessel's correction

I checked whether Bessel's correction was used in the stddev() function of PostgreSQL. Yes, it was used.

susam@nifty:~$ octave -q
octave:1> std([101, 157, 209, 176, 197, 193, 198], 0)
ans =  37.168
octave:2> std([101, 157, 209, 176, 197, 193, 198], 1)
ans =  34.411
octave:3>

The std() function in MATLAB and GNU Octave applies Bessel's correction when invoked with the second argument as 0.

2 comments

Philipp Rautenberg said:

What exactly is float8_accum doing?

Susam Pal said:

float8_accum is the state transition function which maintains an internal state which changes for every new float8 input value. I don't know how float8_accum is defined internally but I believe the internal state would be comprised of at least the sum of all the input values encountered so far and the number of them.

Post a comment

RSS