## Coefficient of variation function in PostgreSQL

Today, I learnt how to create an aggregate function in PostgreSQL function that wraps the functionality provided by other PostgreSQL aggregate functions. In this attempt, I created a cv() function that calculates the coefficient of variation. Effectively, cv(x) is equivalent to stddev(x) / avg(x).

### Table to be used for verifying the function

susam@nifty:~/math$createdb statistics susam@nifty:~/math$ cat perf.sql
CREATE TABLE performance
(
name VARCHAR,
duration DOUBLE PRECISION
);

INSERT INTO performance VALUES ('RAND', 101.0);
INSERT INTO performance VALUES ('ZERO', 157.0);
INSERT INTO performance VALUES ('NONE', 209.0);
INSERT INTO performance VALUES ('TEST', 176.0);
INSERT INTO performance VALUES ('UNIT', 197.0);
INSERT INTO performance VALUES ('LOAD', 193.0);
INSERT INTO performance VALUES ('FREE', 198.0);
susam@nifty:~/math$psql statistics psql (8.4.3) Type "help" for help. statistics=# \i perf.sql DROP TABLE CREATE TABLE INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 statistics=# select * from performance; name | duration ------+---------- RAND | 101 ZERO | 157 NONE | 209 TEST | 176 UNIT | 197 LOAD | 193 FREE | 198 (7 rows) statistics=#  ### Extracting details required to create the function statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval statistics-# FROM pg_aggregate statistics-# WHERE aggfnoid='stddev(double precision)'::regprocedure; aggtransfn | aggfinalfn | aggtranstype | agginitval --------------+--------------------+--------------------+------------ float8_accum | float8_stddev_samp | double precision[] | {0,0,0} (1 row) statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval statistics-# FROM pg_aggregate statistics-# WHERE aggfnoid='avg(double precision)'::regprocedure; aggtransfn | aggfinalfn | aggtranstype | agginitval --------------+------------+--------------------+------------ float8_accum | float8_avg | double precision[] | {0,0,0} (1 row) statistics=#  ### The function susam@nifty:~/math$ cat cv.sql
CREATE OR REPLACE FUNCTION finalcv(double precision[])
RETURNS double precision
AS $$SELECT float8_stddev_samp(1) / float8_avg(1);$$ LANGUAGE SQL;

CREATE AGGREGATE cv(double precision)
(
sfunc = float8_accum,
stype = double precision[],
finalfunc = finalcv,
initcond = '{0, 0, 0}'
);

susam@nifty:~/math$ ### Usage and verification: susam@nifty:~/math$ psql statistics
psql (8.4.3)
Type "help" for help.

statistics=# select stddev(duration), avg(duration) from performance;
stddev      |       avg
------------------+------------------
37.1682147873178 | 175.857142857143
(1 row)

statistics=# select stddev(duration) / avg(duration) as cv from performance;
cv
-------------------
0.211354592616754
(1 row)

statistics=# \i cv.sql
CREATE FUNCTION
CREATE AGGREGATE
statistics=# select cv(duration) from performance;
cv
-------------------
0.211354592616754
(1 row)

statistics=#


### Bessel's correction

I checked whether Bessel's correction was used in the stddev() function of PostgreSQL. Yes, it was used.

susam@nifty:~\$ octave -q
octave:1> std([101, 157, 209, 176, 197, 193, 198], 0)
ans =  37.168
octave:2> std([101, 157, 209, 176, 197, 193, 198], 1)
ans =  34.411
octave:3>


The std() function in MATLAB and GNU Octave applies Bessel's correction when invoked with the second argument as 0.

What exactly is float8_accum doing?
float8_accum is the state transition function which maintains an internal state which changes for every new float8 input value. I don't know how float8_accum is defined internally but I believe the internal state would be comprised of at least the sum of all the input values encountered so far and the number of them.