Coefficient of Variation Function in PostgreSQL
By Susam Pal on 13 May 2010
Today I learnt how to create an aggregate function in PostgreSQL
function that wraps the functionality provided by other aggregate
functions in PosgreSQL. In this experiment, I created
a cv() function that calculates the coefficient of
variation. The function cv(x) is equivalent
to stddev(x) / avg(x) where x
represents the list of data points.
Example Data Table
$ cat perf.sql
CREATE TABLE performance
(
name VARCHAR,
duration DOUBLE PRECISION
);
INSERT INTO performance VALUES ('RAND', 101.0);
INSERT INTO performance VALUES ('ZERO', 157.0);
INSERT INTO performance VALUES ('NONE', 209.0);
INSERT INTO performance VALUES ('TEST', 176.0);
INSERT INTO performance VALUES ('UNIT', 197.0);
INSERT INTO performance VALUES ('LOAD', 193.0);
INSERT INTO performance VALUES ('FREE', 198.0);
$ psql statistics
psql (8.4.3)
Type "help" for help.
statistics=# \i perf.sql
DROP TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
statistics=# select * from performance;
name | duration
------+----------
RAND | 101
ZERO | 157
NONE | 209
TEST | 176
UNIT | 197
LOAD | 193
FREE | 198
(7 rows)
statistics=#
Useful Details to Create Our Function
statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval
statistics-# FROM pg_aggregate
statistics-# WHERE aggfnoid='stddev(double precision)'::regprocedure;
aggtransfn | aggfinalfn | aggtranstype | agginitval
--------------+--------------------+--------------------+------------
float8_accum | float8_stddev_samp | double precision[] | {0,0,0}
(1 row)
statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval
statistics-# FROM pg_aggregate
statistics-# WHERE aggfnoid='avg(double precision)'::regprocedure;
aggtransfn | aggfinalfn | aggtranstype | agginitval
--------------+------------+--------------------+------------
float8_accum | float8_avg | double precision[] | {0,0,0}
(1 row)
statistics=#
Function Definition
$ cat cv.sql
CREATE OR REPLACE FUNCTION finalcv(double precision[])
RETURNS double precision
AS $$
SELECT float8_stddev_samp($1) / float8_avg($1);
$$ LANGUAGE SQL;
CREATE AGGREGATE cv(double precision)
(
sfunc = float8_accum,
stype = double precision[],
finalfunc = finalcv,
initcond = '{0, 0, 0}'
);
Usage
$ psql statistics
psql (8.4.3)
Type "help" for help.
statistics=# select stddev(duration), avg(duration) from performance;
stddev | avg
------------------+------------------
37.1682147873178 | 175.857142857143
(1 row)
statistics=# select stddev(duration) / avg(duration) as cv from performance;
cv
-------------------
0.211354592616754
(1 row)
statistics=# \i cv.sql
CREATE FUNCTION
CREATE AGGREGATE
statistics=# select cv(duration) from performance;
cv
-------------------
0.211354592616754
(1 row)
statistics=#
Bessel's Correction
Checked whether
Bessel's
correction was used in the stddev() function of
PostgreSQL. Yes, it was used.
$ octave -q octave:1> std([101, 157, 209, 176, 197, 193, 198], 0) ans = 37.168 octave:2> std([101, 157, 209, 176, 197, 193, 198], 1) ans = 34.411 octave:3>
The std() function in MATLAB and GNU Octave applies
Bessel's correction when invoked with the second argument
as 0.