Coefficient of Variation Function in PostgreSQL

By Susam Pal on 13 May 2010

Today I learnt how to create an aggregate function in PostgreSQL function that wraps the functionality provided by other aggregate functions in PosgreSQL. In this experiment, I created a cv() function that calculates the coefficient of variation. The function cv(x) is equivalent to stddev(x) / avg(x) where x represents the list of data points.

Example Data Table

$ cat perf.sql
CREATE TABLE performance
(
    name VARCHAR,
    duration DOUBLE PRECISION
);

INSERT INTO performance VALUES ('RAND', 101.0);
INSERT INTO performance VALUES ('ZERO', 157.0);
INSERT INTO performance VALUES ('NONE', 209.0);
INSERT INTO performance VALUES ('TEST', 176.0);
INSERT INTO performance VALUES ('UNIT', 197.0);
INSERT INTO performance VALUES ('LOAD', 193.0);
INSERT INTO performance VALUES ('FREE', 198.0);
$ psql statistics
psql (8.4.3)
Type "help" for help.
statistics=# \i perf.sql
DROP TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
statistics=# select * from performance;
 name | duration
------+----------
 RAND |      101
 ZERO |      157
 NONE |      209
 TEST |      176
 UNIT |      197
 LOAD |      193
 FREE |      198
(7 rows)

statistics=#

Useful Details to Create Our Function

statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval
statistics-# FROM pg_aggregate
statistics-# WHERE aggfnoid='stddev(double precision)'::regprocedure;
  aggtransfn  |     aggfinalfn     |    aggtranstype    | agginitval
--------------+--------------------+--------------------+------------
 float8_accum | float8_stddev_samp | double precision[] | {0,0,0}
(1 row)

statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval
statistics-# FROM pg_aggregate
statistics-# WHERE aggfnoid='avg(double precision)'::regprocedure;
  aggtransfn  | aggfinalfn |    aggtranstype    | agginitval
--------------+------------+--------------------+------------
 float8_accum | float8_avg | double precision[] | {0,0,0}
(1 row)

statistics=#

Function Definition

$ cat cv.sql
CREATE OR REPLACE FUNCTION finalcv(double precision[])
RETURNS double precision
AS $$
    SELECT float8_stddev_samp($1) / float8_avg($1);
$$ LANGUAGE SQL;

CREATE AGGREGATE cv(double precision)
(
    sfunc = float8_accum,
    stype = double precision[],
    finalfunc = finalcv,
    initcond = '{0, 0, 0}'
);

Usage

$ psql statistics
psql (8.4.3)
Type "help" for help.

statistics=# select stddev(duration), avg(duration) from performance;
      stddev      |       avg
------------------+------------------
 37.1682147873178 | 175.857142857143
(1 row)

statistics=# select stddev(duration) / avg(duration) as cv from performance;
        cv
-------------------
 0.211354592616754
(1 row)

statistics=# \i cv.sql
CREATE FUNCTION
CREATE AGGREGATE
statistics=# select cv(duration) from performance;
        cv
-------------------
 0.211354592616754
(1 row)

statistics=#

Bessel's Correction

Checked whether Bessel's correction was used in the stddev() function of PostgreSQL. Yes, it was used.

$ octave -q
octave:1> std([101, 157, 209, 176, 197, 193, 198], 0)
ans =  37.168
octave:2> std([101, 157, 209, 176, 197, 193, 198], 1)
ans =  34.411
octave:3>

The std() function in MATLAB and GNU Octave applies Bessel's correction when invoked with the second argument as 0.

Comments | #sql | #mathematics | #technology