SQL/MX Data Mining Guide
Preparing the Data
HP NonStop SQL/MX Data Mining Guide—523737-001
2-10
Rankings
1000000 1998-12-01 5134.22
1000000 1999-01-01 4572.19
... ... ...
--- 186 row(s) selected.
In this query, the ROWS SINCE INCLUSIVE sequence function is used to limit the
moving average window to records for the current customer. The third argument of
MOVINGAVG is RUNNINGCOUNT(*), which ensures MOVINGAVG does not include
rows before the beginning row.
In practice, similar queries can be used to compute several metrics at the same time,
and the results, which conceptually are new columns in the Account History table, can
be realized in an auxiliary table. This auxiliary table can then be referenced when
computing the mining view.
By using sequence functions, you eliminate the dependency on the number and
location of moving averages computed for each customer. Even if customers have
different numbers of history records, sequence functions allow the computation of a
metric for each customer.
Rankings
Simple rankings provide good predictors for many modeling tasks. An example is the
rank of a customer’s average account balance relative to all other customers. This
query computes the absolute rank of the average account balance for each customer:
SELECT cid, RUNNINGCOUNT(*), avg_bal
FROM
(SELECT account, AVG(balance)
FROM acct_history
GROUP BY account)
AS t(cid, avg_bal)
SEQUENCE BY avg_bal DESC;
CID (EXPR) AVG_BAL
---------- -------------------- ---------------------
4300000 1 6203.33
2900000 2 5184.04
4098124 3 4920.02
2300000 4 4610.28
1000000 5 4067.44
1234567 6 2807.20
... ... ...
--- 14 row(s) selected.
In practice, the results of this type of query are realized in an auxiliary table that can be
thought of as an extension to the Customers table. Percentiles and quartiles can also
be computed easily with similar queries.
See the Sequence Fnctions entry in the SQL/MX Reference Manual for more
information.