SQL/MX Data Mining Guide
Preparing the Data
HP NonStop SQL/MX Data Mining Guide—523737-001
2-9
Deriving Attributes
Deriving Attributes
In the preceding Example of Aligning Data on page 2-7, the derived attributes in the
Close_Temp table are Close_Month and Cust_Left. These attributes are critical for the
task of building a model that will predict at any point in time, based on such things as
current account status, account activity, and customer demographics, whether a credit
card customer will leave three months in the future.
To produce good models, the source mining data typically needs to be supplemented
with appropriate derived attributes. Typical derived attributes include computing ratios
between key quantities, mapping postal codes to average demographics, computing
metrics, and computing rankings, percentiles, or quartiles.
Moving Metrics
Moving metrics measure a dynamic behavior in terms of rates of events or trends for a
state of condition. In the data mining environment, moving metrics are good predictors
for many modeling tasks involving historical or time series data. For example, the
moving average of an account balance produces attributes that could be included in
the mining view for each customer.
SQL/MX supports a number of sequence functions that you can use to simplify queries
and to execute queries more efficiently.
Example Using MOVINGAVG and ROWS SINCE
This query uses the sequence functions MOVINGAVG and ROWS SINCE:
SELECT account, year_month, MOVINGAVG (balance,
ROWS SINCE INCLUSIVE (account <> OFFSET (account,1)) +1,
RUNNINGCOUNT(*))
FROM acct_history
SEQUENCE BY account, year_month;
ACCOUNT YEAR_MONTH (EXPR)
---------- ---------- ---------------------
1000000 1998-07-01 3678.67
1000000 1998-08-01 5229.33
1000000 1998-09-01 4253.15
1000000 1998-10-01 5189.86
1000000 1998-11-01 5221.06
4300000 1999-06-01 N
4400000 1999-07-01 Y
4500000 1998-09-01 Y
4600000 1999-12-01 Y
4700000 1999-06-01 N
Account Number Close Month Cust_Left