Sizing post #3: Understanding percentiles

Hello, here we are now on post #3 on my favorite topic: “sizing”.

So far, we have learned how to standardize data coming from ASH to use it for sizing. Sizing post #1: How to standardize ASH data for sizing.

In our previous post we talked about average (mean), median, minimum, and maximum. Sizing post #2: The not so good mean, the bad median, and the ugly minimum and maximum where we realized that those statistics are not the way to account for the number of CPUs required for our database workload because the CPU utilization can go up drastically and we may not have enough processing power. Nobody wants their system to come to a halt, right?

Here we are going to explain percentiles and see how we can use it to attempt to come up with the number of CPUs required to support our example workload.

A short and good explanation is in the article “How to Calculate Percentiles in Statistics by Deborah J. Rumsey”

Basically the percentile is … well … instead of writing here what it is, lets just try to come up with the definition ourselves with an example.

Lets start with a data set, which can be our own cpu_data column values.

2, 3, 1, 5, 10, 25, 2, 1, 9, 3, 2, 2, 2, 1, 2, 3, 5, 2, 2, 2

We have 20 values in our data set.

We order them ascending and we get this list:

1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 5, 5, 9, 10, 25

Lets decide that we want to work with 90th percentile.

What does that mean? It means that we want to find out what value in the list will have 10% of the values above it (to its right) and 90% of values below it (to its left).

If I have 20 values and that is 100% of my values then the 90% of values is 18, if I assign positional numbers to my list the 18 bucket will be the 9.

Value:    1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 5, 5, 9, 10, 25
Position: 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18  19  20

Because I need to get the value in the middle of position between 18 and 19 I get the average between position 18 and 19 which are values 9 and 10, its average is ( 9 + 10 ) / 2 = 9.5

By formula:

To get the position:
Value # = ( Percentile / 100 ) * (n + 1) = ( 90 / 100 ) * ( 21 ) = 18.9

There is no position 18.9 thus I just get the average between the position 18 and 19 ( 9 + 10 ) / 2 = 9.5
Note that the method I am using calculates the average but I could also extrapolate to be more accurate.

For this data set the values represent CPUs therefore the decimal will not apply and I would round up to 10.

With this data set we conclude that:

The 90th percentile is 9.5 rounding it up to 10
 90% of the values are below 10
 10% of the values are above 10

To conclude our little section explaining percentiles here the definition of percentile:

Percentile (in our case the 9.5): is the value below which a percentage (in our case 90%) of data falls.

This data set is too small to be practical to decide what cpu requirement is needed. When we get to calculate percentiles for a real data set like our example with 69,120 values we would have a better approximation of what we need in terms of number of CPUs for our workload.

For our convenience we use the Oracle function PERCENTILE_DISC(0.9) to get the 90th percentile.

We are running this against our data set of 69,120 data points

SELECT PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY cpu_data) "Percentile_Disc"
FROM   cpu_standard;

Percentile_Disc
---------------
              5

SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY cpu_data) "Percentile_Cont"
FROM   cpu_standard;

Percentile_Cont
---------------
              5

In this case we got the same value of 5, however, there is a difference between PERCENTILE_DISC (discrete) and PERCENTILE_CONT (continuous). The discrete function returns a value that exists in the data set and the continuous function returns an interpolated value that is not necessarily in the data set.

How do we interpret our result of 5?

The 90th percentile is 5
90% of the values are below 5
10% of the values are above 5

For the 97th percentile:

SELECT ROUND(PERCENTILE_CONT(0.97) WITHIN GROUP (ORDER BY cpu_data)) "Percentile_Cont"
FROM   cpu_standard;

Percentile_Cont
---------------
              9

The 97th percentile is 9
97% of the values are below 9
 3% of the values are above 9

I am using PERCENTILE_CONT(0.97) instead PERCENTILE_DISC(0.97) because I want to get a more accurate value even if the value does not exist in the data set, then I can round it up with the ROUND function.

The interpretation in terms of CPU utilization

We can say that for my whole data set of 69,120 data points the 97th percentile is 9 and also that 3% of the values in my data set are above 9. In this case if I were to move my database workload to a system with 9 CPUs then 3% of the time I would be using more than those 9 CPUs and there will be waits. The problem here is that we are using the whole data set and we do not know the details of the distribution of the CPU utilization, meaning that we do not know if that 3% above 9 CPUs is condensed in just a few hours or spread across the 8 days of collected data.

Why is important to know how the utilization is distributed?

We have 8 days of data which in hours is (8 * 24) = 192 hours, 3% of 192 hours is 5.76 ~ 6 hours
Imagine that the 3% above the 9 CPUs utilization is condensed in only 6 hours instead of spread across the 8 days. We would have the system to come to a halt when that first hour of CPU utilization starts using above 9 CPUs which is the total capacity of our imaginary new system.

Using the percentiles for the whole data set is not a good option to know the CPU required for our database workload, however, we still can use percentiles, can’t we? What if we get percentiles by the hour and somehow overcome the problem of not knowing the distribution of the CPU utilization?

Let’s see…

How can we get the 97th percentile for each hour?

SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') THE_HOUR
      ,ROUND(PERCENTILE_CONT(0.97) WITHIN GROUP (ORDER BY cpu_data)) "97th PERCENTILE"
FROM   cpu_standard
GROUP BY TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24')
ORDER BY 1;

THE_HOUR      97th PERCENTILE
------------- ---------------
2016-09-06T19               6
2016-09-06T20               5
2016-09-06T21               4
2016-09-06T22               5
2016-09-06T23               4
...
...
2016-09-14T06               6
2016-09-14T07               9
2016-09-14T08              10
2016-09-14T09              14
2016-09-14T10              12
2016-09-14T11               6
2016-09-14T12               6
2016-09-14T13               3
2016-09-14T14              10
2016-09-14T15               7
2016-09-14T16               2
2016-09-14T17               3
2016-09-14T18               0

192 rows selected.

Lets materialize these results in a table and create a couple of charts

CREATE TABLE cpu_standard_hour_perc
AS SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') THE_HOUR
      ,ROUND(PERCENTILE_CONT(0.97) WITHIN GROUP (ORDER BY cpu_data)) "97th PERCENTILE"
FROM   cpu_standard
GROUP BY TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24')
ORDER BY 1;

DESC cpu_standard_hour_perc

Name                  Null?    Type
--------------------- -------- ----------------------
THE_HOUR                       VARCHAR2(13)
97th PERCENTILE                NUMBER

percentile97th
This chart above shows in the horizontal line the time and it is for every 3 hours, the chart below is a close up that shows every hour in a specific section of the chart above.
percentile97th-closeup
How do we interpret these results?

We got one row per hour. We have 192 hours in our data set and we have calculated the 97th percentile for each hour.

Lets grab the first row that is showing in the output:

THE_HOUR      97th PERCENTILE
------------- ---------------
2016-09-06T19               6

During the hour from 7pm to 8pm on 06-SEP-2016:

The 97th percentile is 6
97% of the values are below 6
 3% of the values are above 6

Here is where we get into more detail on what is going on within the hour. Remember that our samples are taken every 10 seconds, and that means that in one hour (there are 3,600 seconds in one hour) we have 360 samples equidistant 10 seconds from each other. If 3% of the values are above 6 in one hour, that means I had CPU utilization above 6 during 1.8 minutes. If those were to be contiguous which would be the worst case I would have the system on wait for 1.8 minutes.

Now that I have one 97th percentile value for each hour I can calculate the maximum 97th percentile value for the whole data set grouped by the hour.

SELECT * FROM
(
SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') THE_HOUR
      ,ROUND(PERCENTILE_CONT(0.97) WITHIN GROUP (ORDER BY cpu_data)) "97th PERCENTILE"
FROM   cpu_standard
GROUP BY TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24')
ORDER BY 2 DESC
)
WHERE ROWNUM <= 1;

THE_HOUR      97th PERCENTILE
------------- ---------------
2016-09-07T10              15

In terms of the 97th percentile the result from this last query would be the worst hour or in other words the busiest hour in which:

The 97th percentile is 15
97% of the values are below 15
 3% of the values are above 15

In order to fulfill this requirement we would need 15 CPUs and we still have 1.8 minutes wait during this particular hour.

It is a good time to mention here that when we size we do not size to the 100%, in this case if we need 15 CPUs my system would need to leave extra room for unexpected spikes in utilization. In order to protect the system with a cushion of lets say 30% we would need 21.4 CPUs

because if 15 CPUs is 70% then (15 * 100) / 70 = 21.4 CPUs

In this case we have come up with the number 15 for the cpu requirement for our database workload because we don’t want waits over 1.8 minutes within the hour using the 97th percentile for each hour and getting the busiest hour.

What we are really doing here is cropping spikes in the CPU utilization. If we remember we had a maximum of 27 CPUs

SELECT * FROM 
(
SELECT the_hour, cpu_data
FROM cpu_standard_hourly
ORDER BY 2 DESC
)
WHERE rownum <=1;

THE_HOUR         CPU_DATA
------------- -----------
2016-09-14T13          27

and we are saying that we would have enough with 15 CPUs. This is because the 27 was a one time sample of 27 active sessions and the CPU utilization went down suddenly and there were no long waits. We don’t really need 27 CPUs, if the system is being asked for 27 CPUs and it does not have them it will queue the tasks and could be waits depending on how long the CPU resource is required which in this case was not longer than the 10 seconds sample interval.

We have showed that using percentiles is a good method to measure the CPU required and can help us identify the busiest hours in our database workload.

If we were willing to have longer waits we could use the 95th percentile instead of the 97th percentile. With the 95th percentile means we get a CPU value on which 95% of values fall below and 5% are above. This means that we can have waits for 5% of the hour; 5% of each hour is 3 minutes.

We can build a table to show how long the waits could be according to each percentile value:

97th percentile -> longest wait is 1.8 min.
95th percenitle -> longest wait is  3  min.
90th percentile -> longest wait is  6  min.
80th percentile -> longest wait is 12  min.

Question is: How do we pick the percentile that is appropriate for my database workload?

Answer: It depends on how much pain (waits) we are willing to tolerate…

Stay tuned for the next blog post on sizing matters: “Sizing post #4: How much pain are we willing to tolerate?”

Author: Jorge Barba

Sizing post #2: The not so good mean, the bad median, and the ugly minimum and maximum

This is the second blog entry for a series of posts related to the topic of “sizing.”

This time we are going to chart the cpu usage and calculate the average (mean), median, minimum, and maximum from the ASH data we standardized in previous post “Sizing post #1: Sizing post #1: How to standardize ASH data for sizing.”

In our previous post we ended up with a table called CPU_STANDARD with two columns SAMPLE_TIME and CPU_DATA. The SAMPLE_TIME contains 10 seconds equidistant times and the CPU_DATA column contains the number of active sessions for each sample time.

DESC cpu_standard

 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 SAMPLE_TIME					    DATE
 CPU_DATA					    NUMBER

SELECT TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24:MI:SS'), cpu_data 
FROM cpu_standard
ORDER BY 1;

SAMPLE_TI    CPU_DATA
--------- -----------
...
...
2016-09-14T16:48:50           1
2016-09-14T16:49:00           1
2016-09-14T16:49:10           4
2016-09-14T16:49:20           1
2016-09-14T16:49:30          10
2016-09-14T16:49:40           1
2016-09-14T16:49:50           3
2016-09-14T16:50:00           1
2016-09-14T16:50:10           1
...
...

69120 rows selected.

The data in table CPU_STANDARD was collected from a Workload Repository with a retention period of only 8 days.

We can figure out the amount of days by two methods.

1. Numer of rows.

There are 69,120 rows in the CPU_STANDARD table. Each row represents a 10 second period because we have one sample every 10 seconds. There are 3,600 seconds in one hour which means there are 360 samples (one sample every 10 seconds) in each hour. Divide 69,120 by 360 to get the number of hours 69,120 / 360 = 192. Divide 192 by 24 hours to get the number of days. 192 / 24 = 8. The retention period was only 8 days.

2. Min and Max sample time.

SELECT MIN(sample_time) min_st, MAX(sample_time) max_st 
FROM cpu_standard;

MIN_ST    MAX_ST
--------- ---------
06-SEP-16 14-SEP-16

Our cpu data is from September 6th to September 14th which is a period of 8 days.

I recommend to have 45 days retention period or at least 30 days to make sure we capture the activitiy that cycles every month, for example a monthly backup or the closing month accounting processes, however, for the purposes of this blog post we can work with the 8 days cpu data.

Visualization

How about we get a chart for our cpu data to visualize how much our utilization of cpu our database is consuming. How can we chart 69,120 data points? Trying to chart 69,120 points in excel may not work and most likely we will get “Application not responding” message. In fact a chart with so many data points may not be as useful as we expect.

In order to have an idea on the workload of our database we can generate a chart but instead of using every 10 second sample we can generate one data point for every 360 samples (one data point per hour).

How do we summarize 360 data points into just one? We could generate an average (mean) or get the median or get the maximium value.

Lets use the maximum value of each hour to represent each hour in our chart. The following query truncates the date column to the hour, and gets the maximum value of each hour:

SELECT TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') THE_HOUR, MAX(cpu_data)
FROM cpu_standard
GROUP BY TRUNC(sample_time,'HH24')
ORDER BY 1;

...
...
2016-09-14T12             8
2016-09-14T13            27
2016-09-14T14            15
2016-09-14T15            12
2016-09-14T16            10
2016-09-14T17             7
2016-09-14T18             1

192 rows selected.

Now we have a more decent size case for our chart. We can chart 192 data points easily in excel.

We are going to first create a table with the hourly data

CREATE TABLE cpu_standard_hourly
AS SELECT TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') THE_HOUR, MAX(cpu_data) CPU_DATA
FROM cpu_standard
GROUP BY TRUNC(sample_time,'HH24')
ORDER BY 1;

Once we have our table with only 192 data points in Sql Developer we export the data to excel format. If you haven’t exported table data in Sql Developer you will be surprised how easy it is, review documentation Database Express Edition 2 Day DBA 11.2 chapter 10 Exporting and Importing Metadata and Data

When we open the exported data in excel we have two columns of data, the column titled THE_HOUR and the CPU_DATA column. This is the data we are going to use for our chart. One very simple example on how to create a line chart is shown in this page: MS Excel 2016: How to Create a Line Chart.

From the following chart we can observe the minimum utilization at 1 and the maximum utilization at 27.

hourly

We can also get those values with queries

SELECT MIN(cpu_data) MIN_CPU, MAX(cpu_data) MAX_CPU
FROM cpu_standard_hourly;

    MIN_CPU     MAX_CPU
----------- -----------
          1          27

SELECT * FROM 
(
SELECT the_hour, cpu_data
FROM cpu_standard_hourly
ORDER BY 2
)
WHERE rownum <=1;

THE_HOUR         CPU_DATA
------------- -----------
2016-09-14T18           1

SELECT * FROM 
(
SELECT the_hour, cpu_data
FROM cpu_standard_hourly
ORDER BY 2 DESC
)
WHERE rownum <=1;

THE_HOUR         CPU_DATA
------------- -----------
2016-09-14T13          27

Remember that the 27 is the maximum utilization within that hour but not necessarily the cpu was at 27 active sessions the whole hour. It is just that the maximum sampled cpu data reached 27. Because the hourly data hides the detail of what happens within the hour we can chart data for the 10 second samples in the CPU_STANDARD table not for the whole 8 days retention period but only for smaller time periods of interest for example one hour.

Lets chart 360 points of that one hour “2016-09-14T13” that reached the maximum of 27 active sessions. The following query gets the data points

SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') sample_time, cpu_data
FROM cpu_standard
WHERE TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24') = '2016-09-14T13'
ORDER BY 1;

...
...
2016-09-14T13           1
2016-09-14T13           1
2016-09-14T13           1
2016-09-14T13           1
2016-09-14T13           1
2016-09-14T13           1
2016-09-14T13           1

360 rows selected.

Lets create a table to hold exactly that data

CREATE TABLE cpu_standard_14T13
AS SELECT sample_time, cpu_data
FROM cpu_standard
WHERE TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24') = '2016-09-14T13'
ORDER BY 1;

Exporting to excel and charting we get

cpu14t13-1

The spike of 27 active sessions happend only for one sample, the rest is 1s and 2s

Scrolling to the right we can see other periods of time

cpu14t13-2

There are some 3s, 4s, 6s, and 8s and 6s and the majority of the samples are 1s.

So far, we have created charts for data points representing the maximum sample of every hour and we also created a chart with 360 data points for one single hour to observe the details of what happens regarding cpu utilization within the hour.

Lets get some other statistics

The average (mean) and the median utilization of the database:

SELECT AVG(cpu_data)
FROM cpu_standard;

AVG(CPU_DATA)
-------------
  2.452748843

SELECT MEDIAN(cpu_data)
FROM cpu_standard;

MEDIAN(CPU_DATA)
----------------
               2

Question is: How can we use this information to know what is the number of cpus that I really need to support the load of my database if some times it uses 27 and some times it uses 1 with an average of 2.45 and a median of 2?

Using the average of 2.45 or the median of 2 cpus as the requirement for this database is not realistic because the database will go above 2 cpus and will generate a lot of waits at those specific times. Using the maximum of 27 won’t be realistic either because I am wasting resources as those 27 were reached only once meaning it was only for a very short period of time in a logon storm fashion but the utilization went down again really quick.

Well, the answer is, we don’t know yet, the not so good mean and the bad median with ugly minimum and maximum don’t give me the answer yet but at least we have some knowledge about how our database behaves and how many cpus are active at specific times. We can say we got a profile of our workload. This profile can vary from database to database in the organization depending on what applications are connecting to the database, in fact the entity that defines the workload is the application or set of applications making use of the database.

Stay tuned for the next blog post on sizing matters.

Author: Jorge Barba

 

Sizing post #1: How to standardize ASH data for sizing

This is a first blog entry for a series of posts related to the topic of “sizing.”

The standardization of ASH data for sizing consists of aggregating the number of sesions on CPU, adding the 0’s (zeroes) to this aggregated data and filling in with sample times to have 10 seconds equidistant sample times. The original idea for standardization was not mine, it was my friend Mauro Pagano that allowed me to use his idea for my blog.

When we want to know the CPU footprint of our database we can do some research using the data from AWR. The sample data stored in AWR that comes from ASH can be used to measure the CPU footprint of a database among several other measures.

The CPU data from ASH is sampled only when there is activity, meaning that periods of time without activity do not get recorded in ASH. This causes a problem in the case we want a time series with equidistant points to visualize or analyze the CPU usage of a database.

The following article shows one of several different ways to standardize the aggregation of ASH data into 10 seconds equidistant sample times. This includes adding the times where there is no activity or in other words the 0 (zero) values for cpu data.

Being AWR our source of data we need to make sure it is configured appropriately for our purposes.

We want to make sure the snapshot interval is 60 minutes and the retention period is 45 days in addition to verifying that the STATISTICS_LEVEL parameter is set to TYPICAL or ALL.

The following command shows the value for the parameter statistics_level

SQL> show parameter statistics_level

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
statistics_level                     string      TYPICAL

If the statistics_level parameter is not set to TYPICAL or ALL we can change it using the following command:

ALTER SYSTEM SET statistics_level = TYPICAL scope=both;

The following query shows the value for snapshot_interval and retention period

SELECT EXTRACT(DAY    FROM snap_interval)*24*60+
       EXTRACT(HOUR   FROM snap_interval)   *60+
       EXTRACT(MINUTE FROM snap_interval)           INTERVAL_IN_MINUTES,
       (EXTRACT(DAY    FROM retention)*24*60+
       EXTRACT(HOUR   FROM retention)*60+
       EXTRACT(MINUTE FROM retention))/(24*60)      RETENTION_IN_DAYS
FROM dba_hist_wr_control;

INTERVAL_IN_MINUTES RETENTION_IN_DAYS
------------------- -----------------
                 60                 8

If the snapshot_interval is not 60 minutes or the retention is not 45 days we can change them using the follwing command:

BEGIN
  DBMS_WORKLOAD_REPOSITORY.modify_snapshot_settings(
    retention => 64800,       -- Minutes (= 45 Days). Current value retained if NULL.
    interval  => 60);         -- Minutes. Current value retained if NULL.
END;
/

SELECT EXTRACT(DAY    FROM snap_interval)*24*60+
       EXTRACT(HOUR   FROM snap_interval)   *60+
       EXTRACT(MINUTE FROM snap_interval)           INTERVAL_IN_MINUTES,
       (EXTRACT(DAY    FROM retention)*24*60+
       EXTRACT(HOUR   FROM retention)*60+
       EXTRACT(MINUTE FROM retention))/(24*60)      RETENTION_IN_DAYS
FROM dba_hist_wr_control;

INTERVAL_IN_MINUTES RETENTION_IN_DAYS
------------------- -----------------
                 60                45

V$ACTIVE_SESSION_HISTORY displays sampled session activity in the database. It contains snapshots of active database sessions taken once a second. A database session is considered active if it was on the CPU or was waiting for an event that didn’t belong to the Idle wait class. This view contains one row for each active session per sample and returns the latest session sample rows first.

DBA_HIST_ACTIVE_SESS_HISTORY view displays the history of the contents of the in-memory active session history of recent system activity. This view contains snapshots of V$ACTIVE_SESSION_HISTORY.

The following query shows the sample time and the number of samples at that specific sample time. This interprets as 3 sessions active at the time 2016-12-20T16:00:53, 5 sessions active at the time 2016-12-20T16:11:34 and so on.

SELECT   TO_CHAR(sample_time, 'YYYY-MM-DD"T"HH24:MI:SS') SAMPLE_TIME
        ,COUNT(*)  NUMBER_OF_SAMPLES
FROM     dba_hist_active_sess_history
WHERE    session_state='ON CPU'
GROUP BY sample_time
ORDER BY 1;

SAMPLE_TIME         NUMBER_OF_SAMPLES
------------------- -----------------
...
...
2016-12-20T16:00:53           3
2016-12-20T16:11:34           5
2016-12-20T16:11:44           1

8411 rows selected.

If we wanted to extract only the rows related to the sessions ‘ON CPU’ and discard the sessions on wait events. We can create a table and insert the data from dba_hist_active_sess_history. Because the sample time is not equidistant meaning the samples can happen now, then 9 seconds later, then 11 seconds later, we are going to call the table NON_STANDARD_TIME

DROP TABLE non_standard_time;
CREATE TABLE non_standard_time
(sample_time  DATE,
 cpu_data     NUMBER);

DESC non_standard_time
Name					   Null?    Type
----------------------------------------- -------- ----------------------------
SAMPLE_TIME					    DATE
CPU_DATA					    NUMBER

INSERT INTO non_standard_time
SELECT   sample_time, COUNT(*) cpu_data
FROM     dba_hist_active_sess_history
WHERE    session_state = 'ON CPU'
GROUP BY sample_time
ORDER BY 1;

8414 rows created.

SELECT TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24:MI:SS') sample_time
      ,cpu_data 
FROM   non_standard_time
ORDER BY 1;

SAMPLE_TIME            CPU_DATA
------------------- -----------
...
...
2016-12-20T15:33:20           1
2016-12-20T15:43:41           1
2016-12-20T16:00:53           3
2016-12-20T16:11:34           5
2016-12-20T16:11:44           1
2016-12-20T16:51:08           1
2016-12-20T16:56:08           1
2016-12-20T16:56:18           1

8414 rows selected.

We need to standardize the data. The data right now is not equidistant and does not include the time where the samples are 0, meaning the ASH data does not get recorded when there is no activity. In order for us to have an equidistant sample time and include the times when there is no activity we standardize the data as follows:

The following query will show the minimum and maximum times in AWR view dba_hist_active_sess_history

COL min_time format a30
COL max_time format a30
SELECT MIN(to_char(sample_time,'YYYY-MM-DD"T"HH24:MI:SS')) min_time
      ,MAX(to_char(sample_time,'YYYY-MM-DD"T"HH24:MI:SS')) max_time
FROM   dba_hist_active_sess_history
WHERE  session_state = 'ON CPU';

MIN_TIME                       MAX_TIME
------------------------------ ------------------------------
2016-11-23T01:16:19            2016-12-20T16:56:18

This is the same information from our non_standard_time table

SELECT MIN(to_char(sample_time,'YYYY-MM-DD"T"HH24:MI:SS')) min_time
      ,MAX(to_char(sample_time,'YYYY-MM-DD"T"HH24:MI:SS')) max_time
FROM   non_standard_time;

MIN_TIME                       MAX_TIME
------------------------------ ------------------------------
2016-11-23T01:16:19            2016-12-20T16:56:18

Start Time: 2016-11-23T01:16:19 -> 2016-11-23T01:16:10

Table should have:

2016-11-23T01:16:10
2016-11-23T01:16:20
2016-11-23T01:16:30
2016-11-23T01:16:40
...
...

Until around
2016-12-20T16:56:10

Create a new table with the sample time starting at 00 seconds and 10s interval
Sample time should not have gaps in this new table meaning all 10s intervals should be there
Fill in this new table with ASH data
Fill in with 0s when there is no row for a specific 10s interval.

DROP TABLE cpu_standard;
CREATE TABLE cpu_standard
(sample_time DATE,
 cpu_data    NUMBER);

The following PL/SQL block fills the table with 0s for cpu_data column and sample_time at 10s intervals

DECLARE
 sample_10s DATE := TO_DATE('2016-11-23T01:16:10','YYYY-MM-DD"T"HH24:MI:SS');
BEGIN
LOOP
 INSERT INTO cpu_standard
 VALUES (sample_10s, 0);
 sample_10s := sample_10s + ( 1 / ( 24 * 60 * 6 ) );
 IF (sample_10s > TO_DATE('2016-12-20T16:56:10', 'YYYY-MM-DD"T"HH24:MI:SS'))
 THEN EXIT;
 END IF;
END LOOP;
END;
/

SELECT TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24:MI:SS') sample_time
 ,cpu_data
FROM cpu_standard
ORDER BY 1;

SAMPLE_TIME CPU_DATA
------------------- -----------
...
...
2016-12-20T16:55:40 0
2016-12-20T16:55:50 0
2016-12-20T16:56:00 0
2016-12-20T16:56:10 0

238921 rows selected.

The table at this point has only predefined dates with 10 seconds distance between each row and all the values are 0.

We need to bring in the data from AWR into this table

The rule we are going to follow is to truncate the last digit, for example for a sample time of 09 seconds we take that value for our 00 second

seconds 
00 -> 00
01 -> 00
02 -> 00
...
08 -> 00
09 -> 00
--------
10 -> 10
11 -> 10
12 -> 10
...
18 -> 10
19 -> 10
--------
50 -> 50
51 -> 50
...
58 -> 50
59 -> 50
--------

We are simply replacing the last digit with a 0.

Before we update the table cpu_standard lets verify that the data treatment we are doing is correct

SELECT *
FROM 
(
SELECT TO_CHAR( TRUNC( sample_time, 'mi' ) + 
      ( TRUNC( ( TO_NUMBER( TO_CHAR( sample_time, 'ss' ) ) / 10 ) ) * ( 1 / ( 24 * 60 * 6 ) ) ),
      'YYYY-MM-DD"T"HH24:MI:SS') sample_time 
FROM  non_standard_time
ORDER BY sample_time DESC
)
WHERE rownum<=4
ORDER BY 1;

SAMPLE_TIME
-------------------
2016-12-20T16:11:40
2016-12-20T16:51:00
2016-12-20T16:56:00
2016-12-20T16:56:10

Compared to the data in non_standard_time we can observe that the last digit has been converted to 0.

SAMPLE_TIME            CPU_DATA
------------------- -----------
...
...
2016-12-20T16:11:44           1
2016-12-20T16:51:08           1
2016-12-20T16:56:08           1
2016-12-20T16:56:18           1

8414 rows selected.

Updating the cpu_standard table to bring the non-zero values to its respective 10 second sample.

Up to this point we have two tables:

CPU_STANDARD

SQL> desc cpu_standard
 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 SAMPLE_TIME					                    DATE
 CPU_DATA					                        NUMBER

NON_STANDARD_TIME has values in sample time that are not equidistant

SELECT TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24:MI:SS') sample_time
      ,cpu_data 
FROM   non_standard_time
ORDER BY 1;

SAMPLE_TIME            CPU_DATA
------------------- -----------
...
...
2016-12-20T15:33:20           1
2016-12-20T15:43:41           1
2016-12-20T16:00:53           3
2016-12-20T16:11:34           5
2016-12-20T16:11:44           1
2016-12-20T16:51:08           1
2016-12-20T16:56:08           1
2016-12-20T16:56:18           1

8414 rows selected.

CPU_STANDARD table has the data with sample time equidistant 10 seconds apart on each row and all values as 0s

SELECT TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24:MI:SS') sample_time
      ,cpu_data
FROM   cpu_standard
ORDER BY 1;

SAMPLE_TIME            CPU_DATA
------------------- -----------
...
...
2016-12-20T16:55:40           0
2016-12-20T16:55:50           0
2016-12-20T16:56:00           0
2016-12-20T16:56:10           0

238921 rows selected.

The next step would be to merge the data in a way that CPU_STANDARD has the CPU_DATA from the NON_STANDARD_TIME table.

There may be several ways to do it, lets just do it explicilty step by step to understand better what is happening.

We are going to add a new column to the NON_STANDARD_TIME table to hold the standard time

ALTER TABLE non_standard_time
ADD (sample_time_std  DATE);

We update that new added column to have the standard corresponding date value

UPDATE non_standard_time
SET    sample_time_std =
 TRUNC( sample_time, 'mi' ) + 
      ( TRUNC( ( TO_NUMBER( TO_CHAR( sample_time, 'ss' ) ) / 10 ) ) * ( 1 / ( 24 * 60 * 6 ) ) );

8414 rows updated.

COMMIT;

The table NON_STANDARD_TIME has now a column with the sample time standardized.

SELECT TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24:MI:SS') sample_time
      ,cpu_data
      ,TO_CHAR(sample_time_std,'YYYY-MM-DD"T"HH24:MI:SS') sample_time_std
FROM   non_standard_time
ORDER BY 1;

SAMPLE_TIME            CPU_DATA SAMPLE_TIME_STD
------------------- ----------- -------------------
...
...
2016-12-20T16:11:44           1 2016-12-20T16:11:40
2016-12-20T16:51:08           1 2016-12-20T16:51:00
2016-12-20T16:56:08           1 2016-12-20T16:56:00
2016-12-20T16:56:18           1 2016-12-20T16:56:10

8414 rows selected.

In this case we have now the same standardized sample times in both tables NON_STANDARD_TIME and CPU_STANDARD

We can join the two tables to bring the cpu_data values into CPU_STANDARD

UPDATE cpu_standard
SET    cpu_data  = ( SELECT MAX(cpu_data) 
                     FROM non_standard_time ns
                     WHERE  ns.sample_time_std = cpu_standard.sample_time)
WHERE EXISTS (
  SELECT 'X'
  FROM non_standard_time ns
  WHERE cpu_standard.sample_time = ns.sample_time_std);

8414 rows updated.

COMMIT;

SELECT TO_CHAR(sample_time,'YYYY-MM-DD"T"HH24:MI:SS') sample_time
      ,cpu_data
FROM   cpu_standard
ORDER BY 1;

SAMPLE_TIME            CPU_DATA
------------------- -----------
...
...
2016-12-20T16:55:40           0
2016-12-20T16:55:50           0
2016-12-20T16:56:00           1
2016-12-20T16:56:10           1

238921 rows selected.

The table cpu_standard can be used to graph the cpu activity of the database and to further analyze the data for sizing purposes.

Author: Jorge Barba