eDB360 new features (March 2017)

Carlos Sierra's Tools and Tips

As many of you know, eDB360 is a free tool that provides a 360-degree view of an Oracle database without any installation. A new version is available like once per month, but occasionally a large number of enhancements are implemented at once. This new release v1708 (March 25, 2017) includes several new features requested recently by some users of the tool, thus the need to blog about what is new:

  1. Reducing the scope of eDB360 is now possible without having to generate a custom configuration file. Prior to this version, if a user wanted to generate output for let’s say AWR reports only (section 7a), the tool needed a custom.sql file with line DEF edb360_sections = ‘7a’;. Then we would pass to edb360.sql as 2nd execution parameter the name of this custom configuration file (too cumbersome!). Starting on v1708, we can directly pass to edb360.sql the section that we desire…

View original post 740 more words

Sizing post #5: How to map our existing processing power to new hardware.

In our last blog post “Sizing post #4: How much pain are we willing to tolerate?” we talked about how much pain we can tolerate so we can decide on which percentile we want to size our database workload. It is not a size fits all because it depends on the database workload profile, some databases would be fine with 90th percentile, some others would need 97th, 98th or even 99th percentile.

Once we have decided the number of CPUs required for our database workload we know its footprint and we know more of how the resources are being used. Using this information we can move our database to another hardware and be confident that the target hardware is going to be able to support our database workload.

When we move from a system to another system with different processor or even same processor but different vendor the processing power may not be the same. When we move to a different vendor but the same processor the difference may not be relevant and we could map 1 to 1, meaning, if we know our CPU required is 32, then a target system from different vendor but same processor most likely will be fine with also 32 CPUs, however, if we move to a system with different processor, we most likely need a way to map the number of CPUs required to the new hardware as the processing power may differ greatly.

There could be that the processing power of the new server is more powerful than the existing server. In this case we would need to map the processing power. For example if the new server is twice as powerful then we map 2 to 1. In other words if we had a server where my CPU required is 32, then in the new server I would need 16 CPUs.

On the other hand if the new server has half the processing power of the existing server we map 1 to 2 requiring now 64 CPUs for my imaginary example database workload.

Question is: How to map processing power between existing hardware and new hardware?

Answer: One way to map the existing processing power to a new server processing power is using the RESULT value from SPEC.

“The Standard Performance Evaluation Corporation (SPEC) is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. SPEC develops benchmark suites and also reviews and publishes submitted results from our member organizations and other benchmark licensees.”

Reference: https://www.spec.org

SPEC has several benchmarks to test the servers. We are using the results from the benchmark called SPEC CPU2006 CINT2006.

“Designed to provide performance measurements that can be used to compare compute-intensive workloads on different computer systems, SPEC CPU2006 contains two benchmark suites: CINT2006 for measuring and comparing compute-intensive integer performance, and CFP2006 for measuring and comparing compute-intensive floating point performance.”

Lets use an example to explain the mapping.

For our example we are using a BL460c Gen9 server as our existing hardware and a DL360 Gen9 as our new hardware.

Source (existing) server: BL460c Gen9
Target (new)      server: DL360  Gen9

SYSTEM                                                       CORES THREADS       CPUs RESULT       RP_CORE      RP_CPU
------------------------------------------------------------ ----- --------- -------- -------- ----------- -----------
ProLiant DL360 Gen9 (2.40 GHz, Intel Xeon E5-2640 v3)        16    2               32 749             46.8        23.4
ProLiant BL460c Gen9 (2.30 GHz, Intel Xeon E5-2699 v3)       18    2               36 678             37.7        18.8

Assume we identified we required 24 CPUs for our database workload when running our database on a BL460c Gen9 server with the Intel Xeon E5-2699 v3 processor, which has a “result per cpu thread” of 18.8.

How many CPU threads are needed in the new DL360 Gen9 with Intel Xeon E5-2640 v3 to cover for those 24 CPUs?

Because the “result per cpu thread” is higher than the one in the existing hardware we can start thinking that we would need less CPUs in the new hardware.

Lets start by answering how much more powerful is the new hardware compared to the existing hardware?

We can answer that by dividing the RP_CPU of the new hardware by the RP_CPU of the existing hardware

  23.4 / 18.8 = 1.24

The new hardware is 1.24 times more powerful than the existing hardware.

In order to get the ratio we divide the RP_CPU of the old hardware by the RP_CPU of the new hardware

  18.8 / 23.4 = 0.8034

Then we multiply the number of required CPUs (24) times the ratio

  24 * (18.8 / 23.4) = 19.28

There is our answer. We need 19.28 CPUs in the new DL 360 server to cover for the 24 CPUs in the existing BL460 server.

Obviously we can’t have fractions of a CPU so we just say 20 CPUs.

24 CPUs in a BL460c    ~    20 CPUs in a DL360

When doing this type of conversion make sure you have the correct precessor because the servers BL460c and DL360 could be configured with several choices and each would make a difference in the “result per cpu thread” of the server.

The value RESULT which is the number that comes out of the benchmark from SPEC is a comparative value among servers. We can divide that to get “result per cpu thread” and be able to come up with an equivalent number of CPUs from one server to another.

Remember that we have been working with required CPUs which is the number of CPUs we found out from looking at and analyzing the standardized ASH/AWR data and that value is representing the database workload, however, we still need to account for headroom for unexpected spikes and potential growth. For example if we required 20 CPUs in our target server DL360 and we want a 30% headroom for unexpected spikes and growth, this means our 20 CPUs represent 70%, in order to get a total we calculate like this

 Total num CPUs =  ( 20 * 100) / 70 = 28.5 ~ 29 CPUs

For our example we would need 29 CPUs

When doing a sizing for the cloud we are going to need the specifications of the server that is in the cloud. If we know our required number of CPUs for our database workload but we don’t know the server model and processor of the hardware used in the cloud it would not be possible to use the SPEC data to map between processing power from source to target servers.

Lets assume that the cloud is using a DL360 Gen9 with Intel Xeon E5-2640 v3 processor. In this case we would look for a service that provides at least the 29 CPUs that we need.

Stay tuned for more on sizing…

Author: Jorge Barba

Sizing post #4: How much pain are we willing to tolerate?

We have gone through 3 posts already and have learned how to standardize ASH data for sizing, reviewed some basic statistics like mean, median, maximum and minimum and the use of percentiles to calculate the CPU requirement for a single instance database.

In this post we want to show a way to figure out what is the appropriate percentile that we can use to size our database workload.

We have been using a single instance Oracle Database for our examples and we are going to continue to use single instance database workload to keep things simple for now. The example we are going to use is a different database workload with a lot less active sessions to make it even simpler to understand.

Table: CPU_STANDARD_EX2 (stands for example 2)

Lets grab some statistics on this database workload.

SELECT MIN(sample_time) START_TIME, MAX(sample_time) END_TIME, ROUND(MAX(sample_time) - MIN(sample_time)) Retention
FROM cpu_standard_ex2;

START_TIM END_TIME    RETENTION
--------- --------- -----------
04-AUG-16 19-AUG-16          15

We had our retention period set to 15 days thus we have data collected for 15 days starting Aug 4th and ending Aug 19th.

SELECT MIN(cpu_data) MINIMUM, MAX(cpu_data) MAXIMUM
FROM   cpu_standard_ex2;

    MINIMUM     MAXIMUM
----------- -----------
          0           9

This database has a minimum CPU utilization of 0 and a maximum of 9 active sessions.

SELECT AVG(cpu_data) AVERAGE, MEDIAN(cpu_data) MEDIAN
FROM   cpu_standard_ex2;

    AVERAGE      MEDIAN
----------- -----------
.3778628907           0

The average is not even 1 active session and the median is 0. The median is the 50th percentile and we can state that the most popular value in this database workload is 0.

To have a visualization we can summarize the data into hourly data points with the maximum value per hour.

SELECT   TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') THE_HOUR, MAX(cpu_data) CPU_DATA
FROM     cpu_standard_ex2
GROUP BY TRUNC(sample_time,'HH24')
ORDER BY 1;

THE_HOUR         CPU_DATA
------------- -----------
2016-08-04T06           1
2016-08-04T07           1
2016-08-04T08           1
2016-08-04T09           1
...
...
2016-08-18T21           2
2016-08-18T22           3
2016-08-18T23           2
2016-08-19T00           2
2016-08-19T01           2
2016-08-19T02           1
2016-08-19T03           1
2016-08-19T04           1

359 rows selected.

We create a table with the hourly cpu data to create a line chart in excel as explained in Sizing post #2: The not so good mean, the bad median, and the ugly minimum and maximum.

CREATE TABLE cpu_standard_ex2_hourly AS
SELECT   TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') THE_HOUR, MAX(cpu_data) CPU_DATA
FROM     cpu_standard_ex2
GROUP BY TRUNC(sample_time,'HH24')
ORDER BY 1;

Here is the chart for our example 2 database workload:
hourly_ex2
Because we are summarizing with the maximum, most of the data points will not be 0 but at least 1, however, remember that it does not mean that the whole hour was 1 active session. Lets take a look at some detail on a specific hour.

CREATE TABLE cpu_standard_ex2_aug_16 AS
SELECT   TO_CHAR(sample_time, 'YYYY-MM-DD"T"HH24:MI:SS') THE_HOUR, cpu_data
FROM     cpu_standard_ex2
WHERE    TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') = '2016-08-16T05' 
ORDER BY 1;

example2_10s
See how the majority of our data points are 0 with some at 1.

Lets get the hour where we had our maximum of 9.

SELECT * FROM
(
SELECT   TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') THE_HOUR, MAX(cpu_data) CPU_DATA
FROM     cpu_standard_ex2
GROUP BY TRUNC(sample_time,'HH24')
ORDER BY 2 DESC
)
WHERE ROWNUM <=1;

THE_HOUR         CPU_DATA
------------- -----------
2016-08-06T05           9

Generating table and chart:

CREATE TABLE cpu_standard_ex2_aug_06 AS
SELECT   TO_CHAR(sample_time, 'YYYY-MM-DD"T"HH24:MI:SS') THE_HOUR, cpu_data
FROM     cpu_standard_ex2
WHERE    TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') = '2016-08-06T05' 
ORDER BY 1;

example2_spike
There is a spike in CPU utilization up to 9 active sessions on 2016-08-06 at 05:31:20. This is a one 10 second sample that had 9 active sessions, around that time the cpu utilization was 1 and 2.

It won’t be realistic to size with 9 because the workload is usually 0, 1 or 2 and it would be too much resource to size to the maximum of 9.

If we remember from our previous post

97th percentile -> longest wait is 1.8 min.
95th percentile -> longest wait is  3  min.
90th percentile -> longest wait is  6  min.
80th percentile -> longest wait is 12  min.

If we use our 95th percentile because we are willing to tolerate a maximum of 3 min wait, the calculation will be:

SELECT * FROM
(
SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') THE_HOUR
      ,ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY cpu_data)) "95th PERCENTILE"
FROM   cpu_standard_ex2
GROUP BY TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24')
ORDER BY 2 DESC
)
WHERE ROWNUM <= 1;

THE_HOUR      95th PERCENTILE
------------- ---------------
2016-08-04T23               2

A 90th percentile would result in 1.

SELECT * FROM
(
SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') THE_HOUR
      ,ROUND(PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY cpu_data)) "90th PERCENTILE"
FROM   cpu_standard_ex2
GROUP BY TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24')
ORDER BY 2 DESC
)
WHERE ROWNUM <= 1;

THE_HOUR      90th PERCENTILE
------------- ---------------
2016-08-04T22               1

In fact using 97th percentile we also get 2 CPUs, same as with 95th percentile. If we get the results for various percentile values we can build a table like the following:

100th percentile ->          no wait         -> 9 CPUs
 99th percentile -> longest wait is  36 sec. -> 4 CPUs
 97th percentile -> longest wait is 1.8 min. -> 2 CPUs
 95th percenitle -> longest wait is  3  min. -> 2 CPUs
 90th percentile -> longest wait is  6  min. -> 1 CPU
 80th percentile -> longest wait is 12  min. -> 1 CPU

Here is where we come back to our question of what percentile to use.

It depends on how much pain (wait) we are willing to tolerate.

If we choose 4 CPUs for our system our potentially longest wait on our worst hour would be 36 seconds; if we are willing to tolerate a wait of 1.8 min in our worst hour we can choose the 2 CPUs. For this particular example we observe that going from 36 seconds to 1.8 min. will save 2 CPUs out of 4 CPUs, meaning I would have to double the amount of resources to cover for the extra coverage.

Lets say we pick the 2 CPUs and now we want to know how many times the CPU utilization went above those 2 CPUs. We can build a table to hold the number of samples above a specific number of CPUs. We need a procedure to populate such a table and a query to show the result:

CREATE TABLE samples_above
(cpu_required      NUMBER,
 end_date          DATE,
 samples_above     NUMBER
);

CREATE OR REPLACE PROCEDURE get_samples_above
(
 p_cpu_req          IN  samples_above.cpu_required%TYPE
) IS
BEGIN
   INSERT INTO samples_above
   SELECT p_cpu_req, TRUNC(sample_time, 'HH24'), COUNT(cpu_data) samples_above
   FROM cpu_standard_ex2
   WHERE cpu_data > p_cpu_req
   GROUP BY TRUNC(sample_time, 'HH24');
   DBMS_OUTPUT.PUT_LINE(' '); 
   DBMS_OUTPUT.PUT_LINE('get_samples_above: CPU req ' || p_cpu_req);
   COMMIT;
END;
/

TRUNCATE TABLE samples_above;

SET SERVEROUTPUT ON
EXEC get_samples_above(2)

SELECT   cpu_required, TO_CHAR(end_date, 'YYYY-MM-DD"T"HH24') end_date_hour, samples_above
FROM     samples_above
ORDER BY cpu_required, samples_above DESC;

CPU_REQUIRED END_DATE_HOUR SAMPLES_ABOVE
------------ ------------- -------------
           2 2016-08-16T09             6
           2 2016-08-17T22             4
           2 2016-08-16T23             3
           2 2016-08-08T22             3
           2 2016-08-07T00             3
           2 2016-08-12T22             2
           2 2016-08-18T22             2
           2 2016-08-15T00             2
           2 2016-08-13T17             1
           2 2016-08-14T00             1
           2 2016-08-10T23             1
           2 2016-08-10T22             1
           2 2016-08-14T17             1
           2 2016-08-09T22             1
           2 2016-08-07T11             1
           2 2016-08-06T06             1
           2 2016-08-06T16             1
           2 2016-08-06T05             1
           2 2016-08-06T14             1
           2 2016-08-05T22             1
           2 2016-08-12T23             1
           2 2016-08-08T01             1
           2 2016-08-08T07             1
           2 2016-08-06T09             1
           2 2016-08-06T02             1
           2 2016-08-15T22             1
           2 2016-08-07T09             1
           2 2016-08-13T18             1

28 rows selected.

How do we interpret the output?

The busiest hour is 2016-08-16T09 with 6 samples above 2. we are referring to standardized ASH/AWR samples, the ones that are 10 seconds equidistant. There are 360 samples in one hour. If we have 6 samples above 2 CPUs and we assume that those samples are continuous, our longest wait on that hour would be 60 seconds.

If the samples above are not continuous then the waits would be less time. Lets find out if our samples above were continuous or spread within the hour.

CREATE TABLE how_long
(cpu_required    NUMBER,
 end_date        DATE,
 how_long_sec    NUMBER,
 how_long_min    NUMBER,
 max_sample      NUMBER
);

CREATE OR REPLACE PROCEDURE get_how_long
(
 p_cpu_req          IN  samples_above.cpu_required%TYPE
) IS
  CURSOR c_10s_t IS
    SELECT sample_time, cpu_data 
    FROM cpu_standard_ex2
    ORDER BY sample_time; 
  v_end_date            cpu_standard_ex2.sample_time%TYPE;  -- to fetch date
  v_store_end_date      cpu_standard_ex2.sample_time%TYPE;  -- to store the final list of dates where utilization went above #CPUs
  v_on_cpu              cpu_standard_ex2.cpu_data%TYPE;     -- to fetch on_cpu
  v_count               NUMBER := 0;  -- to count how many samples the CPU utilization stayed above #CPUs
  v_max_sample          NUMBER := 0;  -- to hold the max sample in each entry of org6_how_long_t
BEGIN
  OPEN c_10s_t;
  LOOP
    FETCH c_10s_t into v_end_date, v_on_cpu;
    EXIT WHEN c_10s_t%NOTFOUND;
    IF v_on_cpu > p_cpu_req THEN
      v_store_end_date := v_end_date;
      v_count      := 1;
      v_max_sample := v_on_cpu;  
      LOOP
        FETCH c_10s_t into v_end_date, v_on_cpu;
        EXIT WHEN c_10s_t%NOTFOUND;
        IF v_on_cpu > p_cpu_req THEN
          v_count := v_count + 1;
          IF v_on_cpu > v_max_sample THEN
            v_max_sample := v_on_cpu;
          END IF;
        ELSE
          INSERT INTO how_long VALUES(p_cpu_req, v_store_end_date, v_count*10, (v_count*10)/60, v_max_sample);
          EXIT;
        END IF;
      END LOOP;
    END IF;
  END LOOP;
  CLOSE c_10s_t; 
  DBMS_OUTPUT.PUT_LINE(' ');
  DBMS_OUTPUT.PUT_LINE('get_how_long: CPU req ' || p_cpu_req);
  DBMS_OUTPUT.PUT_LINE('get_how_long: Data in mn_how_long_t ');
  COMMIT;
END;
/

TRUNCATE TABLE how_long;
EXEC get_how_long(2);

SELECT cpu_required, to_char(end_date, 'YYYY-MM-DD"T"HH24:MI:SS') "TIME", how_long_sec "SECONDS", round(how_long_min, 2) "MINUTES", max_sample "MAX_SAMPLE" 
FROM how_long
ORDER BY cpu_required, seconds DESC, max_sample DESC;

CPU_REQUIRED TIME                    SECONDS     MINUTES  MAX_SAMPLE
------------ ------------------- ----------- ----------- -----------
           2 2016-08-16T09:47:40          20         .33           5
           2 2016-08-17T22:00:30          20         .33           4
           2 2016-08-12T22:00:40          20         .33           4
           2 2016-08-06T05:31:20          10         .17           9
           2 2016-08-16T09:49:30          10         .17           5
           2 2016-08-16T09:50:30          10         .17           4
           2 2016-08-16T09:48:10          10         .17           4
           2 2016-08-16T09:51:20          10         .17           4
           2 2016-08-07T00:58:10          10         .17           3
           2 2016-08-07T09:16:50          10         .17           3
...
...

Each row in the table tells us the time the samples were above the number of CPUs and how long it took to go down again below the number of CPUs.

Our assumption was not true, the samples above 2 were not continuous, otherwise, we would have seen 60 seconds at the top but we see 20 seconds instead. At most we had 2 continuous samples.

We got the number of samples above 2 CPUs and how long it takes up above those 2 CPUs.

We can get the same information for 1 CPU:

TRUNCATE TABLE samples_above;

SET SERVEROUTPUT ON
EXEC get_samples_above(1)

SELECT   cpu_required, TO_CHAR(end_date, 'YYYY-MM-DD"T"HH24') end_date_hour, samples_above
FROM     samples_above
ORDER BY cpu_required, samples_above DESC;

CPU_REQUIRED END_DATE_HOUR SAMPLES_ABOVE
------------ ------------- -------------
           1 2016-08-09T22            35
           1 2016-08-17T23            26
           1 2016-08-12T22            25
           1 2016-08-06T06            24
           1 2016-08-05T23            23
           1 2016-08-08T22            23
           1 2016-08-08T01            22
           1 2016-08-05T01            22
...
...
           1 2016-08-12T03             1
           1 2016-08-18T06             1
           1 2016-08-14T04             1
           1 2016-08-06T02             1

154 rows selected.

The first row shows that there are 35 samples above 1 CPU. There are 360 samples in one hour, thus 35 samples represent 9.7%, very close to 10% which is 6 minutes.

Tying back to our percentiles table:

100th percentile ->          no wait         -> 9 CPUs
 99th percentile -> longest wait is  36 sec. -> 4 CPUs
 97th percentile -> longest wait is 1.8 min. -> 2 CPUs
 95th percenitle -> longest wait is  3  min. -> 2 CPUs
 90th percentile -> longest wait is  6  min. -> 1 CPU   <<<<<<  80th percentile -> longest wait is 12  min. -> 1 CPU

The 1 CPU is when we use 90th percentile, and potentially our longest wait could be 6 min.

Having the system not able to provide enough CPU resource for 6 minutes seems too long (we are not saying that a session would have to wait 6 min before it gets connected or have a response back, just that we are requesting more CPUs than CPUs available), therefore, we would not say that 1 CPU is enough to support this database workload. We can investigate if the samples above 1 are continuous.

TRUNCATE TABLE how_long;
EXEC get_how_long(1);

SELECT * FROM
(
SELECT cpu_required, to_char(end_date, 'YYYY-MM-DD"T"HH24:MI:SS') "TIME", how_long_sec "SECONDS", round(how_long_min, 2) "MINUTES", max_sample "MAX_SAMPLE" 
FROM how_long
ORDER BY cpu_required, seconds DESC, max_sample DESC
)
WHERE ROWNUM <=10;

CPU_REQUIRED TIME                    SECONDS     MINUTES  MAX_SAMPLE
------------ ------------------- ----------- ----------- -----------
           1 2016-08-09T22:00:30         180           3           3
           1 2016-08-17T22:00:10         100        1.67           4
           1 2016-08-17T23:03:00          80        1.33           2
           1 2016-08-13T06:02:00          70        1.17           2
           1 2016-08-14T06:08:50          60           1           2
           1 2016-08-17T23:33:50          60           1           2
           1 2016-08-05T01:20:40          60           1           2
           1 2016-08-12T22:00:30          50         .83           4
           1 2016-08-06T16:00:50          50         .83           3
           1 2016-08-05T01:52:00          50         .83           2

10 rows selected.

The 35 samples above are not continuous, the maximum number of continuous CPU utilization above 1 is 18 samples (180 seconds).

In real life obviously we do not have a physical server with 1 or 2 CPUs to run and Oracle Database instance but the example is to understand that the percentile used depends on how much pain (wait) we are willing to tolerate.

If we had a VMWare cluster where we have Virtual Machines we could certainly accommodate this example database workload. Because we want to leave some headroom for unexpected spikes in CPU utilization we could start assigning 3 vCPUs to this database workload or even 4 to account also for background processes for VM management. The advantage when using VMs is that the resources can be reallocated.

Les try another example.

The data for the next example has been placed in table CPU_STANDARD_EX3

Lets grab some statistics on this database workload.

SELECT MIN(sample_time) START_TIME, MAX(sample_time) END_TIME, ROUND(MAX(sample_time) - MIN(sample_time)) Retention
FROM cpu_standard_ex3;

START_TIM END_TIME    RETENTION
--------- --------- -----------
03-AUG-16 18-AUG-16          15

We had our retention period set to 15 days thus we have data collected for 15 days starting Aug 3th and ending Aug 18th.

SELECT MIN(cpu_data) MINIMUM, MAX(cpu_data) MAXIMUM
FROM cpu_standard_ex3;

    MINIMUM     MAXIMUM
----------- -----------
          0          21

This database has a minimum CPU utilization of 0 and a maximum of 21 active sessions.

SELECT AVG(cpu_data) AVERAGE, MEDIAN(cpu_data) MEDIAN
FROM cpu_standard_ex3;

    AVERAGE      MEDIAN
----------- -----------
.3324691358           0

CREATE TABLE cpu_standard_ex3_hourly
AS SELECT TO_CHAR(TRUNC(sample_time,'HH24'), 'YYYY-MM-DD"T"HH24') THE_HOUR, MAX(cpu_data) CPU_DATA
FROM cpu_standard_ex3
GROUP BY TRUNC(sample_time,'HH24')
ORDER BY 1;

Using the hourly data to create a line chart:
example3_hourly
In order to find out the 95th percentile:

SELECT * FROM
(
SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') THE_HOUR
      ,ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY cpu_data)) "95th PERCENTILE"
FROM   cpu_standard_ex3
GROUP BY TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24')
ORDER BY 2 DESC
)
WHERE ROWNUM <= 1;

THE_HOUR      95th PERCENTILE
------------- ---------------
2016-08-14T07               4

Using that query we can build the following table:

100th percentile ->          no wait        -> 21 CPUs
 99th percentile -> longest wait is  36 sec. -> 7 CPUs
 97th percentile -> longest wait is 1.8 min. -> 5 CPUs
 95th percenitle -> longest wait is  3  min. -> 4 CPUs
 90th percentile -> longest wait is  6  min. -> 2 CPUs
 80th percentile -> longest wait is 12  min. -> 2 CPUs

Lets pick the 95th percentile of 4 and get the samples above 4 CPUs.

Obtaining the samples above:

CREATE OR REPLACE PROCEDURE get_samples_above
(
 p_cpu_req          IN  samples_above.cpu_required%TYPE
) IS
BEGIN
   INSERT INTO samples_above
   SELECT p_cpu_req, TRUNC(sample_time, 'HH24'), COUNT(cpu_data) samples_above
   FROM cpu_standard_ex3
   WHERE cpu_data > p_cpu_req
   GROUP BY TRUNC(sample_time, 'HH24');
   DBMS_OUTPUT.PUT_LINE(' '); 
   DBMS_OUTPUT.PUT_LINE('get_samples_above: CPU req ' || p_cpu_req);
   COMMIT;
END;
/

TRUNCATE TABLE samples_above;

SET SERVEROUTPUT ON
EXEC get_samples_above(4)

SELECT   cpu_required, TO_CHAR(end_date, 'YYYY-MM-DD"T"HH24') end_date_hour, samples_above
FROM     samples_above
ORDER BY cpu_required, samples_above DESC;

CPU_REQUIRED END_DATE_HOUR SAMPLES_ABOVE
------------ ------------- -------------
           4 2016-08-13T07            12
           4 2016-08-06T07             8
           4 2016-08-09T13             6
           4 2016-08-07T07             5
           4 2016-08-10T07             5
           4 2016-08-08T19             5
           4 2016-08-14T07             5
           4 2016-08-16T07             5
           4 2016-08-05T07             5
           4 2016-08-17T07             5
           4 2016-08-10T05             4
           4 2016-08-11T01             4
...
...

How about 90th percentile?

TRUNCATE TABLE samples_above;

SET SERVEROUTPUT ON
EXEC get_samples_above(2)

SELECT   cpu_required, TO_CHAR(end_date, 'YYYY-MM-DD"T"HH24') end_date_hour, samples_above
FROM     samples_above
ORDER BY cpu_required, samples_above DESC;

CPU_REQUIRED END_DATE_HOUR SAMPLES_ABOVE
------------ ------------- -------------
           2 2016-08-14T07            33
           2 2016-08-06T07            31
           2 2016-08-11T01            27
           2 2016-08-07T07            27
           2 2016-08-08T19            25
           2 2016-08-12T01            25
           2 2016-08-13T07            25
           2 2016-08-05T07            24
           2 2016-08-09T01            23
           2 2016-08-08T07            22
...
...

It seems too busy, lets try 3 CPUs

TRUNCATE TABLE samples_above;

SET SERVEROUTPUT ON
EXEC get_samples_above(3)

SELECT   cpu_required, TO_CHAR(end_date, 'YYYY-MM-DD"T"HH24') end_date_hour, samples_above
FROM     samples_above
ORDER BY cpu_required, samples_above DESC;

CPU_REQUIRED END_DATE_HOUR SAMPLES_ABOVE
------------ ------------- -------------
           3 2016-08-14T07            20
           3 2016-08-05T07            17
           3 2016-08-13T07            16
           3 2016-08-06T07            15
           3 2016-08-07T07            15
           3 2016-08-08T07            14
           3 2016-08-10T07            12
           3 2016-08-04T05            11
           3 2016-08-11T01            10
           3 2016-08-12T05            10
           3 2016-08-17T07            10
...
...

The 3 CPUs correspond to the 93rd percentile.

SELECT * FROM
(
SELECT TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24') THE_HOUR
      ,ROUND(PERCENTILE_CONT(0.93) WITHIN GROUP (ORDER BY cpu_data)) "93th PERCENTILE"
FROM   cpu_standard_ex3
GROUP BY TO_CHAR(TRUNC(sample_time,'HH24'),'YYYY-MM-DD"T"HH24')
ORDER BY 2 DESC
)
WHERE ROWNUM <= 1;

THE_HOUR      93th PERCENTILE
------------- ---------------
2016-08-06T07               3

Lets see what we got here. The busiest hour has 20 samples above 3 CPUs, if all 20 samples are continuous the longest wait would be 200 seconds ~ 3 min. and 20 seconds.

In order to see the “how long” results we run the following:

CREATE OR REPLACE PROCEDURE get_how_long
(
 p_cpu_req          IN  samples_above.cpu_required%TYPE
) IS
  CURSOR c_10s_t IS
    SELECT sample_time, cpu_data 
    FROM cpu_standard_ex3
    ORDER BY sample_time; 
  v_end_date            cpu_standard_ex3.sample_time%TYPE;  -- to fetch date
  v_store_end_date      cpu_standard_ex3.sample_time%TYPE;  -- to store the final list of dates where utilization went above #CPUs
  v_on_cpu              cpu_standard_ex3.cpu_data%TYPE;     -- to fetch on_cpu
  v_count               NUMBER := 0;  -- to count how many samples the CPU utilization stayed above #CPUs
  v_max_sample          NUMBER := 0;  -- to hold the max sample in each entry of org6_how_long_t
BEGIN
  OPEN c_10s_t;
  LOOP
    FETCH c_10s_t into v_end_date, v_on_cpu;
    EXIT WHEN c_10s_t%NOTFOUND;
    IF v_on_cpu > p_cpu_req THEN
      v_store_end_date := v_end_date;
      v_count      := 1;
      v_max_sample := v_on_cpu;  
      LOOP
        FETCH c_10s_t into v_end_date, v_on_cpu;
        EXIT WHEN c_10s_t%NOTFOUND;
        IF v_on_cpu > p_cpu_req THEN
          v_count := v_count + 1;
          IF v_on_cpu > v_max_sample THEN
            v_max_sample := v_on_cpu;
          END IF;
        ELSE
          INSERT INTO how_long VALUES(p_cpu_req, v_store_end_date, v_count*10, (v_count*10)/60, v_max_sample);
          EXIT;
        END IF;
      END LOOP;
    END IF;
  END LOOP;
  CLOSE c_10s_t; 
  DBMS_OUTPUT.PUT_LINE(' ');
  DBMS_OUTPUT.PUT_LINE('get_how_long: CPU req ' || p_cpu_req);
  DBMS_OUTPUT.PUT_LINE('get_how_long: Data in mn_how_long_t ');
  COMMIT;
END;
/

TRUNCATE TABLE how_long;
EXEC get_how_long(3);

SELECT cpu_required, to_char(end_date, 'YYYY-MM-DD"T"HH24:MI:SS') "TIME", how_long_sec "SECONDS", round(how_long_min, 2) "MINUTES", max_sample "MAX_SAMPLE" 
FROM how_long
ORDER BY cpu_required, seconds DESC, max_sample DESC;

CPU_REQUIRED TIME                    SECONDS     MINUTES  MAX_SAMPLE
------------ ------------------- ----------- ----------- -----------
           3 2016-08-05T07:31:40          50         .83           7
           3 2016-08-07T07:31:40          40         .67           9
           3 2016-08-16T00:41:00          40         .67           7
           3 2016-08-11T05:46:30          40         .67           5
           3 2016-08-13T07:32:10          30          .5           7
           3 2016-08-16T03:31:50          30          .5           7
           3 2016-08-11T07:31:50          30          .5           7
           3 2016-08-10T07:31:50          30          .5           7
           3 2016-08-17T00:41:10          30          .5           6
           3 2016-08-13T07:37:20          30          .5           6
           3 2016-08-16T07:35:50          30          .5           5
           3 2016-08-14T07:44:10          30          .5           5
           3 2016-08-06T01:02:40          30          .5           5
           3 2016-08-04T05:33:10          30          .5           4
...
...
           3 2016-08-18T07:32:20          10         .17           4
           3 2016-08-18T07:35:20          10         .17           4
           3 2016-08-18T07:38:00          10         .17           4
           3 2016-08-18T07:42:50          10         .17           4

574 rows selected.

How about with 4 CPUs.

TRUNCATE TABLE how_long;
EXEC get_how_long(4);

SELECT cpu_required, to_char(end_date, 'YYYY-MM-DD"T"HH24:MI:SS') "TIME", how_long_sec "SECONDS", round(how_long_min, 2) "MINUTES", max_sample "MAX_SAMPLE" 
FROM how_long
ORDER BY cpu_required, seconds DESC, max_sample DESC;

CPU_REQUIRED TIME                    SECONDS     MINUTES  MAX_SAMPLE
------------ ------------------- ----------- ----------- -----------
           4 2016-08-13T07:32:10          30          .5           7
           4 2016-08-07T07:31:40          20         .33           9
           4 2016-08-13T07:32:50          20         .33           9
           4 2016-08-06T07:32:40          20         .33           9
           4 2016-08-10T05:46:40          20         .33           7
           4 2016-08-17T00:41:10          20         .33           6
           4 2016-08-08T07:31:40          20         .33           6
           4 2016-08-12T03:44:50          20         .33           6
           4 2016-08-13T07:37:30          20         .33           6
           4 2016-08-16T07:36:00          20         .33           5
           4 2016-08-14T05:46:50          20         .33           5
           4 2016-08-10T07:37:20          20         .33           5
           4 2016-08-13T05:33:00          20         .33           5
...
...
           4 2016-08-11T05:46:50          10         .17           5
           4 2016-08-11T07:31:50          10         .17           5
           4 2016-08-12T00:41:30          10         .17           5
           4 2016-08-12T05:32:10          10         .17           5

306 rows selected.

The 93rd percentile would be a good option, because the longest wait is not more than 50 seconds, however, there are many times when the system goes up above 3 CPUs (574 times), even with 4 CPUs the system goes up above 4 CPUs 306 times.

We could use the 4 CPUs as the number of CPUs for this database. If we wanted this system to have headroom of 50% we would add 4 more CPUs for a total of 8 CPUs. If we wanted to have this as the starting value of a VM for this database workload we could start with 5 (4 + 1 for VM overhead) and monitor to see if more are needed.

As a side note:
I am using this space to recommend the use of Oracle Resource Manager to control the spikes in CPU utilization.

“High database loads destabilize the server. Many operating system vulnerabilities and bugs are exposed at excessive CPU loads. In addition, excessive CPU loads can starve critical applications and monitoring tools running on the server.”

“One key to a healthy database is maintaining a healthy CPU load. Excessive CPU load can destabilize the server and expose operating system bugs. Excessive CPU load can also prevent critical Oracle background processes from running in a timely manner, resulting in failures such as database instance evictions on a RAC database. At such times, your response time may be so poor that you may be unable to debug the source of the excessive CPU load, for example by identifying and fixing a faulty SQL plan.
Using Oracle Database Resource Manager, you can ensure that your database’s CPU load is always healthy, thus avoiding all of these problems. Resource Manager manages your CPU load so that all CPUs are fully utilized yet there is no thrashing since no (or very few) Oracle processes are waiting in the operating system run queue. Using a database feature called Instance Caging, you can configure Resource Manager to manage the CPU load to even lower levels.”

Reference: Using Oracle Database Resource Manager

In this blog post we have learned how to choose a percentile to decide how many CPUs are required for our database workload and review how many times the CPU utilization goes above our number of CPUs and for how long that utilization stays above our specified number of CPUs.

With this information we are able to decide how many CPUs we want to allocate for our database.

It is important to size correctly because if we undersusbscribe we risk the system to come to a halt and if we oversubscribe we are spending more money than needed on hardware and licenses.

If you are migrating to a new system and need to know the number of CPUs in the target system which is different from the actual system you need to translate the processing power of the actual server to the new server. If you want to learn how to do that stay tuned for the next blog post: “Sizing post #5: How to map our existing processing power to new hardware.”

Author: Jorge Barba