Using High-Speed Data Loading and Rolling Window Operations with Partitioning
In this tutorial, you learn how to use Oracle Database for high-speed data loading and leverage Oracle Partitioning for a rolling window operation.
Approximately 2 hours
This tutorial covers the following topics:
Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
Most of the time, online transaction processing (OLTP) source systems feeding a data warehouse are not directly connected to the data warehousing system for extracting new data. Commonly, the OLTP systems send data feeds in the form of external files. This data must be loaded into the data warehouse, preferably in parallel, thus leveraging the existing resources.
For example, due to the business needs and disk space constraints of the sample company used in this tutorial (MyCompany), only the data from the last three years is relevant for analysis. This means that with the insertion of new data, disk space has to be freed, either by purging the old data or by leveraging Oracle Database table compression. The maintenance of this so-called rolling window operation is performed by using Oracle Partitioning.
Before starting this tutorial, you should:
1. |
Install Oracle Database 11g. |
2. |
Create a directory named wkdir. Download and unzip etl.zip into the wkdir directory. |
To load external files into their data warehouse, MyCompany uses the Oracle Database external table feature, which allows external data such as flat files to be exposed within the database just like regular database tables. External tables can be accessed by SQL, so that external files can be queried directly and in parallel using the full power of SQL, PL/SQL, and Java. External tables are often used in the extraction, transformation, and loading (ETL) process to combine data-transformation (through SQL) with data-loading in a single step. External tables are a very powerful feature with many possible applications in ETL and other database environments where flat files are processed. External tables are an alternative to using SQL*Loader.
Parallel execution dramatically reduces response time for data-intensive operations on large databases typically used with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of OLTP and hybrid systems. Simply expressed, parallelism is the idea of breaking down a task so that instead of one process doing all of the work in a query, many processes do parts of the work at the same time. For example, parallel execution can be used when four processes handle four different quarters in a year instead of one process handling all four quarters by itself.
A very important task in the back office of a data warehouse is to keep the data synchronized with the various changes that are taking place in the OLTP (source) systems. In addition, the life span of the data from an analysis perspective is very often limited, so that older data must be purged from the target system while new data is loaded; this operation is often called a rolling window operation. Ideally, this operation should be done as fast as possible without any implication for the concurrent online access of the data warehousing system.
Before starting the tasks for this OBE, you need to implement some changes to the existing Sales History (SH) schema. You need to create additional objects in the SH schema. In addition, you need to grant additional system privileges to the SH user. The SQL file for making these changes is modifySH_11g.sql. Perform the following steps:
1. |
Open a terminal window. Change your working directory to /home/oracle/wkdir by executing the following command from your terminal session: cd wkdir (Note: This tutorial assumes you have a /home/oracle/wkdir folder. If you do not, you will need to create one and unzip the contents of etl.zip into this folder.)
|
2. |
Start a SQL*Plus session and log in as the SH user with a password of SH. Execute the modifySH_11g.sql script in your SQL*Plus session as follows: @modifySH_11g.sql The end of your output should match the image below.
|
In this section of the tutorial, you load data into the data warehouse using external tables.
To create and use external tables, perform the following steps:
1. | |
2. | Create the external table. |
3. | Select from the external table. |
4. | Provide transparent high-speed parallel access of external tables. |
5. | Review Oracle's parallel insert capabilities. |
6. |
1. Create the Necessary Directory Objects
Before you create the external table, you need to create a directory object in the database that points to the directory on the file system where the data files will reside. Optionally, you can separate the location for the log, bad and discard files from the location of the data files. To create the directory, perform the following steps:
In a SQL*Plus session logged on as the SH user, execute the create_directory.sql script or copy the following SQL statements into your SQL*Plus session: DROP DIRECTORY data_dir; DROP DIRECTORY log_dir; CREATE DIRECTORY data_dir AS '/home/oracle/wkdir'; The scripts are set up for a Linux system and assume that the files were extracted into /home/oracle/wkdir. Note that due to security reasons, symbolic links are not supported as DIRECTORY objects within the database.
|
When you create an external table, the following are defined:
1. |
The metadata information for the table representation inside the database |
2. | The HOW access parameter definition to extract the data from the external file |
After the creation of this meta information, the external data can be accessed from within the database without the necessity of an initial load.
To create the external table, perform the following steps:
In a SQL*Plus session logged in as the SH user, execute the create_external_table.sql script or copy the commands below. DROP TABLE sales_delta_XT; CREATE TABLE sales_delta_XT ( PROD_ID NUMBER, CUST_ID NUMBER, TIME_ID DATE, CHANNEL_ID CHAR(2), PROMO_ID NUMBER, QUANTITY_SOLD NUMBER(3), AMOUNT_SOLD NUMBER(10,2) ) ORGANIZATION external ( TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE log_dir:'sh_sales.bad' LOGFILE log_dir:'sh_sales.log_xt' FIELDS TERMINATED BY "|" LDRTRIM (prod_id, cust_id, time_id CHAR(11) DATE_FORMAT DATE MASK "DD-MON-YYYY", channel_id, promo_id, quantity_sold, amount_sold ) ) location ( 'salesDec01.dat' ) )REJECT LIMIT UNLIMITED NOPARALLEL You can view information about external tables through the following data dictionary views: - [USER | ALL| DBA]_EXTERNAL_TABLES - [ALL| DBA]_DIRECTORIES - [USER | ALL| DBA]_EXTERNAL_LOCATIONS
|
3. Select From the External Table
You can now access the data in the external file without any further action, as shown with the following SQL command:
In a SQL*Plus session logged in as the SH user, execute the following queries or the select_et.sql file: SELECT COUNT(*) FROM sales_delta_xt; SELECT MAX(time_id) FROM sales_delta_xt; If you copied the files correctly, the maximum TIME_ID is the last day of December, 2001.
|
4. Provide Transparent High-Speed Parallel Access of External Tables
Unlike SQL*Loader, the access of external tables can be done in parallelindependent of the number of external files. SQL*Loader can operate only on a per-file basis. This means that you have to split large source files manually if you want to parallelize. With external tables, the degree of parallelism is controlled in exactly the same way as it is for a normal table. In this case, you define the external table NOPARALLEL by default. The following section shows you how to control the degree of parallelism on a statement level by using a hint.
1. |
The parallel_select_from_ET.sql script contains the SQL statements for the next three steps. In a SQL*Plus session logged on as the SH user, execute the following query or the parallel_select_from_ET.sql file to see the current parallel session statistics: SELECT * FROM v$pq_sesstat WHERE statistic in ('Queries Parallelized', 'Allocation Height');
|
2. |
Execute the same query you used before to access the external table with a parallel degree of 4, controlled with a hint. You can use the command below or the parallel_select_from_ET_2.sql script. SELECT /*+ parallel(a,4) */ COUNT(*) FROM sales_delta_XT a; You are selecting from the external table in parallel, although the external table points only to one input source file. Alternatively, you could change the PARALLEL property of the external table with an ALTER TABLE command as follows: ALTER TABLE sales_delta_XT PARALLEL 4;
|
3. |
View the session statistics again to see the differences. Execute the command below or the parallel_select_from_ET.sql script. Note that the parallel session statistics have changed. The display shows that the last query was parallelized and it also shows the degree of parallelism. SELECT * FROM v$pq_sesstat WHERE statistic in ('Queries Parallelized', 'Allocation Height');
|
5. Review Oracle's Parallel Insert Capabilities
Oracle Database provides unlimited parallel direct path INSERT capabilities within each partition. The execution plan can be used to determine whether or not the INSERT will be done in parallel. Alternatively, you can check the execution plan of an operation in the SQL cache without the necessity of an EXPLAIN PLAN command at all.
Examine the following serial plan. Because none of the objects are defined in parallel, you automatically have serial execution unless you either change the default degree of parallelism of one of the objects or use a hint.
1. |
To show the execution plan for SERIAL INSERT behavior, execute show_serial_exec_plan.sql or copy the following SQL statements into your SQL*Plus session: EXPLAIN PLAN FOR INSERT /*+ APPEND */ INTO sales ( PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD ) SELECT PROD_ID, CUST_ID, TIME_ID, case CHANNEL_ID when 'S' then 3 when 'T' then 9 when 'C' then 5 when 'I' then 4 when 'P' then 2 else 99 end, PROMO_ID, sum(QUANTITY_SOLD), sum(AMOUNT_SOLD) FROM SALES_DELTA_XT GROUP BY 1, prod_id,time_id,cust_id,channel_id,promo_id; set linesize 140 set pagesize 40 SELECT * FROM TABLE(dbms_xplan.display); |
2. |
To show the PARALLEL INSERT execution plan, execute the commands below or the show_parallel_exec_plan.sql script logged in as the SH user. A parallel DML command must always be the first statement of a transaction. Furthermore, a DML operation cannot execute when there are primary key and foreign key constraints. Therefore, you have to disable the constraints prior to the parallel DML operation: ALTER TABLE sales DISABLE CONSTRAINT sales_product_fk; COMMIT; ALTER SESSION ENABLE PARALLEL DML; EXPLAIN PLAN FOR INSERT /*+ APPEND PARALLEL(SALES,4) */ INTO sales ( PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD ) SELECT /*+ parallel (sales_delta_XT,4) */ PROD_ID, CUST_ID, TIME_ID, case CHANNEL_ID when 'S' then 3 when 'T' then 9 when 'C' then 5 when 'I' then 4 when 'P' then 2 else 99 end, PROMO_ID, sum(QUANTITY_SOLD), sum(AMOUNT_SOLD) FROM SALES_DELTA_XT GROUP BY 1, prod_id,time_id,cust_id,channel_id,promo_id; set linesize 140 set pagesize 40 SELECT * FROM TABLE(dbms_xplan.display);
|
In this step of the tutorial you execute the parallel INSERT discussed previously. Note that you not only SELECT the data from the external table but also perform an aggregate as part of the SELECT, prior to the insertion. You are combining a transformation with the actual loading process. This operation cannot be accomplished with the SQL*Loader utility only.
1. |
Execute the following SQL statements or the parallel_insert_file.sql file to perform a parallel INSERT. Set timing ON.
ALTER SESSION ENABLE PARALLEL DML;
Record the execution time of this statement and compare it to the total amount of time you need with SQL*Loader and a subsequent insertion. Note that you do not see the full benefit of parallelizing the external table access and combining the transformation with the loading, because you are accessing a very small amount of data in parallel on a single CPU machine using one disk.
|
2. |
Perform a ROLLBACK to return the data to its previous state. (In the next example, you insert the same data by using SQL*Loader.) ROLLBACK;
|
3. |
After issuing a ROLLBACK, you need to reenable the constraints. Execute the commands below or the enable_cons.sql script.
ALTER TABLE sales
MODIFY CONSTRAINT sales_product_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_customer_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_time_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_channel_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_promo_fk ENABLE NOVALIDATE;
|
The external table method you previously performed is the preferred method of data loading and transformation. To demonstrate the benefit of using external tables, you can compare the tasks required to load and transform the data by using SQL*Loader.
To load and transform data by using SQL*Loader, perform the following steps:
1. | |
2. | |
3. | Load the staging table into the target database. |
4. | Drop the staging table. |
You need a staging table to load the data into so that you can transform it within the database in a second step.
In a SQL*Plus session connected as the SH user, execute the commands below or the create_stage.sql script to create a staging table:
CREATE TABLE sales_dec01 AS |
Note: The scripts are set up for a Linux system and assume that the files were extracted into the /home/oracle/wkdir directory.
Load the data file from the sales_dec01.ctl file into the staging table by performing the following steps:
1. |
Execute the following command from the OS command line: cd /home/oracle/wkdir sqlldr sh/sh control=sales_dec01.ctl direct=true Note: You may need to specify your database alias when connecting using SQL*Loader. Start SQL*Loader with the statement: sqlldr sh/sh@<database alias> control=sales_dec01.ctl direct=true |
2. |
Note that you cannot parallelize this task. Check the SQL*Loader sales_dec01.log log file and record the execution time for the loading process. You can check the sales_dec01.log file using any editor. The file is located in the /home/oracle/wkdir directory. Space is consumed in the database for making the data accessible from within the database, unlike an external table. The space consumed by the staging table is linearly dependent on the amount of data to be loaded for further transformation. Also note that it is not possible to parallelize the loading with SQL*Loader without having several external files. You can use the SKIP option for several SQL*Loader processes, accessing the same file. However, this enforces every SQL*Loader process to scan the whole external file. This is detrimental to the overall performance. Information about space usage of an object can be accessed through the following data dictionary views:[USER | ALL| DBA]_SEGMENTS [USER | ALL| DBA]_EXTENTS |
After loading the external data and making it accessible in the database, you can perform your transformation.
Log in to SQL*Plus as the SH user. Execute the SQL statements to transform and insert the external data (which is already staged in the database) into the SALES fact table by executing the commands below or the load_stage_table.sql script. set timing on |
You can now drop or truncate the staging table to free its consumed space.
Use the command below or the drop_sales_dec01.sql script to drop the staging table information. DROP TABLE sales_dec01; |
Using external tables for this simple loading and transformation process enables you to combine loading and transformation, which simplifies and speeds up the process. Furthermore, staging of the data in the database is not necessary with external tables. The larger the volume of external data, the more you save in staging space and processing time by using external tables instead of SQL*Loader.
After successfully loading the December data into the Q4 partition of your SALES fact table, this partition will encounter none to minimal further DML operations. This makes the table partition an optimal candidate for storage using Oracle's table compression functionality. Data stored in relational databases keeps growing as businesses require more information. A big portion of the cost of keeping large amounts of data is in the cost of disk systems, and the resources utilized in managing that data. Oracle Database enables a unique way to deal with this cost by compressing data stored in relational tables with virtually no negative impact on query time against that data, which results in substantial cost savings.
Commercially available relational database systems have not heavily utilized compression techniques on data stored in relational tables. One reason is that the trade-off between time and space for compression is not always attractive for relational databases. A typical compression technique may offer space savings, but only at a cost of much increased query time against the data. Furthermore, many of the standard techniques do not even guarantee that data size does not increase after compression.
Oracle Database offers a unique compression technique that is very useful for large data warehouses. It is unique in many ways. Its reduction of disk space can be significantly higher than standard compression algorithms because it is optimized for relational data. It has virtually no negative impact on the performance of queries against compressed data; in fact, it may have a significant positive impact on queries accessing large amounts of data, as well as on data management operations such as backup and recovery. It ensures that compressed data is never larger than uncompressed data.
To measure the benefits of table compression, ensure that the most recent partition does not have compression enabled for it. In addition, determine how large it is.
1. | Execute the part_before_compression.sql script or copy the following SQL statements into your SQL*Plus session: COLUMN partition_name FORMAT a50 |
2. |
Now compress the partition and transparently maintain all existing indexes. All local and global indexes will be maintained as part of this SQL statement. The functionality of online index maintenance for partition maintenance operations is discussed later in this tutorial. Note that compressing a partition is not an in-place compression. You create a new compressed segment and remove the old uncompressed segment at the end of the operation. In a SQL*Plus session logged in as the SH user, execute the compress_salesQ4_2001.sql script or the following SQL statement: ALTER TABLE sales MOVE PARTITION sales_q4_2001 COMPRESS UPDATE INDEXES;
|
3. | Identify how much space the new compressed partition is allocated and compare it to the size of the uncompressed partition by executing the following commands or the part_after_compression.sql script. SELECT partition_name, compression FROM user_tab_partitions Typically the compression ratio with real-world data will be higher than the one experienced with the Sales History schema. The data in the SALES fact table is artificially generated and does not show the typical "natural sorting" that you find in any data warehouse environment where the data was cleansed, consolidated, or even aggregated prior to its INSERT. |
Many data warehouses maintain a rolling window of data. For example, the data warehouse stores the data from the most recent 12 months of sales. Just as a new partition can be added to the SALES table, an old partition can be quickly (and independently) removed from the SALES table. Partitioning provides the ideal framework for those operations. The two benefits (reduced resources utilization and minimal end-user impact) are just as pertinent to removing a partition as they are to adding a partition.
To perform the rolling window operation:
1. | |
2. | |
3. | Delete old data from the fact table |
To perform the rolling window operation, you need to create and load a stand-alone table with the new data by performing the following steps. Note that you will use the external table you defined previously; however, you will point the external table to a different external file.
1.1 | |
1.2 | |
1.3 | Load this table. |
1.4 | |
1.5 |
In this section, you use the external table you already defined. However, this time you use a different external file, sales_Q1_data. So, you have to modify the location attribute of the external table to point to the new data file.
1. |
First, check the number of rows in the current external table by executing the select_et.sql script file or the following SQL statements: SELECT COUNT(*) FROM sales_delta_xt; SELECT MAX(time_id) FROM sales_delta_xt;
The file, containing all sales transactions for December 2001, will show the value for the last day of December, 2001. You see that the number of rows and the MAX(time_id) will be different after changing the external file at the OS level.
|
2. |
Change the LOCATION attribute. Execute the command below or the alter_loc_attrib.sql script to change the LOCATION attribute: ALTER TABLE sales_delta_xt location ( 'salesQ1.dat' ); To check the new data, execute the commands below or the select_et.sql script to check the new data: SELECT COUNT(*) FROM sales_delta_xt; SELECT MAX(time_id) FROM sales_delta_xt; The number of rows as well as the maximum TIME_ID have changed. If your external table file is correct, the maximum TIME_ID is the last day of March, 2002. |
In this step, you create an empty table for the new sales Q1 data. This table will be added to the already existing partitioned SALES table later.
Execute the commands below or the create_stage_table.sql script to create the table: DROP TABLE sales_delta;
|
To load this table, you perform the following steps:
1. |
In a SQL*Plus session logged on as the SH user, execute the commands below or the load_stage_table2.sql script to load the table: INSERT /*+ APPEND */ INTO sales_delta
|
2. |
After loading the SALES_DELTA table, gather statistics for this newly created table. In a SQL*Plus session logged in as the SH user, execute the commands below or the gather_stat_stage_table.sql script to gather statistics for the table. exec dbms_stats.gather_table_stats('SH','sales_delta',estimate_percent=>20); |
Because you are going to exchange this stand-alone table with an empty partition of the SALES table at a later point in time, you have to build exactly the same index structure as the existing SALES table to keep the local index structures of this particular table in a usable state after the exchange.
1. |
Before creating any bitmap indexes, you need to modify the newly created table into a compressed table without actually compressing any data. This operation is necessary to create bitmap indexes that are valid to be exchanged into the partitioned table that already contains compressed partitions. Execute the commands below or the alter_sales_delta.sql script to modify the table. ALTER TABLE sales_delta COMPRESS; ALTER TABLE sales_delta NOCOMPRESS; |
2. |
In a SQL*Plus session logged on as the SH user, execute the commands below or the create_static_bitmap_index.sql script to create the bitmapped indexes on the SALES_DELTA table. CREATE BITMAP INDEX sales_prod_local_bix ON sales_delta (prod_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_cust_local_bix ON sales_delta (cust_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_time_local_bix ON sales_delta (time_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_channel_local_bix ON sales_delta (channel_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_promo_local_bix ON sales_delta (promo_id) NOLOGGING COMPUTE STATISTICS ; Note that the statistics for those indexes will be created as part of the index creation.
|
In a SQL*Plus session logged on as the SH user, execute the commands below or the create_constraints.sql script to modify the constraints on the SALES table: ALTER TABLE channels MODIFY CONSTRAINT CHANNELS_PK RELY;
|
The next task in performing a rolling window operation is to add the newly loaded and indexed data to the fact table. To do this, you perform the following steps:
You need to create a new, empty partition. You can either create the new partition with a distinct upper boundary or use the keyword MAXVALUE. The latter option ensures that records violating the potential upper boundary condition are not rejected and the INSERT operation succeeds.
In this business scenario, you issue a SPLIT PARTITION after the loading operation to identify any potential violations. All records violating the upper boundary will be "separated" into an extra partition.
You need to create a new, empty partition. To do this, perform the following step:
In a SQL*Plus session logged on as the SH user, execute the commands below or the create_partition_for_sales_etl.sql script to add a partition to the SALES table. COLUMN partition_name FORMAT a20 select partition_name, high_value from user_tab_partitions where table_name='SALES' order by partition_position; ALTER TABLE sales ADD PARTITION sales_q1_2002 VALUES LESS THAN (MAXVALUE); SELECT COUNT(*) FROM sales PARTITION (sales_q1_2002); |
You are now ready to add the newly loaded and indexed data to the real SALES fact table by performing a PARTITION EXCHANGE command. Note that this is only a DDL command, which does not touch the actual data at all. To do this, you perform the following step:
In a SQL*Plus session logged on as the SH user, execute the command below or the exchange_partition_wo_gim.sql script to the ALTER the SALES table, enabling the partition to be exchanged: ALTER TABLE sales EXCHANGE PARTITION sales_q1_2002 |
Now you can select from the newly added and exchanged partition.
Note that the more data you have to add to your partitioned fact table, the more time you are saving with this metadata-only operation.
You need to use a logical partitioning operation such as RANGE Partitioning. HASH Partitioning cannot be used for the very common Rolling Window Operation.
All indexes of the SALES table are maintained and usable.
1. |
In a SQL*Plus session logged on as the SH user, execute the following queries or the select_count.sql script. They will show you the number of rows in the exchanged partition and the stand-alone table (which is now empty). SELECT COUNT(*)
|
2. |
Note that all local indexes of the SALES table are valid. Execute the command below or the show_sales_idx_status.sql script to view the status of the indexes: SELECT ui.index_name, DECODE(uip.status,null,ui.status,uip.status) You can also use the WITHOUT VALIDATION clause as part of the PARTITION EXCHANGE command. This causes the Oracle Database server to suppress the validity checking of the table that is going to be exchanged. Otherwise, the Oracle Database server guarantees that all values of the partition key match within the partition boundaries. |
As mentioned previously, you decided to load the data into a partition with no fixed upper boundary to avoid any potential errors. To identify any potential violation, you split the most recent partition, thus creating two partitions, one with a fixed upper boundary.
Oracle Database uses an enhanced fast-split operation that detects whether or not one of the two new partitions after a SPLIT operation will be empty. If this is the case, the Oracle Database server will not create two new segments. It will create only one by using DBMS_STATS on the new empty partition, and it will use the existing segment as the new partition containing all the data.
This optimization is completely transparent. It improves the run time of a SPLIT operation, saves system resources, and does not require any index maintenance.
1. |
In a SQL*Plus session logged on as the SH user, execute the commands below or the fast_split_sales.sql script to alter the SALES table and view the indexes status. ALTER TABLE sales SPLIT PARTITION sales_q1_2002 SELECT COUNT(*) FROM sales PARTITION (sales_beyond_q1_2002); SELECT ui.index_name, DECODE(uip.status,null,ui.status,uip.status) FROM user_ind_partitions uip, user_indexes ui WHERE ui.index_name=uip.index_name(+) AND ui.table_name='SALES' GROUP BY ui.index_name, DECODE(uip.status,null,ui.status,uip.status); ALTER TABLE sales DROP PARTITION sales_beyond_q1_2002;
|
2. |
Note that all local indexes of the SALES table are still valid. Execute the show_sales_idx_status.sql script to view the status of the local indexes on the SALES table. @show_sales_idx_status.sql
|
The next task to perform in a Rolling Window Operation is to delete the old data from the fact table. You want to analyze only the most recent data of the last three years. Because you added Q1-2002, you have to delete the data of Q1-1998.
Without Range Partitioning, you have to perform a DML operation against the table. With partitioning, you can leverage the PARTITION EXCHANGE command again to remove the data from the fact table. Similar to the adding of new data, hash partitioning does not help you in this case.
Note that you are not deleting the data. Instead you are exchanging (logically replacing) the partition containing this data from the SALES fact table with an empty stand-alone table with the same logical structure. You can then archive this data or drop the exchanged partition, depending on your business needs.
You need to create an empty table in which to store the old 1998 data.
In a SQL*Plus session logged on as the SH user, execute the commands below or the create_empty_sat.sql script to create an empty table that will hold the old 1998 data: DROP TABLE sales_old_q1_1998; CREATE TABLE sales_old_q1_1998 NOLOGGING COMPRESS
|
Now create the local indexes.
In a SQL*Plus session logged on as the SH user, execute the commands below or the create_ndx.sql script to create the local indexes. CREATE BITMAP INDEX sales_prod_old_bix
|
Now create the constraints.
In a SQL*Plus session logged on as the SH schema, execute the commands below or the create_constraints_old.sql script to modify and create the constraints. ALTER TABLE channels MODIFY CONSTRAINT CHANNELS_PK RELY; |
Before you perform the exchange, view the 1998 Q1 data that will be aged out of the partition.
In a SQL*Plus session logged on as the SH user, execute the command below or the show_partition.sql script to view the data that will be aged out of the partition: SELECT COUNT(*) FROM sales PARTITION (sales_q1_1998);
|
Exchange the empty table with the existing Q1-1998 partition. To do this, perform the following step:
In a SQL*Plus session logged on as the SH user, execute the command below or the exchange_old_partition.sql script to exchange the partition: ALTER TABLE sales EXCHANGE PARTITION sales_q1_1998 WITH TABLE sales_old_q1_1998 INCLUDING INDEXES; Note that you could have used a DROP PARTITION statement instead. The SALES_OLD_Q1_1998 table now stores all the data of the first quarter of 1998. You could drop this table to remove the data entirely from the system. |
After you perform the exchange, view the data in the partition.
1. |
In a SQL*Plus session logged on as the SH user, execute the commands below or the count_sales.sql script to view the data in the partition: SELECT COUNT(*) FROM sales PARTITION (sales_q1_1998); SELECT COUNT(*) FROM sales_old_q1_1998; Unlike before the EXCHANGE command, the stand-alone table now stores thousands of rows whereas the equivalent partition of SALES is empty.
|
2. |
Local indexes are not affected by the exchange. Execute the command below or the show_sales_idx_status.sql script to view the index information. SELECT ui.index_name, DECODE(uip.status,null,ui.status,uip.status)
|
To learn about local index maintenance enhancements in Oracle Database 10g, you will split the most recent quarter partition into a monthly partition with the online local index maintenance. This is new functionality in Oracle Database 10g. You will also use the Global Index Maintenance feature (introduced in Oracle 9i).
1. |
Utilize Oracle Database 10g Enhancements for Local Index Maintenance |
2. |
1. Enhancements for Local Index Maintenance:
Beginning with Oracle Database 10g, all partition maintenance
operations can be executed without any impact on its availability. Local Index
Maintenance allows you to keep local indexes of a partitioned table up-to-date
as part of any atomic partition maintenance operation.
Oracle extended the SQL syntax for partition maintenance operations to control the physical attributes, such as index placement, for all affected local index structures.
Steps:
1.1 |
Split the most recent partition by using the default placement rules. |
1.2 |
Split a partition by using the extended SQL syntax for local index maintenance. |
1.3 | Clean up. |
Examine this scenario: After successfully loading the data for the first quarter of 2002, you recognize that due to changing business requirements the query pattern has changed. Instead of being focused mostly on quarterly analysis, many business users have started to rely on monthly reporting and analysis. To address this changed business requirement and optimize the query performance, you can leverage Oracle Partitioning and to split the most recent quarter into monthly partitions.
The online availability for local index maintenance will not be demonstrated in this example. Online availability is demonstrated for global index maintenance and works in exactly the same manner for local indexes.
You need to create a new, empty partition. You can either create the new partition with a distinct upper boundary or by choosing the keyword MAXVALUE. The latter option ensures that records violating the potential upper boundary condition are not rejected and the INSERT operation succeeds.
In this business scenario, a SPLIT PARTITION command is issued after the loading operation to identify any potential violations. All records violating the upper boundary will be "separated" into an extra partition.
Perform the following steps to create a new, empty partition:
1. |
In a SQL*Plus session logged on as the SH user, execute the following SQL statement to split off the most recent month (March 2002) from the quarterly partition, including local index maintenance. You can execute the command below or the split1_10g.sql script to accomplish this task. ALTER TABLE sales SPLIT PARTITION sales_q1_2002 AT (TO_DATE('01-MAR-2002','DD-MON-YYYY')) INTO (PARTITION sales_1_2_2002 TABLESPACE example, PARTITION sales_MAR_2002 TABLESPACE example NOCOMPRESS) UPDATE INDEXES;
|
2. |
You can see that the new index partitions are co-located with the table partitions and that the index partition naming is inherited from the partition naming. Execute the commands below or the see_split.sql script to view the partition information. COL segment_name format a25 COL partition_name format a25 COL tablespace_name format a25 SELECT segment_name, partition_name, tablespace_name FROM user_segments WHERE segment_type='INDEX PARTITION' AND segment_name IN (SELECT index_name FROM user_indexes WHERE table_name='SALES'); ...
|
Split the remainder of the former quarter partition into a January and February partition. For demonstration purposes, create one of the new partitions in the SYSAUX tablespace and name some of the indexes explicitly.
1. |
In a SQL*Plus session logged on as the SH user, execute the following SQL statement to split the remainder partition, including local index maintenance. You can execute the command below or the split2_10g.sql script. ALTER TABLE sales SPLIT PARTITION sales_1_2_2002
|
2. |
You can see that the new index partitions are colocated with the table partitions and that the index partition naming is inherited from the partition. Execute the command below or the see_split2.sql script to view the partition and segment information: SELECT segment_name, partition_name, tablespace_name
|
Perform clean up operations by moving the partition out of the SYSAUX tablespace and into the EXAMPLE tablespace. Use the standard naming conventions.
In a SQL*Plus session logged on as the SH user, execute the commands below or the cleanup_split_10g.sql script to move the partition and update the indexes. ALTER TABLE sales MOVE PARTITION sales_JAN_2002 TABLESPACE example COMPRESS ALTER INDEX sales_time_bix REBUILD PARTITION feb_02 TABLESPACE example; SELECT segment_name, partition_name, tablespace_name FROM user_segments WHERE segment_type='INDEX PARTITION' AND segment_name IN (SELECT index_name FROM user_indexes WHERE table_name='SALES') AND tablespace_name <> 'EXAMPLE';
|
Global Index Maintenance enables you to keep global indexes of a partitioned table up-to-date as part of any atomic partition maintenance operation. This keeps global indexes from being unusable and does not affect their usage when the maintenance operation takes place.
Steps:
Exchange the March data into the partitioned table in the presence of a global index. First, you have to build the necessary infrastructure:
In a SQL*Plus session logged on as the SH user, execute the commands below or the prep4_global_index.sql script to prepare for global index maintenance. CREATE TABLE sales_mar_2002_temp ALTER TABLE sales TRUNCATE PARTITION sales_MAR_2002; SELECT COUNT(*) FROM sales PARTITION (sales_MAR_2002); ALTER TABLE sales_mar_2002_temp COMPRESS; ALTER TABLE sales_mar_2002_temp NOCOMPRESS; CREATE BITMAP INDEX sales_prod_mar_2002_bix ON sales_mar_2002_temp (prod_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_cust_mar_2002_bix ON sales_mar_2002_temp (cust_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_time_mar_2002_bix ON sales_mar_2002_temp (time_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_channel_mar_2002_bix ON sales_mar_2002_temp (channel_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_promo_mar_2002_bix ON sales_mar_2002_temp (promo_id) NOLOGGING COMPUTE STATISTICS ;
|
To demonstrate the global index maintenance functionality, you first need to create a global index. To do this, you perform the following steps:
1. |
In a SQL*Plus session logged on as the SH user, execute the commands below or the create_global_index.sql script to create a concatenated unique index on the SALES table: CREATE UNIQUE INDEX sales_pk ON sales (prod_id, cust_id, promo_id, channel_id, time_id) NOLOGGING COMPUTE STATISTICS; This may take up to a minute.
|
2. |
Build a constraint leveraging this index by executing the command below or the add_sales_pk.sql script: ALTER TABLE sales ADD CONSTRAINT sales_pk PRIMARY KEY (prod_id, cust_id, promo_id, channel_id, time_id) USING INDEX; |
3. |
Note that if a constraint is defined using the global index, the same constraint must be defined for the table to be exchanged as well! Execute the command below or the add_salestemp_pk.sql script to accomplish this task. ALTER TABLE sales_mar_2002_temp |
To demonstrate the impact of a partition maintenance operation on concurrent online access, you need two sessions and therefore two windows. Please read the following section carefully before proceeding.
In Window One, perform the following steps:
In a SQL*Plus session logged on as the SH user, execute the commands below or the use_global_index.sql script to create an explain plan and view the information. EXPLAIN PLAN FOR SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500; set linesize 140 SELECT * FROM TABLE(dbms_xplan.display); After verifying the plan and that you are using the global index, execute the following statement or the run_select.sql file repeatedly. You can use the SQL*Plus functionality "r" or "/" to rerun the last executed statement. SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500;
While you are executing the query, perform the steps below in Window Two. You will see that there is no impact on concurrent query access using a global index when the partition maintenance operation takes place. The query will not fail. Recognize that the query result will change as soon as the partition exchange command succeeds. The Oracle Database server guarantees READ CONSISTENCY in this situation and provides the most efficient partition table and index maintenance operations without restricting the online usage. |
In Window Two, perform the following steps:
1. |
In a SQL*Plus session logged on as the SH user, execute the following query or the exchange_partition_w_gim.sql script. ALTER TABLE sales EXCHANGE PARTITION sales_mar_2002 Although it is a DDL command, it might take some time because the global indexes are maintained as part of the atomic PARTITION EXCHANGE command.
|
2. | You will see that all indexes are still valid after the partition maintenance operation. Execute the command below or the show_sales_idx_status.sql script. SELECT ui.index_name, DECODE(uip.status,null,ui.status,uip.status) status, count(*) num_of_part FROM user_ind_partitions uip, user_indexes ui WHERE ui.index_name=uip.index_name(+) AND ui.table_name='SALES' GROUP BY ui.index_name, DECODE(uip.status,null,ui.status,uip.status); |
3. | View the information in the exchanged partition and the stand-alone table. Execute the commands below or the count_mar_sales.sql script. SELECT COUNT(*) FROM sales PARTITION (sales_mar_2002); SELECT COUNT(*) FROM sales_mar_2002_temp; Thousands of rows were added to the partitioned table with this command and the stand-alone table is empty now. |
The new sales data of Q1-2001 is exchanged again. The global index was maintained as part of the PARTITION EXCHANGE command, without affecting online usage.
Next, investigate the behavior without global index maintenance.
To demonstrate this functionality, you will need two windows. Please read the following section carefully before proceeding.
In Window One, perform the following steps:
In a SQL*Plus session logged on as the SH user, execute the following query or the use_global_index.sql script. explain plan for SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500; set linesize 140 SELECT * FROM table(dbms_xplan.display); SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500; Perform the steps in Window Two below and then execute the query above to see the difference. It will fail as soon as the partition maintenance command is processed.
|
In Window Two, perform the following steps:
1. |
In a SQL*Plus session logged on as the SH user, execute the following query or the exchange_partition_wo_gim2.sql script. ALTER TABLE sales EXCHANGE PARTITION sales_mar_2002 WITH TABLE sales_mar_2002_temp
|
2. | The global index is now marked unusable. Execute the following command or the show_sales_idx_status.sql script to view this information. SELECT ui.index_name, DECODE(uip.status,null,ui.status,uip.status) status, count(*) num_of_part FROM user_ind_partitions uip, user_indexes ui WHERE ui.index_name=uip.index_name(+) AND ui.table_name='SALES' GROUP BY ui.index_name, DECODE(uip.status,null,ui.status,uip.status);
|
To clean up your environment, perform the following step:
In a SQL*Plus session logged on as the SH user, execute the following statements or the cleanup_mod1.sql script to clean up the OBE-specific modifications.
ALTER TABLE sales EXCHANGE PARTITION sales_q1_1998 WITH TABLE sales_old_q1_1998 INCLUDING INDEXES; ALTER TABLE sales DROP PARTITION sales_jan_2002; ALTER TABLE sales DROP PARTITION sales_feb_2002; ALTER TABLE sales DROP PARTITION sales_mar_2002; DROP TABLE sales_mar_2002_temp; DROP TABLE sales_delta; DROP TABLE sales_old_q1_1998; set serveroutput on exec dw_handsOn.cleanup_modules SELECT * FROM TABLE(dw_handsOn.verify_env)
|
In this tutorial, you learned how to:
Load data using external tables | ||
Compare usage of SQL*Loader to external tables | ||
Perform a table compression to save disk space | ||
Perform a rolling window operation using Oracle partitioning |