确定可能的查询计划更改
Oracle 文档、若干文章以及我阅读过的一些论坛帖子几乎都提到了有关加密列的数据后现有查询执行计划可能发生更改的概要信息以及某些具体信息。一般来说,相对于没有索引的列,在加密具有索引的列时,您必须注意执行 SQL 语句所发生的情况。当 Oracle 加密具有索引的列时,Oracle 还将加密相应的索引值。如果您花点时间考虑这个问题,就会清楚地发现以具有索引的数据为目标的相等谓词应该继续利用索引,但由于该索引值存储在索引中的方式,加密值的随机性质使得加密索引的范围扫描成本过高。列表 4 演示了这些详细描述的基本情况。
SQL> CONNECT app_001
Enter password:
Connected.
SQL> -- Create a plan table
SQL> @?/rdbms/admin/utlxplan.sql;
Table created.
SQL> -- Disable encryption of the credit card column
SQL> ALTER TABLE app_001.transactions
2 MODIFY (credit_card DECRYPT);
Table altered.
SQL> -- Ensure schema stats are current
SQL> EXEC dbms_stats.gather_schema_stats('app_001');
PL/SQL procedure successfully completed.
SQL> -- Display some representative data
SQL> COL credit_card FOR 9999999999999999;
SQL> SELECT * FROM app_001.transactions
2 WHERE rownum <5;
TRANS_ID CREDIT_CARD
---------- -----------------
389 3469681098409570
390 3441050723354352
391 3485598407754404
392 3485458104610650
SQL> -- Enable tracing and explain plan output
SQL> SET AUTOTRACE ON EXPLAIN;
SQL> -- Demonstrate an equality predicate targeting the
SQL> -- encrypted column
SQL> SELECT * FROM app_001.transactions
2 WHERE credit_card = 3485458104610650;
TRANS_ID CREDIT_CARD
---------- -----------------
392 3485458104610650
Execution Plan
----------------------------------------------------------
Plan hash value: 32329967
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TRANSACTIONS | 1 | 14 | 5 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TRANSACTIONS_NDX1 | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CREDIT_CARD"=3485458104610650)
SQL> -- Demonstrate a range predicate targeting the
SQL> -- encrypted column
SQL> SELECT * FROM app_001.transactions
2 WHERE credit_card BETWEEN 3499990000000000 AND 3499999999999999;
TRANS_ID CREDIT_CARD
---------- -----------------
4629 3499990987277941
18597 3499993250694089
13364 3499996558049599
79326 3499996616476145
60420 3499997873591732
24392 3499998608513414
97433 3499999831086288
72183 3499999977925392
8 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 32329967
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 42 | 6 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TRANSACTIONS | 3 | 42 | 6 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TRANSACTIONS_NDX1 | 3 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CREDIT_CARD">=3499990000000000 AND "CREDIT_CARD"<=3499999999999999)
SQL> -- Disable tracing and explain plan output
SQL> SET AUTOTRACE OFF;
SQL> -- Encrypt the column (and indexes)
SQL> ALTER TABLE app_001.transactions
2 MODIFY (credit_card ENCRYPT NO SALT);
Table altered.
SQL> -- Ensure schema stats are current
SQL> EXEC dbms_stats.gather_schema_stats('app_001');
PL/SQL procedure successfully completed.
SQL> -- Enable tracing and explain plan output
SQL> SET AUTOTRACE ON EXPLAIN;
SQL> -- Rerun previous queries, compare execution plans
SQL> SELECT * FROM app_001.transactions
2 WHERE credit_card = 3485458104610650;
TRANS_ID CREDIT_CARD
---------- -----------------
392 3485458104610650
Execution Plan
----------------------------------------------------------
Plan hash value: 32329967
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TRANSACTIONS | 1 | 14 | 5 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TRANSACTIONS_NDX1 | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CREDIT_CARD"=3485458104610650)
SQL> SELECT * FROM app_001.transactions
2 WHERE credit_card BETWEEN 3499990000000000 AND 3499999999999999;
TRANS_ID CREDIT_CARD
---------- -----------------
60420 3499997873591732
4629 3499990987277941
18597 3499993250694089
13364 3499996558049599
24392 3499998608513414
79326 3499996616476145
72183 3499999977925392
97433 3499999831086288
8 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1321366336
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1250 | 17500 | 914 (2)| 00:00:11 |
|* 1 | TABLE ACCESS FULL| TRANSACTIONS | 1250 | 17500 | 914 (2)| 00:00:11 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(INTERNAL_FUNCTION("CREDIT_CARD")>=3499990000000000 AND
INTERNAL_FUNCTION("CREDIT_CARD")<=3499999999999999)
SQL> -- Disable tracing and explain plan output
SQL> SET AUTOTRACE OFF;
列表 4 仔细识别引用加密列数据的 SQL 语句,然后比较加密前后这些语句的执行计划,以查找是否存在任何更改。
我还想知道加密的开销是否会改变计划成本以及优化程序的选择,即使在文档建议不要这样做的情况下。为了确保我确切知道进行生产时关键任务应用程序所发生的情况,我在沙箱环境中进行了一些额外的工作。首先,我从各种自动负载信息库 (AWR) 快照中收集了一个使用频繁的 SQL 语句(CPU、Gets 和 I/O)列表。然后,我比较了加密列前后每个 SQL 语句的查询执行计划。我的研究转向对多个基于相等的条件使用一个谓词的复杂查询,其中一个条件是以将要进行加密的列为目标。让我吃惊的是,在对列进行加密之后,该查询的执行计划发生了改变。遗憾的是,我无法在我的测试实验室中为本文复制这些结果,我仍然无法完全确定查询计划发生改变的原因。但我之所以在此处提到这种情况是想指出,在对生产系统进行更改之前,最好在测试环境中研究生产应用程序密钥查询的执行计划。如果我假设没有任何使用频繁的查询会发生改变,那么我们将对生产系统进行更改并且不得不勉强拼凑一个解决方案。
此处的教训是,在进行更改之前,您应该始终对这些事项进行测试,无论您在文档和其他来源中阅读了什么内容都是如此。
结论
使用 Oracle 的 TDE 特性加密新表与没有任何数据的表中的列或者现有表中的新列非常简单,原因是不存在任何需要担心的相关性。相反,加密现有列数据需要仔细研究并在您的沙箱环境中进行测试,然后才能在实际生产系统中实施您的计划,因为加密可能会影响很多相关性。
Steve Bobrowski自 Oracle 数据库版本 5 开始一直使用该软件,曾就职于 Oracle;他还是 The Database Domain (dbdomain.com) 的创始人以及五本 Oracle 出版社出版的书籍的作者(包括《Oracle 数据库 10g 快捷版上机操作》系列)。最近,Steve 担任着几个大型公司(如 Computer Sciences Corporation、BEA Systems 和 Salesforce.com)的 SaaS 顾问和 SaaS 首席技术官。