Sunday, July 24, 2016

Oracle SQL Query Writing and Performance Tuning

The following is a list of some tips which can be used as guideline to write and review custom SQL queries. This is by no means an exhaustive list to get the best tuning results but can serve as a ready reference to avoid the common pitfalls while working with Oracle SQL:
1. Do not use the set operator UNION if the objective can be achieved through an UNION ALL. UNION incurs an extra sort operation which can be avoided.
2. Select ONLY those columns in a query which are required. Extra columns which are not actually used, incur more I/O on the database and increase network traffic.
3. Do not use the keyword DISTINCT if the objective can be achieved otherwise. DISTINCT incurs an extra sort operation and therefore slows your queries down.
4. If it is required to use a composite index, try to use the “Leading” column in the “WHERE” clause. Though Index skip scan is possible, it incurs extra cost in creating virtual indexes and may not be always possible depending on the cardinality of the leading columns.
5. There should not be any Cartesian product in the query unless there is a definite requirement to do so. I know this is a silly point but we all have done this mistake at one point Description: 🙂
6. Wherever multiple tables are used, always refer to a column by either using an alias or using the fully qualified name. Do not leave the guess work for Oracle.
7. SQL statements should be formatted consistently (e.g the keywords should be in CAPS only) to aid readability. Now, this is not a performance tip really. However, it’s important and part of the practices.
8. If possible use bind variables instead of constant/literal values in the predicate filter conditions to reduce repeated parsing of the same statement.
9. Use meaningful aliases for tables/views
10. When writing sub-queries make use of the EXISTS operator where possible as Oracle knows that once a match has been found it can stop and avoid a full table scan (it does a SEMI JOIN).
11. If the selective predicate is in the sub query, then use IN.
12. If the selective predicate is in the parent query, then use EXISTS.
13. Do not modify indexed columns with functions such as RTRIM, TO_CHAR, UPPER, TRUNC as this will prevent the optimizer from identifying the index. If possible perform the modification on the constant side of the condition. If the indexed column is usually accessed through a function (e.g NVL), consider creating a function based index.
14. Try to use an index if less than 5% of the data needs to be accessed from a data set. The exception is a small table (a few hundred rows) which is usually best accessed through a FULL table scan irrespective of the percentage of data required.
15. Use equi-joins whenever possible, they improve SQL efficiency
16. Avoid the following kinds of complex expressions:
    • NVL (col1,-999) = ….
    • TO_DATE(), TO_NUMBER(), and so on
These expressions prevent the optimizer from assigning valid cardinality or selectivity estimates and can in turn affect the overall plan and the join method
17. It is always better to write separate SQL statements for different tasks, but if you must use one SQL statement, then you can make a very complex statement slightly less complex by using the UNION ALL operator
18. Joins to complex views are not recommended, particularly joins from one complex view to another. Often this results in the entire view being instantiated, and then the query is run against the view data
19. Querying from a view requires all tables from the view to be accessed for the data to be returned. If that is not required, then do not use the view. Instead, use the base table(s), or if necessary, define a new view.
20. While querying on a partitioned table try to use the partition key in the “WHERE” clause if possible. This will ensure partition pruning.
21. Consider using the PARALLEL hint (only when additional resources can be allocated) while accessing large data sets.
22. Avoid doing an ORDER BY on a large data set especially if the response time is important.
23. Consider changing the OPTIMIZER MODE to FIRST_ROWS(n) if the response time is important. The default is ALL_ROWS which gives better throughput.
24. Use CASE statements instead of DECODE (especially where nested DECODEs are involved) because they increase the readability of the query immensely.
25. Do not use HINTS unless the performance gains clear.
26. Check if the statistics for the objects used in the query are up to date. If not, use the DBMS_STATS package to collect the same.
27. It is always good to understand the data both functionally and it’s diversity and volume in order to tune the query. Selectivity (predicate) and Cardinality (skew) factors have a big impact on query plan. Use of Statistics and Histograms can drive the query towards a better plan.
28. Read explain plan and try to make largest restriction (filter) as the driving site for the query, followed by the next largest, this will minimize the time spent on I/O and execution in subsequent phases of the plan.
29. If Query requires quick response rather than good throughput is the objective, try to avoid sorts (group by, order by, etc.). For good throughput, optimizer mode should be set to ALL ROWS.
30. Queries tend to perform worse as they age due to volume increase, structural changes in the database and application, upgrades etc. Use Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM) to better understand change in execution plan and throughput of top queries over a period of time.
31. SQL Tuning Advisor and SQL Access Advisor can be used for system advice on tuning specific SQL and their join and access paths, however, advice generated by these tools may not be always applicable (point 28).
32. SQL Access paths for joins are an component determining query execution time. Hash Joins are preferable when 2 large tables need to be joined. Nested loops make work better when a large table is joined with a small table.
Disclaimer: Points listed above are only pointers and may not work under every circumstance. This check list can be used as a reference while fixing performance problems in the Oracle Database.
Suggested further readings
  1. Materialized Views
  2. Advanced Replication
  3. Change Data Capture (Asynchronous)
  4. Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM).
  5. Partitioning strategies.
Now it’s turn if have any more tips which you have used then do add them in comment section… Your feedback is very valuable and it would be useful for other viewers too.


No comments:

Post a Comment