How to Use Self Join and WITH Clause in Oracle
The Oracle WITH clause is one of the most commonly used techniques to simplify the SQL source code and improve performance.
Join the DZone community and get the full member experience.
Join For FreeThe Oracle WITH clause is one of the most commonly used techniques to simplify the SQL source code and improve performance. In Oracle SQL, the 'WITH' clause also known as a Common Table Expression (CTE) is a powerful tool which is also used to enhance the code readability.
WITH is commonly used to define temporary named result sets, also referred to as subqueries or CTEs as defined earlier. These temporary named sets can be referenced multiple times within the main SELECT SQL query. The CTEs are like virtual tables and are very helpful in organizing and modularizing the SQL code.
Understanding the WITH Clause
Syntax
The usage of the WITH clause is very simple. Create a namespace with the AS operator followed by the SELECT query and you can add as many SELECT queries as you want followed by a comma (,). It's a good practice to use meaningful terms for namespaces in order to distinguish in the main SELECT.
In terms of internal execution of the WITH clause, Oracle will internally execute the namespaces individually and cache the results in the memory which will then be utilized by the main SELECT SQL. It mimics a materialized view with intermediate results and reduces redundant calculations. This suggests that Oracle optimizes SQL queries with CTEs by storing the results of the subqueries temporarily, allowing for faster retrieval and processing in subsequent parts of the query.
WITH
cte_name1 as (SELECT * FROM Table1),
cte_name2 as (SELECT * FROM Table2),
...
SELECT ...
FROM cte_name1, cte_name2
WHERE ...;
Use Case
In this use case, I am going to talk specifically about how you can effectively utilize inner joins alongside, using a WITH clause, which can tremendously help in performance tuning the process. Let's take a look at the dataset first and the problem statement before we delve deep into the solution.
The scenario is of an e-commerce retail chain for whom the bulk product sales price data needs to be loaded for a particular e-store location. Imagine that a product can have several price lines meant for regular prices, promotional and BOGO offer prices. In this case, the user is trying to create multiple promotional price lines and is unaware of the possible mistakes he/she could commit. Through this process, we will detect duplicate data that is functionally redundant and prevent the creation of poor data quality in the pricing system. By doing so, we will avoid the interface program failures in the Pricing repository staging layer, which acts as a bridge between the pricing computation engine and the pricing repository accessed by the e-commerce platform.
TABLE: e_promotions
Price_LINE
|
UPC_code
|
Description
|
Price
|
Start_DT
|
End_dt
|
Row_num
|
flag
|
---|---|---|---|---|---|---|---|
10001
|
049000093322
|
Coca-Cola 12 OZ
|
$6.86
|
01/01/2024
|
09/30/2024
|
1
|
0
|
10001
|
049000093322
|
Coca-Cola 12 OZ
|
$5.86
|
01/31/2024
|
03/30/2024
|
2
|
0
|
10001
|
049000028201
|
Fanta Pineapple Soda, 20 OZ
|
$2.89
|
01/01/2024
|
09/30/2024
|
3
|
0
|
10001
|
054000150296
|
Scott 1000
|
$1.19
|
01/01/2024
|
09/30/2024
|
4
|
0
|
PS: This a sample data, but in the real world, there could be thousands and millions of price lines being updated to mark down or mark up the prices on a weekly basis.
The table above captures the UPC codes and the respective items within the price line 10001. The issue with this data set is that the back office user is trying to create a duplicate line as part of the same price line through an upload process and the user does not know the duplicate data he/she may be creating.
The intent here is to catch the duplicate record and reject both entries 1 and 2 so that the user can decide which one among the two needs to go in the pricing system to be reflected on the website.
Using the code below would simplify error detection and also optimize the store proc solution for better performance.
WITH price_lines as
(SELECT rowid, price_line, UPC, start_dt, end_dt
FROM e_promotions
WHERE price_line = 10001
AND flag = 0)
SELECT MIN(a.rowid) as row_id, a.price_line, a.UPC, a.start_dt, a.end_dt
FROM price_lines a, price_lines b
WHERE a.price_line = b.price_line
AND a.flag = b.flag
AND a.UPC = b.UPC
AND a.rowid <> b.rowid
AND (a.start_dt BETWEEN b.start_dt AND b.end_dt OR
a.end_dt BETWEEN b.start_dt AND b.end_dt OR
b.start_dt BETWEEN a.start_dt AND a.end_dt OR
b.end_dt BETWEEN a.start_dt AND a.end_dt)
GROUP BY a.price_line, a.UPC, a.start_dt, a.end_dt;
With the code above we did two things in parallel:
- Queried the table once for the dataset we need to process using the WITH clause
- Added the inner join to detect duplicates without having to query the table for the 2nd time, hence optimizing the performance of the store proc
This is one of the many use cases I have used in the past that gave me significant performance gain in my PLSQL and SQL coding.
Have fun and post your comments if you have any questions!
Opinions expressed by DZone contributors are their own.
Comments