Using Teradata SQL Analyzer
Learn about Teradata SQL Analyzer, a utility framework to discover processing flows in SQL scripts, find inter-table dependencies, and convert them to ANSI-SQL standards.
Join the DZone community and get the full member experience.
Join For FreeTeradata is a robust RDBMS product for enterprises to host their data at terabyte-scale. It was designed based on massively parallel processing (MPP) and shared-nothing architecture, which deals with the coordinated processing of large datasets using multiple processors. This enables a rapid rate of execution for intricate queries against large data warehouses.
Due to recent technological trends in big data and the cloud, an increasing number of companies have been rethinking how to manage their data warehouses. Many of them have moved toward cloud solutions to avoid vendor-lock and to reduce operational costs. However, during migration, a common challenge comes from rewriting existing SQL scripts written in Teradata’s proprietary version of ANSI-SQL. These scripts implemented complex business logic and typically took a long time to convert to ANSI-SQL.
Teradata SQL Analyzer is a utility framework to discover processing flows embedded in SQL scripts, find inter-table dependencies, and convert them to ANSI-SQL standards. To use the framework, the first step is to add this JAR file into your classpath and then import com.myuan.transsql.*
and com.myuan.transsql.query.BaseCommand
in your calling classes. The BaseCommand
is the superclass of all classes representing SQL statements, including CreateCommand
, SelectCommand
, UpdateCommand
, and DeleteCommand
.
The entry point of the framework is a factory class (QueryParser
), which has the getCommand(s: String)
method, which takes a String and creates one command object. After getting the command object, you can invoke its parse(s: String)
method so that the framework code is executed to build a tree structure and to populate the values of its nodes from the SQL statement.
val comm: BaseCommand = QueryParser.getCommand(“CRUD SQL statement”)
comm.parse(“CRUD SQL statement”)
Now, you can use the BaseCommand
object to show the structure and table dependencies within the SQL statement. The BaseCommand
class has three useful methods:
showDependency()
: This method returns a String object which depicts the processing flows and key table names in the SQL.showStats()
: This method returns a String in this format:CRUD action|All tables|the number of JOINs| the number of MINUSs | the number of JOINs
toSQL()
: This method returns a String that is a new SQL statement conforming to ANSI-SQL standards.
To illustrate the capability of the framework methods, one sample input is listed below:
CREATE multiset volatile TABLE table3, no logs, AS (sel id_num AS new_id, family_id, birth_dt, wake_tm, col1 FROM table1 WHERE rel_type (NOT cs) LIKE 'Matt%'
AND
birth_dt >= date - 60
AND
(
friend1,friend2
)
IN (sel friend1,
friend2 FROM ${schema2}.table4 WHERE home_type='house' GROUP BY 1,
2)
AND
upper (trim (id_num))(casespecific) = lower (trim (id_num))(casespecific ) qualify rank() OVER (partition BY id_num ORDER BY birth_dt ASC , wake_tm ASC) = 1
UNION
sel id_num AS new_id,
family_id,
birth_dt,
wake_tm,
col1
FROM table10
WHERE NAME (NOT cs)
AND birth_dt >= date - 60AND upper (trim (id_num))(casespecific) = lower (trim (id_num))(casespecific ) qualify rank() OVER (partition BY id_num ORDER BY birth_dt ASC , wake_tm ASC) = 1) WITH data PRIMARY INDEX(new_id) ONCOMMIT preserve rows
The output of the showDependency()
method is:
CREATE TABLE [TABLE3]SELECT 5 columns
FROM table1
WHERESELECT 2 columns
FROM ${SCHEMA2}.table4
UNION
SELECT 5 columns
FROM table10
The showStats()
method returns:
CREATE|TABLE3,TABLE1,${SCHEMA2}.TABLE4,TABLE10|0|0|1
The toSQL()
method generates a new SQL query:
CREATE TABLE TABLE3 AS (
SELECT * FROM (
SELECT ID_NUM AS NEW_ID, FAMILY_ID, BIRTH_DT, WAKE_TM, COL1, RANK () OVER ( PARTITION BY ID_NUM ORDER BY BIRTH_DT ASC , WAKE_TM ASC ) AS QUALIFIER_ELEMENT_131
FROM TABLE1
WHERE REL_TYPE ILIKE 'Matt%' AND BIRTH_DT >= DATEADD ( DAY, -60, CURRENT_DATE() ) AND ( FRIEND1,FRIEND2 ) IN (
SELECT FRIEND1, FRIEND2
FROM ${SCHEMA2}.TABLE4
WHERE HOME_TYPE='house'
GROUP BY 1,2
) AND UPPER ( TRIM ( ID_NUM ) ) = LOWER ( TRIM ( ID_NUM ) ) )
WHERE QUALIFIER_ELEMENT_131 = 1
UNION
SELECT * FROM (
SELECT ID_NUM AS NEW_ID, FAMILY_ID, BIRTH_DT, WAKE_TM, COL1, RANK () OVER ( PARTITION BY ID_NUM ORDER BY BIRTH_DT ASC , WAKE_TM ASC ) AS QUALIFIER_ELEMENT_1737
FROM TABLE10
WHERE NAME ILIKE 'Jason' AND BIRTH_DT >= DATEADD ( DAY, -60, CURRENT_DATE() ) AND UPPER ( TRIM ( ID_NUM ) ) = LOWER ( TRIM ( ID_NUM ) ) )
WHERE QUALIFIER_ELEMENT_1737 = 1
);
In many cases, the first step of rebuilding data warehouses in the cloud is to migrate existing tables and the data as-is. Most companies using Teradata have accumulated thousands of tables in their TD schema. A practical approach is to migrate tables in waves. But it became extremely important and difficult to decide which tables are used more and have the most impact. The framework can help in this analysis by generating usage numbers. You can collect all of the executed SQL statements in a single day from the Teradata Query Log table, loop through and parse all of the SQL, and then add the results to the com.myuan.transsql.TableStats
class. The code snippet looks like:
for (s < -SQLList) {
if (s != null) {
var c: BaseCommand = SQLParser.getCommand(s)
c.parse(s)
if (null != c.getStats())
TableStats.addTo(c.getStats().getTables())
}
}
TableStats.print()
The method prints out two perspectives: the number of times that each table is referenced in the whole list, and a sorted list based on the number of occurrences. The output from the sample SQL above looks like
${SCHEMA2}.table4 --> 1
table1 --> 2
table3 --> 1
----------------------------------------------------------------------
2--> TABLE1
1--> ${SCHEMA2}.TABLE4|TABLE3
The performance of executing the framework depends on the number of SQL statements and the input and the complexity of those statements. During testing, my input was 1.9 million SQL statements in a 500MB file. The test harnesses parsed all of the SQL statements and created four output files by calling three methods on BaseCommand
and one method on TableStates
. With an 8GB heap, the total execution was around 2.5 minutes.
Opinions expressed by DZone contributors are their own.
Comments