3 Approaches to Storing Application Parameters and Metadata
In this article, explore three approaches to storing application parameters and metadata.
Join the DZone community and get the full member experience.
Join For FreeI recently developed a metadata-driven ingestion engine that has various patterns. For example, ingesting a table from MySQL and storing it to S3 is a pattern. The application functionality is determined by the metadata and parameters used while executing a pattern. All the functionality for the pattern is present as part of the pattern source code, and the actual path of execution is determined by the metadata and parameters passed during execution.
To take advantage of such a metadata/parameter-based execution, we have to store the metadata either in a database or a configuration file or an input file.
To store the metadata in a database, we have a couple of options.
You may also be interested in: Creating a Spring Boot Configuration Server Using a Database
Approach 1
The first one is to create one column for each attribute of the pattern in the metadata table. The problem with this approach is that of the columns itself. As we wish to have one table support multiple patterns, we will need to define attributes that are not common to all patterns. Thus, we will need to define a union of all columns for all the patterns. In this approach, we will have many columns with null values.
id | pattern-type | host | port | username | password | sid | database | table | s3accesskey | s3secretkey | s3path | azureaccountname | azuresecretkey | blobpath
1 | msql | 127.0.0.1 | 3306 | admin | admin | | payroll | employee | | | | | |
2 | oracle | 127.0.0.1 | 1521 | admin | admin | payroll | | employee | | | | | |
3 | s3 | 127.0.0.1 | | | | | | | 123d34546g | sxw3567dfs5 | /payroll/employee.csv | | |
4 | azureblob | 127.0.0.1 | | | | | | | | | | 123swerwerwc3454 | sdffd324343ds2 | /payroll/employee.csv
Approach 2
Let's suppose that we store the parameters and the values for each parameter by using name-value pairs in a table. In this arrangement, the table will have three or four columns, with one column to store the parameter name and the second column to store the value of the parameter. The other two columns are needed to distinguish parameters that have the same name for different patterns.
id | pid | name | value
1 | 1 | pattern | msql
2 | 1 | host | 127.0.0.1
3 | 1 | port | 3306
4 | 1 | username | admin
5 | 1 | password | admin
6 | 1 | database | payroll
7 | 1 | table | employee
8 | 2 | pattern | msql
9 | 2 | host | 127.0.0.1
10 | 2 | port | 1521
11 | 2 | username | admin
12 | 2 | password | admin
13 | 2 | sid | payroll
14 | 2 | table | employee
Approach 3
The third approach is to define one column to store the parameters as a single string. With this approach, we face the issue of parsing. The column is a single long string in the database. So we have to read it as a string and then parse it.
Even in this method, there are multiple possibilities. In the first option, we can store all the parameter values by using a comma as a separator — essentially using CSV format. But with this option, we have the problem of null values. Additionally, when we compare parameters across various rows of data, unless we have the order of the parameters as a handy reference, we will find it very difficult to remember the order of the parameters.
id,pattern-type,host,port,username,password,sid,database,table,s3accesskey,s3secretkey,s3path,azureaccountname,azuresecretkey,blobpath
1,msql,127.0.0.1,3306,admin,admin,,payroll,employee,,,,,,
2,oracle,127.0.0.1,1521,admin,admin,payroll,,employee,,,,,,
3,s3,127.0.0.1,,,,,,,123d34546g,sxw3567dfs5,/payroll/employee.csv,,,
4,azureblob,127.0.0.1,,,,,,,drt557da35,1e335dsf45r,,123swerwerwc3454,sdffd324343ds2,/payroll/employee.csv
As the second option, let's suppose we use name-value pairs instead. In this option, we have to parse the parameters to extract the parameter name and its value as a single token/string and then parse that token/string again to extract the parameter name and its value.
id,pattern-type,host,port,username,password,sid,database,table,s3accesskey,s3secretkey,s3path,azureaccountname,azuresecretkey,blobpath
id:1,pattern-type:msql,host:127.0.0.1,port:3306,username:admin,password:admin,,database:payroll,table:employee,,,,,,
id:2,pattern-type:oracle,host:127.0.0.1,port:1521,username:admin,password:admin,sid:payroll,,table:employee,,,,,,
id:3,pattern-type:s3,host:127.0.0.1,,,,,,,s3accesskey:123d34546g,s3secretkey:sxw3567dfs5,s3path:/payroll/employee.csv,,,
id:4,pattern-type:azureblob,host:127.0.0.1,,,,,,,,,,azureaccountname:123swerwerwc3454,azuresecretkey:sdffd324343ds2,blobpath:/payroll/employee.csv
A similar but more structured approach is to format the parameter string as an XML string — but the tag names end up taking more space than the actual data.
<parameters><id>1</id><pattern-type>msql</pattern-type><host>127.0.0.1</host><port>3306</port><username>admin</username><password>admin</password><database>payroll</database><table>employee</table></prameters>
<parameters><id>3</id><pattern-type>s3</pattern-type><host>127.0.0.1</host><s3accesskey>123d34546g</s3accesskey><s3secretkey>sxw3567dfs5</s3secretkey><s3path>/payroll/employee.csv</s3path></parameters>
Given its popularity and flexibility, I decided to store the parameters as a JSON string. This arrangement worked best for me due to the fact that I was to process the parameters using the Scala programming language, which has good support for JSON parsing.
{ "id": 1, "pattern-type": "msql, "host": "127.0.0.1", "port": 3306, "username": "admin", "password": "admin", "database": "payroll", "table": "employee" }
{ "id": 2, "pattern-type": "oracle, "host": "127.0.0.1", "port": 1521, "username": "admin", "password": "admin", "sid": "payroll", "table": "employee" }
{ "id": 3, "pattern-type": "s3", "host": "127.0.0.1", "s3accesskey": "123d34546g", "s3secretkey": "sxw3567dfs5", "s3path": "/payroll/employee.csv" }
{ "id": 4, "pattern-type": "azureblob", "host": "127.0.0.1", "azureaccountname": "123swerwerwc3454", "azuresecretkey" : "sdffd324343ds2", "blobpath": "/payroll/employee.csv" }
By using the JSON format, I was able to overcome many problems — having to define multiple columns and corresponding null values, missing name-value pairs, missing values, etc.
Further Reading
Opinions expressed by DZone contributors are their own.
Comments