Validating EDI Data in Java
This article reveals how the EDI validator notifies an application about validation events, and dives into EDI standards and implementations.
Join the DZone community and get the full member experience.
Join For FreeOne of the most common requirements when dealing with EDI data is the need to validate messages. Last time, we looked at reading EDI data in Java which included a basic example of validating an X12 acknowledgment message. In that article, a sample schema (a set of validation rules) was given along with some basic Java code using the StAEDI library to set up the validator. This time, let's dig a little deeper into how the validator notifies an application about validation events and also discuss the differences between EDI standards and implementations — and how to validate both.
EDI Event Streams with Errors
While reading through an EDI message using the EDIStreamReader
in StAEDI, the various structures found in the data are reported to an application as events. When a schema has been provided and the EDI data does not match the schema in some way, the invalid segments and elements are also reported as events as they are found in the data.
Let's look at a contrived schema to understand how the validation rules are declared. In this example, the transaction structure declares that any EDI message conforming to the schema must begin with a segment SAA
, contain up to five occurrences of a loop starting with segment S11
, and finally, have a segment SZZ
. Note that the schema does not mention the message header and trailer segments ( ST
/ SE
for X12 or UNH
/ UNT
for EDIFACT). Those segments are handled separately by the parser and should not be in the message schema.
For the purposes of demonstrating the structure of the transaction's schema, all segments are composed of the same two element types. The elements themselves have length requirements that any conforming transaction must meet.
x
<schema xmlns="http://xlate.io/EDISchema/v3">
<transaction>
<sequence>
<segment type="SAA" minOccurs="1" />
<loop code="L0000" maxOccurs="5">
<sequence>
<segment type="S11" />
<segment type="S12" maxOccurs="5" />
</sequence>
</loop>
<segment type="SZZ" minOccurs="1" />
</sequence>
</transaction>
<elementType name="E001" base="string" minLength="2" />
<elementType name="E002" base="decimal" maxLength="9" />
<segmentType name="S11">
<sequence>
<element type="E001" minOccurs="1" />
<element type="E002" />
</sequence>
</segmentType>
<segmentType name="S12">
<sequence>
<element type="E001" minOccurs="1" />
</sequence>
</segmentType>
<segmentType name="SAA">
<sequence>
<element type="E001" minOccurs="1" />
</sequence>
</segmentType>
<segmentType name="SZZ">
<sequence>
<element type="E001" minOccurs="1" />
</sequence>
</segmentType>
</schema>
A sample message using this schema might look like the following. This example shows multiple occurrences of the schema's L0000
loop (starting with segment S11
). The first occurrence contains two S12
segments whereas the second contains none — S12
segments are optional (minOccurs
is 0 by default).
xxxxxxxxxx
SAA*11~
S11*X1*2.5~
S12*01~
S12*2~
S11*X2*5.25~
SZZ*99~
There is something wrong here, however. Did you notice that the second occurrence of S12
contains an element that is too short? The value 2
does not meet the schema's required minimum length of 2 characters. Now let's take a look at how a Java program would receive the events from this simple message. The following code snippet skips over the envelope segments and jumps straight to the segments from the example message.
x
EDIInputFactory factory = EDIInputFactory.newFactory();
InputStream stream = new FileInputStream("my_edi_file.txt");
EDIStreamReader reader = factory.createEDIStreamReader(stream);
EDIStreamEvent event = reader.next(); // START_INTERCHANGE
// (...) Skipping forward to the start of the transation
if (event == EDIStreamEvent.START_TRANSACTION) {
// When the START_TRANSACTION event is received the schema may be configured.
SchemaFactory schemaFactory = SchemaFactory.newFactory();
// my_edi_schema.xml contains the example schema shown earlier in this article
InputStream fileStream = new FileInputStream("my_edi_schema.xml");
Schema txSchema = schemaFactory.createSchema(fileStream);
reader.setTransactionSchema(txSchema);
}
// (...) Skip past the transaction header segment
// Segment: SAA*11~
event = reader.next(); // START_SEGMENT
reader.getText(); // "SAA"
event = reader.next(); // ELEMENT_DATA
reader.getText(); // "11"
event = reader.next(); // END_SEGMENT
// Segment: S11*X1*2.5~
event = reader.next(); // START_LOOP
reader.getReferenceCode(); // "L0000"
event = reader.next(); // START_SEGMENT
reader.getText(); // "S11"
event = reader.next(); // ELEMENT_DATA
reader.getText(); // "X1"
event = reader.next(); // ELEMENT_DATA
reader.getText(); // "2.5"
event = reader.next(); // END_SEGMENT
// Segment: S12*01~
event = reader.next(); // START_SEGMENT
reader.getText(); // "S12"
event = reader.next(); // ELEMENT_DATA
reader.getText(); // "01"
event = reader.next(); // END_SEGMENT
// Segment: S12*2~
event = reader.next(); // START_SEGMENT
reader.getText(); // "S12"
event = reader.next(); // ELEMENT_DATA_ERROR
reader.getErrorType(); // DATA_ELEMENT_TOO_SHORT *****
event = reader.next(); // ELEMENT_DATA
reader.getText(); // "2"
event = reader.next(); // END_SEGMENT
event = reader.next(); // END_LOOP
// Segment: S11*X2*5.25~
event = reader.next(); // START_LOOP
reader.getReferenceCode(); // "L0000"
event = reader.next(); // START_SEGMENT
reader.getText(); // "S11"
event = reader.next(); // ELEMENT_DATA
reader.getText(); // "X2"
event = reader.next(); // ELEMENT_DATA
reader.getText(); // "5.25"
event = reader.next(); // END_SEGMENT
event = reader.next(); // END_LOOP
// (...) Remainder of events omitted
As can be seen in this example, the EDI stream is received by an application as a series of events. The events that are only available when a schema has been configured are those dealing with loop boundaries and errors. In this case, we can see that the S11
segments initiate a START_LOOP
event and also that the data error that was noted earlier (where the segment S12*2~
contained an element shorter than the requirement) resulted in the ELEMENT_DATA_ERROR
event coupled with the DATA_ELEMENT_TOO_SHORT
error type.
Standards Versus Implementations
The schema XML above is an example of a standard schema. Standard schemas are the rules published by standards bodies such as ANSI (X12) or the UN (EDIFACT). In the example, the transaction
element is used in the XML to identify the standard message structure. Additionally all of the "type" elements are used to identify the standard segment, composite element, and simple element structure and requirements.
Most EDI exchanges go beyond the standards, however. Business partners often define how the standard must be structured for their particular industry or use case. This is when an implementation schema becomes useful. Implementations allow for the further refinement of the rules and also allows for loops and segments to carry different data. Implementations of loops and segments may include some or all of the components defined by the standard, but must always adhere to the standard.
We can now extend the schema from the earlier example. Below, only the transaction
and implementation
elements are shown along with their sub-elements. All of the types used in the standard example above are implied (elementType
and segmentType
XML elements).
x
<transaction>
<sequence>
<segment type="SAA" minOccurs="1"/>
<loop code="L0000" maxOccurs="5">
<sequence>
<segment type="S11"/>
<segment type="S12" maxOccurs="5"/>
</sequence>
</loop>
<segment type="SZZ" minOccurs="1"/>
</sequence>
</transaction>
<implementation>
<sequence>
<!-- Standard requires at least 1 SAA, implementation permits at most 1. -->
<segment type="SAA" maxOccurs="1">
<sequence>
<element position="1">
<enumeration>
<!-- Implementation requires SAA01 to be 'ZZ'. -->
<value>ZZ</value>
</enumeration>
</element>
</sequence>
</segment>
<!-- Occurrence of standard loop L0000 identified as "0000A".
The discriminator element (1.0, element 1, component N/A)
provides the values that differentiate 0000A from other
loops of the same type.
-->
<loop type="L0000" code="0000A" discriminator="1.0">
<sequence>
<segment type="S11">
<sequence>
<element position="1">
<enumeration>
<!-- Implementation requires SAA01 to be 'X1'. -->
<value>X1</value>
</enumeration>
</element>
<!-- The second element is omitted, therefore not
used by this implementation of the loop. -->
</sequence>
</segment>
<segment type="S12" maxOccurs="2" />
</sequence>
</loop>
<!-- Additional occurrence of standard loop L0000 identified
as "0000B". The position of the discriminator element
(1.0, element 1, component N/A) must be the same as other
implementations of L0000.
-->
<loop type="L0000" code="0000B" discriminator="1.0">
<sequence>
<segment type="S11">
<sequence>
<element position="1">
<enumeration>
<value>QQ</value>
</enumeration>
</element>
<!-- Second element used as-is from the standard SAA segment. -->
<element position="2" />
</sequence>
</segment>
<!-- Segment S12 is not used by loop 0000B -->
</sequence>
</loop>
<!-- Segment SZZ is used as-is from the standard. -->
<segment type="SZZ" />
</sequence>
</implementation>
In this implementation example, we can see a several things.
- The implementation restricts the number of occurrences of the
SAA
segment to one. - There are two "types" of the
L0000
loop,0000A
and0000B
. Each type is identifier by the value of the first element of the first segment,S11
. When element S1101 isX1
, the rules for loop0000A
apply and when S1101 isQQ
, the rules for loop0000B
apply. The value of thediscriminator
attribute on the loop indicates which element of the loop's first segment contains the enumerated values used to identify that instance of the standard loop. - Note the differences between the two occurrences of the loop. In the "A" type, element S1102 is forbidden (not used) and the occurrences of segment
S12
are limited to two (three fewer than allowed in the standard). In the "B" type, S1102 is allowed, but theS12
segment is omitted and therefore not allowed.
Conclusion
Validation of EDI data can be a complicated task — especially when developers need to check both the standard rules as well as the rules specific to a particular industry or trading partner. Using the validation features in the StAEDI Java library, this task becomes a little bit simpler.
Are you writing custom code to process EDI data in Java? Have you tried StAEDI and encountered an issue? Give your feedback in the comments, or open an issue on the StAEDI GitHub repository.
Opinions expressed by DZone contributors are their own.
Comments