Extract Data from a Multilevel XML File

Problem description & analysis

We have an XML file invoice.xml. Below is part of the data:

    <LIST_G_AMOUNT_DUE>

        <G_AMOUNT_DUE>

            <SORT_TRX_SEQUENCE>1</SORT_TRX_SEQUENCE>

            <SORT_TRX_DATE>28-JUL-14</SORT_TRX_DATE>

            <SORT_INVOICE_NUMBER>111820</SORT_INVOICE_NUMBER>

            <SORT_DUE_DATE>27-AUG-14</SORT_DUE_DATE>

            <PS_SEQUENCE>1493092</PS_SEQUENCE>

            <TRX_SEQUENCE>1712368</TRX_SEQUENCE>

            <RECEIPT_CURRENCY_CODE></RECEIPT_CURRENCY_CODE>

            <AMOUNT_APPLIED_FROM></AMOUNT_APPLIED_FROM>

            <LIST_G_LINE_CLUSTER>

                <G_LINE_CLUSTER>

                    <INVOICE_NUMBER>111820</INVOICE_NUMBER>

                    <TRX_DATE>28-JUL-14</TRX_DATE>

                    <TRANSACTION>Invoice</TRANSACTION>

                    <DUE_DATE>27-AUG-14</DUE_DATE>

                    <REFERENCE>SAMPLE SALE </REFERENCE>

                    <BILL_TO_LOCATION>WASHINGTON</BILL_TO_LOCATION>

                    <LINE_CUSTOMER_ID>4382</LINE_CUSTOMER_ID>

                    <GENERAL_SEQUENCE>9648082</GENERAL_SEQUENCE>

                    <AMOUNT_DUE></AMOUNT_DUE>

                    <TRX_AMOUNT>64.4</TRX_AMOUNT>

                    <CD_TRX_AMOUNT>        64.40 </CD_TRX_AMOUNT>

                    <DUMMY_REFERENCE>SAMPLE SALE </DUMMY_REFERENCE>

                    <C_BILL_TO_LOC>0</C_BILL_TO_LOC>

                </G_LINE_CLUSTER>

                <G_LINE_CLUSTER>

                    <INVOICE_NUMBER>111820</INVOICE_NUMBER>

                    <TRX_DATE>19-OCT-15</TRX_DATE>

We are trying to extract data from the XML file. The expected result is as follows:

SORT_INVOICE_NUMBER

TRANSACTION

SORT_DUE_DATE

TRX_DATE

TRX_AMOUNT

111820

Invoice

27-AUG-14

28-JUL-14

64.4

111820

Payment

27-AUG-14

19-OCT-15

-64.4

1100585

Invoice

30-JUL-15

30-JUN-15

69.4

1100585

Payment

30-JUL-15

05-AUG-15

-16.73

1100585

Payment

30-JUL-15

09-SEP-15

-52.2

1100585

Payment

30-JUL-15

19-OCT-15

-0.47

1101491

Invoice

05-AUG-15

06-JUL-15

69.4

1101491

Payment

05-AUG-15

19-OCT-15

-69.4

 

Solution

Write the following script p1.dfx in esProc:

A

1

=file("invoice.xml").read()

2

=xml(A1,"G_STATEMENT/LIST_G_AMOUNT_DUE/G_AMOUNT_DUE")

3

=A2.news(if(ifa(LIST_G_LINE_CLUSTER.G_LINE_CLUSTER),LIST_G_LINE_CLUSTER.G_LINE_CLUSTER,[LIST_G_LINE_CLUSTER.G_LINE_CLUSTER]);SORT_INVOICE_NUMBER,~.TRANSACTION,SORT_DUE_DATE,~.TRX_DATE,~.TRX_AMOUNT)

 

Explanation:

A1   Read the XML file as a string.

A2   Parse A1’s XML string, get data at the level of G_STATEMENT/LIST_G_AMOUNT_DUE/G_AMOUNT_DUE, and return it as a table sequence.

A3   Extend the table sequence by a specific column to get the targe result. Note that LIST_G_LINE_CLUSTER.G_LINE_CLUSTER could be a record sequence, which should first be checked and then converted into a sequence if it is a record sequence.

Refer to How to Call an SPL Script in BIRT to learn about the method of integrating the SPL script with BIRT.

Q & A Collection

https://www.eclipse.org/forums/index.php/t/1076017/