Author Topic: An Adaptive and Efficient XML Parser Tool for Domain Specific Languages  (Read 2383 times)

0 Members and 1 Guest are viewing this topic.


  • Newbie
  • *
  • Posts: 48
  • Karma: +0/-0
    • View Profile
Author : W. Jai Singh, S. Nithya Bala
International Journal of Scientific & Engineering Research, IJSER - Volume 2, Issue 4, April-2011
ISSN 2229-5518
Download Full Paper -

Abstract— XML (eXtensible Markup Language) is a standard and universal language for representing information. XML has become integral to many critical enterprise technologies with its ability to enable data interoperability between applications on different platforms. Every application that processes information from XML documents needs an XML Parser which reads an XML document and provides interface for user to access its content and structure. However the processing of xml documents has a reputation of poor performance and a number of optimizations have been developed to address this performance problem from different perspectives, none of which have been entirely satisfactory. Hence, in this paper we developed a Fast Parser tool for domain specific languages. In this we can execute Parser user friendly without having any constraints.

Index Terms—XML, Parser Tool, Document Object Model, SAX, XML Document, Document Validation.   

The XML (eXtensible Markup Language) is now widely adopted within (networked) applications. Due to its flexibility and efficiency in transmission of data, XML has become the emerging standard of data transfer and data exchange across the application and Internet [1]. XML has potential as a back end solution as well as a marvelous standard for re-designing databases and other content. XML has become integral to many critical enterprise technologies with its ability to enable data interoperability between on different platforms. With various conversion tools now emerging in the marketplace, XML can be used to bridge between different applications. It’s standard-based, set forth a design for structuring future content [1], [2].
XML delivers key advantages in interoperability due to its flexibility, expressiveness and platform-neutrality. As XML has become a performance-critical aspect of the next generation of business computing infrastructure. Tomorrow’s computers will have more cores rather than exponentially faster clock speeds, and software will increasingly have to rely on parallelism to take advantage of this trend.
Every application that processes information from XML documents needs an XML Parser which reads an XML docu-ment and provides interface for user to access its content and structure. An XML parser facilitates in simplifying the process of manipulating XML documents. There are mainly two chal-lenges for generic XML parsers. One is that the code size of the XML Parsers is restricted because of the limitation of mem-ory. The other is the run time adaptability of XML Parsers is required due to the diversity of applications in terms of their dependency on XML syntax set.

Several efforts have been made to address the parsing and validation performance through the use of grammar based parser generation by leveraging XML schema language such as DTD (Document Type Definition), XML schema at compile time. DOM (Document Object Model) and SAX (Simple API for XML) are the two most widely used XML parsing models, none of which has been entirely satisfactory [2], [5].
A parser can read the XML document components via Application Programming Interfaces (APIs) in two approaches. For stream-based approach such as SAX (also known as event-based parser) and tree-based approach such as DOM. DOM (Document Object Model) and SAX (Simple API for XML) are the two most widely used XML parsing models, none of which has been entirely satisfactory [2], [3], [5], [7]. A brief description of them is given as follows.
DOM is a tree-based interface that models an XML document as a tree of various nodes. The main advantage of this parse method is that it supports random access to the doc-ument. DOM parsers create a node object for each node that precisely models all the structure and content information [2], [3], [5]. DOM is an easy way to work with XML. However, DOM parsers take too much time and memory, making them unavailable for large XML documents. Moreover, no actual work can be done before completely parsing XML, which in-troduces significant delay and is unacceptable in enterprise applications.
SAX is an event-based parsing model that reads an XML document from beginning to end. Each time it encounters a syntax construction; it generates an event and notifies the application [2], [5], [7]. It does not preserve the structure and content information in memory, thus saving a large amount of memory space. Unfortunately, they lack the ability of random access and are forward only, which limits their use to a very small scope. SAX is memory efficient but writing a SAX parser is complex.
Several methods were presented to improve XML parsing from different viewpoints. There are a number of ap-proaches trying to address the performance bottleneck of XML parsing. The typical software solutions include the pull-based parsing [9], lazy parsing [10] and schema-specific parsing [4].
Su Cheng Haw and G. S. V. Radha Krishna Rao have presented a model called “Comparative Study and Benchmarking on XML Parser”. In that, they compare the xerces and .NET parsers based on the performance, memory, usage and so on [2]. Giuseppe Psaila have been developed a system called “Loosely coupling Java algorithms and XML Parsers”. In that, he conducted a study about the problem of coupling  java algorithms with XML parsers. Su-cheng Haw and Chien-Sing Lee have been presented a model called “Fast Native XML Storage and Qurey Retrieval”. In that, they proposed the INLAB2 architecture comprises of five main components namely XML parser, XML encoder, XML indexer, Data manager and Query processor [10]. Fadi El-Hassan and Dan Ionescu presented “An efficient Hardware based XML parsing techniques”. In that, they proposed hardware based solutions can be an obvious choice to parse XML in a very efficient manner.
The existing XML Parsers spend a large amount of time in tokenizing the input. To overcome all the drawbacks, here we have developed a new Fast Parser tool for domain specific languages.  Though careful analysis of the operations required for parsing and validation, we are using hash table to store element information, this will enhance the speed of accessibility while searching for an element. More over we are using regular expressions to search for the tags and attributes, this will enhance the speed while reading XML contents.

2 Fast XML  Parser
To parse an XML document in software, the processing sequence starts by loading the XML document, then reading its characters in sequence, extracting elements and attributes and then validating the XML document, writing parsed information and finally reading the resulting parsed data. Our initial approach separates the process of reading the XML document and stores the contents in to the hash table using regular expressions. Fig.1 shows the architecture of the fast XML parser tool.

Fig.1: XML Parser Architecture  ( Download Full Paper to View Fig. )

The Fast XML Parser Tool contains four modules. First, load an XML document in an application. Second, reading an XML document. Third, Writing an XML document into the application. Finally, knowledge based search of an XML document. The Fast XML Parser Tool is as follows:

Load an XML document: Before an XML document can be accessed and manipulated, it must be loaded into an XML Parser. The XML Parser reads XML document and converts it into a meaningful format. The job of the XML Parser is to make sure that the document meets the defined structure and constraints. The validation rule for any particular sequence of character data is determined by the type of definition for its enclosing element. Fig.2 shows the sample XML document.

Step 1: Read XML File
Step 2: Search for start tag using regular expression ‘<…>
Step 3: If a start tag found then
Step 4: Add attributes to hash table
Step 5: Search or end tag using regular expression ‘</…>
Step 6: Add element to hash table
Step 7: End if
Step 8: Repeat step 3 to 7 until End Of File
Step 9: Verify and validate each element in hash table
Step 10: End XML Parser

Reading an XML document: Reading an existing load XML file. It provides the createXMLReader function that returns an implementation of the XMLReader interface. It reads the root element first and reads the sub element and corresponding data. Finally creates the XML document and sends it to the user application as a XML document.

Writing an XML document: Read the XML document and separates the content and writes it to the corresponding hash table such as root element, sub element, attributes and data. It provides the CreateXMLWriter function to return an imple-mentation of the XMLWriter function.

Knowledge based search: Search a particular element or con-tent from an XML document by our fast XML Parser tool. In-itially it checks the content in storage unit using hash key that is root element. If it is available then it goes to the sub element and corresponding data and displays it as output. If it is not available means it terminates the search.

Read More: