Java 1.5 annotation processors and Netbeans →

Java DocumentBuilder: xml parsing is very slow?

March 28, 2011 12 Comments

I’ve been messing up with some code to find certain links in a xhtml page in Java. I’ve started with XPath and page source parsed by ootb javax.xml.parsers.DocumentBuilder, but it was so painfully slow. What’s most interesting it was not the XPath evaluation but xhtml parsing.

It was only 12kB large and took around 2 minutes to parse! It was simply unusable (that’s why this regex from previous post was born). Then XPath was evaluated in no time. What was causing the issue is that xml parser is by default doing all validation it can while parsing documents (this also means trying to download DTDs or xslt documents to validate document structure). All was fixed by disabling validation. So here it is if you need it:


DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
fac.setNamespaceAware(false);
fac.setValidating(false);
fac.setFeature("http://xml.org/sax/features/namespaces", false);
fac.setFeature("http://xml.org/sax/features/validation", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = fac.newDocumentBuilder();

Now use this builder to parse xml documents with no validation (and no time).

Filed under CodeProject, Java, Programming

12 Responses to Java DocumentBuilder: xml parsing is very slow?

Fender says:

May 5, 2011 at 12:57

thanks for your article. I had the same problem and it solved it!

Reply
Doesn't work says:

May 24, 2011 at 03:14

Applied your six line change to our painfully slow Android app and… BAM! Instant load times. My initial thought was “you win teh Internets for today” but then after more digging around I discovered that it didn’t download the content and therefore processed absolutely nothing. Processing nothing of course equates to a super-fast but completely useless application. Removing the ‘apache.org’ lines restored the data but also dropped load times back to what they pretty much were. It seems to be slightly faster but doesn’t make enough of a difference to matter. I’m pretty sure a lot of it has to do with the network being slow even for a paltry couple of KB of data.

Reply
- Marek Piechut says:
  
  May 24, 2011 at 11:08
  Hi. I was using it in desktop app (NetBeans platform to be more precise) and it worked just fine. But I’ve downloaded all content before parsing it and then worked on local data:
```
            ByteArrayOutputStream os = new ByteArrayOutputStream();
            FileUtil.copy(inputStream, os);
            xml = new String(os.toByteArray());
            xml = xml.trim();
            CharArrayReader reader = new CharArrayReader(xml.toCharArray());
            Document document = builder.parse(new InputSource(reader));
```
  Check this out, maybe it will solve your performance issues.
  Reply
Anonymous says:

August 5, 2011 at 15:41

Really great article, thanks a lot!

Reply
Anonymous says:

September 16, 2011 at 12:39

Didn’t work for me

javax.xml.parsers.ParserConfigurationException: http://apache.org/xml/features/nonvalidating/load-dtd-grammar

Reply
Anonymous says:

June 14, 2012 at 18:49

It did its job. Thanks for the concise and straight article

Reply
Dow says:

June 20, 2012 at 06:25

This also does the job : http://stackoverflow.com/questions/5431646/is-there-any-way-improve-the-performance-of-flyingsaucer/6957341#6957341

Reply
Pingback: JavaPins
Laura says:

October 22, 2012 at 17:10

Have you tired Expresso XML Parser? It’s a high performance parser that has a graphical interface. You log onto a website and parse your XML file online so you can create parsing rules without writing code. Then you link up to your existing project using their client code. At the moment they have client code in java and javascript. They have a free developer version at http://www.sxml.com.au

Reply
Anonymous says:

June 21, 2013 at 16:11

great! thanks a lot

Reply
Anonymous says:

June 22, 2013 at 18:56

really great! thanks a lot

Reply
Anonymous says:

October 29, 2013 at 18:33

Thanks a lot! I solved my problem

Reply

Development world stories

Java DocumentBuilder: xml parsing is very slow?

12 Responses to Java DocumentBuilder: xml parsing is very slow?

Leave a reply to Anonymous Cancel reply

Tag Cloud

Tweets

Recent Posts

Email Subscription

Development world stories

Java DocumentBuilder: xml parsing is very slow?

Share this:

Related

12 Responses to Java DocumentBuilder: xml parsing is very slow?

Leave a reply to Anonymous Cancel reply

Tag Cloud

Tweets

Recent Posts

Email Subscription