Java DocumentBuilder: xml parsing is very slow?

I’ve been messing up with some code to find certain links in a xhtml page in Java. I’ve started with XPath and page source parsed by ootb javax.xml.parsers.DocumentBuilder, but it was so painfully slow. What’s most interesting it was not the XPath evaluation but xhtml parsing.

It was only 12kB large and took around 2 minutes to parse! It was simply unusable (that’s why this regex from previous post was born). Then XPath was evaluated in no time. What was causing the issue is that xml parser is by default doing all validation it can while parsing documents (this also means trying to download DTDs or xslt documents to validate document structure). All was fixed by disabling validation. So here it is if you need it:

DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
fac.setNamespaceAware(false);
fac.setValidating(false);
fac.setFeature("http://xml.org/sax/features/namespaces", false);
fac.setFeature("http://xml.org/sax/features/validation", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = fac.newDocumentBuilder();

Now use this builder to parse xml documents with no validation (and no time).

Advertisements

12 Responses to Java DocumentBuilder: xml parsing is very slow?

  1. Fender says:

    thanks for your article. I had the same problem and it solved it!

  2. Doesn't work says:

    Applied your six line change to our painfully slow Android app and… BAM! Instant load times. My initial thought was “you win teh Internets for today” but then after more digging around I discovered that it didn’t download the content and therefore processed absolutely nothing. Processing nothing of course equates to a super-fast but completely useless application. Removing the ‘apache.org’ lines restored the data but also dropped load times back to what they pretty much were. It seems to be slightly faster but doesn’t make enough of a difference to matter. I’m pretty sure a lot of it has to do with the network being slow even for a paltry couple of KB of data.

    • Marek Piechut says:

      Hi. I was using it in desktop app (NetBeans platform to be more precise) and it worked just fine. But I’ve downloaded all content before parsing it and then worked on local data:

                  ByteArrayOutputStream os = new ByteArrayOutputStream();
                  FileUtil.copy(inputStream, os);
                  xml = new String(os.toByteArray());
                  xml = xml.trim();
                  CharArrayReader reader = new CharArrayReader(xml.toCharArray());
                  Document document = builder.parse(new InputSource(reader));
      

      Check this out, maybe it will solve your performance issues.

  3. Anonymous says:

    Really great article, thanks a lot!

  4. Anonymous says:

    Didn’t work for me

    javax.xml.parsers.ParserConfigurationException: http://apache.org/xml/features/nonvalidating/load-dtd-grammar

  5. Anonymous says:

    It did its job. Thanks for the concise and straight article

  6. Pingback: JavaPins

  7. Laura says:

    Have you tired Expresso XML Parser? It’s a high performance parser that has a graphical interface. You log onto a website and parse your XML file online so you can create parsing rules without writing code. Then you link up to your existing project using their client code. At the moment they have client code in java and javascript. They have a free developer version at http://www.sxml.com.au

  8. Anonymous says:

    great! thanks a lot

  9. Anonymous says:

    really great! thanks a lot

  10. Anonymous says:

    Thanks a lot! I solved my problem

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: