PHP: How to handle <![CDATA[ with SimpleXMLElement?

85

12

I noticed that when using SimpleXMLElement on a document that contains those CDATA tags, the content is always NULL. How do I fix this?

Also, sorry for spamming about XML here. I have been trying to get an XML based script to work for several hours now...

<content><![CDATA[Hello, world!]]></content>

I tried the first hit on Google if you search for "SimpleXMLElement cdata", but that didn't work.

Angelo

Posted 2010-06-03T23:48:39.177

Reputation: 462

How are you trying to access the node value? And, is SimpleXML a requirement? – allnightgrocery – 2010-06-03T23:58:17.950

I tried every other function (xml2array and all) that I could find on the web and SimpleXML seems to be the only one that gives GOOD results, except for the CDATA not working. – Angelo – 2010-06-04T00:02:49.407

1

We do a lot of XML parsing at work using DOMDocument (http://www.php.net/manual/en/class.domdocument.php). It works just fine in handling CDATA. Give that a short or post a little more code for us to see how you're working with SimpleXML.

– allnightgrocery – 2010-06-04T00:51:43.250

Answers

151

You're probably not accessing it correctly. You can output it directly or cast it as a string. (in this example, the casting is superfluous, as echo automatically does it anyway)

$content = simplexml_load_string(
    '<content><![CDATA[Hello, world!]]></content>'
);
echo (string) $content;

// or with parent element:

$foo = simplexml_load_string(
    '<foo><content><![CDATA[Hello, world!]]></content></foo>'
);
echo (string) $foo->content;

You might have better luck with LIBXML_NOCDATA:

$content = simplexml_load_string(
    '<content><![CDATA[Hello, world!]]></content>'
    , null
    , LIBXML_NOCDATA
);

Josh Davis

Posted 2010-06-03T23:48:39.177

Reputation: 22 398

LIBXML_NOCDATA should be the second parameter, not the third! Otherwise it works fine, +1. – Pavel S. – 2013-08-20T13:33:45.280

2No, PHP skips CDATA completely for some reason.

Any other ideas? – Angelo – 2010-06-04T00:24:08.810

4Then it's a bug. Upgrade PHP/libxml until it works (I've never had any problems with CDATA and SimpleXML.) You may want to try your luck with LIBXML_NOCDATA otherwise. – Josh Davis – 2010-06-04T01:56:08.683

Right on. Without the LIBXML_NOCDATA, the XML comes in as false - regardless of how it's done. I was able to prove that out with both creation methods...

 $x = new SimpleXMLElement('&lt;content&gt;&lt;![CDATA[Hello, world!]]&gt;&lt;/content&gt;', LIBXML_NOCDATA);

 $y = simplexml_load_string('&lt;content&gt;&lt;![CDATA[Hello, world!]]&gt;&lt;/content&gt;', "SimpleXMLElement", LIBXML_NOCDATA);

print_r($y);

Without that option, they're both null.

Just wanted to back up your assertion. – allnightgrocery – 2010-06-04T02:09:57.897

4I know this is an old answer, but I would like to stress that the first part of this answer is correct. When you print the result with print_r you are indeed not accessing it correctly. Write the code you actually want - probably with echo, or with a (string) cast, and you will find the content is fine. Do not use LIBXML_NOCDATA it is irrelevant. – IMSoP – 2014-05-05T01:26:52.483

While debugging an application, var_dump'ing a SimpleXMLElement containing CDATA's doesn't show nodes content. But var_dump'ing this did the job: simplexml_load_string($simplexml-&gt;asXML(), null, LIBXML_NOCDATA) – Gras Double – 2014-05-06T00:17:36.363

3@IMSoP Adding LIBXML_NOCDATA (and changing nothing else) works, so I'm not so sure it is irrelevant. – rand – 2015-02-06T10:55:15.650

@SimonePalazzo Adding LIBXML_NOCDATA fixes print_r and var_dump output, yes. It does not fix any code you should actually be using in production, because whenever you actually try to use that string, you'll find that the CDATA was there all along. – IMSoP – 2015-02-06T11:03:19.487

@IMSoP Well, then it's not working for me :) I'm using simplexml_load_string + convert object to array + edit array + convert back to xml, and without LIBXML_NOCDATA it does not work, i.e. the corresponding field is empty (don't know if null or empty string). – rand – 2015-02-06T11:20:40.053

@SimonePalazzo Your mistake is converting the SimpleXML object to an array - that's not what SimpleXML is designed for. You should be using foreach, -&gt;element, ['attribute'], etc on the SimpleXML object itself. See: http://php.net/manual/en/simplexml.examples-basic.php Or alternatively, you should be using a different parser to produce an array more suited to your needs. Or using the DOM interface, which has better editing functions.

– IMSoP – 2015-02-06T12:03:09.420

@IMSoP I see... but why does LIBXML_NOCDATA help then? – rand – 2015-02-06T16:54:33.013

2@SimonePalazzo XML consists of various different "nodes" - e.g. &lt;anElement&gt;a text node &lt;aChildElement /&gt; &lt;![CDATA a cdata node]]&gt; another text node&lt;/anElement&gt;. The CDATA and text nodes are different types, and SimpleXML tracks this so you can get back the XML you put in. When you squeeze a SimpleXML object into an array, it throws away a lot of information - CDATA nodes, comments, any element not in the current namespace (e.g. &lt;someNSPrefix:someElement /&gt;), the position of the child element in the text, etc. LIBXML_NOCDATA converts CDATA nodes into text nodes, but doesn't fix the rest. – IMSoP – 2015-02-07T15:54:18.340

For full reference: Predefined Constants

– Marcio Mazzucato – 2017-06-08T16:01:43.073

36

The LIBXML_NOCDATA is optional third parameter of simplexml_load_file() function. This returns the XML object with all the CDATA data converted into strings.

$xml = simplexml_load_file($this->filename, 'SimpleXMLElement', LIBXML_NOCDATA);
echo "<pre>";
print_r($xml);
echo "</pre>";


Fix CDATA in SimpleXML

Pradip Kharbuja

Posted 2010-06-03T23:48:39.177

Reputation: 1 849

LIBXML_NOCDATA is what made this work for me. PHP 5.3.5 – Mike_K – 2017-04-21T19:27:57.640

1Your answer is the one that explains the LIBXML_NOCDATA meaning, thanks! – Marcio Mazzucato – 2017-06-08T15:38:46.100

10

This did the trick for me:

echo trim($entry->title);

breez

Posted 2010-06-03T23:48:39.177

Reputation: 1 109

Perfect if you need to keep the cdata (without LIBXML_NOCDATA) – maztch – 2013-05-10T17:46:38.230

7

This is working perfect for me.

$content = simplexml_load_string(
    $raw_xml
    , null
    , LIBXML_NOCDATA
);

vijayrana

Posted 2010-06-03T23:48:39.177

Reputation: 418