Investigate using xmlEncodeEntitiesReentrant() and deprecating crm_xml_escape()
Closed (Released)Public
Actions

Assigned To

Authored By

	nrwahl2
	Jan 30 2024, 12:09 AM

Description

Or alternatively, get rid of uses of xmlEncodeEntitiesReentrant(). The main point here is that we probably ought to stick with one or the other unless there's a compelling reason to use both. Using libxml2 would be nice if it works.

We currently use xmlEncodeEntitiesReentrant() to escape special characters in LSB and heartbeat stonith metadata. It seems to serve the same purpose as crm_xml_escape(), though the implementations of course differ. In particular, they take different approaches to handling Unicode characters.

Also, crm_xml_escape() escapes more characters. The following are escaped by crm_xml_escape() but not by xmlEncodeEntitiesReentrant():

'\'' (via ampersand)
'\t' (via expansion to spaces)
'\n' (via backslash)
'\r' (escaped via backslash in crm_xml_escape() but via hex in xmlEncodeEntitiesReentrant())
'"': A comment in xmlEncodeEntitiesInternal() says this is escaped, but I don't see it being escaped in the code.
crm_xml_escape() uses octal while xmlEncodeEntitiesReentrant() uses hex for out-of-range characters.

Here's the implementation. xmlEncodeEntitiesReentrant() is a wrapper for xmlEncodeEntitiesInternal().
https://gitlab.gnome.org/GNOME/libxml2/-/blob/master/entities.c#L533

Despite these differences, it's reasonable to think that xmlEncodeEntitiesReentrant() works and produces a valid representation. libxml2 uses it in a couple of places internally, so one would expect something else to break if it didn't work correctly... not guaranteed of course.

Within Pacemaker:

2012-06-05: 4963384: xmlEncodeEntitiesReentrant() was used
2013-01-13: 2723858: crm_xml_escape() was added

There's no comment describing any limitation with xmlEncodeEntitiesReentrant(), so it's unclear whether beekhof was simply unaware of xmlEncodeEntitiesReentrant() or what.

To complicate things further, there's another libxml2 function xmlEncodeSpecialChars() that seems to work similarly. I've posted on the libxml2 Discourse asking about the intended use cases for each:

https://discourse.gnome.org/t/intended-use-of-xmlencodeentitiesreentrant-vs-xmlencodespecialchars/19252

References:

Edit: I'm waiting to hear back from the libxml maintainers on Discourse. However, from looking at the source code and reading more of the XML spec, I can't imagine how the library functions are sufficient... they should be fine for text but not for attribute values, for example.

Another case where "rolling our own" might have actually been the right move.