Refactor: libcrmcommon: Drop utf8_bytes()
39b6baf4ab89
Actions

Description

Refactor: libcrmcommon: Drop utf8_bytes()

A lot of the complexity of utf8_bytes() was for dealing with the fact
that the C standard doesn't specify the size of a byte. UTF-8 characters
come in 8-bit chunks. utf8_bytes() detected how many 8-bit bytes were in
a UTF-8 character and then converted that to a number of C bytes.

The previous commit requires an 8-bit char at build time. Now we can use
g_utf8_next_char() to get a pointer to the next UTF-8 character in a
string. Determining the number of bytes to skip is implemented more
efficiently there (by indexing into an array), and this avoids
reinventing the wheel and adding clutter.

Note: the GLib docs recommend calling g_utf8_validate() on the string
before calling g_utf8_next_char(). However, all GLib functions assume
that strings are encoded as UTF-8 [1]. I suspect most of Pacemaker and
parts of libxml2 would fall apart if we received non-UTF-8 strings. Our
escape-XML functions don't seem like the place to start validating the
encoding.

[1] https://docs.gtk.org/glib/programming.html#utf-8-and-string-encoding

Closes T801

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>