[FFmpeg-devel] [PATCH v4 1/4] avutil/{avstring, bprint}: add XML escaping from ffprobe to avutil

Nicolas George george at nsup.org
Mon Jan 25 14:47:30 EET 2021


Jan Ekström (12021-01-25):
> For now I kept them separate since one is just moving Stefano's code,
> and another adds new functionality that you requested.

Stefano's code was good enough for ffprobe, because it had to perform a
very precise task in a very constrained context, but as is, it is not
good enough for lavu, it needs to be adapted.

> I really have no bigger bone with either naming, I just utilized:
> 1. How the XML spec calls the thing ("AttValue" thus becoming
> ATT_VALUE, "CharData" becoming CHAR_DATA). I slightly would prefer to
> keep this due to this making it easy to point towards what is actually
> meant in the XML spec.

CharData is only used a few times in the abstract syntax of the spec, it
will have meaning only for people who are very familiar with it.

> 2. Single and double quotes are the wording that I am more acquainted
> with, but I have no hard opinion so if you want _QUOT/_APOS, sure.

I am fine with "double_quotes" and "single_quotes" or "dquot" and
"squot" too.

> > Outside attributes, " and ' do not need to be quoted.
> 
> I think this is where I'm slightly going out of my comfort zone. This
> is already existing code, and I just moved it out of ffprobe (where it
> was already successfully merged and existed for a while). Additionally
> the expression for char_data does at least reference single quotes. I
> do see the paragraph at the end of 2.4 that notes

Let us take a few steps back.

The spec says:

- < needs to be always escaped;

- ' needs to be escaped when between ';

- " needs to be escaped when between ".

Common sens and good practice say:

- characters that do not need to be escaped should not be escaped.

(With XML, we make an exception for that last rule for >.)

Therefore, we need three modes:

- one mode to use between ';

- one mode to use between ";

- one mode to use outside.

> > To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "'", and the double-quote character (") as """."
> 
> Of course, which seems specific to attribute values.

> But at this point I'm 50/50 if it makes even sense for me to continue
> this struggle or if this is an unclear message from you to me that I
> should just utilize libxml2, which is already utilized in the DASH
> demuxer. Mostly because there I at least can relatively surely know
> that (probably) not only escaping, but also removal of invalid UTF-8
> might be handled by it (which we currently do not do either with JSON
> or XML output from ffprobe as far as I know). That way also I don't
> have to attempt to be a nice person and attempt to re-utilize existing
> code from ffprobe, nor do I have to touch ffprobe.

libxml2 has a terrible track records in terms of stability, including
security. We should strive to use it less, not more.

For parsers, we currently have no choice, because parsing XML is tricky.
I still intend to finish the limited XML parser that I started
implementing a few years ago, though.

On the other hand, writing good XML is quite easy, and that is what we
should be doing. Your code is almost ready, it just needs a few tweaks.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210125/0bed61a9/attachment.sig>


More information about the ffmpeg-devel mailing list