[FFmpeg-devel] [PATCH 1/4] avcodec/strdec: factor out HTML parsing code

Clément Bœsch u at pkh.me
Sat Aug 15 20:08:13 CEST 2015


On Sat, Aug 08, 2015 at 12:52:04PM -0700, Yayoi Ukai wrote:
[...]
> >> -    while (dst->len >= 2 && !strncmp(&dst->str[dst->len - 2], "\\N", 2))
> >> -        dst->len -= 2;
> >> -    dst->str[dst->len] = 0;
> >> -    rstrip_spaces_buf(dst);
> >
> > why did you completely remove this chunk?
> 
> It appeared to me that it didn't do anything even in original code.
> But I can put it back.
> (It didn't make any difference in fate test or other simple test
> whether I commented it out or not.)
> 

This commit is supposed to be factoring out the code, not do any
functional change. If you want to do such changes it belongs in a separate
commit (but you probably don't want to because it's likely wrong).

Anyway, I had to have a look to the diff again to check if you hadn't
added more unwanted changes. And unfortunately, you did. So first of all,
this following diff I extracted belongs in a separated commit (typically
squashed in the one where you make SAMI use it, or eventually just
before).

Here is a review of that part assuming it is extracted/moved:

diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
index a138955..b2f2273 100644
--- a/libavcodec/htmlsubtitles.c
+++ b/libavcodec/htmlsubtitles.c
@@ -71,6 +71,21 @@ void ff_htmlmarkup_to_ass(AVCodecContext *avctx, AVBPrint *dst, const char *in)
                 end = 1;
                 break;
             }
+
+            /* check if it is end of the paragraph or not*/
+            in++;
+            count = 1;
+            while(*in == ' ') {
+                in++;
+                count++;
+            }
+            if (*in == '\0' || *in == '\n'){
+                in = in - count;
+                break;
+            }
+            in = in - count;
+
+            /*if not the end of the paragraph, add line break */

Why? What are you trying to do here that isn't already handled by the code
already?

There is no concept of "paragraph" in SubRip markup, so it probably
doesn't belong here, assuming it's necessary.

             rstrip_spaces_buf(dst);
             av_bprintf(dst, "\\N");
             line_start = 1;
@@ -90,6 +105,15 @@ void ff_htmlmarkup_to_ass(AVCodecContext *avctx, AVBPrint *dst, const char *in)
                 av_bprint_chars(dst, *in, 1);
             break;
         case '<':
+            if (!av_strncasecmp(in, "<BR", 3)){

here and below, check your style

+                av_bprintf(dst, "\\N");
+                len = 3;
+                while (in[len] != '>' && (av_isspace(in[len]) || in[len] == '/')){

if in[len] is a space or a '/', it's obviously different than '>', so the
condition is redundant.

+                        len++;

wrong indent

+                }

+                in += len + 1;

this +1 is very dangerous, there is a risk of overread.

+            }
+
             tag_close = in[1] == '/';
             len = 0;
             if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= 1 && len > 0) {

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150815/31c56557/attachment.sig>


More information about the ffmpeg-devel mailing list