[FFmpeg-devel] [PATCH 2/4] avcodec/samidec: use ff_htmlmarkup_to_ass()

Yayoi Ukai yayoi.ukai at gmail.com
Sun Aug 9 06:24:03 CEST 2015


On Sat, Aug 8, 2015 at 8:23 AM, Clément Bœsch <u at pkh.me> wrote:
> On Fri, Aug 07, 2015 at 11:03:29PM -0700, Yayoi wrote:
>> ---
>>  libavcodec/samidec.c    | 59 +++++++++++++++++++++++++++++--------------------
>>  tests/ref/fate/sub-sami | 18 +++++++--------
>>  2 files changed, 44 insertions(+), 33 deletions(-)
>>
>> diff --git a/libavcodec/samidec.c b/libavcodec/samidec.c
>> index 47850e2..41826a7 100644
>> --- a/libavcodec/samidec.c
>> +++ b/libavcodec/samidec.c
>> @@ -27,6 +27,7 @@
>>  #include "ass.h"
>>  #include "libavutil/avstring.h"
>>  #include "libavutil/bprint.h"
>> +#include "htmlsubtitles.h"
>>
>>  typedef struct {
>>      AVBPrint source;
>> @@ -41,11 +42,12 @@ static int sami_paragraph_to_ass(AVCodecContext *avctx, const char *src)
>>      char *tag = NULL;
>>      char *dupsrc = av_strdup(src);
>>      char *p = dupsrc;
>> +    char *pcopy = NULL;
>> +    int index = 0;
>> +    int second_paragraph = 0;
>>
>> -    av_bprint_clear(&sami->content);
>>      for (;;) {
>>          char *saveptr = NULL;
>> -        int prev_chr_is_space = 0;
>>          AVBPrint *dst = &sami->content;
>>
>>          /* parse & extract paragraph tag */
>> @@ -77,37 +79,46 @@ static int sami_paragraph_to_ass(AVCodecContext *avctx, const char *src)
>>              goto end;
>>          }
>>
>
>> -        /* extract the text, stripping most of the tags */
>> +        /* check for the second paragrph */
>
> Why change the comment? What does "check" mean here? What is the "second
> paragraph"?


I answer it below with the other questions you have because you are
basically asking the same things.
And I will document better too.

>
>> +        pcopy = av_strdup(p);
>>          while (*p) {
>>              if (*p == '<') {
>> -                if (!av_strncasecmp(p, "<P", 2) && (p[2] == '>' || av_isspace(p[2])))
>> +                if (!av_strncasecmp(p, "<p", 2) && (p[2] == '>' || av_isspace(p[2]))) {
>> +                    second_paragraph = 1;
>>                      break;
>> -                if (!av_strncasecmp(p, "<BR", 3))
>> -                    av_bprintf(dst, "\\N");
>> -                p++;
>> -                while (*p && *p != '>')
>> -                    p++;
>> -                if (!*p)
>> -                    break;
>> -                if (*p == '>')
>> -                    p++;
>> -                continue;
>> +                }
>>              }
>> -            if (!av_isspace(*p))
>> -                av_bprint_chars(dst, *p, 1);
>> -            else if (!prev_chr_is_space)
>> -                av_bprint_chars(dst, ' ', 1);
>> -            prev_chr_is_space = av_isspace(*p);
>>              p++;
>> +            index++;
>> +        }
>> +        p = p - index;
>> +        if (second_paragraph) {
>> +            p[index] = 0;
>>          }
>> -    }
>>
>> -    av_bprint_clear(&sami->full);
>> -    if (sami->source.len)
>> -        av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", sami->source.str);
>> -    av_bprintf(&sami->full, "%s", sami->content.str);
>> +        ff_htmlmarkup_to_ass(avctx, dst, p);
>> +
>> +        /* add the source if there are any. */
>> +        av_bprint_clear(&sami->full);
>> +        if (sami->source.len) {
>> +            av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", sami->source.str);
>> +            av_bprintf(&sami->full, "%s", sami->content.str);
>> +            if (second_paragraph) {
>> +                second_paragraph = 0;
>> +                p = pcopy;
>> +                p += index;
>> +                index = 0;
>> +                continue;
>> +            }
>> +        } else {
>> +            av_bprintf(&sami->full, "%s", sami->content.str);
>> +        }
>> +        av_bprint_clear(&sami->content);
>> +
>> +    }
>>
>
> This looks clumsy at best: you are finalizing the subtitle event inside
> the paragraph loop when it should be outside. It also seems there is a
> duplicating "second paragraph" logic even though the loop is supposed to
> handle one paragraph at a time.

I know. It is clumsy.. I explain it below as well.

>
> If you are uncomfortable with the current logic or believe it's badly
> designed for the logic you're trying to mangle, feel free to rewrite it;
> it's better than trying to inhibit the old behaviour with hacks.

It's not that your code was badly designed to begin with. I feel like
it is the nature
of this format.

So let me explain what was going on your old code and what I had changed.
Here is the example: (This one is from current fate sample.)

<SYNC Start=73000>
      <P Class=ENUSCC ID=Source>End of:
      <P Class=ENUSCC>President John F. Kennedy Speech

So this is one subtitle event and there are two paragraphs. One
paragraph has a new source and
the other paragraph has its content.
Since it is decided that this code uses html
parser(ff_htmlmarkup_to_ass) and use it to parse the contents of the
paragraph. Each paragraph content needs to go to to html parser,
regardless whether it is source or subtitle content.

So I need to tell html_parser, ff_htmlmarkup_to_ass that where the
destination is.. (whether it is source or content).
I can't no longer specify the destination which I was able to do in
your old code. I mean maybe I can change the function header and add
additional flag to tell our parser where to put in the dst but I don't
want to because it may require to change srtdec and I do not want to
do that..

Anyways, so that's the basic idea. I honestly think it is good that
subtitle event handle is also in loop because one loop iteration per
one subtitle event and subtitle event does not guarantee that it
contains only one paragraph to begin with. It was a possibility either
adding additional loop or having goto to honor your original logic but
after I noticed that subtitle event contains more than one paragraph,
I gave up on the idea to keep your logic. The comment says "check
second paragraph" but it should be able to handle multiple paragraphs
more than 2 in one subtitle event. (I definitely tested while I was
working on this part.. I can modify sample or attach test file if that
makes you feel happier. And I make a better comment then.)

So I want to leave it as it is if you do not see major flaws or you do
not have very very strong opinion about how it needs to be modified.
Well, given the information, if you have some ideas, you can tell me
too. Besides, I can't cross out the possibility if you or me or
someone wants to add more enhancement (maybe one day you decided to
add support for img tag? who knows..) , and it may need to change code
structure again. And as it is now, it is still enhancement that works.
;-).

Well, please let me know if you still have concerns and questions etc.
I will fix the rest of comments and misplaced things on you mentioned
in other mails and I will send the updated patches either today or
tomorrow. (or when it is done but hopefully definitely in a week.)

Thank you!
Yayoi

>
> [...]
>
> --
> Clément B.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list