[FFmpeg-devel] [PATCH] delogo filter: new "uglarm" interpolation mode added

Wed Jan 2 23:08:55 EET 2019

Hi,

just an info: code didn't compile, don't know why I didn't see this... 
New patch is in work...

Regards, Uwe

Am 01.01.19 um 22:14 schrieb Uwe Freese:
> Hello,
>
> here's a new version of the patch.
>
> Thanks for the infos. I used the raw output of a small test video 
> (where delogo is applied in both modes) before and after the changes 
> to make sure the output is bytewise identical (the changes don't 
> change the output).
>
>
> In general I want to say that it seems to me that *some* of the points 
> go now in a more philosophical direction.
>
> I prefer code easy to understand and assume that compilers optimize 
> much. Yes, there may be many possibilities to probably "optimize" 
> something, but I tend to believe that many times this is not needed 
> nor helpful, because it doesn't really optimize and complicates the code.
>
> Additionally, when I don't have complete and functioning code to 
> replace my code (the same what's expected from me), I'm not willing to 
> spend many hours to guess what could be meant.
>
> I hope that not so many additional patches are needed anymore. I 
> understand that coding styles, syntax etc. shall be corrected, but 
> hope that we don't have to discuss too much alternative ways of 
> implementing basically the same.
>
>
>
> Changes since the last version of the patch (mail from 29.12. 21:38), 
> according the comments since then:
>
> - change include order at the beginning of the file
> - change loop format (linebreaks)
> - change loop borders
> - change indentation (line 175)
> - moved init code (create tables) to config_input function
>
> - used av_malloc_array instead of av_malloc. Avoid the cast. Use 
> variable in sizeof(...).
> - copyright 2018, 2019 - happy new year BTW ;-)
>
>
> Comments and remaining open points from the mails:
>
>>> +        for (y = logo_y1 + 1; y < logo_y2; y++) {
>>> +            for (x = logo_x1 + 1,
>>> +                xdst = dst + logo_x1 + 1,
>>> +                xsrc = src + logo_x1 + 1; x < logo_x2; x++, xdst++, 
>>> xsrc++) {
>> I think a line break after the semicolons would make this easier to
>> understand.
>
> Hm. I used the same format as in the original code.
>
> Nonetheless, I changed the format now, because I changed the loop 
> borders as requested and the loops are now different anyway.
>
>>> +                for (bx = 0; bx < logo_w; bx++) {
>>> +                    interpd += topleft[bx] *
>>> +                        uglarmtable[abs(bx - (x - logo_x1)) + (y - 
>>> logo_y1) * (logo_w - 1)];
>>> +                    interpd += botleft[bx] *
>>> +                        uglarmtable[abs(bx - (x - logo_x1)) + 
>>> (logo_h - (y - logo_y1) - 1) * (logo_w - 1)];
>>> +                }
>> This can be optimized, and since it is the inner loop of the filter, I
>> think it is worth it. You can declare a pointer that will stay the same
>> for the whole y loop:
>>
>>     double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1);
>>
>> and then use it in that loop:
>>
>>     interpd += ... * xweight[abs(bx - (x - logo_x1))];
>>
>> It avoids a lot of multiplications.
>
> I'm not sure about this point. First, I would absolutely assume from 
> the compiler that it optimizes expressions like "(logo_w - 1)" or a 
> fixed offset to access the array in the loop and that they are only 
> calculated once.
>
> Secondly, I don't exactly understand how *xweight in your example 
> should be used (and would mean smaller or easier code).
>
> @All: What is the opinion about changing this loop regarding use of an 
> additional xweight pointer?
>
>>
>>> +
>>> +                for (by = 1; by < logo_h - 1; by++) {
>>> +                    interpd += topleft[by * src_linesize] *
>>> +                        uglarmtable[(x - logo_x1) + abs(by - (y - 
>>> logo_y1)) * (logo_w - 1)];
>>> +                    interpd += topleft[by * src_linesize + (logo_w 
>>> - 1)] *
>>> +                        uglarmtable[logo_w - (x - logo_x1) - 1 + 
>>> abs(by - (y - logo_y1)) * (logo_w - 1)];
>>> +                }
>> This one is more tricky to optimize, because of the abs() within the
>> multiplication, but it can be done by splitting the loop in two:
>>
>>     for (by = 1; by < y; by++) {
>>         interpd += ... * yweight[x - logo_x1];
>>         yweight -= logo_w_minus_one;
>>     }
>>     for (; by < logo_h_minus_one; by++) {
>>         interpd += ... * yweight[x - logo_x1];
>>         yweight += logo_w_minus_one;
>>     }
>>     av_assert2(yweight == the_correct_value);
>>
>> In fact, I think even the x loop would be a little more readable if it
>> was split in two like that.
> Sorry, I don't understand. Please give me a complete example that 
> replaces the previous for loop.
>>> +
>>> +                interp = (uint64_t)(interpd /
>>> +                    uglarmweightsum[(x - logo_x1) - 1 + (y - 
>>> logo_y1 - 1) * (logo_w - 2)]);
>> The cast to uint64_t seems suspicious. Can you explain?
>
> Every pixel value of the inner logo area is an integer. Only for the 
> calculation of the weighted average, all values use floats. At the 
> end, it is rounded (truncated) to an int.
>
> Should work - and did work for 17 years...
>
>>
>>> +                *xdst = interp;
>>> +            }
>>> +
>>> +            dst += dst_linesize;
>>> +            src += src_linesize;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +/**
>>> + * Calculate the lookup tables to be used in UGLARM interpolation 
>>> mode.
>>> + *
>>> + * @param *uglarmtable      Pointer to table containing weights for 
>>> each possible
>>> + *                          diagonal distance between a border 
>>> pixel and an inner
>>> + *                          logo pixel.
>>> + * @param *uglarmweightsum  Pointer to a table containing the 
>>> weight sum to divide
>>> + *                          by for each pixel within the logo area.
>>> + * @param sar               The sar to take into account when 
>>> calculating lookup
>>> + *                          tables.
>>> + * @param logo_w            width of the logo
>>> + * @param logo_h            height of the logo
>>> + * @param exponent          exponent used in uglarm interpolation
>>> + */
>>> +static void calc_uglarm_tables(double *uglarmtable, double 
>>> *uglarmweightsum,
>>> +                               AVRational sar, int logo_w, int logo_h,
>>> +                               float exponent)
>>> +{
>>> +    double aspect = (double)sar.num / sar.den;
>> Tiny optimization:
>>
>>     double aspect2 = aspect * aspect;
>>
>> for use later.
>
> I wouldn't consider this an optimization. First, I assume the compiler 
> to only calculate "aspect * aspect" once in that loop (as well as "y * 
> y" which doesn't change in the inner loop). Second, the code with 
> using the additional variable makes the code more complex and third, 
> this loop is only calculated once at startup.
>
> @All: What are your opinions?
>
>>
>>> +                double d = pow(sqrt(x * x * aspect * aspect + y * 
>>> y), exponent);
>>     pow(x * x * aspect2 + y * y, exponent / 2);
>
> Hm. Again, I'm unsure if this "optimization" is worth it. I wouldn't 
> do this normally.
> Reasons: Again, I think the compiler would optimize it. At least the 
> compiled ffmpeg binaries are exactly the same size... And the original 
> code explains better the calculation: Calculate the distance 
> (*Pythagorean theorem, c = sqrt(a² + b²), and then calculate the 
> weighted distance value with the power. Again, this is only run once 
> at startup
> *
>
> *@All: Other opinions?
> *
>
>>
>>> +#define MAX_PLANES 10
>> You could use FF_ARRAY_ELEMS(AVPixFmtDescriptor.comp), and the
>> consistency would be guaranteed. Well, that syntax is not valid, but it
>> can be expressed:
>>
>> #define MAX_PLANES FF_ARRAY_ELEMS(((AVPixFmtDescriptor *)0)->comp)
>>
>> But I have a better suggestion:
>>
>> #define MAX_SUB 2
>>
>>     double uglarmtable[MAX_SUB + 1][MAX_SUB + 1]
>>
>> and use uglarmtable[hsub][vsub]. That way, for YUVA420P for example, the
>> table will be computed only once for Y and A and once for U and V.
>>
>> The assert is still needed with that solution, though.
>
> I don't understand this. Please provide a complete example, or it 
> stays as it is. - and could of course be optimized later.
>
>>> +    if (s->mode == MODE_UGLARM) {
>> No need to test, we can free anyway.
>
> But why should the for loop run in xy mode and check "planes count" 
> times to free the memory? The code without the "if" also looks to me 
> more like this is not checked by mistake.
>
> @All: Opinions?
>
>
>> [Carl Eugen] While there, please don't use sizeof(type) but 
>> sizeof(variable),
> same below.
>
> Hope that the code is correct like this?
>
> s->uglarmtable[plane] =
>                 av_malloc_array((logo_w - 1) * (logo_h - 1), 
> sizeof(s->uglarmtable[plane]));
>
> Regards,
> Uwe
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel