[FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()

Michael Niedermayer michael at niedermayer.cc
Sun Jan 20 22:14:59 EET 2019


On Sat, Jan 19, 2019 at 12:28:25AM +0100, Marton Balint wrote:
> 
> 
> On Thu, 17 Jan 2019, Michael Niedermayer wrote:
> 
> >On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote:
> >>
> >>
> >>On Tue, 15 Jan 2019, Michael Niedermayer wrote:
> >>
> >>>On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote:
> >>>>
> >>>>
> >>>>On Fri, 28 Dec 2018, Michael Niedermayer wrote:
> >>>>
> >>>>>On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote:
> >>>>>>
> >>>>>>
> >>>>>>On Wed, 26 Dec 2018, Paul B Mahol wrote:
> >>>>>>
> >>>>>>>On 12/26/18, Michael Niedermayer <michael at niedermayer.cc> wrote:
> >>>>>>>>On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
> >>>>>>>>>On 12/25/18, Michael Niedermayer <michael at niedermayer.cc> wrote:
> >>>>>>>>>>Fixes: Timeout
> >>>>>>>>>>Fixes:
> >>>>>>>>>>11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>>>>>>>>Before: Executed
> >>>>>>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>>>>>>>>in 11294 ms
> >>>>>>>>>>After : Executed
> >>>>>>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>>>>>>>>in 4249 ms
> >>>>>>>>>>
> >>>>>>>>>>Found-by: continuous fuzzing process
> >>>>>>>>>>https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >>>>>>>>>>Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> >>>>>>>>>>---
> >>>>>>>>>>libavutil/imgutils.c | 6 ++++++
> >>>>>>>>>>1 file changed, 6 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>>diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> >>>>>>>>>>index 4938a7ef67..cc38f1e878 100644
> >>>>>>>>>>--- a/libavutil/imgutils.c
> >>>>>>>>>>+++ b/libavutil/imgutils.c
> >>>>>>>>>>@@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
> >>>>>>>>>>dst_size,
> >>>>>>>>>>uint8_t *clear,
> >>>>>>>>>>       }
> >>>>>>>>>>   } else if (clear_size == 4) {
> >>>>>>>>>>       uint32_t val = AV_RN32(clear);
> >>>>>>>>>>+        uint64_t val8 = val * 0x100000001ULL;
> >>>>>>>>>>+        for (; dst_size >= 32; dst_size -= 32) {
> >>>>>>>>>>+            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> >>>>>>>>>>+            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> >>>>>>>>>>+            dst += 32;
> >>>>>>>>>>+        }
> >>>>>>>>>>       for (; dst_size >= 4; dst_size -= 4) {
> >>>>>>>>>>           AV_WN32(dst, val);
> >>>>>>>>>>           dst += 4;
> >>>>>>>>>>--
> >>>>>>>>>>2.20.1
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>NAK, implement special memset function instead.
> >>>>>>>>
> >>>>>>>>I can move the added loop into a seperate function, if thats what you
> >>>>>>>>suggest ?
> >>>>>>>
> >>>>>>>No, don't do that.
> >>>>>>>
> >>>>>>>>All the code is already in a "special" memset though, this is
> >>>>>>>>memset_bytes()
> >>>>>>>>
> >>>>>>>
> >>>>>>>I guess function is less useful if its static. So any duplicate should
> >>>>>>>be avoided in codebase.
> >>>>>>
> >>>>>>Isn't av_memcpy_backptr does almost exactly what is needed here? That can
> >>>>>>also be optimized further if needed.
> >>>>>
> >>>>>av_memcpy_backptr() copies data with overlap, its more like a recursive
> >>>>>memmove().
> >>>>
> >>>>So? As far as I see the memset_bytes function in imgutils.c can be replaced
> >>>>with this:
> >>>>
> >>>>   if (clear_size > dst_size)
> >>>>       clear_size = dst_size;
> >>>>   memcpy(dst, clear, clear_size);
> >>>>   av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size);
> >>>>
> >>>>I am not against an av_memset_bytes API addition, but I believe it should
> >>>>share code with av_memcpy_backptr to avoid duplication.
> >>>
> >>>ive implemented this, it does not seem to be really faster in the testcase
> >>
> >>I guess it is not faster because you have not applied your original
> >>optimalization to fill32 in libavutil/mem.c. Could you compare speed after
> >>optimizing that the same way your original patch did it with imgutils
> >>memset_bytes?
> >
> >sure, that makes it faster:
> 
> Thanks, both patches LGTM.

will apply

thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Observe your enemies, for they first find out your faults. -- Antisthenes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190120/fc9b122e/attachment.sig>


More information about the ffmpeg-devel mailing list