[FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h

flow gg hlefthleft at gmail.com
Sat Feb 24 03:07:36 EET 2024


 .ifc \len,4
-        vsetivli        zero, 5, e8, mf2, ta, ma
+        vsetivli        zero, 5, e8, m1, ta, ma
 .elseif \len == 8
         vsetivli        zero, 9, e8, m1, ta, ma
 .else
@@ -112,9 +112,9 @@ endfunc
         vslide1down.vx  v2, \dst, t5

 .ifc \len,4
-        vsetivli        zero, 4, e8, mf4, ta, ma
+        vsetivli        zero, 4, e8, m1, ta, ma
 .elseif \len == 8
-        vsetivli        zero, 8, e8, mf2, ta, ma
+        vsetivli        zero, 8, e8, m1, ta, ma

What are the benefits of not using fractional multipliers here? Making this
change would result in a 10%-20% slowdown.

                                              mf2/4   m1
vp8_put_bilin4_h_rvv_i32:   158.7   193.7
vp8_put_bilin4_hv_rvv_i32:  255.7   302.7
vp8_put_bilin8_h_rvv_i32:   318.7   358.7
vp8_put_bilin8_hv_rvv_i32:  528.7   569.7

Rémi Denis-Courmont <remi at remlab.net> 于2024年2月24日周六 01:18写道:

> Hi,
>
> +
> +.macro bilin_h_load dst len
> +.ifc \len,4
> +        vsetivli        zero, 5, e8, mf2, ta, ma
>
> Don't use fractional multipliers if you don't mix element widths.
>
> +.elseif \len == 8
> +        vsetivli        zero, 9, e8, m1, ta, ma
> +.else
> +        vsetivli        zero, 17, e8, m2, ta, ma
> +.endif
> +
> +        vle8.v          \dst, (a2)
> +        vslide1down.vx  v2, \dst, t5
> +
>
> +.ifc \len,4
> +        vsetivli        zero, 4, e8, mf4, ta, ma
>
> Same as above.
>
> +.elseif \len == 8
> +        vsetivli        zero, 8, e8, mf2, ta, ma
>
> Also.
>
> +.else
> +        vsetivli        zero, 16, e8, m1, ta, ma
> +.endif
>
> +        vwmulu.vx       v28, \dst, t1
> +        vwmaccu.vx      v28, a5, v2
> +        vwaddu.wx       v24, v28, t4
> +        vnsra.wi        \dst, v24, 3
> +.endm
> +
> +.macro put_vp8_bilin_h len
> +        li              t1, 8
> +        li              t4, 4
> +        li              t5, 1
> +        sub             t1, t1, a5
> +1:
> +        addi            a4, a4, -1
> +        bilin_h_load    v0, \len
> +        vse8.v          v0, (a0)
> +        add             a2, a2, a3
> +        add             a0, a0, a1
> +        bnez            a4, 1b
> +
> +        ret
> +.endm
> +
> +func ff_put_vp8_bilin16_h_rvv, zve32x
> +        put_vp8_bilin_h 16
> +endfunc
> +
> +func ff_put_vp8_bilin8_h_rvv, zve32x
> +        put_vp8_bilin_h 8
> +endfunc
> +
> +func ff_put_vp8_bilin4_h_rvv, zve32x
> +        put_vp8_bilin_h 4
> +endfunc
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>


More information about the ffmpeg-devel mailing list