[FFmpeg-devel] [PATCH] lavc/lpc: R-V V apply_welch_window

Rémi Denis-Courmont remi at remlab.net
Mon Dec 11 11:50:53 EET 2023



Le 11 décembre 2023 11:11:28 GMT+02:00, Anton Khirnov <anton at khirnov.net> a écrit :
>Quoting Rémi Denis-Courmont (2023-12-08 18:46:51)
>> +#if __riscv_xlen >= 64
>> +func ff_lpc_apply_welch_window_rvv, zve64d
>> +        vsetvli t0, zero, e64, m8, ta, ma
>> +        vid.v   v0
>> +        addi    t2, a1, -1
>> +        vfcvt.f.xu.v v0, v0
>> +        li      t3, 2
>> +        fcvt.d.l ft2, t2
>> +        srai    t1, a1, 1
>> +        fcvt.d.l ft3, t3
>> +        li      t4, 1
>> +        fdiv.d  ft0, ft3, ft2    # ft0 = c = 2. / (len - 1)
>> +        fcvt.d.l fa1, t4         # fa1 = 1.
>> +        fsub.d  ft1, ft0, fa1
>> +        vfrsub.vf v0, v0, ft1    # v0[i] = c - i - 1.
>> +1:
>> +        vsetvli t0, t1, e64, m8, ta, ma
>> +        vfmul.vv v16, v0, v0  # no fused multipy-add as v0 is reused
>> +        sub     t1, t1, t0
>> +        vle32.v v8, (a0)
>> +        fcvt.d.l ft2, t0
>> +        vfrsub.vf v16, v16, fa1  # v16 = 1. - w * w
>> +        sh2add  a0, t0, a0
>> +        vsetvli zero, zero, e32, m4, ta, ma
>> +        vfwcvt.f.x.v v24, v8
>> +        vsetvli zero, zero, e64, m8, ta, ma
>> +        vfsub.vf v0, v0, ft2     # v0 -= vl
>> +        vfmul.vv v8, v24, v16
>> +        vse64.v v8, (a2)
>> +        sh3add  a2, t0, a2
>> +        bnez    t1, 1b
>> +
>> +        andi    t1, a1, 1
>> +        beqz    t1, 2f
>> +
>> +        sd      zero, (a2)
>> +        addi    a0, a0, 4
>> +        addi    a2, a2, 8
>> +2:
>> +        vsetvli t0, zero, e64, m8, ta, ma
>> +        vid.v   v0
>> +        srai    t1, a1, 1
>> +        vfcvt.f.xu.v v0, v0
>> +        fcvt.d.l ft1, t1
>> +        fsub.d  ft1, ft0, ft1    # ft1 = c - (len / 2)
>> +        vfadd.vf v0, v0, ft1     # v0[i] = c - (len / 2) + i
>> +3:
>> +        vsetvli t0, t1, e64, m8, ta, ma
>> +        vfmul.vv v16, v0, v0
>> +        sub     t1, t1, t0
>> +        vle32.v v8, (a0)
>> +        fcvt.d.l ft2, t0
>> +        vfrsub.vf v16, v16, fa1  # v16 = 1. - w * w
>> +        sh2add  a0, t0, a0
>> +        vsetvli zero, zero, e32, m4, ta, ma
>> +        vfwcvt.f.x.v v24, v8
>> +        vsetvli zero, zero, e64, m8, ta, ma
>> +        vfadd.vf v0, v0, ft2     # v0 += vl
>> +        vfmul.vv v8, v24, v16
>> +        vse64.v v8, (a2)
>> +        sh3add  a2, t0, a2
>> +        bnez    t1, 3b
>
>I think it'd look a lot less like base64 < /dev/random if you vertically
>aligned the first operands.

They are aligned to the 17th column. Problem is that quite a few vector mnemonics are larger than 7 characters.

>
>-- 
>Anton Khirnov
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel at ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list