[FFmpeg-devel] [PATCH] Faster CABAC H.264 residual decoding
Sun Apr 27 13:38:47 CEST 2008
On Sunday 27 April 2008, M?ns Rullg?rd wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > On Sunday 27 April 2008, M?ns Rullg?rd wrote:
> >> matthieu castet <castet.matthieu at free.fr> writes:
> >> > Jason Garrett-Glaser wrote:
> >> >> On the advice of #ffmpeg-devel I have made a version with uint8_t
> >> >> arrays instead of int.
> >> >
> >> > Don't forget that some cpu (arm for example) don't have native 8 bits
> >> > operation. Everything is done in 32 bits, and 8 bits behavior is
> >> > emulated with extra operation.
> >> ARM has byte load and store instructions. All ALU operations are
> >> 32-bit, except for certain multiplies. I doubt this is a problem
> >> here.
> >> The only recent CPU I know of that lacks byte load/store is the first
> >> generation of the Alpha.
> > Probably he just wanted to say that reading bytes has higher latency
> > (+1 cycle extra) than reading ints on at least some ARM cores (ARM9).
> Where do you find this information? The ARM926 data sheet only
> mentions the 1-cycle penalty for shifted offsets.
In DDI0222B_9EJS_r1p2.pdf, section "8.12.1 Interlocks":
"Unaligned word loads, load byte (LDRB), and load halfword (LDRH) instructions
use the byte rotate unit in the Write stage of the pipeline. This introduces a
two-cycle load-use interlock, that can affect the two instructions immediately
following the load instruction."
> > On the other hand, indexing bytes in array does not require shifted
> > offset (which may also introduce some kind of penalty).
> A left shift by 2 has no penalty on ARMv6.
Yes, I'm well aware of it. And I'm sorry for nitpicking, but you probably
wanted to say ARM11? As there may be other microarchitectures compatible with
ARMv6 ISA (Cortex cores are coming). I used words "some" and "may" on purpose
just because interlock behaviour is different for different cores.
More information about the ffmpeg-devel