[FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c

Reimar Döffinger Reimar.Doeffinger
Mon Aug 27 22:02:09 CEST 2007


Hello,
On Mon, Aug 27, 2007 at 09:06:45PM +0200, Michael Niedermayer wrote:
> On Mon, Aug 27, 2007 at 03:07:51PM +0200, Reimar D?ffinger wrote:
[...]
> > Well, I do not have too much time, but still here is an attempt to
> > partially fix some things and add some helpful comments.
> > I think inner_add_yblock_bw_8_obmc_16_bh_even_sse2 works now, though
> > though I only get a black-and-white image...
> 
> if you get black and white then it likely does not work or what happens if
> you call the c code instead of it?

Well, obviously it does not work right, and it decodes correctly with
the C code. But it still is progress and I have no idea why it doesn't
work...

> about the comments, well i do not like the way the code looks at all its
> totally unreadable, id rather see the code implemented with less macro
> obfuscation than comments explaining what the macros do and what they
> need in what register

No disagreement here, I just don't feel able to rewrite it from scratch
right now.

> >               "mov %0, %%"REG_d"              \n\t"
> > -             "movdqa (%%"REG_D"), %%xmm0     \n\t"
> > -             "movdqa %%xmm1, %%xmm2          \n\t"
> > +             "movdqu (%%"REG_D"), %%xmm0     \n\t"
> 
> why an unaligned read?

The honest answer: Because it crashes with the aligned one. That seems
to be because src_x is never divisible by 8, typical values are:
src_x: 4
src_x: 12
src_x: 20
src_x: 28
src_x: 36
src_x: 44
src_x: 52
src_x: 60

> > -             "punpckhwd %%xmm7, %%xmm1       \n\t"
> > -             "punpcklwd %%xmm7, %%xmm2       \n\t"
> > -             "paddd %%xmm2, %%xmm0           \n\t"
> > -             "movdqa 16(%%"REG_D"), %%xmm2   \n\t"
> > -             "paddd %%xmm1, %%xmm2           \n\t"
> > +             "paddd %%xmm1, %%xmm0           \n\t"
> 
> no, these are 16 not 32 bit

Obviously.

> also theres some shift by 4 missing here

I don't think so, there is a "psraw $4, %%xmm0               \n\t"
further down. And I know the code is an unreadable mess. I'll try to
reimplement it somewhen if noone else will do it...

Greetings,
Reimar D?ffinger




More information about the ffmpeg-cvslog mailing list