[FFmpeg-devel] [PATCH] h264: hl_decode_mb_internal optimisation

Paul Kendall paul
Thu Jul 31 08:10:37 CEST 2008


On Thursday 31 July 2008 01:09:38 Michael Niedermayer wrote:
> On Mon, Jul 28, 2008 at 05:51:47PM +1200, Paul Kendall wrote:
> > In the hl_decode_mb_internal function, the access pattern for h->mb is
> > seqential, so we can simplify the code to get the value into a local and
> > use *mb++ pattern.
> >
> > Also, we can simplify the copy destination logic by calculating values at
> > various parts of the loop rather then for each store action.
>
> PCM macroblocks should occur VERY rarely thus i dont think this will have
> any effect on speed not even if the code would be 10 times faster.
> If not speed then its just readability and i think old and new variants
> are rather similar readability wise.
>
> [...]

I see you have committed a change for PCM stuff in revision 14476.
I think you need to revisit it! The memcpy in the first part of the commit is 
cannot work as h->mb is of type DCTELEM[] (short) and desty is uint8_t!

The same goes for the final section. Also, the other sections are not copying 
the data in the same order as was there previously! I copied the code there 
to a small C file with then printed the indexes from the calculations and it 
is certainly not sequential!

This is the order I got from printing the index calculations from the loops.

// Luma IPCM level macroblock indexes
0,   1,   2,   3,  16,  17,  18,  19,  64,  65,  66,  67,  80,  81,  82,  83,
4,   5,   6,   7,  20,  21,  22,  23,  68,  69,  70,  71,  84,  85,  86,  87,
8,   9,  10,  11,  24,  25,  26,  27,  72,  73,  74,  75,  88,  89,  90,  91,
12,  13,  14,  15,  28,  29,  30,  31,  76,  77,  78,  79,  92,  93,  94,  95,
32,  33,  34,  35,  48,  49,  50,  51,  96,  97,  98,  99, 112, 113, 114, 115,
36,  37,  38,  39,  52,  53,  54,  55, 100, 101, 102, 103, 116, 117, 118, 119,
40,  41,  42,  43,  56,  57,  58,  59, 104, 105, 106, 107, 120, 121, 122, 123,
44,  45,  46,  47,  60,  61,  62,  63, 108, 109, 110, 111, 124, 125, 126, 127,
128, 129, 130, 131, 144, 145, 146, 147, 192, 193, 194, 195, 208, 209, 210, 
211,
132, 133, 134, 135, 148, 149, 150, 151, 196, 197, 198, 199, 212, 213, 214, 
215,
136, 137, 138, 139, 152, 153, 154, 155, 200, 201, 202, 203, 216, 217, 218, 
219,
140, 141, 142, 143, 156, 157, 158, 159, 204, 205, 206, 207, 220, 221, 222, 
223,
160, 161, 162, 163, 176, 177, 178, 179, 224, 225, 226, 227, 240, 241, 242, 
243,
164, 165, 166, 167, 180, 181, 182, 183, 228, 229, 230, 231, 244, 245, 246, 
247,
168, 169, 170, 171, 184, 185, 186, 187, 232, 233, 234, 235, 248, 249, 250, 
251,
172, 173, 174, 175, 188, 189, 190, 191, 236, 237, 238, 239, 252, 253, 254, 
255,
// Chroma U IPCM level macroblock indexes
256, 257, 258, 259, 272, 273, 274, 275,
260, 261, 262, 263, 276, 277, 278, 279,
264, 265, 266, 267, 280, 281, 282, 283,
268, 269, 270, 271, 284, 285, 286, 287,
288, 289, 290, 291, 304, 305, 306, 307,
292, 293, 294, 295, 308, 309, 310, 311,
296, 297, 298, 299, 312, 313, 314, 315,
// Chroma V IPCM level macroblock indexes
300, 301, 302, 303, 316, 317, 318, 319,
320, 321, 322, 323, 336, 337, 338, 339,
324, 325, 326, 327, 340, 341, 342, 343,
328, 329, 330, 331, 344, 345, 346, 347,
332, 333, 334, 335, 348, 349, 350, 351,
352, 353, 354, 355, 368, 369, 370, 371,
356, 357, 358, 359, 372, 373, 374, 375,
360, 361, 362, 363, 376, 377, 378, 379,
364, 365, 366, 367, 380, 381, 382, 383

I have a patch in progress for these which I can send through if you like, but 
as you said before, these blocks are very rare.

Cheers,
Paul




More information about the ffmpeg-devel mailing list