Why shuffle bytes vpshufb in avx512 needs to compute index index[5:0] := b[i+3:i] + (j & 0x30), I don't understand the function of j & 0x30 - intel

Why does the vpshufb byte shuffle in avx512 need to compute index as index[5:0] := b[i+3:i] + (j & 0x30)?
The intrinsics guide pseudocode for _mm512_shuffle_epi8(a,b) (no masking) is:
FOR j := 0 to 63
i := j*8
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[5:0] := b[i+3:i] + (j & 0x30)
dst[i+7:i] := a[index*8+7:index*8]
FI
ENDFOR
dst[MAX:512] := 0
I don't understand the function of j & 0x30 and what's the meaning of it.
Because vpshufb in avx2 doesn't have (j & 0x30), it computes index by index[3:0] := b[i+3:i] and index[3:0] := b[128+i+3:128+i]. Docs for _mm256_shuffle_epi8(a,b):
FOR j := 0 to 15
i := j*8
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[3:0] := b[i+3:i]
dst[i+7:i] := a[index*8+7:index*8]
FI
IF b[128+i+7] == 1
dst[128+i+7:128+i] := 0
ELSE
index[3:0] := b[128+i+3:128+i]
dst[128+i+7:128+i] := a[128+index*8+7:128+index*8]
FI
ENDFOR
dst[MAX:256] := 0

It's 4 separate 16-element shuffles using 4-bit indices, not like vpermb. It's adding j&0x30 to make a 6-bit index into the right 128-bit lane of the source vector, since j loops over all 64 destination bytes. j & 0xf0 clears the low nibble, the offset-within-lane bits, giving you the byte offset of the start of the lane. Since j only actually goes up to 64, they chose to do j & 0x30.
The 256-bit pseudocode unrolls the loop, doing the low byte of the low then high lane, etc.
Intel's asm manual (https://felixcloutier.com/x86/pshufb) is similar, also using a 0x10 or 0x30 mask for a j that loops over output bytes. It's weird that the intrinsics guide uses that bit-slicing notation when they don't have to; the index is just an internal temporary, and not part of a larger value; the asm manual just uses index := src.byte[ j ];.
(KL, VL) = (16, 128), (32, 256), (64, 512)
jmask := (KL-1) & ~0xF
// 0x00, 0x10, 0x30 depending on the VL
FOR j = 0 TO KL-1
// dest
IF kl[ i ] or no_masking
index := src.byte[ j ];
IF index & 0x80
Dest.byte[ j ] := 0;
ELSE
index := (index & 0xF) + (j & jmask);
// 16-element in-lane lookup
Dest.byte[ j ] := src.byte[ index ];
ELSE if zeroing
Dest.byte[ j ] := 0;
DEST[MAXVL-1:VL] := 0;
That's weird, it seems like both inputs are called src! vpshufb dst, src1, src2 uses the 2nd input vector as the index, the first as the table. Wouldn't be the first or last time there's a mistake in Intel's manual :/
Anyway, the advantage of the asm manual is that the Description section explains what it does in plain English, which helps a lot in thinking through the pseudocode.

Related

How to bruteforce a lossy AND routine?

Im wondering whether there are any standard approaches to reversing AND routines by brute force.
For example I have the following transformation:
MOV(eax, 0x5b3e0be0) <- Here we move 0x5b3e0be0 to EDX.
MOV(edx, eax) # Here we copy 0x5b3e0be0 to EAX as well.
SHL(edx, 0x7) # Bitshift 0x5b3e0be0 with 0x7 which results in 0x9f05f000
AND(edx, 0x9d2c5680) # AND 0x9f05f000 with 0x9d2c5680 which results in 0x9d045000
XOR(edx, eax) # XOR 0x9d045000 with original value 0x5b3e0be0 which results in 0xc63a5be0
My question is how to brute force and reverse this routine (i.e. transform 0xc63a5be0 back into 0x5b3e0be0)
One idea i had (which didn't work) was this using PeachPy implementation:
#Input values
MOV(esi, 0xffffffff) < Initial value to AND with, which will be decreased by 1 in a loop.
MOV(cl, 0x1) < Initial value to SHR with which will be increased by 1 until 0x1f.
MOV(eax, 0xc63a5be0) < Target result which I'm looking to get using the below loop.
MOV(edx, 0x5b3e0be0) < Input value which will be transformed.
sub_esi = peachpy.x86_64.Label()
with loop:
#End the loop if ESI = 0x0
TEST(esi, esi)
JZ(loop.end)
#Test the routine and check if it matches end result.
MOV(ebx, eax)
SHR(ebx, cl)
TEST(ebx, ebx)
JZ(sub_esi)
AND(ebx, esi)
XOR(ebx, eax)
CMP(ebx, edx)
JZ(loop.end)
#Add to the CL register which is used for SHR.
#Also check if we've reached the last potential value of CL which is 0x1f
ADD(cl, 0x1)
CMP(cl, 0x1f)
JNZ(loop.begin)
#Decrement ESI by 1, reset CL and restart routine.
peachpy.x86_64.LABEL(sub_esi)
SUB(esi, 0x1)
MOV(cl, 0x1)
JMP(loop.begin)
#The ESI result here will either be 0x0 or a valid value to AND with and get the necessary result.
RETURN(esi)
Maybe an article or a book you can recommend specific to this?
It's not lossy, the final operation is an XOR.
The whole routine can be modeled in C as
#define K 0x9d2c5680
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
Now, if we have two bits x and y and the operation x XOR y, when y is zero the result is x.
So given two numbers n1 and n2 and considering their XOR, the bits or n1 that pairs with a zero in n2 would make it to the result unchanged (the others will be flipped).
So in considering num ^ ( (num << 7) & K) we can identify num with n1 and (num << 7) & K with n2.
Since n2 is an AND, we can tell that it must have at least the same zero bits that K has.
This means that each bit of num that corresponds to a zero bit in the constant K will make it unchanged into the result.
Thus, by extracting those bits from the result we already have a partial inverse function:
/*hash & ~K extracts the bits of hash that pair with a zero bit in K*/
partial_num = hash & ~K
Technically, the factor num << 7 would also introduce other zeros in the result of the AND. We know for sure that the lowest 7 bits must be zero.
However K already has the lowest 7 bits zero, so we cannot exploit this information.
So we will just use K here, but if its value were different you'd need to consider the AND (which, in practice, means to zero the lower 7 bits of K).
This leaves us with 13 bits unknown (the ones corresponding to the bits that are set in K).
If we forget about the AND for a moment, we would have x ^ (x << 7) meaning that
hi = numi for i from 0 to 6 inclusive
hi = numi ^ numi-7 for i from 7 to 31 inclusive
(The first line is due to the fact that the lower 7 bits of the right-hand are zero)
From this, starting from h7 and going up, we can retrive num7 as h7 ^ num0 = h7 ^ h0.
From bit 7 onward, the equality doesn't work and we need to use numk (for the suitable k) but luckily we already have computed its value in a previous step (that's why we start from lower to higher).
What the AND does to this is just restricting the values the index i runs in, specifically only to the bits that are set in K.
So to fill in the thirteen remaining bits one have to do:
part_num7 = h7 ^ part_num0
part_num9 = h9 ^ part_num2
part_num12 = h12 ^ part_num5
...
part_num31 = h31 ^ part_num24
Note that we exploited that fact that part_num0..6 = h0..6.
Here's a C program that inverts the function:
#include <stdio.h>
#include <stdint.h>
#define BIT(i, hash, result) ( (((result >> i) ^ (hash >> (i+7))) & 0x1) << (i+7) )
#define K 0x9d2c5680
uint32_t base_candidate(uint32_t hash)
{
uint32_t result = hash & ~K;
result |= BIT(0, hash, result);
result |= BIT(2, hash, result);
result |= BIT(3, hash, result);
result |= BIT(5, hash, result);
result |= BIT(7, hash, result);
result |= BIT(11, hash, result);
result |= BIT(12, hash, result);
result |= BIT(14, hash, result);
result |= BIT(17, hash, result);
result |= BIT(19, hash, result);
result |= BIT(20, hash, result);
result |= BIT(21, hash, result);
result |= BIT(24, hash, result);
return result;
}
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
int main()
{
uint32_t tester = 0x5b3e0be0;
uint32_t candidate = base_candidate(hash(tester));
printf("candidate: %x, tester %x\n", candidate, tester);
return 0;
}
Since the original question was how to "bruteforce" instead of solve here's something that I eventually came up with which works just as well. Obviously its prone to errors depending on input (might be more than 1 result).
from peachpy import *
from peachpy.x86_64 import *
input = 0xc63a5be0
x = Argument(uint32_t)
with Function("DotProduct", (x,), uint32_t) as asm_function:
LOAD.ARGUMENT(edx, x) # EDX = 1b6fb67c
MOV(esi, 0xffffffff)
with Loop() as loop:
TEST(esi,esi)
JZ(loop.end)
MOV(eax, esi)
SHL(eax, 0x7)
AND(eax, 0x9d2c5680)
XOR(eax, esi)
CMP(eax, edx)
JZ(loop.end)
SUB(esi, 0x1)
JMP(loop.begin)
RETURN(esi)
#Read Assembler Return
abi = peachpy.x86_64.abi.detect()
encoded_function = asm_function.finalize(abi).encode()
python_function = encoded_function.load()
print(hex(python_function(input)))

Is this a bug with UTF conversion in GNAT Ada

I am trying to convert from UTF 16 to UTF 8; this is a test program:
with Ada.Text_IO;
with Ada.Strings.UTF_Encoding.Conversions;
use Ada.Text_IO;
use Ada.Strings.Utf_Encoding.Conversions;
use Ada.Strings.UTF_Encoding;
procedure Main is
Str_8: UTF_8_String := "𝄞";
Str_16: UTF_16_Wide_String := Convert(Str_8);
Str_8_New: UTF_8_String := Convert(Str_16);
begin
if Str_8 = Str_8_New then
Put_Line("OK");
else
Put_Line("Bug");
end if;
end Main;
With latest GNAT community it prints "Bug". Is this a bug in the implementation of UTF conversion functions or am I doing something wrong here?
Edit: For reference, this issue has been accepted as Bug 95953 / Bug 95959.
As shown here, #DeeDee has identified a bug in the implementation of Convert for UTF_16 to UTF_8. The problem arises in byte three of the four byte value for code points in the range U+10000 to U+10FFFF, shown here. The source documents the relevant bit fields:
-- Codes in the range 16#10000# - 16#10FFFF#
-- UTF-16: 110110zzzzyyyyyy 110111yyxxxxxxxx
-- UTF-8: 11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
-- Note: zzzzz in the output is input zzzz + 1
Byte three is constructed as follows:
Result (Len + 3) :=
Character'Val
(2#10_000000# or Shift_Left (yyyyyyyy and 2#1111#, 4)
or Shift_Right (xxxxxxxx, 6));
While the low four bits of yyyyyyyy are used to construct byte three, the value only needs to be shifted two places left to make room for the top two bits of xxxxxxxx. The correct formulation should be this:
Result (Len + 3) :=
Character'Val
(2#10_000000# or Shift_Left (yyyyyyyy and 2#1111#, 2)
or Shift_Right (xxxxxxxx, 6));
For reference, the complete example below recapitulates the original implementation, with enough additions to study the problem in isolation. The output shows the code point, the expected binary representation of the UTF-8 encoding, the conversion to UTF-16, the incorrect UTF-8 conversion, and the correct UTF-8 conversion.
Codepoint: 16#1D11E#
UTF-8: 4: 2#11110000# 2#10011101# 2#10000100# 2#10011110#
UTF-16: 2: 2#1101100000110100# 2#1101110100011110#
UTF-8: 4: 2#11110000# 2#10011101# 2#10010000# 2#10011110#
UTF-8: 4: 2#11110000# 2#10011101# 2#10000100# 2#10011110#
OK
Code:
-- https://stackoverflow.com/q/62564638/230513
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with Ada.Strings.UTF_Encoding; use Ada.Strings.UTF_Encoding;
with Ada.Strings.UTF_Encoding.Conversions;
use Ada.Strings.UTF_Encoding.Conversions;
with Ada.Strings.UTF_Encoding.Wide_Wide_Strings;
use Ada.Strings.UTF_Encoding.Wide_Wide_Strings;
with Interfaces; use Interfaces;
with Unchecked_Conversion;
procedure UTFTest is
-- http://www.fileformat.info/info/unicode/char/1d11e/index.htm
Clef : constant Wide_Wide_String :=
(1 => Wide_Wide_Character'Val (16#1D11E#));
Str_8 : constant UTF_8_String := Encode (Clef);
Str_16 : constant UTF_16_Wide_String := Convert (Str_8);
Str_8_New : constant UTF_8_String := Convert (Str_16);
My_Str_8 : UTF_8_String := Convert (Str_16);
function To_Unsigned_16 is new Unchecked_Conversion (Wide_Character,
Interfaces.Unsigned_16);
procedure Raise_Encoding_Error (Index : Natural) is
Val : constant String := Index'Img;
begin
raise Encoding_Error
with "bad input at Item (" & Val (Val'First + 1 .. Val'Last) & ')';
end Raise_Encoding_Error;
function My_Convert (Item : UTF_16_Wide_String;
Output_BOM : Boolean := False) return UTF_8_String
is
Result : UTF_8_String (1 .. 3 * Item'Length + 3);
-- Worst case is 3 output codes for each input code + BOM space
Len : Natural;
-- Number of result codes stored
Iptr : Natural;
-- Pointer to next input character
C1, C2 : Unsigned_16;
zzzzz : Unsigned_16;
yyyyyyyy : Unsigned_16;
xxxxxxxx : Unsigned_16;
-- Components of double length case
begin
Iptr := Item'First;
-- Skip BOM at start of input
if Item'Length > 0 and then Item (Iptr) = BOM_16 (1) then
Iptr := Iptr + 1;
end if;
-- Generate output BOM if required
if Output_BOM then
Result (1 .. 3) := BOM_8;
Len := 3;
else
Len := 0;
end if;
-- Loop through input
while Iptr <= Item'Last loop
C1 := To_Unsigned_16 (Item (Iptr));
Iptr := Iptr + 1;
-- Codes in the range 16#0000# - 16#007F#
-- UTF-16: 000000000xxxxxxx
-- UTF-8: 0xxxxxxx
if C1 <= 16#007F# then
Result (Len + 1) := Character'Val (C1);
Len := Len + 1;
-- Codes in the range 16#80# - 16#7FF#
-- UTF-16: 00000yyyxxxxxxxx
-- UTF-8: 110yyyxx 10xxxxxx
elsif C1 <= 16#07FF# then
Result (Len + 1) :=
Character'Val (2#110_00000# or Shift_Right (C1, 6));
Result (Len + 2) :=
Character'Val (2#10_000000# or (C1 and 2#00_111111#));
Len := Len + 2;
-- Codes in the range 16#800# - 16#D7FF# or 16#E000# - 16#FFFF#
-- UTF-16: yyyyyyyyxxxxxxxx
-- UTF-8: 1110yyyy 10yyyyxx 10xxxxxx
elsif C1 <= 16#D7FF# or else C1 >= 16#E000# then
Result (Len + 1) :=
Character'Val (2#1110_0000# or Shift_Right (C1, 12));
Result (Len + 2) :=
Character'Val
(2#10_000000# or (Shift_Right (C1, 6) and 2#00_111111#));
Result (Len + 3) :=
Character'Val (2#10_000000# or (C1 and 2#00_111111#));
Len := Len + 3;
-- Codes in the range 16#10000# - 16#10FFFF#
-- UTF-16: 110110zzzzyyyyyy 110111yyxxxxxxxx
-- UTF-8: 11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
-- Note: zzzzz in the output is input zzzz + 1
elsif C1 <= 2#110110_11_11111111# then
if Iptr > Item'Last then
Raise_Encoding_Error (Iptr - 1);
else
C2 := To_Unsigned_16 (Item (Iptr));
Iptr := Iptr + 1;
end if;
if (C2 and 2#111111_00_00000000#) /= 2#110111_00_00000000# then
Raise_Encoding_Error (Iptr - 1);
end if;
zzzzz := (Shift_Right (C1, 6) and 2#1111#) + 1;
yyyyyyyy :=
((Shift_Left (C1, 2) and 2#111111_00#) or
(Shift_Right (C2, 8) and 2#000000_11#));
xxxxxxxx := C2 and 2#11111111#;
Result (Len + 1) :=
Character'Val (2#11110_000# or (Shift_Right (zzzzz, 2)));
Result (Len + 2) :=
Character'Val
(2#10_000000# or Shift_Left (zzzzz and 2#11#, 4) or
Shift_Right (yyyyyyyy, 4));
Result (Len + 3) :=
Character'Val
(2#10_000000# or Shift_Left (yyyyyyyy and 2#1111#, 2) or
Shift_Right (xxxxxxxx, 6));
Result (Len + 4) :=
Character'Val (2#10_000000# or (xxxxxxxx and 2#00_111111#));
Len := Len + 4;
-- Error if input in 16#DC00# - 16#DFFF# (2nd surrogate with no 1st)
else
Raise_Encoding_Error (Iptr - 2);
end if;
end loop;
return Result (1 .. Len);
end My_Convert;
procedure Show (S : String) is
begin
Put(" UTF-8: ");
Put (S'Length, 1);
Put (":");
for C of S loop
Put (Character'Pos (C), 12, 2);
end loop;
New_Line;
end Show;
procedure Show (S : Wide_String) is
begin
Put("UTF-16: ");
Put (S'Length, 1);
Put (":");
for C of S loop
Put (Wide_Character'Pos (C), 20, 2);
end loop;
New_Line;
end Show;
begin
Put ("Codepoint:");
Put (Wide_Wide_Character'Pos (Clef (1)), 10, 16);
New_Line;
Show (Str_8);
Show (Str_16);
Show (Str_8_New);
My_Str_8 := My_Convert (Str_16);
Show (My_Str_8);
if Str_8 = My_Str_8 then
Put_Line ("OK");
else
Put_Line ("Bug");
end if;
end UTFTest;
See also Bug 95953 / Bug 95959.
There's a mismatch between the 3rd byte of Str_8 and Str_8_New which causes the round-trip to fail. This seems a bug.
main.adb
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with Ada.Strings.UTF_Encoding.Conversions;
use Ada.Strings.UTF_Encoding;
use Ada.Strings.UTF_Encoding.Conversions;
procedure Main is
-- UTF8 encoded Clef (U+1D11E)
-- (e.g.) https://unicode-table.com/en/1D11E/
Str_8 : constant UTF_8_String :=
Character'Val (16#F0#) &
Character'Val (16#9D#) &
Character'Val (16#84#) &
Character'Val (16#9E#);
Str_16 : constant UTF_16_Wide_String := Convert (Str_8);
Str_8_New : constant UTF_8_String := Convert (Str_16);
begin
for I in Str_8'Range loop
Put (Character'Pos (Str_8 (I)), 7, 16);
end loop;
New_Line (2);
for I in Str_16'Range loop
Put (Wide_Character'Pos (Str_16 (I)), 9, 16);
end loop;
New_Line (2);
for I in Str_8_New'Range loop
Put (Character'Pos (Str_8_New (I)), 7, 16);
end loop;
New_Line (2);
end Main;
output
$ ./main
16#F0# 16#9D# 16#84# 16#9E#
16#D834# 16#DD1E#
16#F0# 16#9D# 16#90# 16#9E#

Pascal Quicksort - counting the total number of recursions

I am stuck on my quicksort program. I need to count the total number of times the sorting function calls itself.
procedure quick (first, last, counter: integer);
var i, k, x : integer;
begin
i := first;
k := last;
x := a[(i+k) div 2];
counter := counter + 1;
while i<=k do begin
while a[i] < x do
i:= i+1;
while a[k] > x do
k:= k-1;
if i<=k then begin
prohod(i,k);
i:=i+1;
k:=k-1;
end;
end;
if first<k then quick(first,k, counter);
if i<last then quick(i,last, counter);
P:= P + counter;
end;
I tried this, where P is global variable and counter is recursive variable, first called as 1 ( quick(1, n, 1) ) . Sadly did not work. I also set P:= 0; right before I called the sorting (quick) procedure (I am not quite sure if it is correct way to approach the problem but it's all I could come up with).
Any ideas how to make this correctly and / or why my counter is not working?
This looks overly complex. If you just remove the counter, initialize P with the value -1 (instead of 0) and replace P := P + counter with P := P + 1, it should work.

How to interpret a binary integer as ternary (base 3)?

My CPU register contains a binary integer 0101, equal to the decimal number 5:
0101 ( 4 + 1 = 5 )
I want the register to contain instead the binary integer equal to decimal 10, as if the original binary number 0101 were ternary (base 3) and every digit happens to be either 0 or 1:
0101 ( 9 + 1 = 10 )
How can i do this on a contemporary CPU or GPU with 1. the fewest memory reads and 2. the fewest hardware instructions?
Use an accumulator. C-ish Pseudocode:
var accumulator = 0
foreach digit in string
accumulator = accumulator * 3 + (digit - '0')
return accumulator
To speed up the multiply by 3, you might use ((accumulator << 1) + accumulator), but a good compiler will be able to do that for you.
If a large percentage of your numbers are within a relatively small range, you can also pregenerate a lookup table to make the transformation from base2 to base3 instantaneous (using the base2 value as the index). You can also use the lookup table to accelerate lookup of the first N digits, so you only pay for the conversion of the remaining digits.
This C program will do it:
#include <stdio.h>
main()
{
int binary = 5000; //Example
int ternary = 0;
int po3 = 1;
do
{
ternary += (binary & 1) * po3;
po3 *= 3;
}
while (binary >>= 1 != 0);
printf("%d\n",ternary);
}
The loop compiles into this machine code on my 32-bit Intel machine:
do
{
ternary += (binary & 1) * po3;
0041BB33 mov eax,dword ptr [binary]
0041BB36 and eax,1
0041BB39 imul eax,dword ptr [po3]
0041BB3D add eax,dword ptr [ternary]
0041BB40 mov dword ptr [ternary],eax
po3 *= 3;
0041BB43 mov eax,dword ptr [po3]
0041BB46 imul eax,eax,3
0041BB49 mov dword ptr [po3],eax
}
while (binary >>= 1 != 0);
0041BB4C mov eax,dword ptr [binary]
0041BB4F sar eax,1
0041BB51 mov dword ptr [binary],eax
0041BB54 jne main+33h (41BB33h)
For the example value (decimal 5000 = binary 1001110001000), the ternary value it produces is 559899.

Generate PCR from PTS

I am trying to create PCR from PTS as follows.
S64 nPcr = nPts * 9 / 100;
pTsBuf[4] = 7 + nStuffyingBytes;
pTsBuf[5] = 0x10; /* flags */
pTsBuf[6] = ( nPcr >> 25 )&0xff;
pTsBuf[7] = ( nPcr >> 17 )&0xff;
pTsBuf[8] = ( nPcr >> 9 )&0xff;
pTsBuf[9] = ( nPcr >> 1 )&0xff;
pTsBuf[10]= ( nPcr << 7 )&0x80;
pTsBuf[11]= 0;
But the problem is VLC is playing only first frame and not playing any other frames.
and I am getting the warning "early picture skipped".
Could any one help me in converting from PTS to PCR..
First, the PCR has 33+9 bits, the PTS 33 bits. The 33 bit-portion (called PCR_base) runs at 90kHz, as does the PTS. The remaining 9 bits are called PCR_ext and run at 27MHz.
Thus, this is how you could calculate the PCR:
S64 nPcr = (S64)nPts << 9;
Note that there should be a time-offset between the PTSs of the multiplexed streams and the PCR, it's usually in the range of a few hundred ms, depending on the stream.
The respective decoder needs some time to decode the data and get it ready for presentation at the time given by the respective PTS, that's why the PTSs are always "ahead" of the PCR. ISO-13818 and some DVB specs give specifics about buffering and (de)multiplexing.
About your bitshifting I'm not sure, this is a code snippet of mine. The comment may help in shifting the bits to the right place, R stands for reserved.
data[4] = 7;
data[5] = 1 << 4; // PCR_flag
// pcr has 33+9=42 bits
// 4 3 2 1 0
// 76543210 98765432 10987654 32109876 54321098 76543210
// xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xRRRRRRx xxxxxxxx
// 10987654 32109876 54321098 76543210 9 8 76543210
// 4 3 2 1 0
// b6 b7 b8 b9 b10 b11
data[ 6] = (pcr >> 34) & 0xff;
data[ 7] = (pcr >> 26) & 0xff;
data[ 8] = (pcr >> 18) & 0xff;
data[ 9] = (pcr >> 10) & 0xff;
data[10] = 0x7e | ((pcr & (1 << 9)) >> 2) | ((pcr & (1 << 8)) >> 8);
data[11] = pcr & 0xff;
The answer of #schieferstapel is correct. I am only adding one more note here which refers to an exception.
There are times when B frames arrives after (who's PTS is less than) P frames. so PTS can be non-linear if every picture is stamped with PTS value. Whereas, PCR must be incrementally linear.
So in the above situation, you must try to either omit B frames or make relevant calculation when putting PCR values. Also, if this is hardware playouts, it is advisable that PCR should be slightly ahead (lesser by 400 ms or so) than PTS of corresponding I frames.
the PCR contains 33(PCR_Base)+6(PCR_const)+9(PCR_Ext) number of bits and also it states that the first 33 bits are based on a 90 kHz clock while the last 9 are based on a 27 MHz clock.PCR_const = 0x3F PCR_Ext=0 PCR_Base=pts/dts
Below code is easy understand.
PCR_Ext = 0;
PCR_Const = 0x3F;
int64_t pcrv = PCR_Ext & 0x1ff;
pcrv |= (PCR_Const << 9) & 0x7E00;
pcrv |= (PCR_Base << 15) & 0xFFFFFFFF8000LL;
pp = (char*)&pcrv;
data[ 6] = pp[5];
data[ 7] = pp[4];
data[ 8] = pp[3];
data[ 9] = pp[2];
data[10] = pp[1];
data[11] = pp[0];

Resources