MXU

From GCW Zero Wiki
Jump to: navigation, search

MXU is the name for the XBurst SIMD instructions. SIMD means Single Instruction Multiple Data and is often used to speed up audio/video processing. Examples of SIMD instruction sets for other CPUs are MMX, SSE and AltiVec.

Contents

Instruction Naming

The initial letter indicates the number of elements in the vector(s) operated upon: S(ingle) for 1, D(ual) for 2, Q(uad) for 4. The letter is followed by a number, which denotes the length of the input elements in bits. The number is followed by the name of the operation that will be performed.

Register Naming

There is a dedicated register set for the MXU operations. It contains 17 32-bit registers which will be referred to as xr0..xr16. Registers xr0..xr15 are used in computations, xr16 is a control register. MXU register xr0 always has value 0; writes to it have no effect.

The main MIPS registers will be referred to as r0..r31.

Enabling MXU

Before the MXU can be used, it must be enabled. This is done by setting bit 0 (the lowest bit) of xr16 to 1.

Load and Store Instructions

S32I2M

S32I2M xr, r

Assigns the value of main register r to MXU register xr.

S32M2I

S32M2I xr, r

Assigns the value of MXU register xr to main register r.

S32LDD

S32LDD xr, p, o

Loads the contents of the memory at p + o (pointer + offset) into MXU register xr.

S32LDDV

S32LDDV xr, p, o, s

Loads the contents of the memory at p + o * 2s (pointer + shifted offset) into MXU register xr.

S32LDI

S32LDI xr, p, o

Loads the contents of the memory at p + o (pointer + offset) into MXU register xr. After that, p is incremented by o.

S32LDIV

S32LDIV xr, p, o, s

Loads the contents of the memory at p + o * 2s (pointer + shifted offset) into MXU register xr.

After that, p is incremented by o * 2s.

S32STD

S32STD xr, p, o

Stores the contents of MXU register xr into the memory at p + o (pointer + offset).

S32STDV

S32STDV xr, p, o, s

Stores the contents of MXU register xr into the memory at p + o * 2s (pointer + shifted offset).

S32SDI

S32SDI xr, p, o

Stores the contents of MXU register xr into the memory at p + o (pointer + offset). After that, p is incremented by o.

S32SDIV

S32SDIV xr, p, o, s

Stores the contents of MXU register xr into the memory at p + o * 2s (pointer + shifted offset). After that, p is incremented by o * 2s.

Addition and Subtraction Instructions

D32ADD, Q16ADD

D32ADD xra, xrb, xrc, xrd, addsub

Q16ADD xra, xrb, xrc, xrd, addsub, swizzle

Performs addition and/or subtraction on vectors xrb and xrc and writes the results to vectors xra and xrd.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

addsub = AA: xra := xrb + xrc; xrd := xrb + xrc
addsub = AS: xra := xrb + xrc; xrd := xrb - xrc
addsub = SA: xra := xrb - xrc; xrd := xrb + xrc
addsub = SS: xra := xrb - xrc; xrd := xrb - xrc

When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:

swizzle = WW: xrb.hl (as-is)
swizzle = XW: xrb.lh (exchanged)
swizzle = HW: xrb.hh (clone high)
swizzle = LW: xrb.ll (clone low)

The values read from vector xrc are always used as-is.

D32ACC, Q16ACC

D32ACC xra, xrb, xrc, xrd, addsub

Q16ACC xra, xrb, xrc, xrd, addsub, swizzle

Performs addition and/or subtraction on vectors xrb and xrc and adds the results to vectors xra and xrd.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

mode = AA: xra += xrb + xrc; xrd += xrb + xrc
mode = AS: xra += xrb + xrc; xrd += xrb - xrc
mode = SA: xra += xrb - xrc; xrd += xrb + xrc
mode = SS: xra += xrb - xrc; xrd += xrb - xrc

When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:

swizzle = WW: xrb.hl (as-is)
swizzle = XW: xrb.lh (exchanged)
swizzle = HW: xrb.hh (clone high)
swizzle = LW: xrb.ll (clone low)

The values read from vector xrc are always used as-is.

Q8ADD

Q8ADD xra, xrb, xrc, addsub

Adds or subtracts the four 8-bit values in the vectors xrb and xrc. The four 8-bit results are stored in the vector xra.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

addsub = AA: xra.h := xrb.h + xrc.h; xra.l := xrb.l + xrc.l
addsub = AS: xra.h := xrb.h + xrc.h; xra.l := xrb.l - xrc.l
addsub = SA: xra.h := xrb.h - xrc.h; xra.l := xrb.l + xrc.l
addsub = SS: xra.h := xrb.h - xrc.h; xra.l := xrb.l - xrc.l

Q8ADDE

Q8ADDE xra, xrb, xrc, xrd, addsub

Adds or subtracts the four 8-bit unsigned values in the vectors xrb and xrc. The four 16-bit results are stored in the vectors xra and xrd.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

addsub = AA: xra := xrb.h + xrc.h; xrd := xrb.l + xrc.l
addsub = AS: xra := xrb.h + xrc.h; xrd := xrb.l - xrc.l
addsub = SA: xra := xrb.h - xrc.h; xrd := xrb.l + xrc.l
addsub = SS: xra := xrb.h - xrc.h; xrd := xrb.l - xrc.l

Q8ACCE

Q8ACCE xra, xrb, xrc, xrd, addsub

Adds or subtracts the four 8-bit unsigned values in the vectors xrb and xrc. The four 16-bit results are added to the vectors xra and xrd.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

addsub = AA: xra += xrb.h + xrc.h; xrd += xrb.l + xrc.l
addsub = AS: xra += xrb.h + xrc.h; xrd += xrb.l - xrc.l
addsub = SA: xra += xrb.h - xrc.h; xrd += xrb.l + xrc.l
addsub = SS: xra += xrb.h - xrc.h; xrd += xrb.l - xrc.l

D16AVG, Q8AVG

D16AVG xra, xrb, xrc

Q8AVG xra, xrb, xrc

Computes the average, rounded down, of the unsigned values in vectors xrb and xrc and assigns the result to vector xra.

D16AVGR, Q8AVGR

D16AVGR xra, xrb, xrc

Q8AVGR xra, xrb, xrc

Computes the average, rounded up, of the unsigned values in vectors xrb and xrc and assigns the result to vector xra.

Q8SAD

Q8SAD xra, xrb, xrc, xrd

Computes the absolute difference of the unsigned values in vectors xrb and xrc. The sum of these 4 differences is assigned to the full register xra and added to the full register xrd.

Multiply Instructions

D16MUL, Q8MUL

D16MUL xra, xrb, xrc, xrd, swizzle

Q8MUL xra, xrb, xrc, xrd

Multiplies the signed values in vector xrb by the signed values in vector xrc and assigns the results to vectors xra and xrd.

When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:

swizzle = WW: xrb.hl (as-is)
swizzle = XW: xrb.lh (exchanged)
swizzle = HW: xrb.hh (clone high)
swizzle = LW: xrb.ll (clone low)

The values read from vector xrc are always used as-is.

D16MAC, Q8MAC

D16MAC xra, xrb, xrc, xrd, addsub, swizzle

Q8MAC xra, xrb, xrc, xrd, addsub

Multiplies the signed values in vector xrb by the signed values in vector xrc and adds or subtracts the results to vectors xra and xrd.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

addsub = AA: xra += xrb.h * xrc.h; xrd += xrb.l * xrc.l
addsub = AS: xra += xrb.h * xrc.h; xrd -= xrb.l * xrc.l
addsub = SA: xra -= xrb.h * xrc.h; xrd += xrb.l * xrc.l
addsub = SS: xra -= xrb.h * xrc.h; xrd -= xrb.l * xrc.l

When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:

swizzle = WW: xrb.hl (as-is)
swizzle = XW: xrb.lh (exchanged)
swizzle = HW: xrb.hh (clone high)
swizzle = LW: xrb.ll (clone low)

The values read from vector xrc are always used as-is.

D16MADL, Q8MADL

D16MADL xra, xrb, xrc, xrd, addsub, swizzle

Q8MADL xra, xrb, xrc, xrd, addsub

Multiplies the signed values in vector xrb by the signed values in vector xrc. The results of the multiplication are added or subtracted from the values in vector xra and that final result is written to vector xrd.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

addsub = AA: xrd.h := xra.h + xrb.h * xrc.h; xrd.l := xra.l + xrb.l * xrc.l
addsub = AS: xrd.h := xra.h + xrb.h * xrc.h; xrd.l := xra.l - xrb.l * xrc.l
addsub = SA: xrd.h := xra.h - xrb.h * xrc.h; xrd.l := xra.l + xrb.l * xrc.l
addsub = SS: xrd.h := xra.h - xrb.h * xrc.h; xrd.l := xra.l - xrb.l * xrc.l

When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:

swizzle = WW: xrb.hl (as-is)
swizzle = XW: xrb.lh (exchanged)
swizzle = HW: xrb.hh (clone high)
swizzle = LW: xrb.ll (clone low)

The values read from vector xrc are always used as-is.

D16MULF

D16MULF xra, xrb, xrc, swizzle

Multiplies the signed values in vector xrb by the signed values in vector xrc. The highest 16 bits of the results of the multiplication are written to vector xra. Note that the result of multiplying two 16-bit signed numbers is a 31-bit signed number (bit 30 being the sign bit), so vector xra will contain bits 30..15 of the two multiplication results, not bits 31..16.

It is possible to swizzle the values read from vector xrb as follows:

swizzle = WW: xrb.hl (as-is)
swizzle = XW: xrb.lh (exchanged)
swizzle = HW: xrb.hh (clone high)
swizzle = LW: xrb.ll (clone low)

The values read from vector xrc are always used as-is.

D16MACF

D16MACF xra, xrb, xrc, xrd, addsub, swizzle

Multiplies the signed values in vector xrb by the signed values in vector xrc. These results are doubled to make two 32-bit signed numbers. Those numbers are then added to or subtracted from vector xra and xrd. The upper 16 bits of those numbers, rounded up, are written to vector xra.

Whether the values are added or subtracted is controlled by addsub, as shown in the following table:

addsub = AA: xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16); xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16)
addsub = AS: xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16); xra.l := ceil((xrd - xrb.l * xrc.l * 2) / 2^16)
addsub = SA: xra.h := ceil((xra - xrb.h * xrc.h * 2) / 2^16); xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16)
addsub = SS: xra.h := ceil((xra - xrb.h * xrc.h * 2) / 2^16); xra.l := ceil((xrd - xrb.l * xrc.l * 2) / 2^16)

It is possible to swizzle the values read from vector xrb as follows:

swizzle = WW: xrb.hl (as-is)
swizzle = XW: xrb.lh (exchanged)
swizzle = HW: xrb.hh (clone high)
swizzle = LW: xrb.ll (clone low)

The values read from vector xrc are always used as-is.

S16MAD

S16MAD xra, xrb, xrc, xrd, addsub, select

Multiplies a 16-bit signed value from vector xrb with a 16-bit signed value from vector xrc. The result is added to or subtracted from xra and the final result is written to xrd.

Whether the multiplication result is added or subtracted is controlled by addsub, as shown in the following table:

addsub = A: xrd := xra + x * y
addsub = S: xrd := xra - x * y

Which parts of xrb and xrc are used is controlled by select, as shown in the following table:

select = HH: x := xrb.h; y := xrc.h
select = HL: x := xrb.h; y := xrc.l
select = LH: x := xrb.l; y := xrc.h
select = LL: x := xrb.l; y := xrc.l

Other Math

S32MAX, D16MAX, Q8MAX

S32MAX xra, xrb, xrc

D16MAX xra, xrb, xrc

Q8MAX xra, xrb, xrc

Takes the maximum of the signed values of vector xrb and vector xrc and assigns those to vector xra.

S32MIN, D16MIN, Q8MIN

S32MIN xra, xrb, xrc

D16MIN xra, xrb, xrc

Q8MIN xra, xrb, xrc

Takes the minimum of the signed values of vector xrb and vector xrc and assigns those to vector xra.

Q16SAT

Q16SAT xra, xrb, xrc

Saturate: The values in xrb and xrc are taken as four 16-bit signed integers and clamped to the range [0..255]. The result is written to xra, with from high to low: upper half of xrb, lower half of xrb, upper half of xrc, lower half of xrc.

S32CPS, D16CPS

S32CPS xra, xrb, xrc

D16CPS xra, xrb, xrc

Copy Sign: For each signed value in vector xrc: If it is non-negative signed value, assign the corresponding value from vector xrb, unmodified, to vector xra. Otherwise, assign the corresponding value from vector xrb, negated, to vector xra.

Q8ABD

Q8ABD xra, xrb, xrc

Absolute difference: Computes the absolute value of the difference of the unsigned values in vector xrb and vector xrc and assigns the result to vector xra.

Q8SLT

Q8SLT xra, xrb, xrc

Set on Less Than: Compares the signed values in vector xrb and vector xrc. If the value from xrb is less than the value from xrc, 1 is assigned to the corresponding position in vector xra, otherwise 0 is assigned.

This is a vectorized version of the MIPS instruction SLT.

Shift and Shuffle Instructions

D32SLL

D32SLL xra, xrb, xrc, xrd, S

Shift Logical Left: The value of xrb is shifted S bits to the left and the result is assigned to xra. Also, the value of xrc is shifted S bits to the left and the result is assigned to xrd. S is a constant in the range [0..31].

D32SLLV

D32SLLV xra, xrb, rs

Shift Logical Left: The value of xra is shifted S bits to the left and the result is assigned to xra. Also, the value of xrb is shifted S bits to the left and the result is assigned to xrb. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.

D32SLR

D32SLR xra, xrb, xrc, xrd, S

Shift Logical Right: The unsigned value of xrb is shifted S bits to the right and the result is assigned to xra. Also, the unsigned value of xrc is shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..31].

D32SLRV

D32SLRV xra, xrb, rs

Shift Logical Right: The unsigned value of xra is shifted S bits to the right and the result is assigned to xra. Also, the unsigned value of xrb is shifted S bits to the right and the result is assigned to xrb. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.

D32SAR

D32SAR xra, xrb, xrc, xrd, S

Shift Arithmetic Right: The signed value of xrb is shifted S bits to the right and the result is assigned to xra. Also, the signed value of xrc is shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..31].

D32SARV

D32SARV xra, xrb, rs

Shift Arithmetic Right: The signed value of xra is shifted S bits to the right and the result is assigned to xra. Also, the signed value of xrb is shifted S bits to the right and the result is assigned to xrb. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.

D32SARL

D32SARL xra, xrb, xrc, S

Shift Arithmetic Right: The signed value of xrb is shifted S bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra. Also, the signed value of xrc is shifted S bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra. S is a constant in the range [0..31].

D32SARW

D32SARW xra, xrb, xrc, rs

Shift Arithmetic Right: The signed value of xrb is shifted S bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra. Also, the signed value of xrc is shifted S bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.

Q16SLL

Q16SLL xra, xrb, xrc, xrd, S

Shift Logical Left: The values of the upper and lower halves of xrb are shifted S bits to the left and the result is assigned to xra. Also, the values of the upper and lower halves of xrc are shifted S bits to the left and the result is assigned to xrd. S is a constant in the range [0..15].

Q16SLLV

Q16SLLV xra, xrb, rs

Shift Logical Left: The values of the upper and lower halves of xra are shifted S bits to the left and the result is assigned to xra. Also, the values of the upper and lower halves of xrb are shifted S bits to the left and the result is assigned to xrb. S is [0..15]: the value of the lowest 4 bits of main MIPS register rs.

Q16SLR

Q16SLR xra, xrb, xrc, xrd, S

Shift Logical Right: The unsigned values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xra. Also, the unsigned values of the upper and lower halves of xrc are shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..15].

Q16SLRV

Q16SLRV xra, xrb, rs

Shift Logical Right: The unsigned values of the upper and lower halves of xra are shifted S bits to the right and the result is assigned to xra. Also, the unsigned values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xrb. S is [0..15]: the value of the lowest 4 bits of main MIPS register rs.

Q16SAR

Q16SAR xra, xrb, xrc, xrd, S

Shift Arithmetic Right: The signed values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xra. Also, the signed values of the upper and lower halves of xrc are shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..15].

Q16SARV

Q16SARV xra, xrb, rs

Shift Arithmetic Right: The signed values of the upper and lower halves of xra are shifted S bits to the right and the result is assigned to xra. Also, the signed values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xrb. S is [0..15]: the value of the lowest 4 bits of main MIPS register rs.

S32ALN

S32ALN xra, xrb, xrc, s

Takes the value of xrb:xrc, shifts it s bytes (0..4) to the left and assigns the highest 32 bits of the result to xra. Can be used to realign values that are not aligned in memory.

S32SFL

S32SFL xra, xrb, xrc, xrd, ptn

Shuffles (swizzles) the bytes of xrb and xrc as indicated in the table below and writes the result into xra and xrd.

Input xrb xrc
b3 b2 b1 b0 c3 c2 c1 c0
Output xra xrd
ptn=0 b3 c3 b2 c2 b1 c1 b0 c0
ptn=1 b3 b1 c3 c1 b2 b0 c2 c0
ptn=2 b3 c3 b1 c1 b2 c2 b0 c0
ptn=3 b3 b2 c3 c2 b1 b0 c1 c0

New instructions in JZ4770

The JZ4770 has a quite a few additional MXU instructions. Ingenic writes 3 or 7 to register xr16 to activate these. This may imply that there are two levels of extension between JZ4740 and JZ4770.

Load and store instructions

  • LXB rb, rc, strd2
  • LXBU rb, rc, strd2
  • LXH rb, rc, strd2
  • LXHU rb, rc, strd2
  • LXW rb, rc, strd2
  • S16LDD xra, rb, s10, optn2
  • S16LDI xra, rb, s10, optn2
  • S16SDI xra, rb, s10, optn2
  • S16STD xra, rb, s10, optn2
  • S32LDDR xra, rb, s12
  • S32LDDVR xra, rb, rc, strd2
  • S32LDIR xra, rb, s12
  • S32LDIVR xra, rb, rc, strd2
  • S32SDIR xra, rb, s12
  • S32SDIVR xra, rb, rc, strd2
  • S32STDR xra, rb, s12
  • S32STDVR xra, rb, rc, strd2
  • S8LDD xra, rb, s8, optn3
  • S8LDI xra, rb, s8, optn3
  • S8SDI xra, rb, s8, optn3
  • S8STD xra, rb, s8, optn3

Other math

  • D16MOVN xra, xrb, xrc
  • D16MOVZ xra, xrb, xrc
  • D16SLT xra, xrb, xrc
  • Q16SCOP xra, xrb, xrc, xrd
  • Q8MOVN xra, xrb, xrc
  • Q8MOVZ xra, xrb, xrc
  • Q8SLTU xra, xrb, xrc
  • S32ABS xra, xrb
  • S32ALNI xra, xrb, xrc, optn3
  • S32EXTRV xra, xrd, rs, rt
  • S32EXTR xra, xrd, rs, bits5
  • S32LUI xra, s8, optn2
  • S32MOVN xra, xrb, xrc
  • S32MOVZ xra, xrb, xrc
  • S32SLT xra, xrb, xrc

Addition and subtraction instructions

  • D16ASUM xra, xrb, xrc, xrd
  • D32ACCM xra, xrb, xrc, xrd
  • D32ADDC xra, xrb, xrc
  • D32ASUM xra, xrb, xrc, xrd
  • Q16ACCM xra, xrb, xrc, xrd
  • Q16ASUM xra, xrb, xrc, xrd
  • S32MSUBU xra, xrd, rs, rt
  • S32MSUB xra, xrd, rs, rt

Multiply instructions

  • D16MACE xra, xrb, xrc, xrd
  • D16MULE xra, xrb, xrc, xrd
  • D8SUMC xra, xrb, xrc
  • D8SUM xra, xrb, xrc
  • Q8MULSU xra, xrb, xrc, xrd
  • Q8MACSU xra, xrb, xrc, xrd
  • S32MADDU xra, xrd, rs, rt
  • S32MADD xra, xrd, rs, rt
  • S32MULU xra, xrd, rs, rt
  • S32MUL xra, xrd, rs, rt

Bitwise instructions

  • S32AND xra, xrb, xrc
  • S32NOR xra, xrb, xrc
  • S32OR xra, xrb, xrc
  • S32XOR xra, xrb, xrc