Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Merge requests
  • !5489

i#5483 add support for avx512 bf16 instructions

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged prasun3 requested to merge i5483-add-support-for-avx512-bf16-instructions into master May 10, 2022
  • Overview 41
  • Commits 16
  • Pipelines 0
  • Changes 11

Added support for AVX512 bfloat16 instructions

These are the three bfloat16 instructions.

VCVTNE2PS2BF16—Convert Two Packed Single Data to One Packed BF16 Data

EVEX.128.F2.0F38.W0 72 /r VCVTNE2PS2BF16 xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F2.0F38.W0 72 /r VCVTNE2PS2BF16 ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F2.0F38.W0 72 /r VCVTNE2PS2BF16 zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst

Op/En   Tuple   Operand 1       Operand 2       Operand 3
A       Full    ModRM:reg (w)   EVEX.vvvv (r)   ModRM:r/m (r)

VCVTNEPS2BF16—Convert Packed Single Data to Packed BF16 Data

EVEX.128.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, xmm2/m128/m32bcst
EVEX.256.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, ymm2/m256/m32bcst
EVEX.512.F3.0F38.W0 72 /r VCVTNEPS2BF16 ymm1{k1}{z}, zmm2/m512/m32bcst

Op/En   Tuple   Operand 1       Operand 2
A       Full    ModRM:reg (w)   ModRM:r/m (r)

VDPBF16PS—Dot Product of BF16 Pairs Accumulated into Packed Single Precision

EVEX.128.F3.0F38.W0 52 /r VDPBF16PS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F3.0F38.W0 52 /r VDPBF16PS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F3.0F38.W0 52 /r VDPBF16PS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst

Op/En   Tuple   Operand 1       Operand 2       Operand 3
A       Full    ModRM:reg (w)   EVEX.vvvv (r)   ModRM:r/m (r)

List of places to update

From https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/x86/opcode_api.h#L53

 * When adding new instructions, be sure to update all of these places:
 *   1) decode_table op_instr array
 *   2) decode_table decoding table entries
 *   3) OP_ enum (here) via x86opnums.pl
 *   4) update OP_LAST at end of enum here
 *   5) decode_fast tables if necessary (they are conservative)
 *   6) instr_create macros
 *   7) suite/tests/api/ir* tests
 *   8) add binutils tests in third_party/binutils/test_decenc

Step 1: update op_instr array

Added entries to op_instr. These point directly to evex_Wb_extensions since these instructions only have evex encoding.

Step 2: add decode_table entries

  • updated third_byte_38 table to point to prefix_extensions since these instructions have common opcodes and differ in prefix.
    • The instructions VCVTNEPS2BF16 and VCVTNE2PS2BF16 have three byte opcodes starting with 0f 38 so the decoder looks at third_byte_38[third_byte_38_index[opcode]]
    • Since these instructions have the same opcode (72) and differ only in the prefix (f2/f3), we need to point the third_byte_38 to prefix_extensions which in turn points to the appropriate EVEX_Wb entries.
    • The instruction VDPBF16PS has the same opcode (52) as the VNNI instruction vpdpwsd and they differ only in the prefix (F3/66). We need to update that entry to point to prefix_extensions instead of e_vex_extensions. This causes the e_vex_extensions entry ( e_vex ext 151) to be orphaned - do we remove this entry?
  • added entries in prefix_extensions to point to appropriate vex/evex entries
  • added leaf entries in evex_Wb_extensions

Updated opcodes for invalid entries in e_vex ext 151 and 152 for consistency.

Step 3: add OP_ enums

Done

Step 4: update OP_LAST

Not needed since OP_LAST already points to the last enum.

Step 5: decode_fast tables if necessary

Not done

Step 6: instr_create macros

Added 1dst_3src macros for VCVTNE2PS2BF16 and VDPBF16PS since they write to operand 1 and read from mask register, operand 2, and operand 3.

Added 1dst_2src macro for VCVTNEPS2BF16 since it writes to operand 1 and reads from mask register and operand 2. We are setting the destination size explicitly since this writes to "half" the destination.

Step 7: suite/tests/api/ir tests

Added tests in ir_x86_3args_avx512_evex_mask.h and ir_x86_4args_avx512_evex_mask_C.h.

Currently commented out the VCVTNEPS2BF16 test because the destination size needs to be set explicitly.

Step 8: binutils tests

Added binutils tests that encode the assembly instructions using instr_create_.. APIs and match against the opcode bytes rather than the opposite because we don't produce disassembly that can match exactly against binutils disassembly.

These currently have two workarounds

  • set dest size explicitly
  • set zeroing prefix explicitly
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: i5483-add-support-for-avx512-bf16-instructions