Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 71 additions & 10 deletions doc/contributing/ffi-fast-api-internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,9 @@ The implementation is split across these files:
* `src/ffi/types.{h,cc}` parses public FFI signatures and implements
`IsFastCallEligible()`, which rejects signatures that the current Fast API
trampolines cannot represent.
* `src/ffi/platforms/arm64.cc` and `src/ffi/platforms/x64.cc` contain the
platform trampoline generators. These files follow the contract exposed by
`node_ffi_create_fast_trampoline()` and release code with
`node_ffi_free_fast_trampoline()`.
* `src/ffi/platforms/*.cc` contain the platform trampoline generators. These
files follow the contract exposed by `node_ffi_create_fast_trampoline()` and
release code with `node_ffi_free_fast_trampoline()`.
* `src/node_ffi.cc` decides whether a function gets a Fast API callable,
SharedBuffer callable, or generic callable, and attaches hidden metadata used
by JavaScript wrappers.
Expand Down Expand Up @@ -88,8 +87,7 @@ true only on supported architectures when `IsJitMemorySupported()` succeeds.
`IsJitMemorySupported()` runs a one-time self-test:

* Map one writable anonymous page.
* Write a minimal return instruction (`0xD65F03C0` on AArch64, `0xC3` on
x86\_64).
* Write a minimal return instruction for the current architecture.
* Flush the instruction cache where required.
* Try to transition the page to read/execute with `mprotect(PROT_READ |
PROT_EXEC)`.
Expand All @@ -99,8 +97,8 @@ The probe deliberately does not execute the generated instruction. Executing a
freshly written capability probe could terminate the process on systems that
block generated code. The real trampoline emitter performs the same writable to
executable transition when creating a callable trampoline and falls back when it
is rejected. Windows currently returns false because the branch does not yet
have a Win64 trampoline emitter or `VirtualAlloc`-based JIT memory support.
is rejected. Windows uses `VirtualAlloc`, `VirtualProtect`, and
`FlushInstructionCache` for the same probe.

## Signature Eligibility

Expand All @@ -110,8 +108,8 @@ keeps unsupported cases out of the trampoline emitters and lets

Eligibility requires:

* A supported platform emitter: AArch64 or x86\_64 SysV. Win64 is currently
ineligible.
* A supported platform emitter: AArch64, x86\_64 SysV, Win64 x64, PPC64LE
ELFv2, LoongArch64, RISC-V 64, or s390x.
* A return type that is numeric, pointer, or `void`.
* Argument types that are numeric or pointer. `void` cannot be an argument.
* No `function` typed argument or return value.
Expand Down Expand Up @@ -141,6 +139,56 @@ x86\_64 SysV eligibility mirrors `src/ffi/platforms/x64.cc`:
incoming GP count is capped at 5 and buffer-shaped arguments cannot coexist
with FP arguments.

Win64 x64 eligibility mirrors the conservative Windows emitter in
`src/ffi/platforms/x64.cc`:

* The JavaScript receiver occupies the first positional register slot.
* Public arguments are shifted from positions 1..3 into positions 0..2.
* Integer and FP arguments are handled according to their positional Win64
register slots.
* Only scalar register-only signatures with at most three public arguments are
currently eligible.
* Buffer-shaped arguments and stack-passed arguments fall back.

PPC64LE eligibility mirrors `src/ffi/platforms/ppc64.cc`:

* `r3` is occupied by V8's receiver, so user GP arguments arrive in `r4..r10`.
* FP arguments use FPRs and are not shifted by the receiver slot.
* The generated trampoline shifts only GP registers and tail-branches to the
target through `ctr`, with the target address in `r12` for ELFv2 global entry.
* Only scalar register-only signatures are currently eligible.
* Buffer-shaped arguments, stack-passed arguments, narrow returns, and PPC64BE
platforms fall back. AIX/PPC64BE is intentionally a non-target for the current
Fast FFI trampoline work because its ABI/linkage shape needs separate design.

LoongArch64 eligibility mirrors `src/ffi/platforms/loong64.cc`:

* `a0` is occupied by V8's receiver, so user GP arguments arrive in `a1..a7`.
* FP arguments use `fa0..fa7` and are not shifted by the receiver slot.
* The generated trampoline shifts only GP registers and tail-branches to the
target through `jirl`.
* Only scalar register-only signatures are currently eligible.
* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back.

RISC-V 64 eligibility mirrors `src/ffi/platforms/riscv64.cc`:

* `a0` is occupied by V8's receiver, so user GP arguments arrive in `a1..a7`.
* FP arguments use `fa0..fa7` and are not shifted by the receiver slot.
* The generated trampoline shifts only GP registers and tail-branches to the
target through `jalr`.
* Only scalar register-only signatures are currently eligible.
* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back.

s390x eligibility mirrors `src/ffi/platforms/s390x.cc`:

* `r2` is occupied by V8's receiver, so user GP arguments arrive in `r3..r6`.
* FP arguments use `f0`, `f2`, `f4`, and `f6` and are not shifted by the receiver
slot.
* The generated trampoline shifts only GP registers and tail-branches to the
target through `br`.
* Only scalar register-only signatures are currently eligible.
* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back.

The native trampoline generator still repeats its own register checks. The
eligibility function is the early, centralized rejection point; the generator
checks are a defense against direct or future callers.
Expand Down Expand Up @@ -395,9 +443,22 @@ Important limits are:
* No stack arguments in the current AArch64 trampoline.
* At most one stack-loaded scalar GP argument in the current x86\_64 SysV
trampoline.
* No stack arguments or buffer-shaped arguments in the current Win64 x64
trampoline.
* No stack arguments, buffer-shaped arguments, or narrow returns in the current
PPC64LE trampoline.
* No stack arguments, buffer-shaped arguments, or narrow returns in the current
LoongArch64, RISC-V 64, and s390x trampolines.
* No mixed buffer-shaped and FP arguments.
* No `function` argument or return type in the Fast API path.

Linux x86 and armv7 are experimental Node.js platforms, but the current Fast FFI
trampoline model remains 64-bit only. They continue to use SharedBuffer or
generic libffi fallback paths. Linux s390x is a Tier 2 Node.js platform, but
bundled FFI is not currently enabled for that target; if built with
`--shared-ffi`, scalar register-only Fast API FFI can use the s390x emitter. AIX
PPC64BE is intentionally not covered by this implementation.

These are optimization boundaries, not public FFI signature boundaries. User
code can still call supported public FFI signatures through fallback paths.

Expand Down
4 changes: 4 additions & 0 deletions node.gyp
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,10 @@
'src/node_ffi.cc',
'src/node_ffi.h',
'src/ffi/platforms/arm64.cc',
'src/ffi/platforms/loong64.cc',
'src/ffi/platforms/ppc64.cc',
'src/ffi/platforms/riscv64.cc',
'src/ffi/platforms/s390x.cc',
'src/ffi/platforms/x64.cc',
'src/ffi/data.cc',
'src/ffi/data.h',
Expand Down
5 changes: 4 additions & 1 deletion src/ffi/fast.cc
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,10 @@ FastFFIMetadata::~FastFFIMetadata() {

bool IsFastCallSupported() {
// Fast call requires both a platform stub emitter and working JIT memory.
#if defined(__aarch64__) || defined(_M_ARM64) || defined(__x86_64__)
#if defined(__aarch64__) || defined(_M_ARM64) || defined(__x86_64__) || \
defined(_M_X64) || defined(__powerpc64__) || defined(__ppc64__) || \
defined(__PPC64__) || defined(__loongarch64) || \
(defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__)
return IsJitMemorySupported();
#else
return false;
Expand Down
74 changes: 56 additions & 18 deletions src/ffi/jit_memory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,85 @@

#include "ffi/jit_memory.h"

#if !defined(_WIN32)

#include <sys/mman.h>
#include <unistd.h>

#include <cstdint>
#include <cstring>
#include <mutex>

#if defined(_WIN32)
#include <windows.h>
#else
#include <sys/mman.h>
#include <unistd.h>

#if defined(__APPLE__)
#include <libkern/OSCacheControl.h>
#endif

#endif // !defined(_WIN32)
#endif // defined(_WIN32)

namespace node::ffi {

namespace {

#if !defined(_WIN32)

bool SelfTest() {
#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__)
#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__) && \
!defined(_M_X64) && !defined(__powerpc64__) && !defined(__ppc64__) && \
!defined(__PPC64__) && !defined(__loongarch64) && \
!(defined(__riscv) && __riscv_xlen == 64) && !defined(__s390x__)
// No stub emitter for this platform; nothing to test.
return false;
#else
#if defined(__aarch64__) || defined(_M_ARM64)
// AArch64 BR LR: 0xD65F03C0
constexpr uint32_t kInstruction = 0xD65F03C0;
constexpr size_t kInstructionSize = sizeof(uint32_t);
#elif defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__)
// PPC64 BLR: 0x4E800020
constexpr uint32_t kInstruction = 0x4E800020;
constexpr size_t kInstructionSize = sizeof(uint32_t);
#elif defined(__loongarch64)
// LoongArch64 JIRL zero, ra, 0
constexpr uint32_t kInstruction = 0x4C000020;
constexpr size_t kInstructionSize = sizeof(uint32_t);
#elif defined(__riscv) && __riscv_xlen == 64
// RISC-V JALR zero, ra, 0
constexpr uint32_t kInstruction = 0x00008067;
constexpr size_t kInstructionSize = sizeof(uint32_t);
#elif defined(__s390x__)
// s390x BR r14
constexpr uint16_t kInstruction = 0x07fe;
constexpr size_t kInstructionSize = sizeof(uint16_t);
#else
// x86_64 RET: 0xC3
constexpr uint8_t kInstruction = 0xC3;
constexpr size_t kInstructionSize = sizeof(uint8_t);
#endif

#if defined(_WIN32)
void* page = VirtualAlloc(
nullptr, kInstructionSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (page == nullptr) {
return false;
}

uint8_t* code = static_cast<uint8_t*>(page);
#if defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) || \
defined(__ppc64__) || defined(__PPC64__) || defined(__loongarch64) || \
(defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__)
std::memcpy(code, &kInstruction, kInstructionSize);
#else
code[0] = kInstruction;
#endif

FlushInstructionCache(GetCurrentProcess(), page, kInstructionSize);

DWORD old_protect;
const bool ok =
VirtualProtect(page, kInstructionSize, PAGE_EXECUTE_READ, &old_protect) !=
0;
VirtualFree(page, 0, MEM_RELEASE);
return ok;
#else
const size_t page_size = static_cast<size_t>(getpagesize());
void* page = mmap(nullptr,
page_size,
Expand All @@ -50,7 +93,9 @@ bool SelfTest() {
}

uint8_t* code = static_cast<uint8_t*>(page);
#if defined(__aarch64__) || defined(_M_ARM64)
#if defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) || \
defined(__ppc64__) || defined(__PPC64__) || defined(__loongarch64) || \
(defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__)
std::memcpy(code, &kInstruction, kInstructionSize);
#elif defined(__x86_64__)
code[0] = kInstruction;
Expand Down Expand Up @@ -84,25 +129,18 @@ bool SelfTest() {
munmap(page, page_size);
return ok;
#endif
#endif
}

#endif // !defined(_WIN32)

} // namespace

bool IsJitMemorySupported() {
#if defined(_WIN32)
// Windows stub emitter and VirtualAlloc-based JIT memory support not yet
// implemented. Return false so the fast-call path falls back to libffi.
return false;
#else
// Run the self-test exactly once and publish only the final result, so
// concurrent callers never observe a provisional value.
static std::once_flag once;
static bool supported = false;
std::call_once(once, [] { supported = SelfTest(); });
return supported;
#endif
}

} // namespace node::ffi
Expand Down
Loading
Loading