diff --git a/doc/contributing/ffi-fast-api-internals.md b/doc/contributing/ffi-fast-api-internals.md index b0d318b3a1bee8..280fa44251199a 100644 --- a/doc/contributing/ffi-fast-api-internals.md +++ b/doc/contributing/ffi-fast-api-internals.md @@ -41,10 +41,9 @@ The implementation is split across these files: * `src/ffi/types.{h,cc}` parses public FFI signatures and implements `IsFastCallEligible()`, which rejects signatures that the current Fast API trampolines cannot represent. -* `src/ffi/platforms/arm64.cc` and `src/ffi/platforms/x64.cc` contain the - platform trampoline generators. These files follow the contract exposed by - `node_ffi_create_fast_trampoline()` and release code with - `node_ffi_free_fast_trampoline()`. +* `src/ffi/platforms/*.cc` contain the platform trampoline generators. These + files follow the contract exposed by `node_ffi_create_fast_trampoline()` and + release code with `node_ffi_free_fast_trampoline()`. * `src/node_ffi.cc` decides whether a function gets a Fast API callable, SharedBuffer callable, or generic callable, and attaches hidden metadata used by JavaScript wrappers. @@ -88,8 +87,7 @@ true only on supported architectures when `IsJitMemorySupported()` succeeds. `IsJitMemorySupported()` runs a one-time self-test: * Map one writable anonymous page. -* Write a minimal return instruction (`0xD65F03C0` on AArch64, `0xC3` on - x86\_64). +* Write a minimal return instruction for the current architecture. * Flush the instruction cache where required. * Try to transition the page to read/execute with `mprotect(PROT_READ | PROT_EXEC)`. @@ -99,8 +97,8 @@ The probe deliberately does not execute the generated instruction. Executing a freshly written capability probe could terminate the process on systems that block generated code. The real trampoline emitter performs the same writable to executable transition when creating a callable trampoline and falls back when it -is rejected. Windows currently returns false because the branch does not yet -have a Win64 trampoline emitter or `VirtualAlloc`-based JIT memory support. +is rejected. Windows uses `VirtualAlloc`, `VirtualProtect`, and +`FlushInstructionCache` for the same probe. ## Signature Eligibility @@ -110,8 +108,8 @@ keeps unsupported cases out of the trampoline emitters and lets Eligibility requires: -* A supported platform emitter: AArch64 or x86\_64 SysV. Win64 is currently - ineligible. +* A supported platform emitter: AArch64, x86\_64 SysV, Win64 x64, PPC64LE + ELFv2, LoongArch64, RISC-V 64, or s390x. * A return type that is numeric, pointer, or `void`. * Argument types that are numeric or pointer. `void` cannot be an argument. * No `function` typed argument or return value. @@ -141,6 +139,56 @@ x86\_64 SysV eligibility mirrors `src/ffi/platforms/x64.cc`: incoming GP count is capped at 5 and buffer-shaped arguments cannot coexist with FP arguments. +Win64 x64 eligibility mirrors the conservative Windows emitter in +`src/ffi/platforms/x64.cc`: + +* The JavaScript receiver occupies the first positional register slot. +* Public arguments are shifted from positions 1..3 into positions 0..2. +* Integer and FP arguments are handled according to their positional Win64 + register slots. +* Only scalar register-only signatures with at most three public arguments are + currently eligible. +* Buffer-shaped arguments and stack-passed arguments fall back. + +PPC64LE eligibility mirrors `src/ffi/platforms/ppc64.cc`: + +* `r3` is occupied by V8's receiver, so user GP arguments arrive in `r4..r10`. +* FP arguments use FPRs and are not shifted by the receiver slot. +* The generated trampoline shifts only GP registers and tail-branches to the + target through `ctr`, with the target address in `r12` for ELFv2 global entry. +* Only scalar register-only signatures are currently eligible. +* Buffer-shaped arguments, stack-passed arguments, narrow returns, and PPC64BE + platforms fall back. AIX/PPC64BE is intentionally a non-target for the current + Fast FFI trampoline work because its ABI/linkage shape needs separate design. + +LoongArch64 eligibility mirrors `src/ffi/platforms/loong64.cc`: + +* `a0` is occupied by V8's receiver, so user GP arguments arrive in `a1..a7`. +* FP arguments use `fa0..fa7` and are not shifted by the receiver slot. +* The generated trampoline shifts only GP registers and tail-branches to the + target through `jirl`. +* Only scalar register-only signatures are currently eligible. +* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back. + +RISC-V 64 eligibility mirrors `src/ffi/platforms/riscv64.cc`: + +* `a0` is occupied by V8's receiver, so user GP arguments arrive in `a1..a7`. +* FP arguments use `fa0..fa7` and are not shifted by the receiver slot. +* The generated trampoline shifts only GP registers and tail-branches to the + target through `jalr`. +* Only scalar register-only signatures are currently eligible. +* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back. + +s390x eligibility mirrors `src/ffi/platforms/s390x.cc`: + +* `r2` is occupied by V8's receiver, so user GP arguments arrive in `r3..r6`. +* FP arguments use `f0`, `f2`, `f4`, and `f6` and are not shifted by the receiver + slot. +* The generated trampoline shifts only GP registers and tail-branches to the + target through `br`. +* Only scalar register-only signatures are currently eligible. +* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back. + The native trampoline generator still repeats its own register checks. The eligibility function is the early, centralized rejection point; the generator checks are a defense against direct or future callers. @@ -395,9 +443,22 @@ Important limits are: * No stack arguments in the current AArch64 trampoline. * At most one stack-loaded scalar GP argument in the current x86\_64 SysV trampoline. +* No stack arguments or buffer-shaped arguments in the current Win64 x64 + trampoline. +* No stack arguments, buffer-shaped arguments, or narrow returns in the current + PPC64LE trampoline. +* No stack arguments, buffer-shaped arguments, or narrow returns in the current + LoongArch64, RISC-V 64, and s390x trampolines. * No mixed buffer-shaped and FP arguments. * No `function` argument or return type in the Fast API path. +Linux x86 and armv7 are experimental Node.js platforms, but the current Fast FFI +trampoline model remains 64-bit only. They continue to use SharedBuffer or +generic libffi fallback paths. Linux s390x is a Tier 2 Node.js platform, but +bundled FFI is not currently enabled for that target; if built with +`--shared-ffi`, scalar register-only Fast API FFI can use the s390x emitter. AIX +PPC64BE is intentionally not covered by this implementation. + These are optimization boundaries, not public FFI signature boundaries. User code can still call supported public FFI signatures through fallback paths. diff --git a/node.gyp b/node.gyp index 03c85224a3e7cb..706489560db9d5 100644 --- a/node.gyp +++ b/node.gyp @@ -486,6 +486,10 @@ 'src/node_ffi.cc', 'src/node_ffi.h', 'src/ffi/platforms/arm64.cc', + 'src/ffi/platforms/loong64.cc', + 'src/ffi/platforms/ppc64.cc', + 'src/ffi/platforms/riscv64.cc', + 'src/ffi/platforms/s390x.cc', 'src/ffi/platforms/x64.cc', 'src/ffi/data.cc', 'src/ffi/data.h', diff --git a/src/ffi/fast.cc b/src/ffi/fast.cc index 7e8d182a7bdc87..13c039c0da3f17 100644 --- a/src/ffi/fast.cc +++ b/src/ffi/fast.cc @@ -222,7 +222,10 @@ FastFFIMetadata::~FastFFIMetadata() { bool IsFastCallSupported() { // Fast call requires both a platform stub emitter and working JIT memory. -#if defined(__aarch64__) || defined(_M_ARM64) || defined(__x86_64__) +#if defined(__aarch64__) || defined(_M_ARM64) || defined(__x86_64__) || \ + defined(_M_X64) || defined(__powerpc64__) || defined(__ppc64__) || \ + defined(__PPC64__) || defined(__loongarch64) || \ + (defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__) return IsJitMemorySupported(); #else return false; diff --git a/src/ffi/jit_memory.cc b/src/ffi/jit_memory.cc index 0c4de68305c772..023b55757691e7 100644 --- a/src/ffi/jit_memory.cc +++ b/src/ffi/jit_memory.cc @@ -2,29 +2,31 @@ #include "ffi/jit_memory.h" -#if !defined(_WIN32) - -#include -#include - #include #include #include +#if defined(_WIN32) +#include +#else +#include +#include + #if defined(__APPLE__) #include #endif -#endif // !defined(_WIN32) +#endif // defined(_WIN32) namespace node::ffi { namespace { -#if !defined(_WIN32) - bool SelfTest() { -#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__) +#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__) && \ + !defined(_M_X64) && !defined(__powerpc64__) && !defined(__ppc64__) && \ + !defined(__PPC64__) && !defined(__loongarch64) && \ + !(defined(__riscv) && __riscv_xlen == 64) && !defined(__s390x__) // No stub emitter for this platform; nothing to test. return false; #else @@ -32,12 +34,53 @@ bool SelfTest() { // AArch64 BR LR: 0xD65F03C0 constexpr uint32_t kInstruction = 0xD65F03C0; constexpr size_t kInstructionSize = sizeof(uint32_t); +#elif defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__) + // PPC64 BLR: 0x4E800020 + constexpr uint32_t kInstruction = 0x4E800020; + constexpr size_t kInstructionSize = sizeof(uint32_t); +#elif defined(__loongarch64) + // LoongArch64 JIRL zero, ra, 0 + constexpr uint32_t kInstruction = 0x4C000020; + constexpr size_t kInstructionSize = sizeof(uint32_t); +#elif defined(__riscv) && __riscv_xlen == 64 + // RISC-V JALR zero, ra, 0 + constexpr uint32_t kInstruction = 0x00008067; + constexpr size_t kInstructionSize = sizeof(uint32_t); +#elif defined(__s390x__) + // s390x BR r14 + constexpr uint16_t kInstruction = 0x07fe; + constexpr size_t kInstructionSize = sizeof(uint16_t); #else // x86_64 RET: 0xC3 constexpr uint8_t kInstruction = 0xC3; constexpr size_t kInstructionSize = sizeof(uint8_t); #endif +#if defined(_WIN32) + void* page = VirtualAlloc( + nullptr, kInstructionSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); + if (page == nullptr) { + return false; + } + + uint8_t* code = static_cast(page); +#if defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) || \ + defined(__ppc64__) || defined(__PPC64__) || defined(__loongarch64) || \ + (defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__) + std::memcpy(code, &kInstruction, kInstructionSize); +#else + code[0] = kInstruction; +#endif + + FlushInstructionCache(GetCurrentProcess(), page, kInstructionSize); + + DWORD old_protect; + const bool ok = + VirtualProtect(page, kInstructionSize, PAGE_EXECUTE_READ, &old_protect) != + 0; + VirtualFree(page, 0, MEM_RELEASE); + return ok; +#else const size_t page_size = static_cast(getpagesize()); void* page = mmap(nullptr, page_size, @@ -50,7 +93,9 @@ bool SelfTest() { } uint8_t* code = static_cast(page); -#if defined(__aarch64__) || defined(_M_ARM64) +#if defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) || \ + defined(__ppc64__) || defined(__PPC64__) || defined(__loongarch64) || \ + (defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__) std::memcpy(code, &kInstruction, kInstructionSize); #elif defined(__x86_64__) code[0] = kInstruction; @@ -84,25 +129,18 @@ bool SelfTest() { munmap(page, page_size); return ok; #endif +#endif } -#endif // !defined(_WIN32) - } // namespace bool IsJitMemorySupported() { -#if defined(_WIN32) - // Windows stub emitter and VirtualAlloc-based JIT memory support not yet - // implemented. Return false so the fast-call path falls back to libffi. - return false; -#else // Run the self-test exactly once and publish only the final result, so // concurrent callers never observe a provisional value. static std::once_flag once; static bool supported = false; std::call_once(once, [] { supported = SelfTest(); }); return supported; -#endif } } // namespace node::ffi diff --git a/src/ffi/platforms/arm64.cc b/src/ffi/platforms/arm64.cc index b0a2261074c16a..ccb7a8cf3d04ff 100644 --- a/src/ffi/platforms/arm64.cc +++ b/src/ffi/platforms/arm64.cc @@ -2,10 +2,14 @@ #include "ffi/fast.h" -#if (defined(__aarch64__) || defined(_M_ARM64)) && !defined(_WIN32) +#if defined(__aarch64__) || defined(_M_ARM64) +#if defined(_WIN32) +#include +#else #include #include +#endif #include @@ -163,6 +167,50 @@ unsigned Align16(unsigned value) { return (value + 15) & ~15; } +void* AllocateCode(size_t code_size) { +#if defined(_WIN32) + return VirtualAlloc( + nullptr, code_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); +#else + void* code = mmap(nullptr, + code_size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, + -1, + 0); + return code == MAP_FAILED ? nullptr : code; +#endif +} + +void FreeCode(void* code, size_t code_size) { +#if defined(_WIN32) + VirtualFree(code, 0, MEM_RELEASE); +#else + munmap(code, code_size); +#endif +} + +void FlushCode(void* code, size_t written) { +#if defined(_WIN32) + FlushInstructionCache(GetCurrentProcess(), code, written); +#elif defined(__APPLE__) + // Make the just-written instructions visible to the CPU's instruction cache. + sys_icache_invalidate(code, written); +#else + __builtin___clear_cache(static_cast(code), + static_cast(code) + written); +#endif +} + +bool ProtectCode(void* code, size_t code_size) { +#if defined(_WIN32) + DWORD old_protect; + return VirtualProtect(code, code_size, PAGE_EXECUTE_READ, &old_protect) != 0; +#else + return mprotect(code, code_size, PROT_READ | PROT_EXEC) == 0; +#endif +} + } // namespace extern "C" bool node_ffi_create_fast_trampoline( @@ -218,13 +266,8 @@ extern "C" bool node_ffi_create_fast_trampoline( // Generate into writable anonymous memory first; the page is made executable // only after the instruction stream is complete and the instruction cache is // synchronized. - void* code = mmap(nullptr, - code_size, - PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANON, - -1, - 0); - if (code == MAP_FAILED) { + void* code = AllocateCode(code_size); + if (code == nullptr) { return false; } @@ -340,18 +383,12 @@ extern "C" bool node_ffi_create_fast_trampoline( const size_t written = reinterpret_cast(cursor) - static_cast(code); -#if defined(__APPLE__) - // Make the just-written instructions visible to the CPU's instruction cache. - sys_icache_invalidate(code, written); -#else - __builtin___clear_cache(static_cast(code), - static_cast(code) + written); -#endif + FlushCode(code, written); // Enforce W^X after code generation: the trampoline is executable but no // longer writable once published through FastFFITrampoline. - if (mprotect(code, code_size, PROT_READ | PROT_EXEC) != 0) { - munmap(code, code_size); + if (!ProtectCode(code, code_size)) { + FreeCode(code, code_size); return false; } @@ -367,12 +404,15 @@ extern "C" void node_ffi_free_fast_trampoline( if (trampoline == nullptr || trampoline->code == nullptr) { return; } - munmap(trampoline->code, trampoline->size); + FreeCode(trampoline->code, trampoline->size); trampoline->code = nullptr; trampoline->size = 0; } -#elif !defined(__x86_64__) || defined(_WIN32) +#elif !defined(__x86_64__) && !defined(_M_X64) && \ + !defined(__powerpc64__) && !defined(__ppc64__) && \ + !defined(__loongarch64) && \ + !(defined(__riscv) && __riscv_xlen == 64) && !defined(__s390x__) extern "C" bool node_ffi_create_fast_trampoline( void* target, @@ -390,6 +430,6 @@ extern "C" void node_ffi_free_fast_trampoline( // No code is allocated in the non-AArch64 stub. } -#endif // (defined(__aarch64__) || defined(_M_ARM64)) && !defined(_WIN32) +#endif // defined(__aarch64__) || defined(_M_ARM64) #endif // HAVE_FFI diff --git a/src/ffi/platforms/loong64.cc b/src/ffi/platforms/loong64.cc new file mode 100644 index 00000000000000..9b3bd5dfd90102 --- /dev/null +++ b/src/ffi/platforms/loong64.cc @@ -0,0 +1,159 @@ +#if HAVE_FFI + +#include "ffi/fast.h" + +#if defined(__loongarch64) + +#include +#include + +#include + +namespace { + +using node::ffi::FastFFIType; + +bool IsFloatType(FastFFIType type) { + return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64; +} + +bool IsNarrowType(FastFFIType type) { + switch (type) { + case FastFFIType::kBool: + case FastFFIType::kInt8: + case FastFFIType::kUint8: + case FastFFIType::kInt16: + case FastFFIType::kUint16: + return true; + default: + return false; + } +} + +uint32_t Or(unsigned rd, unsigned rj, unsigned rk) { + return (0x2au << 15) | (rk << 10) | (rj << 5) | rd; +} + +uint32_t Pcaddu12i(unsigned rd, int imm20) { + return (0x0eu << 25) | ((static_cast(imm20) & 0xfffff) << 5) | rd; +} + +uint32_t LdD(unsigned rd, unsigned rj, int imm12) { + return (0xa3u << 22) | ((static_cast(imm12) & 0xfff) << 10) | + (rj << 5) | rd; +} + +uint32_t Jirl(unsigned rd, unsigned rj, int imm16) { + return (0x13u << 26) | ((static_cast(imm16) & 0xffff) << 10) | + (rj << 5) | rd; +} + +void Emit32(uint32_t** cursor, uint32_t value) { + *(*cursor)++ = value; +} + +void Emit64(uint32_t** cursor, uint64_t value) { + uint64_t* slot = reinterpret_cast(*cursor); + *slot = value; + *cursor += 2; +} + +void* AllocateCode(size_t code_size) { + void* code = mmap(nullptr, + code_size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, + -1, + 0); + return code == MAP_FAILED ? nullptr : code; +} + +void FreeCode(void* code, size_t code_size) { + munmap(code, code_size); +} + +} // namespace + +extern "C" bool node_ffi_create_fast_trampoline( + void* target, + const node::ffi::FastFFIType* args, + size_t argc, + node::ffi::FastFFIType result, + node::ffi::FastFFITrampoline* out) { + if (target == nullptr || out == nullptr || IsNarrowType(result)) { + return false; + } + + size_t gp_count = 0; + size_t fp_count = 0; + for (size_t i = 0; i < argc; i++) { + if (args[i] == FastFFIType::kBuffer) { + return false; + } + if (IsFloatType(args[i])) { + fp_count++; + } else { + gp_count++; + } + } + + // LoongArch64 passes integer arguments in a0..a7. V8's receiver occupies a0, + // so user GP arguments arrive in a1..a7 and are shifted down before the tail + // branch. FP arguments are already in fa0..fa7. + if (gp_count > 7 || fp_count > 8) { + return false; + } + + constexpr size_t kCodeSize = 256; + void* code = AllocateCode(kCodeSize); + if (code == nullptr) { + return false; + } + + uint32_t* cursor = static_cast(code); + unsigned gp_index = 0; + for (size_t i = 0; i < argc; i++) { + if (IsFloatType(args[i])) { + continue; + } + const unsigned target_reg = 4 + gp_index; + const unsigned incoming_reg = target_reg + 1; + Emit32(&cursor, Or(target_reg, incoming_reg, 0)); + gp_index++; + } + + // Load the target address from a nearby literal into t0 and tail-branch. + Emit32(&cursor, Pcaddu12i(12, 0)); // pcaddu12i t0, 0 + Emit32(&cursor, LdD(12, 12, 16)); // ld.d t0, t0, literal + Emit32(&cursor, Jirl(0, 12, 0)); // jr t0 + Emit32(&cursor, Or(0, 0, 0)); // nop; align literal to 8 bytes + Emit64(&cursor, reinterpret_cast(target)); + + const size_t written = reinterpret_cast(cursor) - + static_cast(code); + __builtin___clear_cache(static_cast(code), + static_cast(code) + written); + + if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) { + FreeCode(code, kCodeSize); + return false; + } + + out->code = code; + out->size = kCodeSize; + return true; +} + +extern "C" void node_ffi_free_fast_trampoline( + node::ffi::FastFFITrampoline* trampoline) { + if (trampoline == nullptr || trampoline->code == nullptr) { + return; + } + FreeCode(trampoline->code, trampoline->size); + trampoline->code = nullptr; + trampoline->size = 0; +} + +#endif // defined(__loongarch64) + +#endif // HAVE_FFI diff --git a/src/ffi/platforms/ppc64.cc b/src/ffi/platforms/ppc64.cc new file mode 100644 index 00000000000000..4549b98186f5e7 --- /dev/null +++ b/src/ffi/platforms/ppc64.cc @@ -0,0 +1,189 @@ +#if HAVE_FFI + +#include "ffi/fast.h" + +#if defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__) +#if (defined(__LITTLE_ENDIAN__) || \ + (defined(__BYTE_ORDER__) && \ + __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)) && \ + !defined(_AIX) + +#include +#include + +#include + +namespace { + +using node::ffi::FastFFIType; + +bool IsFloatType(FastFFIType type) { + return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64; +} + +bool IsNarrowType(FastFFIType type) { + switch (type) { + case FastFFIType::kBool: + case FastFFIType::kInt8: + case FastFFIType::kUint8: + case FastFFIType::kInt16: + case FastFFIType::kUint16: + return true; + default: + return false; + } +} + +uint32_t Or(unsigned ra, unsigned rs, unsigned rb) { + return (31u << 26) | (rs << 21) | (ra << 16) | (rb << 11) | (444u << 1); +} + +uint32_t Mr(unsigned ra, unsigned rs) { + return Or(ra, rs, rs); +} + +uint32_t Bl(unsigned instruction_offset) { + return (18u << 26) | ((instruction_offset & 0x00ffffffu) << 2) | 1u; +} + +uint32_t Mfspr(unsigned rt, unsigned spr) { + return (31u << 26) | (rt << 21) | ((spr & 0x1f) << 16) | + ((spr >> 5) << 11) | (339u << 1); +} + +uint32_t Mtspr(unsigned spr, unsigned rs) { + return (31u << 26) | (rs << 21) | ((spr & 0x1f) << 16) | + ((spr >> 5) << 11) | (467u << 1); +} + +uint32_t Ld(unsigned rt, unsigned ra, unsigned offset) { + return (58u << 26) | (rt << 21) | (ra << 16) | (offset & 0xfffcu); +} + +void Emit32(uint32_t** cursor, uint32_t value) { + *(*cursor)++ = value; +} + +void Emit64(uint32_t** cursor, uint64_t value) { + uint64_t* slot = reinterpret_cast(*cursor); + *slot = value; + *cursor += 2; +} + +void* AllocateCode(size_t code_size) { + void* code = mmap(nullptr, + code_size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, + -1, + 0); + return code == MAP_FAILED ? nullptr : code; +} + +void FreeCode(void* code, size_t code_size) { + munmap(code, code_size); +} + +} // namespace + +extern "C" bool node_ffi_create_fast_trampoline( + void* target, + const node::ffi::FastFFIType* args, + size_t argc, + node::ffi::FastFFIType result, + node::ffi::FastFFITrampoline* out) { + if (target == nullptr || out == nullptr || IsNarrowType(result)) { + return false; + } + + size_t gp_count = 0; + size_t fp_count = 0; + for (size_t i = 0; i < argc; i++) { + if (args[i] == FastFFIType::kBuffer) { + return false; + } + if (IsFloatType(args[i])) { + fp_count++; + } else { + gp_count++; + } + } + + // ELFv2 PPC64LE passes integer arguments in r3..r10. V8's receiver occupies + // r3, so the scalar-only fast path keeps user GP arguments in r4..r10 and + // shifts them down before tail-branching to the native target. + if (gp_count > 7 || fp_count > 8) { + return false; + } + + constexpr size_t kCodeSize = 256; + void* code = AllocateCode(kCodeSize); + if (code == nullptr) { + return false; + } + + uint32_t* cursor = static_cast(code); + unsigned gp_index = 0; + for (size_t i = 0; i < argc; i++) { + if (IsFloatType(args[i])) { + continue; + } + const unsigned target_reg = 3 + gp_index; + const unsigned incoming_reg = target_reg + 1; + Emit32(&cursor, Mr(target_reg, incoming_reg)); + gp_index++; + } + + // Load the target address from the literal pool into r12, then branch through + // CTR. ELFv2 functions can use r12 to establish their TOC on global entry. + Emit32(&cursor, Bl(1)); // bl .+4 + Emit32(&cursor, Mfspr(12, 8)); // mflr r12 + Emit32(&cursor, Ld(12, 12, 20)); // ld r12, literal-mflr(r12) + Emit32(&cursor, Mtspr(9, 12)); // mtctr r12 + Emit32(&cursor, 0x4e800420); // bctr + Emit32(&cursor, 0x60000000); // nop; align literal to 8 bytes + Emit64(&cursor, reinterpret_cast(target)); + + const size_t written = reinterpret_cast(cursor) - + static_cast(code); + __builtin___clear_cache(static_cast(code), + static_cast(code) + written); + + if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) { + FreeCode(code, kCodeSize); + return false; + } + + out->code = code; + out->size = kCodeSize; + return true; +} + +extern "C" void node_ffi_free_fast_trampoline( + node::ffi::FastFFITrampoline* trampoline) { + if (trampoline == nullptr || trampoline->code == nullptr) { + return; + } + FreeCode(trampoline->code, trampoline->size); + trampoline->code = nullptr; + trampoline->size = 0; +} + +#else + +extern "C" bool node_ffi_create_fast_trampoline( + void* target, + const node::ffi::FastFFIType* args, + size_t argc, + node::ffi::FastFFIType result, + node::ffi::FastFFITrampoline* out) { + return false; +} + +extern "C" void node_ffi_free_fast_trampoline( + node::ffi::FastFFITrampoline* trampoline) {} + +#endif // PPC64LE non-AIX +#endif // defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__) + +#endif // HAVE_FFI diff --git a/src/ffi/platforms/riscv64.cc b/src/ffi/platforms/riscv64.cc new file mode 100644 index 00000000000000..306dd5bbce77c4 --- /dev/null +++ b/src/ffi/platforms/riscv64.cc @@ -0,0 +1,160 @@ +#if HAVE_FFI + +#include "ffi/fast.h" + +#if defined(__riscv) && __riscv_xlen == 64 + +#include +#include + +#include + +namespace { + +using node::ffi::FastFFIType; + +bool IsFloatType(FastFFIType type) { + return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64; +} + +bool IsNarrowType(FastFFIType type) { + switch (type) { + case FastFFIType::kBool: + case FastFFIType::kInt8: + case FastFFIType::kUint8: + case FastFFIType::kInt16: + case FastFFIType::kUint16: + return true; + default: + return false; + } +} + +uint32_t Addi(unsigned rd, unsigned rs1, int imm) { + return ((static_cast(imm) & 0xfff) << 20) | (rs1 << 15) | + (rd << 7) | 0x13; +} + +uint32_t Ld(unsigned rd, unsigned rs1, int imm) { + return ((static_cast(imm) & 0xfff) << 20) | (rs1 << 15) | + (3u << 12) | (rd << 7) | 0x03; +} + +uint32_t Auipc(unsigned rd, int imm20) { + return (static_cast(imm20) << 12) | (rd << 7) | 0x17; +} + +uint32_t Jalr(unsigned rd, unsigned rs1, int imm) { + return ((static_cast(imm) & 0xfff) << 20) | (rs1 << 15) | + (rd << 7) | 0x67; +} + +void Emit32(uint32_t** cursor, uint32_t value) { + *(*cursor)++ = value; +} + +void Emit64(uint32_t** cursor, uint64_t value) { + uint64_t* slot = reinterpret_cast(*cursor); + *slot = value; + *cursor += 2; +} + +void* AllocateCode(size_t code_size) { + void* code = mmap(nullptr, + code_size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, + -1, + 0); + return code == MAP_FAILED ? nullptr : code; +} + +void FreeCode(void* code, size_t code_size) { + munmap(code, code_size); +} + +} // namespace + +extern "C" bool node_ffi_create_fast_trampoline( + void* target, + const node::ffi::FastFFIType* args, + size_t argc, + node::ffi::FastFFIType result, + node::ffi::FastFFITrampoline* out) { + if (target == nullptr || out == nullptr || IsNarrowType(result)) { + return false; + } + + size_t gp_count = 0; + size_t fp_count = 0; + for (size_t i = 0; i < argc; i++) { + if (args[i] == FastFFIType::kBuffer) { + return false; + } + if (IsFloatType(args[i])) { + fp_count++; + } else { + gp_count++; + } + } + + // RISC-V LP64D passes integer arguments in a0..a7. V8's receiver occupies + // a0, so user GP arguments arrive in a1..a7 and are shifted down before the + // tail branch. FP arguments are already in fa0..fa7. + if (gp_count > 7 || fp_count > 8) { + return false; + } + + constexpr size_t kCodeSize = 256; + void* code = AllocateCode(kCodeSize); + if (code == nullptr) { + return false; + } + + uint32_t* cursor = static_cast(code); + unsigned gp_index = 0; + for (size_t i = 0; i < argc; i++) { + if (IsFloatType(args[i])) { + continue; + } + const unsigned target_reg = 10 + gp_index; + const unsigned incoming_reg = target_reg + 1; + Emit32(&cursor, Addi(target_reg, incoming_reg, 0)); + gp_index++; + } + + // Load the target address from a nearby literal into t0 and tail-branch. + Emit32(&cursor, Auipc(5, 0)); // auipc t0, 0 + Emit32(&cursor, Ld(5, 5, 16)); // ld t0, literal(t0) + Emit32(&cursor, Jalr(0, 5, 0)); // jr t0 + Emit32(&cursor, Addi(0, 0, 0)); // nop; align literal to 8 bytes + Emit64(&cursor, reinterpret_cast(target)); + + const size_t written = reinterpret_cast(cursor) - + static_cast(code); + __builtin___clear_cache(static_cast(code), + static_cast(code) + written); + + if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) { + FreeCode(code, kCodeSize); + return false; + } + + out->code = code; + out->size = kCodeSize; + return true; +} + +extern "C" void node_ffi_free_fast_trampoline( + node::ffi::FastFFITrampoline* trampoline) { + if (trampoline == nullptr || trampoline->code == nullptr) { + return; + } + FreeCode(trampoline->code, trampoline->size); + trampoline->code = nullptr; + trampoline->size = 0; +} + +#endif // defined(__riscv) && __riscv_xlen == 64 + +#endif // HAVE_FFI diff --git a/src/ffi/platforms/s390x.cc b/src/ffi/platforms/s390x.cc new file mode 100644 index 00000000000000..2d8a0c0bea7cb6 --- /dev/null +++ b/src/ffi/platforms/s390x.cc @@ -0,0 +1,168 @@ +#if HAVE_FFI + +#include "ffi/fast.h" + +#if defined(__s390x__) + +#include +#include + +#include + +namespace { + +using node::ffi::FastFFIType; + +bool IsFloatType(FastFFIType type) { + return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64; +} + +bool IsNarrowType(FastFFIType type) { + switch (type) { + case FastFFIType::kBool: + case FastFFIType::kInt8: + case FastFFIType::kUint8: + case FastFFIType::kInt16: + case FastFFIType::kUint16: + return true; + default: + return false; + } +} + +uint32_t Lgr(unsigned r1, unsigned r2) { + return 0xb9040000u | (r1 << 4) | r2; +} + +uint16_t Br(unsigned r2) { + return 0x07f0u | r2; +} + +uint64_t Lgrl(unsigned r1, int imm) { + return 0xc40800000000ull | (static_cast(r1) << 36) | + (static_cast(imm)); +} + +void Emit16(uint8_t** cursor, uint16_t value) { + *(*cursor)++ = value >> 8; + *(*cursor)++ = value; +} + +void Emit32(uint8_t** cursor, uint32_t value) { + *(*cursor)++ = value >> 24; + *(*cursor)++ = value >> 16; + *(*cursor)++ = value >> 8; + *(*cursor)++ = value; +} + +void Emit48(uint8_t** cursor, uint64_t value) { + *(*cursor)++ = value >> 40; + *(*cursor)++ = value >> 32; + *(*cursor)++ = value >> 24; + *(*cursor)++ = value >> 16; + *(*cursor)++ = value >> 8; + *(*cursor)++ = value; +} + +void Emit64(uint8_t** cursor, uint64_t value) { + for (int shift = 56; shift >= 0; shift -= 8) { + *(*cursor)++ = value >> shift; + } +} + +void* AllocateCode(size_t code_size) { + void* code = mmap(nullptr, + code_size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, + -1, + 0); + return code == MAP_FAILED ? nullptr : code; +} + +void FreeCode(void* code, size_t code_size) { + munmap(code, code_size); +} + +} // namespace + +extern "C" bool node_ffi_create_fast_trampoline( + void* target, + const node::ffi::FastFFIType* args, + size_t argc, + node::ffi::FastFFIType result, + node::ffi::FastFFITrampoline* out) { + if (target == nullptr || out == nullptr || IsNarrowType(result)) { + return false; + } + + size_t gp_count = 0; + size_t fp_count = 0; + for (size_t i = 0; i < argc; i++) { + if (args[i] == FastFFIType::kBuffer) { + return false; + } + if (IsFloatType(args[i])) { + fp_count++; + } else { + gp_count++; + } + } + + // Linux s390x passes integer arguments in r2..r6. V8's receiver occupies r2, + // so user GP arguments arrive in r3..r6 and are shifted down before the tail + // branch. FP arguments are already in f0, f2, f4, and f6. + if (gp_count > 4 || fp_count > 4) { + return false; + } + + constexpr size_t kCodeSize = 256; + void* code = AllocateCode(kCodeSize); + if (code == nullptr) { + return false; + } + + uint8_t* cursor = static_cast(code); + unsigned gp_index = 0; + for (size_t i = 0; i < argc; i++) { + if (IsFloatType(args[i])) { + continue; + } + const unsigned target_reg = 2 + gp_index; + const unsigned incoming_reg = target_reg + 1; + Emit32(&cursor, Lgr(target_reg, incoming_reg)); + gp_index++; + } + + // Load the target address from the literal pool into r1 and tail-branch. + Emit48(&cursor, Lgrl(1, 4)); // lgrl r1, literal + Emit16(&cursor, Br(1)); // br r1 + Emit64(&cursor, reinterpret_cast(target)); + + const size_t written = cursor - static_cast(code); + __builtin___clear_cache(static_cast(code), + static_cast(code) + written); + + if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) { + FreeCode(code, kCodeSize); + return false; + } + + out->code = code; + out->size = kCodeSize; + return true; +} + +extern "C" void node_ffi_free_fast_trampoline( + node::ffi::FastFFITrampoline* trampoline) { + if (trampoline == nullptr || trampoline->code == nullptr) { + return; + } + FreeCode(trampoline->code, trampoline->size); + trampoline->code = nullptr; + trampoline->size = 0; +} + +#endif // defined(__s390x__) + +#endif // HAVE_FFI diff --git a/src/ffi/platforms/x64.cc b/src/ffi/platforms/x64.cc index cacf56bd773235..1073508d365b18 100644 --- a/src/ffi/platforms/x64.cc +++ b/src/ffi/platforms/x64.cc @@ -543,6 +543,230 @@ extern "C" void node_ffi_free_fast_trampoline( trampoline->size = 0; } -#endif // defined(__x86_64__) && !defined(_WIN32) +#elif defined(_M_X64) + +#include + +#include + +namespace { + +using node::ffi::FastFFIType; + +constexpr unsigned kRax = 0; +constexpr unsigned kRcx = 1; +constexpr unsigned kRdx = 2; +constexpr unsigned kRsp = 4; +constexpr unsigned kR8 = 8; +constexpr unsigned kR9 = 9; +constexpr unsigned kR11 = 11; + +constexpr unsigned kWin64GPRegisters[] = {kRcx, kRdx, kR8, kR9}; + +bool IsFloatType(FastFFIType type) { + return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64; +} + +bool IsBufferType(FastFFIType type) { + return type == FastFFIType::kBuffer; +} + +void Emit8(uint8_t** cursor, uint8_t value) { + *(*cursor)++ = value; +} + +void Emit32(uint8_t** cursor, uint32_t value) { + for (unsigned i = 0; i < 4; i++) { + Emit8(cursor, (value >> (i * 8)) & 0xff); + } +} + +void Emit64(uint8_t** cursor, uint64_t value) { + for (unsigned i = 0; i < 8; i++) { + Emit8(cursor, (value >> (i * 8)) & 0xff); + } +} + +void EmitRex(uint8_t** cursor, bool wide, unsigned reg, unsigned rm) { + Emit8(cursor, 0x40 | (wide ? 0x08 : 0) | ((reg >> 3) << 2) | (rm >> 3)); +} + +void EmitModRM(uint8_t** cursor, unsigned reg, unsigned rm) { + Emit8(cursor, 0xc0 | ((reg & 7) << 3) | (rm & 7)); +} + +void EmitMov(uint8_t** cursor, unsigned dst, unsigned src) { + EmitRex(cursor, true, src, dst); + Emit8(cursor, 0x89); + EmitModRM(cursor, src, dst); +} + +void EmitMovaps(uint8_t** cursor, unsigned dst, unsigned src) { + // movaps xmm_dst, xmm_src. Used only for register-to-register argument + // shuffles; it preserves the payload bits for both f32 and f64 arguments. + EmitRex(cursor, false, dst, src); + Emit8(cursor, 0x0f); + Emit8(cursor, 0x28); + EmitModRM(cursor, dst, src); +} + +void EmitMovImm64(uint8_t** cursor, unsigned reg, uintptr_t value) { + EmitRex(cursor, true, 0, reg); + Emit8(cursor, 0xb8 | (reg & 7)); + Emit64(cursor, value); +} + +void EmitCall(uint8_t** cursor, unsigned reg) { + EmitRex(cursor, true, 0, reg); + Emit8(cursor, 0xff); + EmitModRM(cursor, 2, reg); +} + +void EmitJmp(uint8_t** cursor, unsigned reg) { + EmitRex(cursor, true, 0, reg); + Emit8(cursor, 0xff); + EmitModRM(cursor, 4, reg); +} + +void EmitSubRsp(uint8_t** cursor, unsigned value) { + EmitRex(cursor, true, 5, kRsp); + Emit8(cursor, 0x81); + Emit8(cursor, 0xec); + Emit32(cursor, value); +} + +void EmitAddRsp(uint8_t** cursor, unsigned value) { + EmitRex(cursor, true, 0, kRsp); + Emit8(cursor, 0x81); + Emit8(cursor, 0xc4); + Emit32(cursor, value); +} + +void EmitRet(uint8_t** cursor) { + Emit8(cursor, 0xc3); +} + +void EmitNarrowInstruction(uint8_t** cursor, uint8_t opcode, unsigned reg) { + EmitRex(cursor, false, reg, reg); + Emit8(cursor, 0x0f); + Emit8(cursor, opcode); + EmitModRM(cursor, reg, reg); +} + +bool EmitNarrowReturn(uint8_t** cursor, FastFFIType type, unsigned reg) { + switch (type) { + case FastFFIType::kBool: + case FastFFIType::kUint8: + EmitNarrowInstruction(cursor, 0xb6, reg); + return true; + case FastFFIType::kInt8: + EmitNarrowInstruction(cursor, 0xbe, reg); + return true; + case FastFFIType::kUint16: + EmitNarrowInstruction(cursor, 0xb7, reg); + return true; + case FastFFIType::kInt16: + EmitNarrowInstruction(cursor, 0xbf, reg); + return true; + default: + return false; + } +} + +bool NeedsNarrow(FastFFIType type) { + switch (type) { + case FastFFIType::kBool: + case FastFFIType::kUint8: + case FastFFIType::kInt8: + case FastFFIType::kUint16: + case FastFFIType::kInt16: + return true; + default: + return false; + } +} + +void FreeCode(void* code, size_t code_size) { + VirtualFree(code, 0, MEM_RELEASE); +} + +} // namespace + +extern "C" bool node_ffi_create_fast_trampoline( + void* target, + const node::ffi::FastFFIType* args, + size_t argc, + node::ffi::FastFFIType result, + node::ffi::FastFFITrampoline* out) { + if (target == nullptr || out == nullptr || argc > 3) { + return false; + } + + for (size_t i = 0; i < argc; i++) { + if (IsBufferType(args[i])) { + return false; + } + } + + constexpr size_t kCodeSize = 512; + void* code = VirtualAlloc( + nullptr, kCodeSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); + if (code == nullptr) { + return false; + } + + uint8_t* cursor = static_cast(code); + const bool tail_call = !NeedsNarrow(result); + + // Win64 uses positional registers. The V8 receiver occupies position 0, so + // public arguments arrive in positions 1..3 and must be shifted down. + for (size_t i = 0; i < argc; i++) { + if (IsFloatType(args[i])) { + EmitMovaps(&cursor, static_cast(i), static_cast(i + 1)); + } else { + EmitMov(&cursor, kWin64GPRegisters[i], kWin64GPRegisters[i + 1]); + } + } + + EmitMovImm64(&cursor, kR11, reinterpret_cast(target)); + if (tail_call) { + // The caller already provided Win64 shadow space for the trampoline; after + // the receiver-slot shuffle, the target can reuse the same stack shape. + EmitJmp(&cursor, kR11); + } else { + // Reserve 32 bytes of shadow space plus 8 bytes for 16-byte stack alignment + // before making a nested call from inside the trampoline. + EmitSubRsp(&cursor, 40); + EmitCall(&cursor, kR11); + EmitAddRsp(&cursor, 40); + EmitNarrowReturn(&cursor, result, kRax); + EmitRet(&cursor); + } + + const size_t written = cursor - static_cast(code); + FlushInstructionCache(GetCurrentProcess(), code, written); + + DWORD old_protect; + if (VirtualProtect(code, kCodeSize, PAGE_EXECUTE_READ, &old_protect) == 0) { + FreeCode(code, kCodeSize); + return false; + } + + out->code = code; + out->size = kCodeSize; + return true; +} + +extern "C" void node_ffi_free_fast_trampoline( + node::ffi::FastFFITrampoline* trampoline) { + if (trampoline == nullptr || trampoline->code == nullptr) { + return; + } + FreeCode(trampoline->code, trampoline->size); + trampoline->code = nullptr; + trampoline->size = 0; +} + +#endif // defined(_M_X64) #endif // HAVE_FFI diff --git a/src/ffi/types.cc b/src/ffi/types.cc index 336a8fa10053e4..9ba3cc4da448a3 100644 --- a/src/ffi/types.cc +++ b/src/ffi/types.cc @@ -236,10 +236,12 @@ bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason) { if (out_reason == nullptr) out_reason = &dummy; // Check that a platform stub emitter exists for the current ABI. - // Stub emitters cover AArch64 (Linux/macOS/FreeBSD/Windows) and - // x86_64 (SysV: Linux/macOS/FreeBSD, Win64: Windows). Other platforms + // Stub emitters cover AArch64, x86_64 SysV, and Win64 x64. Other platforms // fall back to libffi. -#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__) +#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__) && \ + !defined(_M_X64) && !defined(__powerpc64__) && !defined(__ppc64__) && \ + !defined(__PPC64__) && !defined(__loongarch64) && \ + !(defined(__riscv) && __riscv_xlen == 64) && !defined(__s390x__) *out_reason = "no platform stub emitter"; return false; #endif @@ -324,16 +326,109 @@ bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason) { *out_reason = "argument count exceeds AArch64 register limit"; return false; } -#elif defined(__x86_64__) -#if defined(_WIN32) - // No Win64 trampoline emitter exists (src/ffi/platforms implements only - // AArch64 and x86_64 SysV), so Win64 fast-call is never eligible. This is - // already short-circuited earlier by IsJitMemorySupported() returning false - // on Windows; rejecting here keeps eligibility self-consistent regardless of - // caller order. - *out_reason = "no Win64 fast-call trampoline emitter"; +#elif defined(_M_X64) + // Win64 x64 uses positional integer/FP registers. The current emitter handles + // only the register-only scalar subset: receiver plus up to three public + // arguments. Buffer-shaped arguments require FastApiCallbackOptions and a C++ + // helper call, which is left to fallback until the Win64 emitter grows stack + // and helper support. + if (has_buffer_arg) { + *out_reason = "buffer args are not yet supported on Win64 x64"; + return false; + } + if (fn.args.size() > 3) { + *out_reason = "argument count exceeds Win64 x64 register-only limit"; + return false; + } + if (fp_count > 3 || gp_count > 3) { + *out_reason = "argument count exceeds Win64 x64 register-only limit"; + return false; + } +#elif defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__) +#if defined(_AIX) || \ + !(defined(__LITTLE_ENDIAN__) || \ + (defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)) + *out_reason = "no PPC64BE fast-call trampoline emitter"; return false; #else + // PPC64LE ELFv2: r3 is occupied by V8's receiver, leaving r4..r10 for + // incoming user GP arguments. FP arguments use FPRs and are not shifted by + // the receiver slot. The first PPC64LE emitter is scalar-only and + // tail-branches to the target, so narrow return normalization and buffer + // helper calls fall back. + if (has_buffer_arg) { + *out_reason = "buffer args are not yet supported on PPC64LE"; + return false; + } + if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 || + fn.return_type == &ffi_type_sint16 || + fn.return_type == &ffi_type_uint16) { + *out_reason = "narrow returns are not yet supported on PPC64LE"; + return false; + } + if (gp_count > 7 || fp_count > 8) { + *out_reason = "argument count exceeds PPC64LE register limit"; + return false; + } +#endif +#elif defined(__loongarch64) + // LoongArch64: a0 is occupied by V8's receiver, leaving a1..a7 for incoming + // user GP arguments. FP arguments are already in fa0..fa7. The current + // emitter is scalar-only and tail-branches to the target, so narrow returns + // and buffer helper calls fall back. + if (has_buffer_arg) { + *out_reason = "buffer args are not yet supported on LoongArch64"; + return false; + } + if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 || + fn.return_type == &ffi_type_sint16 || + fn.return_type == &ffi_type_uint16) { + *out_reason = "narrow returns are not yet supported on LoongArch64"; + return false; + } + if (gp_count > 7 || fp_count > 8) { + *out_reason = "argument count exceeds LoongArch64 register limit"; + return false; + } +#elif defined(__riscv) && __riscv_xlen == 64 + // RISC-V LP64D: a0 is occupied by V8's receiver, leaving a1..a7 for incoming + // user GP arguments. FP arguments are already in fa0..fa7. The current + // emitter is scalar-only and tail-branches to the target, so narrow returns + // and buffer helper calls fall back. + if (has_buffer_arg) { + *out_reason = "buffer args are not yet supported on RISC-V 64"; + return false; + } + if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 || + fn.return_type == &ffi_type_sint16 || + fn.return_type == &ffi_type_uint16) { + *out_reason = "narrow returns are not yet supported on RISC-V 64"; + return false; + } + if (gp_count > 7 || fp_count > 8) { + *out_reason = "argument count exceeds RISC-V 64 register limit"; + return false; + } +#elif defined(__s390x__) + // Linux s390x: r2 is occupied by V8's receiver, leaving r3..r6 for incoming + // user GP arguments. FP arguments are already in f0, f2, f4, and f6. The + // current emitter is scalar-only and tail-branches to the target, so narrow + // returns and buffer helper calls fall back. + if (has_buffer_arg) { + *out_reason = "buffer args are not yet supported on s390x"; + return false; + } + if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 || + fn.return_type == &ffi_type_sint16 || + fn.return_type == &ffi_type_uint16) { + *out_reason = "narrow returns are not yet supported on s390x"; + return false; + } + if (gp_count > 4 || fp_count > 4) { + *out_reason = "argument count exceeds s390x register limit"; + return false; + } +#elif defined(__x86_64__) // x86_64 SysV: the V8 receiver occupies rdi, leaving rsi, rdx, rcx, r8, r9 // (5 incoming GP slots); scalar signatures can load one more user GP arg // from the caller stack, for an effective cap of 6 GP. FP args use @@ -352,7 +447,6 @@ bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason) { *out_reason = "argument count exceeds x86_64 SysV register limit"; return false; } -#endif // _WIN32 #endif // __x86_64__ *out_reason = ""; diff --git a/src/ffi/types.h b/src/ffi/types.h index d49619a19b71cb..a68549bc629853 100644 --- a/src/ffi/types.h +++ b/src/ffi/types.h @@ -60,11 +60,13 @@ bool SignaturesMatch(const FFIFunction& fn, // Eligibility checks: every arg type and the return type are // numeric-or-pointer, no `function`-typed args/return, arg count // within V8 fast-call cap (8), and register-passed arg counts within -// per-ABI limits. Trampoline emitters currently exist only for AArch64 -// (≤ 7 GP + ≤ 8 FP) and x86_64 SysV (≤ 6 GP + ≤ 8 FP; buffer args cap GP at -// 5 and cannot coexist with FP args). Platforms without an emitter -// (including Win64) are reported ineligible so the caller falls back to -// libffi. +// per-ABI limits. Trampoline emitters currently exist for AArch64 +// (≤ 7 GP + ≤ 8 FP), x86_64 SysV (≤ 6 GP + ≤ 8 FP; buffer args cap GP at 5 and +// cannot coexist with FP args), Win64 x64 (≤ 3 register-only scalar args), and +// PPC64LE ELFv2, LoongArch64, and RISC-V 64 (≤ 7 GP + ≤ 8 FP scalar args, no +// narrow returns), and s390x (≤ 4 GP + ≤ 4 FP scalar args, no narrow returns). +// Platforms without an emitter are reported ineligible so the caller falls back +// to libffi. bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason); // True if the FFI type can be read from / written to a raw byte buffer