diff --git a/doc/contributing/ffi-fast-api-internals.md b/doc/contributing/ffi-fast-api-internals.md
index b0d318b3a1bee8..280fa44251199a 100644
--- a/doc/contributing/ffi-fast-api-internals.md
+++ b/doc/contributing/ffi-fast-api-internals.md
@@ -41,10 +41,9 @@ The implementation is split across these files:
 * `src/ffi/types.{h,cc}` parses public FFI signatures and implements
   `IsFastCallEligible()`, which rejects signatures that the current Fast API
   trampolines cannot represent.
-* `src/ffi/platforms/arm64.cc` and `src/ffi/platforms/x64.cc` contain the
-  platform trampoline generators. These files follow the contract exposed by
-  `node_ffi_create_fast_trampoline()` and release code with
-  `node_ffi_free_fast_trampoline()`.
+* `src/ffi/platforms/*.cc` contain the platform trampoline generators. These
+  files follow the contract exposed by `node_ffi_create_fast_trampoline()` and
+  release code with `node_ffi_free_fast_trampoline()`.
 * `src/node_ffi.cc` decides whether a function gets a Fast API callable,
   SharedBuffer callable, or generic callable, and attaches hidden metadata used
   by JavaScript wrappers.
@@ -88,8 +87,7 @@ true only on supported architectures when `IsJitMemorySupported()` succeeds.
 `IsJitMemorySupported()` runs a one-time self-test:
 
 * Map one writable anonymous page.
-* Write a minimal return instruction (`0xD65F03C0` on AArch64, `0xC3` on
-  x86\_64).
+* Write a minimal return instruction for the current architecture.
 * Flush the instruction cache where required.
 * Try to transition the page to read/execute with `mprotect(PROT_READ |
   PROT_EXEC)`.
@@ -99,8 +97,8 @@ The probe deliberately does not execute the generated instruction. Executing a
 freshly written capability probe could terminate the process on systems that
 block generated code. The real trampoline emitter performs the same writable to
 executable transition when creating a callable trampoline and falls back when it
-is rejected. Windows currently returns false because the branch does not yet
-have a Win64 trampoline emitter or `VirtualAlloc`-based JIT memory support.
+is rejected. Windows uses `VirtualAlloc`, `VirtualProtect`, and
+`FlushInstructionCache` for the same probe.
 
 ## Signature Eligibility
 
@@ -110,8 +108,8 @@ keeps unsupported cases out of the trampoline emitters and lets
 
 Eligibility requires:
 
-* A supported platform emitter: AArch64 or x86\_64 SysV. Win64 is currently
-  ineligible.
+* A supported platform emitter: AArch64, x86\_64 SysV, Win64 x64, PPC64LE
+  ELFv2, LoongArch64, RISC-V 64, or s390x.
 * A return type that is numeric, pointer, or `void`.
 * Argument types that are numeric or pointer. `void` cannot be an argument.
 * No `function` typed argument or return value.
@@ -141,6 +139,56 @@ x86\_64 SysV eligibility mirrors `src/ffi/platforms/x64.cc`:
   incoming GP count is capped at 5 and buffer-shaped arguments cannot coexist
   with FP arguments.
 
+Win64 x64 eligibility mirrors the conservative Windows emitter in
+`src/ffi/platforms/x64.cc`:
+
+* The JavaScript receiver occupies the first positional register slot.
+* Public arguments are shifted from positions 1..3 into positions 0..2.
+* Integer and FP arguments are handled according to their positional Win64
+  register slots.
+* Only scalar register-only signatures with at most three public arguments are
+  currently eligible.
+* Buffer-shaped arguments and stack-passed arguments fall back.
+
+PPC64LE eligibility mirrors `src/ffi/platforms/ppc64.cc`:
+
+* `r3` is occupied by V8's receiver, so user GP arguments arrive in `r4..r10`.
+* FP arguments use FPRs and are not shifted by the receiver slot.
+* The generated trampoline shifts only GP registers and tail-branches to the
+  target through `ctr`, with the target address in `r12` for ELFv2 global entry.
+* Only scalar register-only signatures are currently eligible.
+* Buffer-shaped arguments, stack-passed arguments, narrow returns, and PPC64BE
+  platforms fall back. AIX/PPC64BE is intentionally a non-target for the current
+  Fast FFI trampoline work because its ABI/linkage shape needs separate design.
+
+LoongArch64 eligibility mirrors `src/ffi/platforms/loong64.cc`:
+
+* `a0` is occupied by V8's receiver, so user GP arguments arrive in `a1..a7`.
+* FP arguments use `fa0..fa7` and are not shifted by the receiver slot.
+* The generated trampoline shifts only GP registers and tail-branches to the
+  target through `jirl`.
+* Only scalar register-only signatures are currently eligible.
+* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back.
+
+RISC-V 64 eligibility mirrors `src/ffi/platforms/riscv64.cc`:
+
+* `a0` is occupied by V8's receiver, so user GP arguments arrive in `a1..a7`.
+* FP arguments use `fa0..fa7` and are not shifted by the receiver slot.
+* The generated trampoline shifts only GP registers and tail-branches to the
+  target through `jalr`.
+* Only scalar register-only signatures are currently eligible.
+* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back.
+
+s390x eligibility mirrors `src/ffi/platforms/s390x.cc`:
+
+* `r2` is occupied by V8's receiver, so user GP arguments arrive in `r3..r6`.
+* FP arguments use `f0`, `f2`, `f4`, and `f6` and are not shifted by the receiver
+  slot.
+* The generated trampoline shifts only GP registers and tail-branches to the
+  target through `br`.
+* Only scalar register-only signatures are currently eligible.
+* Buffer-shaped arguments, stack-passed arguments, and narrow returns fall back.
+
 The native trampoline generator still repeats its own register checks. The
 eligibility function is the early, centralized rejection point; the generator
 checks are a defense against direct or future callers.
@@ -395,9 +443,22 @@ Important limits are:
 * No stack arguments in the current AArch64 trampoline.
 * At most one stack-loaded scalar GP argument in the current x86\_64 SysV
   trampoline.
+* No stack arguments or buffer-shaped arguments in the current Win64 x64
+  trampoline.
+* No stack arguments, buffer-shaped arguments, or narrow returns in the current
+  PPC64LE trampoline.
+* No stack arguments, buffer-shaped arguments, or narrow returns in the current
+  LoongArch64, RISC-V 64, and s390x trampolines.
 * No mixed buffer-shaped and FP arguments.
 * No `function` argument or return type in the Fast API path.
 
+Linux x86 and armv7 are experimental Node.js platforms, but the current Fast FFI
+trampoline model remains 64-bit only. They continue to use SharedBuffer or
+generic libffi fallback paths. Linux s390x is a Tier 2 Node.js platform, but
+bundled FFI is not currently enabled for that target; if built with
+`--shared-ffi`, scalar register-only Fast API FFI can use the s390x emitter. AIX
+PPC64BE is intentionally not covered by this implementation.
+
 These are optimization boundaries, not public FFI signature boundaries. User
 code can still call supported public FFI signatures through fallback paths.
 
diff --git a/node.gyp b/node.gyp
index 03c85224a3e7cb..706489560db9d5 100644
--- a/node.gyp
+++ b/node.gyp
@@ -486,6 +486,10 @@
       'src/node_ffi.cc',
       'src/node_ffi.h',
       'src/ffi/platforms/arm64.cc',
+      'src/ffi/platforms/loong64.cc',
+      'src/ffi/platforms/ppc64.cc',
+      'src/ffi/platforms/riscv64.cc',
+      'src/ffi/platforms/s390x.cc',
       'src/ffi/platforms/x64.cc',
       'src/ffi/data.cc',
       'src/ffi/data.h',
diff --git a/src/ffi/fast.cc b/src/ffi/fast.cc
index 7e8d182a7bdc87..13c039c0da3f17 100644
--- a/src/ffi/fast.cc
+++ b/src/ffi/fast.cc
@@ -222,7 +222,10 @@ FastFFIMetadata::~FastFFIMetadata() {
 
 bool IsFastCallSupported() {
   // Fast call requires both a platform stub emitter and working JIT memory.
-#if defined(__aarch64__) || defined(_M_ARM64) || defined(__x86_64__)
+#if defined(__aarch64__) || defined(_M_ARM64) || defined(__x86_64__) ||        \
+    defined(_M_X64) || defined(__powerpc64__) || defined(__ppc64__) ||         \
+    defined(__PPC64__) || defined(__loongarch64) ||                            \
+    (defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__)
   return IsJitMemorySupported();
 #else
   return false;
diff --git a/src/ffi/jit_memory.cc b/src/ffi/jit_memory.cc
index 0c4de68305c772..023b55757691e7 100644
--- a/src/ffi/jit_memory.cc
+++ b/src/ffi/jit_memory.cc
@@ -2,29 +2,31 @@
 
 #include "ffi/jit_memory.h"
 
-#if !defined(_WIN32)
-
-#include <sys/mman.h>
-#include <unistd.h>
-
 #include <cstdint>
 #include <cstring>
 #include <mutex>
 
+#if defined(_WIN32)
+#include <windows.h>
+#else
+#include <sys/mman.h>
+#include <unistd.h>
+
 #if defined(__APPLE__)
 #include <libkern/OSCacheControl.h>
 #endif
 
-#endif  // !defined(_WIN32)
+#endif  // defined(_WIN32)
 
 namespace node::ffi {
 
 namespace {
 
-#if !defined(_WIN32)
-
 bool SelfTest() {
-#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__)
+#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__) &&     \
+    !defined(_M_X64) && !defined(__powerpc64__) && !defined(__ppc64__) &&      \
+    !defined(__PPC64__) && !defined(__loongarch64) &&                          \
+    !(defined(__riscv) && __riscv_xlen == 64) && !defined(__s390x__)
   // No stub emitter for this platform; nothing to test.
   return false;
 #else
@@ -32,12 +34,53 @@ bool SelfTest() {
   // AArch64 BR LR: 0xD65F03C0
   constexpr uint32_t kInstruction = 0xD65F03C0;
   constexpr size_t kInstructionSize = sizeof(uint32_t);
+#elif defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__)
+  // PPC64 BLR: 0x4E800020
+  constexpr uint32_t kInstruction = 0x4E800020;
+  constexpr size_t kInstructionSize = sizeof(uint32_t);
+#elif defined(__loongarch64)
+  // LoongArch64 JIRL zero, ra, 0
+  constexpr uint32_t kInstruction = 0x4C000020;
+  constexpr size_t kInstructionSize = sizeof(uint32_t);
+#elif defined(__riscv) && __riscv_xlen == 64
+  // RISC-V JALR zero, ra, 0
+  constexpr uint32_t kInstruction = 0x00008067;
+  constexpr size_t kInstructionSize = sizeof(uint32_t);
+#elif defined(__s390x__)
+  // s390x BR r14
+  constexpr uint16_t kInstruction = 0x07fe;
+  constexpr size_t kInstructionSize = sizeof(uint16_t);
 #else
   // x86_64 RET: 0xC3
   constexpr uint8_t kInstruction = 0xC3;
   constexpr size_t kInstructionSize = sizeof(uint8_t);
 #endif
 
+#if defined(_WIN32)
+  void* page = VirtualAlloc(
+      nullptr, kInstructionSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
+  if (page == nullptr) {
+    return false;
+  }
+
+  uint8_t* code = static_cast<uint8_t*>(page);
+#if defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) ||     \
+    defined(__ppc64__) || defined(__PPC64__) || defined(__loongarch64) ||      \
+    (defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__)
+  std::memcpy(code, &kInstruction, kInstructionSize);
+#else
+  code[0] = kInstruction;
+#endif
+
+  FlushInstructionCache(GetCurrentProcess(), page, kInstructionSize);
+
+  DWORD old_protect;
+  const bool ok =
+      VirtualProtect(page, kInstructionSize, PAGE_EXECUTE_READ, &old_protect) !=
+      0;
+  VirtualFree(page, 0, MEM_RELEASE);
+  return ok;
+#else
   const size_t page_size = static_cast<size_t>(getpagesize());
   void* page = mmap(nullptr,
                     page_size,
@@ -50,7 +93,9 @@ bool SelfTest() {
   }
 
   uint8_t* code = static_cast<uint8_t*>(page);
-#if defined(__aarch64__) || defined(_M_ARM64)
+#if defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) ||     \
+    defined(__ppc64__) || defined(__PPC64__) || defined(__loongarch64) ||      \
+    (defined(__riscv) && __riscv_xlen == 64) || defined(__s390x__)
   std::memcpy(code, &kInstruction, kInstructionSize);
 #elif defined(__x86_64__)
   code[0] = kInstruction;
@@ -84,25 +129,18 @@ bool SelfTest() {
   munmap(page, page_size);
   return ok;
 #endif
+#endif
 }
 
-#endif  // !defined(_WIN32)
-
 }  // namespace
 
 bool IsJitMemorySupported() {
-#if defined(_WIN32)
-  // Windows stub emitter and VirtualAlloc-based JIT memory support not yet
-  // implemented. Return false so the fast-call path falls back to libffi.
-  return false;
-#else
   // Run the self-test exactly once and publish only the final result, so
   // concurrent callers never observe a provisional value.
   static std::once_flag once;
   static bool supported = false;
   std::call_once(once, [] { supported = SelfTest(); });
   return supported;
-#endif
 }
 
 }  // namespace node::ffi
diff --git a/src/ffi/platforms/arm64.cc b/src/ffi/platforms/arm64.cc
index b0a2261074c16a..ccb7a8cf3d04ff 100644
--- a/src/ffi/platforms/arm64.cc
+++ b/src/ffi/platforms/arm64.cc
@@ -2,10 +2,14 @@
 
 #include "ffi/fast.h"
 
-#if (defined(__aarch64__) || defined(_M_ARM64)) && !defined(_WIN32)
+#if defined(__aarch64__) || defined(_M_ARM64)
 
+#if defined(_WIN32)
+#include <windows.h>
+#else
 #include <sys/mman.h>
 #include <unistd.h>
+#endif
 
 #include <stdint.h>
 
@@ -163,6 +167,50 @@ unsigned Align16(unsigned value) {
   return (value + 15) & ~15;
 }
 
+void* AllocateCode(size_t code_size) {
+#if defined(_WIN32)
+  return VirtualAlloc(
+      nullptr, code_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
+#else
+  void* code = mmap(nullptr,
+                    code_size,
+                    PROT_READ | PROT_WRITE,
+                    MAP_PRIVATE | MAP_ANON,
+                    -1,
+                    0);
+  return code == MAP_FAILED ? nullptr : code;
+#endif
+}
+
+void FreeCode(void* code, size_t code_size) {
+#if defined(_WIN32)
+  VirtualFree(code, 0, MEM_RELEASE);
+#else
+  munmap(code, code_size);
+#endif
+}
+
+void FlushCode(void* code, size_t written) {
+#if defined(_WIN32)
+  FlushInstructionCache(GetCurrentProcess(), code, written);
+#elif defined(__APPLE__)
+  // Make the just-written instructions visible to the CPU's instruction cache.
+  sys_icache_invalidate(code, written);
+#else
+  __builtin___clear_cache(static_cast<char*>(code),
+                          static_cast<char*>(code) + written);
+#endif
+}
+
+bool ProtectCode(void* code, size_t code_size) {
+#if defined(_WIN32)
+  DWORD old_protect;
+  return VirtualProtect(code, code_size, PAGE_EXECUTE_READ, &old_protect) != 0;
+#else
+  return mprotect(code, code_size, PROT_READ | PROT_EXEC) == 0;
+#endif
+}
+
 }  // namespace
 
 extern "C" bool node_ffi_create_fast_trampoline(
@@ -218,13 +266,8 @@ extern "C" bool node_ffi_create_fast_trampoline(
   // Generate into writable anonymous memory first; the page is made executable
   // only after the instruction stream is complete and the instruction cache is
   // synchronized.
-  void* code = mmap(nullptr,
-                    code_size,
-                    PROT_READ | PROT_WRITE,
-                    MAP_PRIVATE | MAP_ANON,
-                    -1,
-                    0);
-  if (code == MAP_FAILED) {
+  void* code = AllocateCode(code_size);
+  if (code == nullptr) {
     return false;
   }
 
@@ -340,18 +383,12 @@ extern "C" bool node_ffi_create_fast_trampoline(
   const size_t written = reinterpret_cast<uint8_t*>(cursor) -
                          static_cast<uint8_t*>(code);
 
-#if defined(__APPLE__)
-  // Make the just-written instructions visible to the CPU's instruction cache.
-  sys_icache_invalidate(code, written);
-#else
-  __builtin___clear_cache(static_cast<char*>(code),
-                          static_cast<char*>(code) + written);
-#endif
+  FlushCode(code, written);
 
   // Enforce W^X after code generation: the trampoline is executable but no
   // longer writable once published through FastFFITrampoline.
-  if (mprotect(code, code_size, PROT_READ | PROT_EXEC) != 0) {
-    munmap(code, code_size);
+  if (!ProtectCode(code, code_size)) {
+    FreeCode(code, code_size);
     return false;
   }
 
@@ -367,12 +404,15 @@ extern "C" void node_ffi_free_fast_trampoline(
   if (trampoline == nullptr || trampoline->code == nullptr) {
     return;
   }
-  munmap(trampoline->code, trampoline->size);
+  FreeCode(trampoline->code, trampoline->size);
   trampoline->code = nullptr;
   trampoline->size = 0;
 }
 
-#elif !defined(__x86_64__) || defined(_WIN32)
+#elif !defined(__x86_64__) && !defined(_M_X64) && \
+    !defined(__powerpc64__) && !defined(__ppc64__) && \
+    !defined(__loongarch64) && \
+    !(defined(__riscv) && __riscv_xlen == 64) && !defined(__s390x__)
 
 extern "C" bool node_ffi_create_fast_trampoline(
     void* target,
@@ -390,6 +430,6 @@ extern "C" void node_ffi_free_fast_trampoline(
   // No code is allocated in the non-AArch64 stub.
 }
 
-#endif  // (defined(__aarch64__) || defined(_M_ARM64)) && !defined(_WIN32)
+#endif  // defined(__aarch64__) || defined(_M_ARM64)
 
 #endif  // HAVE_FFI
diff --git a/src/ffi/platforms/loong64.cc b/src/ffi/platforms/loong64.cc
new file mode 100644
index 00000000000000..9b3bd5dfd90102
--- /dev/null
+++ b/src/ffi/platforms/loong64.cc
@@ -0,0 +1,159 @@
+#if HAVE_FFI
+
+#include "ffi/fast.h"
+
+#if defined(__loongarch64)
+
+#include <sys/mman.h>
+#include <unistd.h>
+
+#include <stdint.h>
+
+namespace {
+
+using node::ffi::FastFFIType;
+
+bool IsFloatType(FastFFIType type) {
+  return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64;
+}
+
+bool IsNarrowType(FastFFIType type) {
+  switch (type) {
+    case FastFFIType::kBool:
+    case FastFFIType::kInt8:
+    case FastFFIType::kUint8:
+    case FastFFIType::kInt16:
+    case FastFFIType::kUint16:
+      return true;
+    default:
+      return false;
+  }
+}
+
+uint32_t Or(unsigned rd, unsigned rj, unsigned rk) {
+  return (0x2au << 15) | (rk << 10) | (rj << 5) | rd;
+}
+
+uint32_t Pcaddu12i(unsigned rd, int imm20) {
+  return (0x0eu << 25) | ((static_cast<uint32_t>(imm20) & 0xfffff) << 5) | rd;
+}
+
+uint32_t LdD(unsigned rd, unsigned rj, int imm12) {
+  return (0xa3u << 22) | ((static_cast<uint32_t>(imm12) & 0xfff) << 10) |
+         (rj << 5) | rd;
+}
+
+uint32_t Jirl(unsigned rd, unsigned rj, int imm16) {
+  return (0x13u << 26) | ((static_cast<uint32_t>(imm16) & 0xffff) << 10) |
+         (rj << 5) | rd;
+}
+
+void Emit32(uint32_t** cursor, uint32_t value) {
+  *(*cursor)++ = value;
+}
+
+void Emit64(uint32_t** cursor, uint64_t value) {
+  uint64_t* slot = reinterpret_cast<uint64_t*>(*cursor);
+  *slot = value;
+  *cursor += 2;
+}
+
+void* AllocateCode(size_t code_size) {
+  void* code = mmap(nullptr,
+                    code_size,
+                    PROT_READ | PROT_WRITE,
+                    MAP_PRIVATE | MAP_ANON,
+                    -1,
+                    0);
+  return code == MAP_FAILED ? nullptr : code;
+}
+
+void FreeCode(void* code, size_t code_size) {
+  munmap(code, code_size);
+}
+
+}  // namespace
+
+extern "C" bool node_ffi_create_fast_trampoline(
+    void* target,
+    const node::ffi::FastFFIType* args,
+    size_t argc,
+    node::ffi::FastFFIType result,
+    node::ffi::FastFFITrampoline* out) {
+  if (target == nullptr || out == nullptr || IsNarrowType(result)) {
+    return false;
+  }
+
+  size_t gp_count = 0;
+  size_t fp_count = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (args[i] == FastFFIType::kBuffer) {
+      return false;
+    }
+    if (IsFloatType(args[i])) {
+      fp_count++;
+    } else {
+      gp_count++;
+    }
+  }
+
+  // LoongArch64 passes integer arguments in a0..a7. V8's receiver occupies a0,
+  // so user GP arguments arrive in a1..a7 and are shifted down before the tail
+  // branch. FP arguments are already in fa0..fa7.
+  if (gp_count > 7 || fp_count > 8) {
+    return false;
+  }
+
+  constexpr size_t kCodeSize = 256;
+  void* code = AllocateCode(kCodeSize);
+  if (code == nullptr) {
+    return false;
+  }
+
+  uint32_t* cursor = static_cast<uint32_t*>(code);
+  unsigned gp_index = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (IsFloatType(args[i])) {
+      continue;
+    }
+    const unsigned target_reg = 4 + gp_index;
+    const unsigned incoming_reg = target_reg + 1;
+    Emit32(&cursor, Or(target_reg, incoming_reg, 0));
+    gp_index++;
+  }
+
+  // Load the target address from a nearby literal into t0 and tail-branch.
+  Emit32(&cursor, Pcaddu12i(12, 0));    // pcaddu12i t0, 0
+  Emit32(&cursor, LdD(12, 12, 16));     // ld.d t0, t0, literal
+  Emit32(&cursor, Jirl(0, 12, 0));      // jr t0
+  Emit32(&cursor, Or(0, 0, 0));         // nop; align literal to 8 bytes
+  Emit64(&cursor, reinterpret_cast<uintptr_t>(target));
+
+  const size_t written = reinterpret_cast<uint8_t*>(cursor) -
+                         static_cast<uint8_t*>(code);
+  __builtin___clear_cache(static_cast<char*>(code),
+                          static_cast<char*>(code) + written);
+
+  if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) {
+    FreeCode(code, kCodeSize);
+    return false;
+  }
+
+  out->code = code;
+  out->size = kCodeSize;
+  return true;
+}
+
+extern "C" void node_ffi_free_fast_trampoline(
+    node::ffi::FastFFITrampoline* trampoline) {
+  if (trampoline == nullptr || trampoline->code == nullptr) {
+    return;
+  }
+  FreeCode(trampoline->code, trampoline->size);
+  trampoline->code = nullptr;
+  trampoline->size = 0;
+}
+
+#endif  // defined(__loongarch64)
+
+#endif  // HAVE_FFI
diff --git a/src/ffi/platforms/ppc64.cc b/src/ffi/platforms/ppc64.cc
new file mode 100644
index 00000000000000..4549b98186f5e7
--- /dev/null
+++ b/src/ffi/platforms/ppc64.cc
@@ -0,0 +1,189 @@
+#if HAVE_FFI
+
+#include "ffi/fast.h"
+
+#if defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__)
+#if (defined(__LITTLE_ENDIAN__) ||                                      \
+     (defined(__BYTE_ORDER__) &&                                        \
+      __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)) &&                    \
+    !defined(_AIX)
+
+#include <sys/mman.h>
+#include <unistd.h>
+
+#include <stdint.h>
+
+namespace {
+
+using node::ffi::FastFFIType;
+
+bool IsFloatType(FastFFIType type) {
+  return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64;
+}
+
+bool IsNarrowType(FastFFIType type) {
+  switch (type) {
+    case FastFFIType::kBool:
+    case FastFFIType::kInt8:
+    case FastFFIType::kUint8:
+    case FastFFIType::kInt16:
+    case FastFFIType::kUint16:
+      return true;
+    default:
+      return false;
+  }
+}
+
+uint32_t Or(unsigned ra, unsigned rs, unsigned rb) {
+  return (31u << 26) | (rs << 21) | (ra << 16) | (rb << 11) | (444u << 1);
+}
+
+uint32_t Mr(unsigned ra, unsigned rs) {
+  return Or(ra, rs, rs);
+}
+
+uint32_t Bl(unsigned instruction_offset) {
+  return (18u << 26) | ((instruction_offset & 0x00ffffffu) << 2) | 1u;
+}
+
+uint32_t Mfspr(unsigned rt, unsigned spr) {
+  return (31u << 26) | (rt << 21) | ((spr & 0x1f) << 16) |
+         ((spr >> 5) << 11) | (339u << 1);
+}
+
+uint32_t Mtspr(unsigned spr, unsigned rs) {
+  return (31u << 26) | (rs << 21) | ((spr & 0x1f) << 16) |
+         ((spr >> 5) << 11) | (467u << 1);
+}
+
+uint32_t Ld(unsigned rt, unsigned ra, unsigned offset) {
+  return (58u << 26) | (rt << 21) | (ra << 16) | (offset & 0xfffcu);
+}
+
+void Emit32(uint32_t** cursor, uint32_t value) {
+  *(*cursor)++ = value;
+}
+
+void Emit64(uint32_t** cursor, uint64_t value) {
+  uint64_t* slot = reinterpret_cast<uint64_t*>(*cursor);
+  *slot = value;
+  *cursor += 2;
+}
+
+void* AllocateCode(size_t code_size) {
+  void* code = mmap(nullptr,
+                    code_size,
+                    PROT_READ | PROT_WRITE,
+                    MAP_PRIVATE | MAP_ANON,
+                    -1,
+                    0);
+  return code == MAP_FAILED ? nullptr : code;
+}
+
+void FreeCode(void* code, size_t code_size) {
+  munmap(code, code_size);
+}
+
+}  // namespace
+
+extern "C" bool node_ffi_create_fast_trampoline(
+    void* target,
+    const node::ffi::FastFFIType* args,
+    size_t argc,
+    node::ffi::FastFFIType result,
+    node::ffi::FastFFITrampoline* out) {
+  if (target == nullptr || out == nullptr || IsNarrowType(result)) {
+    return false;
+  }
+
+  size_t gp_count = 0;
+  size_t fp_count = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (args[i] == FastFFIType::kBuffer) {
+      return false;
+    }
+    if (IsFloatType(args[i])) {
+      fp_count++;
+    } else {
+      gp_count++;
+    }
+  }
+
+  // ELFv2 PPC64LE passes integer arguments in r3..r10. V8's receiver occupies
+  // r3, so the scalar-only fast path keeps user GP arguments in r4..r10 and
+  // shifts them down before tail-branching to the native target.
+  if (gp_count > 7 || fp_count > 8) {
+    return false;
+  }
+
+  constexpr size_t kCodeSize = 256;
+  void* code = AllocateCode(kCodeSize);
+  if (code == nullptr) {
+    return false;
+  }
+
+  uint32_t* cursor = static_cast<uint32_t*>(code);
+  unsigned gp_index = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (IsFloatType(args[i])) {
+      continue;
+    }
+    const unsigned target_reg = 3 + gp_index;
+    const unsigned incoming_reg = target_reg + 1;
+    Emit32(&cursor, Mr(target_reg, incoming_reg));
+    gp_index++;
+  }
+
+  // Load the target address from the literal pool into r12, then branch through
+  // CTR. ELFv2 functions can use r12 to establish their TOC on global entry.
+  Emit32(&cursor, Bl(1));              // bl .+4
+  Emit32(&cursor, Mfspr(12, 8));       // mflr r12
+  Emit32(&cursor, Ld(12, 12, 20));     // ld r12, literal-mflr(r12)
+  Emit32(&cursor, Mtspr(9, 12));       // mtctr r12
+  Emit32(&cursor, 0x4e800420);         // bctr
+  Emit32(&cursor, 0x60000000);         // nop; align literal to 8 bytes
+  Emit64(&cursor, reinterpret_cast<uintptr_t>(target));
+
+  const size_t written = reinterpret_cast<uint8_t*>(cursor) -
+                         static_cast<uint8_t*>(code);
+  __builtin___clear_cache(static_cast<char*>(code),
+                          static_cast<char*>(code) + written);
+
+  if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) {
+    FreeCode(code, kCodeSize);
+    return false;
+  }
+
+  out->code = code;
+  out->size = kCodeSize;
+  return true;
+}
+
+extern "C" void node_ffi_free_fast_trampoline(
+    node::ffi::FastFFITrampoline* trampoline) {
+  if (trampoline == nullptr || trampoline->code == nullptr) {
+    return;
+  }
+  FreeCode(trampoline->code, trampoline->size);
+  trampoline->code = nullptr;
+  trampoline->size = 0;
+}
+
+#else
+
+extern "C" bool node_ffi_create_fast_trampoline(
+    void* target,
+    const node::ffi::FastFFIType* args,
+    size_t argc,
+    node::ffi::FastFFIType result,
+    node::ffi::FastFFITrampoline* out) {
+  return false;
+}
+
+extern "C" void node_ffi_free_fast_trampoline(
+    node::ffi::FastFFITrampoline* trampoline) {}
+
+#endif  // PPC64LE non-AIX
+#endif  // defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__)
+
+#endif  // HAVE_FFI
diff --git a/src/ffi/platforms/riscv64.cc b/src/ffi/platforms/riscv64.cc
new file mode 100644
index 00000000000000..306dd5bbce77c4
--- /dev/null
+++ b/src/ffi/platforms/riscv64.cc
@@ -0,0 +1,160 @@
+#if HAVE_FFI
+
+#include "ffi/fast.h"
+
+#if defined(__riscv) && __riscv_xlen == 64
+
+#include <sys/mman.h>
+#include <unistd.h>
+
+#include <stdint.h>
+
+namespace {
+
+using node::ffi::FastFFIType;
+
+bool IsFloatType(FastFFIType type) {
+  return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64;
+}
+
+bool IsNarrowType(FastFFIType type) {
+  switch (type) {
+    case FastFFIType::kBool:
+    case FastFFIType::kInt8:
+    case FastFFIType::kUint8:
+    case FastFFIType::kInt16:
+    case FastFFIType::kUint16:
+      return true;
+    default:
+      return false;
+  }
+}
+
+uint32_t Addi(unsigned rd, unsigned rs1, int imm) {
+  return ((static_cast<uint32_t>(imm) & 0xfff) << 20) | (rs1 << 15) |
+         (rd << 7) | 0x13;
+}
+
+uint32_t Ld(unsigned rd, unsigned rs1, int imm) {
+  return ((static_cast<uint32_t>(imm) & 0xfff) << 20) | (rs1 << 15) |
+         (3u << 12) | (rd << 7) | 0x03;
+}
+
+uint32_t Auipc(unsigned rd, int imm20) {
+  return (static_cast<uint32_t>(imm20) << 12) | (rd << 7) | 0x17;
+}
+
+uint32_t Jalr(unsigned rd, unsigned rs1, int imm) {
+  return ((static_cast<uint32_t>(imm) & 0xfff) << 20) | (rs1 << 15) |
+         (rd << 7) | 0x67;
+}
+
+void Emit32(uint32_t** cursor, uint32_t value) {
+  *(*cursor)++ = value;
+}
+
+void Emit64(uint32_t** cursor, uint64_t value) {
+  uint64_t* slot = reinterpret_cast<uint64_t*>(*cursor);
+  *slot = value;
+  *cursor += 2;
+}
+
+void* AllocateCode(size_t code_size) {
+  void* code = mmap(nullptr,
+                    code_size,
+                    PROT_READ | PROT_WRITE,
+                    MAP_PRIVATE | MAP_ANON,
+                    -1,
+                    0);
+  return code == MAP_FAILED ? nullptr : code;
+}
+
+void FreeCode(void* code, size_t code_size) {
+  munmap(code, code_size);
+}
+
+}  // namespace
+
+extern "C" bool node_ffi_create_fast_trampoline(
+    void* target,
+    const node::ffi::FastFFIType* args,
+    size_t argc,
+    node::ffi::FastFFIType result,
+    node::ffi::FastFFITrampoline* out) {
+  if (target == nullptr || out == nullptr || IsNarrowType(result)) {
+    return false;
+  }
+
+  size_t gp_count = 0;
+  size_t fp_count = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (args[i] == FastFFIType::kBuffer) {
+      return false;
+    }
+    if (IsFloatType(args[i])) {
+      fp_count++;
+    } else {
+      gp_count++;
+    }
+  }
+
+  // RISC-V LP64D passes integer arguments in a0..a7. V8's receiver occupies
+  // a0, so user GP arguments arrive in a1..a7 and are shifted down before the
+  // tail branch. FP arguments are already in fa0..fa7.
+  if (gp_count > 7 || fp_count > 8) {
+    return false;
+  }
+
+  constexpr size_t kCodeSize = 256;
+  void* code = AllocateCode(kCodeSize);
+  if (code == nullptr) {
+    return false;
+  }
+
+  uint32_t* cursor = static_cast<uint32_t*>(code);
+  unsigned gp_index = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (IsFloatType(args[i])) {
+      continue;
+    }
+    const unsigned target_reg = 10 + gp_index;
+    const unsigned incoming_reg = target_reg + 1;
+    Emit32(&cursor, Addi(target_reg, incoming_reg, 0));
+    gp_index++;
+  }
+
+  // Load the target address from a nearby literal into t0 and tail-branch.
+  Emit32(&cursor, Auipc(5, 0));         // auipc t0, 0
+  Emit32(&cursor, Ld(5, 5, 16));        // ld t0, literal(t0)
+  Emit32(&cursor, Jalr(0, 5, 0));       // jr t0
+  Emit32(&cursor, Addi(0, 0, 0));       // nop; align literal to 8 bytes
+  Emit64(&cursor, reinterpret_cast<uintptr_t>(target));
+
+  const size_t written = reinterpret_cast<uint8_t*>(cursor) -
+                         static_cast<uint8_t*>(code);
+  __builtin___clear_cache(static_cast<char*>(code),
+                          static_cast<char*>(code) + written);
+
+  if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) {
+    FreeCode(code, kCodeSize);
+    return false;
+  }
+
+  out->code = code;
+  out->size = kCodeSize;
+  return true;
+}
+
+extern "C" void node_ffi_free_fast_trampoline(
+    node::ffi::FastFFITrampoline* trampoline) {
+  if (trampoline == nullptr || trampoline->code == nullptr) {
+    return;
+  }
+  FreeCode(trampoline->code, trampoline->size);
+  trampoline->code = nullptr;
+  trampoline->size = 0;
+}
+
+#endif  // defined(__riscv) && __riscv_xlen == 64
+
+#endif  // HAVE_FFI
diff --git a/src/ffi/platforms/s390x.cc b/src/ffi/platforms/s390x.cc
new file mode 100644
index 00000000000000..2d8a0c0bea7cb6
--- /dev/null
+++ b/src/ffi/platforms/s390x.cc
@@ -0,0 +1,168 @@
+#if HAVE_FFI
+
+#include "ffi/fast.h"
+
+#if defined(__s390x__)
+
+#include <sys/mman.h>
+#include <unistd.h>
+
+#include <stdint.h>
+
+namespace {
+
+using node::ffi::FastFFIType;
+
+bool IsFloatType(FastFFIType type) {
+  return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64;
+}
+
+bool IsNarrowType(FastFFIType type) {
+  switch (type) {
+    case FastFFIType::kBool:
+    case FastFFIType::kInt8:
+    case FastFFIType::kUint8:
+    case FastFFIType::kInt16:
+    case FastFFIType::kUint16:
+      return true;
+    default:
+      return false;
+  }
+}
+
+uint32_t Lgr(unsigned r1, unsigned r2) {
+  return 0xb9040000u | (r1 << 4) | r2;
+}
+
+uint16_t Br(unsigned r2) {
+  return 0x07f0u | r2;
+}
+
+uint64_t Lgrl(unsigned r1, int imm) {
+  return 0xc40800000000ull | (static_cast<uint64_t>(r1) << 36) |
+         (static_cast<uint32_t>(imm));
+}
+
+void Emit16(uint8_t** cursor, uint16_t value) {
+  *(*cursor)++ = value >> 8;
+  *(*cursor)++ = value;
+}
+
+void Emit32(uint8_t** cursor, uint32_t value) {
+  *(*cursor)++ = value >> 24;
+  *(*cursor)++ = value >> 16;
+  *(*cursor)++ = value >> 8;
+  *(*cursor)++ = value;
+}
+
+void Emit48(uint8_t** cursor, uint64_t value) {
+  *(*cursor)++ = value >> 40;
+  *(*cursor)++ = value >> 32;
+  *(*cursor)++ = value >> 24;
+  *(*cursor)++ = value >> 16;
+  *(*cursor)++ = value >> 8;
+  *(*cursor)++ = value;
+}
+
+void Emit64(uint8_t** cursor, uint64_t value) {
+  for (int shift = 56; shift >= 0; shift -= 8) {
+    *(*cursor)++ = value >> shift;
+  }
+}
+
+void* AllocateCode(size_t code_size) {
+  void* code = mmap(nullptr,
+                    code_size,
+                    PROT_READ | PROT_WRITE,
+                    MAP_PRIVATE | MAP_ANON,
+                    -1,
+                    0);
+  return code == MAP_FAILED ? nullptr : code;
+}
+
+void FreeCode(void* code, size_t code_size) {
+  munmap(code, code_size);
+}
+
+}  // namespace
+
+extern "C" bool node_ffi_create_fast_trampoline(
+    void* target,
+    const node::ffi::FastFFIType* args,
+    size_t argc,
+    node::ffi::FastFFIType result,
+    node::ffi::FastFFITrampoline* out) {
+  if (target == nullptr || out == nullptr || IsNarrowType(result)) {
+    return false;
+  }
+
+  size_t gp_count = 0;
+  size_t fp_count = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (args[i] == FastFFIType::kBuffer) {
+      return false;
+    }
+    if (IsFloatType(args[i])) {
+      fp_count++;
+    } else {
+      gp_count++;
+    }
+  }
+
+  // Linux s390x passes integer arguments in r2..r6. V8's receiver occupies r2,
+  // so user GP arguments arrive in r3..r6 and are shifted down before the tail
+  // branch. FP arguments are already in f0, f2, f4, and f6.
+  if (gp_count > 4 || fp_count > 4) {
+    return false;
+  }
+
+  constexpr size_t kCodeSize = 256;
+  void* code = AllocateCode(kCodeSize);
+  if (code == nullptr) {
+    return false;
+  }
+
+  uint8_t* cursor = static_cast<uint8_t*>(code);
+  unsigned gp_index = 0;
+  for (size_t i = 0; i < argc; i++) {
+    if (IsFloatType(args[i])) {
+      continue;
+    }
+    const unsigned target_reg = 2 + gp_index;
+    const unsigned incoming_reg = target_reg + 1;
+    Emit32(&cursor, Lgr(target_reg, incoming_reg));
+    gp_index++;
+  }
+
+  // Load the target address from the literal pool into r1 and tail-branch.
+  Emit48(&cursor, Lgrl(1, 4));           // lgrl r1, literal
+  Emit16(&cursor, Br(1));                // br r1
+  Emit64(&cursor, reinterpret_cast<uintptr_t>(target));
+
+  const size_t written = cursor - static_cast<uint8_t*>(code);
+  __builtin___clear_cache(static_cast<char*>(code),
+                          static_cast<char*>(code) + written);
+
+  if (mprotect(code, kCodeSize, PROT_READ | PROT_EXEC) != 0) {
+    FreeCode(code, kCodeSize);
+    return false;
+  }
+
+  out->code = code;
+  out->size = kCodeSize;
+  return true;
+}
+
+extern "C" void node_ffi_free_fast_trampoline(
+    node::ffi::FastFFITrampoline* trampoline) {
+  if (trampoline == nullptr || trampoline->code == nullptr) {
+    return;
+  }
+  FreeCode(trampoline->code, trampoline->size);
+  trampoline->code = nullptr;
+  trampoline->size = 0;
+}
+
+#endif  // defined(__s390x__)
+
+#endif  // HAVE_FFI
diff --git a/src/ffi/platforms/x64.cc b/src/ffi/platforms/x64.cc
index cacf56bd773235..1073508d365b18 100644
--- a/src/ffi/platforms/x64.cc
+++ b/src/ffi/platforms/x64.cc
@@ -543,6 +543,230 @@ extern "C" void node_ffi_free_fast_trampoline(
   trampoline->size = 0;
 }
 
-#endif  // defined(__x86_64__) && !defined(_WIN32)
+#elif defined(_M_X64)
+
+#include <windows.h>
+
+#include <stdint.h>
+
+namespace {
+
+using node::ffi::FastFFIType;
+
+constexpr unsigned kRax = 0;
+constexpr unsigned kRcx = 1;
+constexpr unsigned kRdx = 2;
+constexpr unsigned kRsp = 4;
+constexpr unsigned kR8 = 8;
+constexpr unsigned kR9 = 9;
+constexpr unsigned kR11 = 11;
+
+constexpr unsigned kWin64GPRegisters[] = {kRcx, kRdx, kR8, kR9};
+
+bool IsFloatType(FastFFIType type) {
+  return type == FastFFIType::kFloat32 || type == FastFFIType::kFloat64;
+}
+
+bool IsBufferType(FastFFIType type) {
+  return type == FastFFIType::kBuffer;
+}
+
+void Emit8(uint8_t** cursor, uint8_t value) {
+  *(*cursor)++ = value;
+}
+
+void Emit32(uint8_t** cursor, uint32_t value) {
+  for (unsigned i = 0; i < 4; i++) {
+    Emit8(cursor, (value >> (i * 8)) & 0xff);
+  }
+}
+
+void Emit64(uint8_t** cursor, uint64_t value) {
+  for (unsigned i = 0; i < 8; i++) {
+    Emit8(cursor, (value >> (i * 8)) & 0xff);
+  }
+}
+
+void EmitRex(uint8_t** cursor, bool wide, unsigned reg, unsigned rm) {
+  Emit8(cursor, 0x40 | (wide ? 0x08 : 0) | ((reg >> 3) << 2) | (rm >> 3));
+}
+
+void EmitModRM(uint8_t** cursor, unsigned reg, unsigned rm) {
+  Emit8(cursor, 0xc0 | ((reg & 7) << 3) | (rm & 7));
+}
+
+void EmitMov(uint8_t** cursor, unsigned dst, unsigned src) {
+  EmitRex(cursor, true, src, dst);
+  Emit8(cursor, 0x89);
+  EmitModRM(cursor, src, dst);
+}
+
+void EmitMovaps(uint8_t** cursor, unsigned dst, unsigned src) {
+  // movaps xmm_dst, xmm_src. Used only for register-to-register argument
+  // shuffles; it preserves the payload bits for both f32 and f64 arguments.
+  EmitRex(cursor, false, dst, src);
+  Emit8(cursor, 0x0f);
+  Emit8(cursor, 0x28);
+  EmitModRM(cursor, dst, src);
+}
+
+void EmitMovImm64(uint8_t** cursor, unsigned reg, uintptr_t value) {
+  EmitRex(cursor, true, 0, reg);
+  Emit8(cursor, 0xb8 | (reg & 7));
+  Emit64(cursor, value);
+}
+
+void EmitCall(uint8_t** cursor, unsigned reg) {
+  EmitRex(cursor, true, 0, reg);
+  Emit8(cursor, 0xff);
+  EmitModRM(cursor, 2, reg);
+}
+
+void EmitJmp(uint8_t** cursor, unsigned reg) {
+  EmitRex(cursor, true, 0, reg);
+  Emit8(cursor, 0xff);
+  EmitModRM(cursor, 4, reg);
+}
+
+void EmitSubRsp(uint8_t** cursor, unsigned value) {
+  EmitRex(cursor, true, 5, kRsp);
+  Emit8(cursor, 0x81);
+  Emit8(cursor, 0xec);
+  Emit32(cursor, value);
+}
+
+void EmitAddRsp(uint8_t** cursor, unsigned value) {
+  EmitRex(cursor, true, 0, kRsp);
+  Emit8(cursor, 0x81);
+  Emit8(cursor, 0xc4);
+  Emit32(cursor, value);
+}
+
+void EmitRet(uint8_t** cursor) {
+  Emit8(cursor, 0xc3);
+}
+
+void EmitNarrowInstruction(uint8_t** cursor, uint8_t opcode, unsigned reg) {
+  EmitRex(cursor, false, reg, reg);
+  Emit8(cursor, 0x0f);
+  Emit8(cursor, opcode);
+  EmitModRM(cursor, reg, reg);
+}
+
+bool EmitNarrowReturn(uint8_t** cursor, FastFFIType type, unsigned reg) {
+  switch (type) {
+    case FastFFIType::kBool:
+    case FastFFIType::kUint8:
+      EmitNarrowInstruction(cursor, 0xb6, reg);
+      return true;
+    case FastFFIType::kInt8:
+      EmitNarrowInstruction(cursor, 0xbe, reg);
+      return true;
+    case FastFFIType::kUint16:
+      EmitNarrowInstruction(cursor, 0xb7, reg);
+      return true;
+    case FastFFIType::kInt16:
+      EmitNarrowInstruction(cursor, 0xbf, reg);
+      return true;
+    default:
+      return false;
+  }
+}
+
+bool NeedsNarrow(FastFFIType type) {
+  switch (type) {
+    case FastFFIType::kBool:
+    case FastFFIType::kUint8:
+    case FastFFIType::kInt8:
+    case FastFFIType::kUint16:
+    case FastFFIType::kInt16:
+      return true;
+    default:
+      return false;
+  }
+}
+
+void FreeCode(void* code, size_t code_size) {
+  VirtualFree(code, 0, MEM_RELEASE);
+}
+
+}  // namespace
+
+extern "C" bool node_ffi_create_fast_trampoline(
+    void* target,
+    const node::ffi::FastFFIType* args,
+    size_t argc,
+    node::ffi::FastFFIType result,
+    node::ffi::FastFFITrampoline* out) {
+  if (target == nullptr || out == nullptr || argc > 3) {
+    return false;
+  }
+
+  for (size_t i = 0; i < argc; i++) {
+    if (IsBufferType(args[i])) {
+      return false;
+    }
+  }
+
+  constexpr size_t kCodeSize = 512;
+  void* code = VirtualAlloc(
+      nullptr, kCodeSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
+  if (code == nullptr) {
+    return false;
+  }
+
+  uint8_t* cursor = static_cast<uint8_t*>(code);
+  const bool tail_call = !NeedsNarrow(result);
+
+  // Win64 uses positional registers. The V8 receiver occupies position 0, so
+  // public arguments arrive in positions 1..3 and must be shifted down.
+  for (size_t i = 0; i < argc; i++) {
+    if (IsFloatType(args[i])) {
+      EmitMovaps(&cursor, static_cast<unsigned>(i), static_cast<unsigned>(i + 1));
+    } else {
+      EmitMov(&cursor, kWin64GPRegisters[i], kWin64GPRegisters[i + 1]);
+    }
+  }
+
+  EmitMovImm64(&cursor, kR11, reinterpret_cast<uintptr_t>(target));
+  if (tail_call) {
+    // The caller already provided Win64 shadow space for the trampoline; after
+    // the receiver-slot shuffle, the target can reuse the same stack shape.
+    EmitJmp(&cursor, kR11);
+  } else {
+    // Reserve 32 bytes of shadow space plus 8 bytes for 16-byte stack alignment
+    // before making a nested call from inside the trampoline.
+    EmitSubRsp(&cursor, 40);
+    EmitCall(&cursor, kR11);
+    EmitAddRsp(&cursor, 40);
+    EmitNarrowReturn(&cursor, result, kRax);
+    EmitRet(&cursor);
+  }
+
+  const size_t written = cursor - static_cast<uint8_t*>(code);
+  FlushInstructionCache(GetCurrentProcess(), code, written);
+
+  DWORD old_protect;
+  if (VirtualProtect(code, kCodeSize, PAGE_EXECUTE_READ, &old_protect) == 0) {
+    FreeCode(code, kCodeSize);
+    return false;
+  }
+
+  out->code = code;
+  out->size = kCodeSize;
+  return true;
+}
+
+extern "C" void node_ffi_free_fast_trampoline(
+    node::ffi::FastFFITrampoline* trampoline) {
+  if (trampoline == nullptr || trampoline->code == nullptr) {
+    return;
+  }
+  FreeCode(trampoline->code, trampoline->size);
+  trampoline->code = nullptr;
+  trampoline->size = 0;
+}
+
+#endif  // defined(_M_X64)
 
 #endif  // HAVE_FFI
diff --git a/src/ffi/types.cc b/src/ffi/types.cc
index 336a8fa10053e4..9ba3cc4da448a3 100644
--- a/src/ffi/types.cc
+++ b/src/ffi/types.cc
@@ -236,10 +236,12 @@ bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason) {
   if (out_reason == nullptr) out_reason = &dummy;
 
     // Check that a platform stub emitter exists for the current ABI.
-    // Stub emitters cover AArch64 (Linux/macOS/FreeBSD/Windows) and
-    // x86_64 (SysV: Linux/macOS/FreeBSD, Win64: Windows). Other platforms
+    // Stub emitters cover AArch64, x86_64 SysV, and Win64 x64. Other platforms
     // fall back to libffi.
-#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__)
+#if !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__x86_64__) &&     \
+    !defined(_M_X64) && !defined(__powerpc64__) && !defined(__ppc64__) &&      \
+    !defined(__PPC64__) && !defined(__loongarch64) &&                          \
+    !(defined(__riscv) && __riscv_xlen == 64) && !defined(__s390x__)
   *out_reason = "no platform stub emitter";
   return false;
 #endif
@@ -324,16 +326,109 @@ bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason) {
     *out_reason = "argument count exceeds AArch64 register limit";
     return false;
   }
-#elif defined(__x86_64__)
-#if defined(_WIN32)
-  // No Win64 trampoline emitter exists (src/ffi/platforms implements only
-  // AArch64 and x86_64 SysV), so Win64 fast-call is never eligible. This is
-  // already short-circuited earlier by IsJitMemorySupported() returning false
-  // on Windows; rejecting here keeps eligibility self-consistent regardless of
-  // caller order.
-  *out_reason = "no Win64 fast-call trampoline emitter";
+#elif defined(_M_X64)
+  // Win64 x64 uses positional integer/FP registers. The current emitter handles
+  // only the register-only scalar subset: receiver plus up to three public
+  // arguments. Buffer-shaped arguments require FastApiCallbackOptions and a C++
+  // helper call, which is left to fallback until the Win64 emitter grows stack
+  // and helper support.
+  if (has_buffer_arg) {
+    *out_reason = "buffer args are not yet supported on Win64 x64";
+    return false;
+  }
+  if (fn.args.size() > 3) {
+    *out_reason = "argument count exceeds Win64 x64 register-only limit";
+    return false;
+  }
+  if (fp_count > 3 || gp_count > 3) {
+    *out_reason = "argument count exceeds Win64 x64 register-only limit";
+    return false;
+  }
+#elif defined(__powerpc64__) || defined(__ppc64__) || defined(__PPC64__)
+#if defined(_AIX) ||                                                           \
+    !(defined(__LITTLE_ENDIAN__) ||                                            \
+      (defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__))
+  *out_reason = "no PPC64BE fast-call trampoline emitter";
   return false;
 #else
+  // PPC64LE ELFv2: r3 is occupied by V8's receiver, leaving r4..r10 for
+  // incoming user GP arguments. FP arguments use FPRs and are not shifted by
+  // the receiver slot. The first PPC64LE emitter is scalar-only and
+  // tail-branches to the target, so narrow return normalization and buffer
+  // helper calls fall back.
+  if (has_buffer_arg) {
+    *out_reason = "buffer args are not yet supported on PPC64LE";
+    return false;
+  }
+  if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 ||
+      fn.return_type == &ffi_type_sint16 ||
+      fn.return_type == &ffi_type_uint16) {
+    *out_reason = "narrow returns are not yet supported on PPC64LE";
+    return false;
+  }
+  if (gp_count > 7 || fp_count > 8) {
+    *out_reason = "argument count exceeds PPC64LE register limit";
+    return false;
+  }
+#endif
+#elif defined(__loongarch64)
+  // LoongArch64: a0 is occupied by V8's receiver, leaving a1..a7 for incoming
+  // user GP arguments. FP arguments are already in fa0..fa7. The current
+  // emitter is scalar-only and tail-branches to the target, so narrow returns
+  // and buffer helper calls fall back.
+  if (has_buffer_arg) {
+    *out_reason = "buffer args are not yet supported on LoongArch64";
+    return false;
+  }
+  if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 ||
+      fn.return_type == &ffi_type_sint16 ||
+      fn.return_type == &ffi_type_uint16) {
+    *out_reason = "narrow returns are not yet supported on LoongArch64";
+    return false;
+  }
+  if (gp_count > 7 || fp_count > 8) {
+    *out_reason = "argument count exceeds LoongArch64 register limit";
+    return false;
+  }
+#elif defined(__riscv) && __riscv_xlen == 64
+  // RISC-V LP64D: a0 is occupied by V8's receiver, leaving a1..a7 for incoming
+  // user GP arguments. FP arguments are already in fa0..fa7. The current
+  // emitter is scalar-only and tail-branches to the target, so narrow returns
+  // and buffer helper calls fall back.
+  if (has_buffer_arg) {
+    *out_reason = "buffer args are not yet supported on RISC-V 64";
+    return false;
+  }
+  if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 ||
+      fn.return_type == &ffi_type_sint16 ||
+      fn.return_type == &ffi_type_uint16) {
+    *out_reason = "narrow returns are not yet supported on RISC-V 64";
+    return false;
+  }
+  if (gp_count > 7 || fp_count > 8) {
+    *out_reason = "argument count exceeds RISC-V 64 register limit";
+    return false;
+  }
+#elif defined(__s390x__)
+  // Linux s390x: r2 is occupied by V8's receiver, leaving r3..r6 for incoming
+  // user GP arguments. FP arguments are already in f0, f2, f4, and f6. The
+  // current emitter is scalar-only and tail-branches to the target, so narrow
+  // returns and buffer helper calls fall back.
+  if (has_buffer_arg) {
+    *out_reason = "buffer args are not yet supported on s390x";
+    return false;
+  }
+  if (fn.return_type == &ffi_type_sint8 || fn.return_type == &ffi_type_uint8 ||
+      fn.return_type == &ffi_type_sint16 ||
+      fn.return_type == &ffi_type_uint16) {
+    *out_reason = "narrow returns are not yet supported on s390x";
+    return false;
+  }
+  if (gp_count > 4 || fp_count > 4) {
+    *out_reason = "argument count exceeds s390x register limit";
+    return false;
+  }
+#elif defined(__x86_64__)
   // x86_64 SysV: the V8 receiver occupies rdi, leaving rsi, rdx, rcx, r8, r9
   // (5 incoming GP slots); scalar signatures can load one more user GP arg
   // from the caller stack, for an effective cap of 6 GP. FP args use
@@ -352,7 +447,6 @@ bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason) {
     *out_reason = "argument count exceeds x86_64 SysV register limit";
     return false;
   }
-#endif  // _WIN32
 #endif  // __x86_64__
 
   *out_reason = "";
diff --git a/src/ffi/types.h b/src/ffi/types.h
index d49619a19b71cb..a68549bc629853 100644
--- a/src/ffi/types.h
+++ b/src/ffi/types.h
@@ -60,11 +60,13 @@ bool SignaturesMatch(const FFIFunction& fn,
 // Eligibility checks: every arg type and the return type are
 // numeric-or-pointer, no `function`-typed args/return, arg count
 // within V8 fast-call cap (8), and register-passed arg counts within
-// per-ABI limits. Trampoline emitters currently exist only for AArch64
-// (≤ 7 GP + ≤ 8 FP) and x86_64 SysV (≤ 6 GP + ≤ 8 FP; buffer args cap GP at
-// 5 and cannot coexist with FP args). Platforms without an emitter
-// (including Win64) are reported ineligible so the caller falls back to
-// libffi.
+// per-ABI limits. Trampoline emitters currently exist for AArch64
+// (≤ 7 GP + ≤ 8 FP), x86_64 SysV (≤ 6 GP + ≤ 8 FP; buffer args cap GP at 5 and
+// cannot coexist with FP args), Win64 x64 (≤ 3 register-only scalar args), and
+// PPC64LE ELFv2, LoongArch64, and RISC-V 64 (≤ 7 GP + ≤ 8 FP scalar args, no
+// narrow returns), and s390x (≤ 4 GP + ≤ 4 FP scalar args, no narrow returns).
+// Platforms without an emitter are reported ineligible so the caller falls back
+// to libffi.
 bool IsFastCallEligible(const FFIFunction& fn, const char** out_reason);
 
 // True if the FFI type can be read from / written to a raw byte buffer