What Actually Changed in Game Porting Toolkit 4
We inspect GPTK internals as part of building Velocity. Here's what we found in GPTK 4 — no hype, just binary evidence and what each change means for game compatibility.
Key findings
D3DMetal.framework — Shader pipeline restructured — two-phase compilation replaces single-pass DXIL→MSLnvngx-on-metalfx — NGX shim that routes DLSS calls from games to MetalFX — new in GPTK 4DLSS Frame Generation — NVSDK_NGX_Feature_FrameGeneration callbacks wired to Metal 4 frame interpolatorMTLFrameInterpolationDescriptor — Metal 4 first-class API for Neural Engine temporal frame synthesisMTLMLComputePipeline — New MSL headers exposing lower-level Neural Engine programming interfaceCyberpunk 2077 · M3 Max · nvngx-on-metalfx shim routing DLSS calls to MetalFX · Frame Gen doubling render rate via Metal 4
D3DMetal: the shader pipeline changed
D3DMetal.framework is the core of GPTK — it translates Direct3D API calls to Metal in real time. In GPTK 4, the shader compilation pipeline is structurally different from GPTK 3.
GPTK 3 compiled DXIL (DirectX Intermediate Language) to MSL (Metal Shading Language) at first encounter — on the hot path, during gameplay. When a game triggered a new shader permutation for the first time, you'd see a frame hitch while D3DMetal transpiled it. Heavy open-world titles were the worst case: hundreds of unique shader permutations scattered across load screens and new areas.
GPTK 4 splits this into two phases. A fast-path transpiler handles the hot path — producing a working MSL shader immediately — while a background optimizer processes the shader cache asynchronously to produce a refined version. The next time that permutation runs, it uses the optimized version. The initial stutter is substantially reduced; the second run is clean.
The second meaningful D3DMetal change is in unified memory handling. On Apple Silicon, the CPU and GPU share the same physical memory pool. GPTK 3 was allocating certain resources in MTLStorageModePrivate (GPU-only) and then copying them back to CPU-accessible memory unnecessarily — a round-trip that costs time and memory bandwidth. GPTK 4 allocates these resources as MTLStorageModeShared from the start, eliminating the copy. The improvement is most visible in CPU-heavy DX12 titles that do frequent buffer readbacks.
Root signature handling also improved. DX12 titles that use large descriptor tables — common in modern AAA engines using bindless resource models — hit a translation bottleneck in GPTK 3 when the argument buffer layout couldn't represent the full descriptor range efficiently. GPTK 4 restructures the argument buffer encoding for these cases, reducing per-draw overhead in bindless titles like Starfield and Cyberpunk 2077.
nvngx-on-metalfx: the DLSS shim
The most significant new component in GPTK 4 is nvngx-on-metalfx. This is a drop-in replacement for NVIDIA's NGX SDK that routes all DLSS calls from Windows games to Apple's MetalFX framework.
Every DLSS-enabled game loads nvngx.dll at startup to communicate with NVIDIA's upscaling and AI infrastructure. In GPTK 4, when the game requests this DLL, the shim is substituted transparently. The game's DLSS code path runs unchanged. The shim handles the translation.
The shim exports the full NGX DX12 function surface:
NVSDK_NGX_D3D12_Init
NVSDK_NGX_D3D12_Init_ProjectID
NVSDK_NGX_D3D12_CreateFeature
NVSDK_NGX_D3D12_EvaluateFeature
NVSDK_NGX_D3D12_ReleaseFeature
NVSDK_NGX_D3D12_DestroyFeature
NVSDK_NGX_D3D12_Shutdown
NVSDK_NGX_D3D12_Shutdown1
NVSDK_NGX_D3D12_GetCapabilityParameters
NVSDK_NGX_D3D12_GetScratchBufferSize
NVSDK_NGX_DLSS_GetOptimalSettingsCallback
NVSDK_NGX_DLSS_GetStatsCallbackThis is the complete NGX SDK export surface for DX12 titles — the same interface NVIDIA publishes in their public NGX SDK headers. The shim implements feature creation for both NVSDK_NGX_Feature_SuperSampling (DLSS upscaling) and NVSDK_NGX_Feature_FrameGeneration (DLSS Frame Generation), routing each to a different MetalFX codepath.
DLSS Frame Generation: what changed from GPTK 3
GPTK 3 implemented NGX only partially. NVSDK_NGX_Feature_SuperSampling worked: DLSS upscaling (Quality, Balanced, Performance modes) ran via MetalFX Temporal. But NVSDK_NGX_Feature_FrameGeneration — the DLSS 3.x Frame Generation API — was either absent or a stub. Games using DLSS 3 Frame Generation would either crash when initializing the feature, or fall back to rendering every frame.
GPTK 4 adds working Frame Generation callbacks. The implementation bridges the DLSS Frame Generation API to Metal 4's MTLFrameInterpolationDescriptor:
- The game submits rendered frames N and N−1 plus motion vectors via the DLSS Frame Generation evaluate callback
- D3DMetal extracts the motion vectors and depth buffer from the DX12 pipeline automatically
- Metal 4's Neural Engine generates a synthetic intermediate frame using temporal analysis
- The game's present queue receives the interpolated frame at the doubled cadence
The game doesn't need to know it's running on Apple hardware. From the game's perspective, it called the DLSS Frame Generation API and got a synthetic frame back. The implementation detail — Neural Engine instead of NVIDIA hardware — is opaque.
One important constraint: Frame Generation requires DLSS 3.x or later. FSR 3 Frame Generation (AMD's implementation) uses a different API and is not covered by this shim. XeSS Frame Generation (Intel) likewise. GPTK 4 is DLSS-first for frame interpolation.
Metal 4 Frame Interpolator
Metal 4 introduces MTLFrameInterpolationDescriptor as a first-class framework API. This is what the DLSS Frame Generation shim routes to, but it's also available independently for games that call MetalFX directly.
The descriptor accepts:
- Two source textures (rendered frames N and N−1)
- A motion vector texture (in screen-space pixel units)
- An optional depth texture (improves synthesis at occlusion boundaries)
- Display timing information (target frame interval in nanoseconds)
The Neural Engine processes these inputs and returns a synthesized intermediate frame. The synthesis uses temporal optical flow — the same class of algorithm NVIDIA uses for DLSS Frame Generation — but implemented on Apple's ML hardware rather than NVIDIA's Optical Flow Accelerator.
This is distinct from MetalFX Temporal upscaling in a key way. MetalFX Temporal accumulates data across frames to improve the quality of a single frame — higher resolution from lower-resolution inputs. Metal 4 Frame Interpolation creates an entirely new frame that was never rendered. The two can run simultaneously: render at 67% resolution, upscale with MetalFX Temporal to native, then interpolate to double the frame rate.
ML Pipeline headers
GPTK 4 ships new Metal Shading Language headers for a lower-level Neural Engine programming interface. These are separate from Core ML and the Foundation Models framework — they're MSL types and pipeline objects designed for authors writing Metal shaders that need to dispatch Neural Engine inference inline with GPU work.
Key additions in the headers:
MTLMLComputePipeline— a new pipeline type for ML workloads, distinct from the existingMTLComputePipelineStateMTLMLComputeCommandEncoder— for encoding Neural Engine dispatch commands within a Metal command buffer, interleaved with GPU commandsmetal::ml_tensor_t— a tensor type usable in MSL shaders, enabling on-device inference from within a shader function
The most likely near-term use in GPTK 4 context is shader denoising — running the denoiser on the Neural Engine from within a Metal shader, rather than dispatching through a separate Core ML model pass. This would reduce the pipeline complexity and latency of the existing MetalFX denoising path.
What this means for game compatibility
DLSS upscaling: Works for all titles using DLSS 2.x, 3.x, and 3.5. Routed through nvngx-on-metalfx to MetalFX Temporal. This covers the majority of AAA titles released in the last three years.
DLSS Frame Generation: Works for titles using DLSS 3.x Frame Generation. Routed to Metal 4 Frame Interpolator. Requires macOS Tahoe 26.
DLSS 4 (Transformer model): Not yet present in the shim. Games that specifically require DLSS 4's transformer-based model will fall back to DLSS 3 behavior — the NGX feature negotiation handles this transparently.
FSR and XeSS: Not covered by these changes. AMD and Intel upscaling APIs remain unaffected — they go through D3DMetal's compute shader path as before.
Anti-cheat: GPTK 4 includes improvements to the Mach exception handler that services ring-0 verification checks from EasyAntiCheat and BattlEye. The mechanism is unchanged from GPTK 3 — the exception handler catches privileged instruction faults and returns plausible hardware state — but GPTK 4 handles more of the specific instruction patterns these drivers use in their current versions. More protected titles should now reach the main menu.
Research performed by the Velocity team at Skyfire Works. Velocity is a Mac gaming launcher that sits above the GPTK translation layer — handling compatibility, optimization, and performance intelligence that Apple's tools don't provide. Learn more →
See which games work with GPTK 4
Community-sourced compatibility data, updated continuously.
