NVIDIA Boosts Path Tracing Performance 3x with Advanced ReSTIR Algorithms for Next-Gen Gaming

NVIDIA Boosts Path Tracing Performance 3x with Advanced ReSTIR Algorithms for Next-Gen Gaming

NVIDIA has unveiled a revolutionary enhancement to its ReSTIR algorithm, which significantly boosts Path Tracing performance by 2-3 times, paving the way for the future of gaming graphics.

Ray Tracing: A Trendsetter for Path Tracing Advances by NVIDIA

Path Tracing is increasingly embraced by PC games to achieve unparalleled visual fidelity synonymous with next-gen experiences. NVIDIA, a frontrunner in graphic technology, is leading the charge in bringing Path Tracing to the PC platform. However, similar to the earlier days of Ray Tracing, Path Tracing currently requires high-performance hardware. For instance, even the powerful RTX 5090 struggles to deliver playable frame rates, achieving only 30-40 FPS in many titles and relying heavily on DLSS upscaling and frame generation.

Ray Tracing began its journey on PC and has progressively become more efficient on modern hardware. Consoles, too, have incorporated Ray Tracing effectively, though primarily within quality settings that still fall short of 60 FPS in most scenarios.

A comparison image showing 'Original ReSTIR PT (37.1 ms) FLIP: 0.321' on the left and 'ReSTIR PT Enhanced (12.6 ms) FLIP: 0.263' on the right, highlighting differences in rendering time and visual noise reduction.
Image Source: NVIDIA

In a groundbreaking research paper titled “ReSTIR PT Enhanced: Algorithmic Advances for Faster and More Robust ReSTIR Path Tracing, ” NVIDIA outlines a suite of ReSTIR algorithms designed to elevate Path Tracing performance. These innovations can offer a remarkable 2-3x improvement in speed while minimizing visual inconsistencies prevalent in current Path Tracing and Ray Tracing outputs.

A collage compares 'Original ReSTIR PT' to 'ReSTIR PT Enhanced' across three scenes: 'Watercolor, ' 'Zero Day, ' and 'Crown, ' highlighting differences in rendering times and FLIP values, with the enhanced version showing faster times and improved image quality.
Image Source: NVIDIA

NVIDIA’s enhanced Path Tracing algorithms are approaching what the company deems “Production Ready, ”cutting the costs associated with spatial reuse in half. These advancements also improve overall performance and quality through methodologies that integrate direct and global illumination while effectively addressing color noise and disocclusion noise reduction. The advancements featured in the algorithm include:

  • A reduction in shift mapping costs tied to spatial reuse achieved through selective neighbor choice.
  • Dynamic ray footprint thresholds that adjust according to varying scenes and materials.
  • Minimized correlation artifacts utilizing sample duplication maps.
  • Additional optimizations that enhance stability and performance by curtailing color and disocclusion noise.
A table titled 'Frame and pass costs (in milliseconds) averaged over four scenes' shows that the method '+Unify DI & GI (Section 6.1)' achieves the lowest total frame cost of 13.04 milliseconds.
Image Source: NVIDIA

Table 1 shows performance of our techniques, with each row adding one new feature/optimization on top of a baseline of Lin et al.’s [2022] public source code. We first measure the speedup from our cost-reduction techniques, which provide an average 2.74× speedup across the four tested scenes. These scenes were chosen to reflect a range of geometry and material complexity. Results for individual scenes are provided in the supplemental material.

To provide further insight into the effect of our low-level GPU optimizations, we profiled Opera House using NSight Graphics. The profiler data indicate that the optimizations in Section 6.2.1–6.2.3 reduce thread divergence and improve GPU computation efficiency. Specifically:

  • SM warp occupancy increases from 22.4% → 31.1%
  • Active threads per warp increase from 15.3 → 19.9
  • Warp latency decreases from 347k → 241k cycles

All of this occurs without changing sampler behavior. Applying Russian roulette (Section 6.2.4) further improves these metrics to:

  • 34.9% occupancy
  • 20.6 active threads per warp
  • 82k cycles latency

Because each ReSTIR pass requires two sets of reservoirs to support temporal reuse, these changes reduce per-pixel storage from 2 × (88 + 16) bytes in the baseline implementation (which uses 16-byte reservoirs for ReSTIR DI) to 2 × 64 bytes. With a 1920×1080 render resolution, this lowers memory consumption from 431 MB to 265 MB.

GPU Optimization Results Compared to Lin et al.[2022]

Technical / Internship SM Warp Occupancy (%) Active Threads per Warp Warp Latency (cycles) Speedup vs. Baseline Notes
Baseline (Lin et al.[2022]) 22.4 15.3 347k 1.0× Public source code baseline
Low-level GPU optimizations (Sec.6.2.1–6.2.3) 31.1 19.9 241k 2.74× (avg across 4 scenes) Reduced thread divergence, improved efficiency
+ Russian roulette (Sec.6.2.4) 34.9 20.6 82k Further efficiency gains
+ New thresholds (Sec.4, 5, 6) Scene-independent reconnection criteria, improves shift mapping quality
All improvements (decorrelation, noise reduction) 2.30× Adds 19% cost vs.fastest version, but still faster than

The advancements from NVIDIA promise a significant leap in Path Tracing capabilities, especially since the release of the RTX 40 and RTX 50 GPU series. Looking ahead, NVIDIA is enthusiastic about incorporating Neural Rendering techniques and AI algorithms to further refine the performance of its gaming hardware, aiming to enhance next-gen visual capabilities dramatically.

Source & Images

Leave a Reply

Your email address will not be published. Required fields are marked *