Intel Launches Texture Set Neural Compression SDK: Achieve Textures Up to 18x Smaller

Intel Launches Texture Set Neural Compression SDK: Achieve Textures Up to 18x Smaller

During GDC 2026, Marissa Dubois, a graphics engineer at Intel, took the stage to unveil Intel’s innovative approach to neural texture compression, which bears a resemblance to NVIDIA’s NTC. This presentation marked a significant progression from Intel’s earlier R&D prototype demonstrated at GDC 2025, revealing that the technology has now evolved into a fully productized standalone software development kit (SDK).

Dubbed Texture Set Neural Compression (TSNC), this method represents an advanced approach to storing textures used in gaming. Conventional GPU block compression techniques, spanning formats BC1 to BC7, typically apply fixed algorithms. While these methods are quick and universally adopted, they often miss out on substantial compression potential. In contrast, TSNC harnesses the power of machine learning, utilizing a small neural network that employs stochastic gradient descent to efficiently encode and decode specific texture sets. This breakthrough culminates in a compact latent space representation, which a compact multi-layer perceptron can reconstruct at runtime, enabling the retrieval of original texture data, including diffuse, normal, roughness, metallic, ambient occlusion, and emissive attributes.

The image is a diagram titled 'Neural Compression 101' detailing the process of compressing input data through an 'Encoder' into 'Latent space values' and decompressing it with a 'Decoder' to produce 'Output data, ' with information about discovering model weights for encoder and decoder networks.

A pivotal aspect of TSNC is understanding that a texture set, which includes all PBR maps for a specific material, often contains overlapping data across its channels. TSNC cleverly capitalizes on this redundancy in ways that standard block compression fails to achieve.

A comparison chart titled 'Feature Pyramid Comparisons' displays various texture maps and latent space variants for a pumpkin model with an Intel logo present.

The Two Tiers of Feature Pyramids

At the core of TSNC’s compression methodology lies the feature pyramid, comprising four BC1-encoded latent-space textures that vary across multiple resolution configurations. Intel introduces two distinct variants, each offering different compromises between quality and compression efficiency:

  • Variant A features two full-resolution latent images and two at half resolution. For textures targeting 4K input, this translates to two 4K and two 2K latent images, resulting in an impressive 9x compression, bringing file size down from 256 MB to approximately 26.8 MB. The perceptual quality loss, evaluated using NVIDIA’s FLIP analysis tool, hovers around 5%, with minor impacts on normal maps.
  • Variant B adopts a more aggressive approach, reducing latent images to one half, one quarter, and one eighth of the initial resolution, thus achieving over 17x compression. However, this variant does come with noticeable quality degradation, where BC1 artifacts become visible in normal maps and ambient occlusion/roughness channels. The perceptual error evaluated by FLIP sits between 6–7%, which Intel concedes is “enough to be noticeable to a viewer.”Consequently, Variant B is best employed for distant or secondary materials where detail preservation is less critical.
A chart titled 'TSNC Variant A Compression Ratio' indicates that TSNC achieves higher compression ratios (9.53 to 9.59x) compared to BCx (4.79 to 4.80x) across resolutions of 1k, 2k, and 4k.
A slide titled 'Compression Ratios' compares different compression formats, showing that TSNC achieves higher compression ratios of 17.85x to 18.05x compared to 4.79x to 4.80x for BCx, with a chart illustrating the data.

Since its introduction as a research prototype built on PyTorch, Intel has completely redeveloped the TSNC compressor using Slang compute shaders. This new architecture allows developers to utilize the same decompression code across a variety of platforms, including Unreal Engine, custom engines, and CPU-based decompression.

On the GPU front, Intel now supports Microsoft’s DirectX 12 Cooperative Vectors API, leveraging the XMX matrix cores integrated in both A-series and B-series GPUs to facilitate hardware-accelerated matrix inference. For systems lacking XMX support, the framework falls back on a standard FMA (fused multiply-and-add) technique compatible with both Intel and non-Intel architectures.

During her presentation, Dubois outlined four deployment strategies for the TSNC technology, each offering a different balance between memory utilization and disk space efficiency:

  • At install time — Compressed files are delivered and decompressed locally as part of the installation, retaining uncompressed textures on the user’s storage for optimum bandwidth savings during distribution.
  • At load time — Textures remain compressed on disk, decompressing into VRAM during the game’s loading phase. This method minimizes both installation size and VRAM usage during the loading process.
  • At stream time — In conjunction with texture streaming, textures decompress on demand, achieving a balance between storage and memory efficiency while adding some runtime inference load.
  • At sample time — Textures remain permanently compressed in VRAM and are decoded on a per-pixel basis within the shader, maximizing VRAM savings while incurring a constant inference cost.

Each deployment strategy necessitates careful selection by developers based on their specific requirements and the underlying engine used.

A presentation slide titled 'Inference Time Estimates On Pantherlake B390 Built-In Graphics' features a bar chart indicating 'Avg. Nanoseconds Per Pixel (Lower is Better)' with the LinAlg algorithm achieving approximately 3.4x per-pixel speedup over FMA.

Intel’s benchmarks from a Panther Lake laptop utilizing B390 integrated graphics during a full 1080p compute shader workload yielded the following results:

  • FMA path: 0.661 nanoseconds per pixel
  • XMX linear algebra path: 0.194 nanoseconds per pixel

This demonstrates a substantial 3.4x speedup attributed to hardware-accelerated matrix computations. The favorable performance metrics observed on integrated systems suggest that the per-pixel sample-time deployment could be more feasible than previously anticipated. For discrete GPUs, one could expect even lower overheads. Intel anticipates rolling out an Alpha version of the Texture Set Neural Compression SDK later this year, followed by beta testing and a public release, although the exact timelines remain unconfirmed.

Source & Images

Leave a Reply

Your email address will not be published. Required fields are marked *