Skip to content

Assessing Learned Models for Phase-only Hologram Compression

People

/         /      /      /     

Zicong Peng1

Yicheng Zhan1

Josef Spjut2

Kaan Akşit1

1University College London, 2NVIDIA

SIGGRAPH 2025 Poster

Resources

Manuscript Poster Supplementary Code

Bibtex
@inproceedings{peng2025assessing,
  author = {Zicong Peng and Yicheng Zhan and Josef Spjut and Kaan Ak{\c{s}}it},
  title = {Assessing Learned Models for Phase-only Hologram Compression},
  booktitle = {SIGGRAPH 2025 Posters (SA Posters '25)},
  year = {2025},
  location = {Vancouver, BC, Canada},
  publisher = {ACM},
  address = {New York, NY, USA},
  pages = {2},
  doi = {10.1145/3721250.3742993},
  url = {https://doi.org/10.1145/3721250.3742993},
  month = {August 10--14}
}

Video

Abstract

We evaluate the performance of four common learned models utilizing Implicit Neural Representation (INR) and Variational Autoencoder (VAE) structures for compressing phase-only holograms in holographic displays. The evaluated models include a vanilla MLP, SIREN, and FilmSIREN, with TAESD as the representative VAE model. Our experiments reveal that a pretrained image VAE, TAESD, with 2.2M parameters struggles with phase-only hologram compression, revealing the need for task-specific adaptations. Among the INRs, SIREN with 4.9k parameters achieves \(40\%\) compression with high quality in the reconstructed 3D images (PSNR = 34.54 dB). These results emphasize the effectiveness of INRs and identify the limitations of pretrained image compression VAE s for hologram compression task.

Image title

Proposed Method

Our assessments involve using single-color double-phase encoded phase-only holograms, \(P \in \mathbb{R}^{3 \times 512 \times 512}\), using three color primaries. These \(P\)s are calculated for three wavelengths, \(\{473, 515, 639\}\) nm and a fixed pixel pitch, \(3.74\,\mu\text{m}\) (Jasper Display JD7714). We adopt an off-the-shelf TAESD trained for image compression task. Specifically, the TAESD with \(2.2M\) parameters encodes \(P\) to a \(\text{bottleneck} \in \mathbb{R}^{16 \times 64 \times 64}\) and later decodes into the original resolution of \(3 \times 512 \times 512\). Our teaser (above) shows pretrained TAESD fails, requiring dedicated training for generalization. Feature size comparison yields only 92% reduction (excluding TAESD params). Thus, we choose to explore INR based models to see if the feature size could be further reduced while accepting longer training times as INRs typically are overfitted on a single data at a time.

Image title

In our study, we compare three foundational INR architectures (vanillaMLP, SIREN, and FilmSIREN), aiming for \(\sim \%40\) feature reduction as the starting point of the experiment, and the aim is to strike a balance between the quality of the reconstructed image and the compression ratio. \(P\)s are split into patches (e.g., \(3 \times 64 \times 64\)), a separate model is trained for each patch (initialized from prior weights), and their outputs are combined for full reconstruction. Experiments that we are going to detail in the next section utilize ten different holograms (The purpose of selecting a small but diverse set of initial samples in this study is to demonstrate the comparative trends among different methods. The large-scale validation work will be addressed in the subsequent research.) and turns them into patches by following the choices listed in table. All INRs use Adam (lr=0.0001) with StepLR (gamma=0.5 every 5000 epochs), trained for 5000 epochs.

Conclusions

SIREN and FilmSIREN provide strong compression, outperforming vanillaMLP, with SIREN showing best consistency. In our current experiments in table below, SIREN achieves the highest fidelity with a PSNR of 42.29 dB and SSIM of 0.99 at \(3 \times 64 \times 64\) patch size, while its 3D reconstruction quality (PSNR = 34.54 dB, SSIM = 0.96, LPIPS = 0.10) marginally outperforms FilmSIREN (PSNR = 33.27 dB, SSIM = 0.94, LPIPS = 0.15). Additionally, under identical training schedules, both SIREN and FilmSIREN frequently satisfied the early stopping criterion near 2000 epochs. This consistency implies a relatively smooth optimization process, suggesting that these models can converge effectively without compromising image quality, which is a favorable property in hologram compression task. Computational cost (\(T\) hours/hologram) is justified by SIREN's quality.

Image title

These observations suggest that specialized INR architectures require further investigation for the hologram compression task, potentially opening new solutions for efficient 3D scene representation in holographic displays. Achieving robust compression remains an open challenge; our study guides future work on efficient 3D holographic rendering/storage.

Relevant research works

Here are relevant research works from the authors:

Outreach

We host a Slack group with more than 250 members. This Slack group focuses on the topics of rendering, perception, displays and cameras. The group is open to public and you can become a member by following this link.

Contact Us

Warning

Please reach us through email to provide your feedback and comments.