PaperHub
7.2
/10
Spotlight4 位审稿人
最低3最高4标准差0.4
3
4
4
4
ICML 2025

LotteryCodec: Searching the Implicit Representation in a Random Network for Low-Complexity Image Compression

OpenReviewPDF
提交: 2025-01-24更新: 2025-08-16

摘要

关键词
Implicit neural representationsource codingoverfitted image compressionlottery codec hypothesislow-complexity image codec.

评审与讨论

审稿意见
3

This paper investigate the lottery ticket hypothesis for implicit representation based image compression.

It proposes to overfits a binary mask and modulation vectors to the source image, and then leverages a randomly initialized neural network to generate the reconstruction.

The proposed LotteryCodec achieves state-of-the-art performance among overfitted image codecs designed for single-image compression at a reduced computational cost.

Additionally, LotteryCodec can adjust its decoding complexity by varying the mask ratio, providing flexible solutions for diverse computational and performance needs.

给作者的问题

This is a good paper regarding ideas.

There are some concerns regarding experiments and metrics, listed as (1-5) in previous sections. I am ready to increase the score if my concerns can be solved.

论据与证据

Yes. The manuscript is well written and logical.

方法与评估标准

(1) Only bpp-PSNR is compared. It is suggested to add MS-SSIM as evaluation metric.

(2) MACs/pixel is not a reliable metric for the real running complexity of neural networks. For neural applications, IO might occupy most of the latency. The decoding pipeline of the proposed method is more complex than the original INRs, as shown in figure 4. It is better to compare the latency with baseline INRs C3 on a BD-rate decoding latency curve.

(3) Encoding latency should be compared with baseline C3.

理论论述

No theoretical claims.

实验设计与分析

The experiment is well designed, and the anlysis is clear and thorough.

补充材料

I did not read the source code provided in the supplementary material.

(4) More visual comparisons could be included in the appendix.

与现有文献的关系

Related to many lottery ticket papers, which are already properly discussed in the manuscript.

遗漏的重要参考文献

Missing important citations: Is overfitting necessary for implicit video representation? ICML 2023

(5) This previous icml paper investigates the same topic of lottery hypothesis for implicit representation. The difference should be properly discussed.

其他优缺点

None.

其他意见或建议

L178, networt

作者回复

We appreciate the reviewer’s valuable comments and for highlighting Choi’s ICML 2023 paper. Our detailed responses to each comment are as follows:

  • (1). Following the suggestion, we have conducted additional MS-SSIM experiments on the Kodak dataset. The results, presented in Table 4.1, demonstrate that our method consistently outperforms VTM, achieving up to a -43.81% reduction in BD-rate, closely matching ELIC's performance. We note that previous overfitted codecs only focus on PSNR, and direct MS-SSIM optimization is unstable for those baselines (as reported in C3 and its extension work (Fig. 6 and line 3 in [1])).

    [1]. Ballé, Jona, et al. ``Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion.''

Table 4.1 MS-SSIM / bpp performance on Kodak dataset

MS-SSIM1 / bpp1MS-SSIM2 / bpp2MS-SSIM3 / bpp3MS-SSIM4 / bpp4BD-rate vs. VTM (%)
VTM13.10 / 0.21214.31 / 0.28716.75 / 0.49218.56 / 0.7040
LotteryCodec13.85 / 0.15316.78 / 0.27519.46 / 0.47322.70 / 0.853-43.81
ELIC12.22 / 0.09115.86 / 0.21518.83 / 0.39421.67 / 0.667-44.60
MLIC+14.91 / 0.14816.53 / 0.21418.20 / 0.30719.77 / 0.425-52.75
  • (2-3). We have compared the encoding (NVIDIA L40S) and decoding latency of LotteryCodec and its BD-rate against other alternatives (see Table 4.2 with a structured pruning over masking ratio of 0.8). Additional latency results across resolutions are reported in Tables 2.1 (Reviewer bWTB) and an coding exmaple is given in Table 3.2 (Reviewer itee). Given the fast decoding speed of overfitted codecs, we evaluate all overfitted codecs on an Intel Xeon CPU. Overall, our method achieves faster decoding with slightly higher encoding time, compared with other overfitted codecs. Additional analysis is provided in our response to Reviewer itee. We want to note that real-world latency is affected by many uncontrollable factors in a lab setting and can be significantly reduced through various optimization techniques, making fair coding speed comparisons difficult. For example, [2] reported that Cool-chic achieves a 100ms latency using a C API for binary arithmetic coding, while our implementation is slower due to the lack of such optimization. Nonetheless, we recognize the importance of real-world latency and provide these evaluations for a practical perspective. To ensure fairness, all reported results are based on same unoptimized decoding implementations, with no methods using C API optimizations. We expect similar speedups across all methods with these techniques.

    [2] Blard, Théophile, et al. "Overfitted image coding at reduced complexity." 2024.

Table 4.2 Coding time for Kodak images

ModelsEncoding timeDecoding timeBD rate
Traditional codecCPU (s)CPU (ms)-
VTM85.53352.520
AE-based codecGPU (ms)GPU (ms)-
EVC (S/M/L)20.23/32.21/51.3518.82/23.73/32.563.3% / -0.8% / -1.9%
MLIC+205.60271.31-13.19%
Overfitted codecGPU(sec / 1k steps)CPU (ms)-
LotteryCodec (d=8/16/24)13.86/14.64/14.92261.3/267.5/278.3-3.64%
C3 (d=12/18/24)13.10/13.98/14.32272.1/284.6/295.0+3.24%
  • (4). We have conducted extensive additional ablation studies in the rebuttal (as shown in these tables) and will include visualizations of these results to support our analysis in our revised manuscript, such as (a). impact of each component (Tables 3.1 from Reviewer itee ) and (b). training latency vs. performance (Tables 3.2 from Reviewer itee), (c) visual comparison for MS-SSIM results (Table 4.1 here). For clarity, here we provide the corresponding numerical results in tabular form in the above rebuttal.
  • (5). We will add a discussion over Choi et al.’s ICML 2023 paper in our revised manuscript. Here, we highlight the key differences between their approach and ours: While both studies leverage the Lottery Ticket Hypothesis (LTH) for INRs, Choi et al. apply LTH to video representation using image-wise encoding with multiple supermask overlays and unpruned biases, boosting representation at the cost of increased complexity and bit rate. In contrast, our LotteryCodec adopts a pixel-wise model and focuses on low-complexity image compression problem. We introduce mechanisms such as Fourier initialization and Rewind modulation to enhance rate-distortion performance, distinguishing our approach from Choi’s. Although Choi’s method is a novel contribution to video representation, it still falls short of state-of-the-art compression techniques. We will also add one paragraph to discuss the potential extension of our work for video compression (see response to Reviewer bwTB for more details).
  • (6). We have proofread the manuscript again and corrected the typos.
审稿人评论

Thanks for the reply, I raise my score. Please include these important new results in the manuscript

审稿意见
4

This paper introduces LotteryCodec, a novel, low-complexity image compression scheme based on overfitting. LotteryCodec effectively overfits a binary mask of an over-parameterized, randomly initialized network to an image, achieving high-performance compression. To enhance its performance, techniques such as Fourier initialization and rewind modulation are proposed. Extensive experimental results demonstrate LotteryCodec's high compression ratio and low decoding complexity.

update after rebuttal

As discussed below, I maintain my positive score.

给作者的问题

I have no additional questions.

论据与证据

I have no concern on this part.

方法与评估标准

I have no concern on this part.

理论论述

I have no concern on this part.

实验设计与分析

  1. While LotteryCodec achieves low decoding MACs, MACs alone do not fully represent decoding complexity. In practice, factors such as peak memory usage and, especially, decoding speed play crucial roles in determining complexity. A comparison with other schemes (e.g., C3, EVC, and ELIC) on these factors would provide a more comprehensive understanding.

  2. Beyond decoding complexity, encoding complexity also requires clarification. Compared to other overfitted image codecs (e.g., C3 or COOL-CHIC), does LotteryCodec require more or less encoding time?

补充材料

I review the full appendix.

与现有文献的关系

I have no concern on this part.

遗漏的重要参考文献

I have no concern on this part.

其他优缺点

Strengths :

  1. The idea of overfitting a binary mask of an over-parameterized, randomly initialized network to an image is novel, introducing a new paradigm for overfitted image compression.

  2. Experimental results effectively validate the LotteryCodec hypothesis

  3. The compression performance is excellent for a low-complexity overfitted image codec.

Weakness :

  1. The discussion on complexity is insufficient, as noted in 'Experimental Designs and Analyses'.

  2. While experiments support the LotteryCodec hypothesis, the paper lacks a qualitative analysis explaining why LotteryCodec is superior to previous overfitted codecs like C3.

  3. Compared to C3, LotteryCodec differs in both the synthesis network and ModNet. An ablation study on removing ModNet could help clarify the impact of each modification.

其他意见或建议

I have no other comments.

作者回复

We appreciate the reviewer's valuable comments. Our responses to the reviewer's main concerns are as follows:

  • Encoding/decoding complexity. Practical encoding/decoding time and peak memory usage across images with various resolutions are reported in Table 2.1 (see our response to Reviewer bWTB), with additional coding speed results reported in Table 4.2 (response to Reviewer ynP1) and Table 3.2 (this response). All of these results are based on unoptimized research code and current hardware, which can be significantly improved with proper engineering optimization (e.g., C API, optimized wavefront decoding). Overall, our method has a slightly longer encoding time than other overfitted codecs due to additional gradient-based mask learning process, but it offers greater flexibility and faster decoding. Notably, the lottery codec hypothesis (LCH) can provide potential for parallel encoding by re-parameterizing distinct network optimization into batch-wise mask learning, highlighting its advantage of scalability for efficient large-scale image encoding.

  • Qualitative analysis. In addition to experimental evidence, we provide a rough analysis of the LCH to explain why it is likely to hold (see our response to Reviewer bWTB). Based on the LCH, we can intuitively justify why the proposed LotteryCodec outperforms previous overfitted codecs in terms of rate-distortion performance. The rate formulations for overfitted codecs and our LotteryCodec are given in Eqs. (2) and (5), respectively. They show that the rate of overfitted codecs depends on {z^,ψ^,W^}\{\hat{z}, \hat{\psi}, \hat{W}\}, while our method is determined by {z^,ψ^,τ,θ^}\{\hat{z}, \hat{\psi}, \tau, \hat{\theta}\}. According to LCH, to achieve the same level of distortion, we can find a pair of (z^,τ)(\hat{z},\tau) such that the bit cost for z^\hat{z} and ψ^\hat{\psi} is equal to that of overfitted codecs. While each quantized parameter in W^\hat{W} typically requires over 13 bits, our binary mask τ\tau uses just 1 bit per entry. Despite its higher dimensionality, τ\tau contributes significantly less to the total rate. Moreover, since θ^\hat{\theta} is lightweight, the combined rate of τ\tau and θ^\hat{\theta} remains lower than that of W^\hat{W}, resulting in a lower compression rate and improved RD performance.

  • Ablation study. We conducted additional ablation studies to clarify the impact of each component in our design. As shown in Table 3.1 below, removing the Supermask network and using only the modulation network increases BD-rate by +12.45%, highlighting the importance of the random network. Removing ModNet and directly feeding z{z} into the random network (with different overparameterization configurations) results in a performance drop of up to +14.99% due to high overparameterization costs. Additional ablation studies and visualizations of other components are provided in Table 4 of Appendix C in our original paper.

Table 3.1 BD-rate change due to removal of individual components from LotteryCodec

LotteryCodecw/o SuperMaskRandom network w/o ModNet: (4,32)/(4,48)/(4,64)
0+12.45%+13.02% / +11.98% / +14.99%

Table 3.2 Encoding cost for a 2K image as an example

(Size: 1292 × 1945, “davide-ragusa-716” in CLIC2020, optimal PSNR: 37.18 at bpp 0.196, d=24d=24, ratio=0.2, peak memory: 5.64 GB)
(10–20k steps can yield decent performance)

Training StepsTraining Time (s)bppPSNR (dB)
5k6780.2436.51
10k13470.2236.92
20k26850.2137.02
30k40260.2037.10
50k67330.19937.14
审稿人评论

Thank the authors for the rebuttal. My concerns have been well addressed. Please ensure these results are included in the camera-ready version.

审稿意见
4

The paper presents LotteryCodec, a new method for single-image compression that builds on the idea that large, randomly initialized neural networks contain subnetworks capable of matching the performance of fully trained networks. Concretely, instead of training and transmitting all synthesis network parameters for each image, LotteryCodec transmits only a binary mask (to identify a subnetwork inside a frozen, random network) and a small latent representation. This approach encodes the image’s statistics primarily into the network’s structure (the mask) rather than its weights.

A key contribution is the “lottery codec hypothesis”, which posits that for any standard, overfitted compression model, there exists a subnetwork within a sufficiently large, randomly initialized neural network that can reconstruct the image to a similar distortion at the same or lower bit-rate. The authors reinforce this concept with a “rewind modulation” mechanism that merges a learned latent representation with hierarchical modulations at multiple layers, helping the subnetwork capture image details more effectively.

给作者的问题

  1. In line 215–219 (right column), the paper states that the loss function (4) omits the rate terms for ψ, θ, and τ because their lightweight architectures contribute negligibly to the overall bit rate. Could you provide empirical evidence—such as detailed bit consumption measurements for each of these components—to support this claim?

  2. Regarding the binary mask, is its overhead significant, and how is the mask data compressed in practice?

  3. For network architecture, why is the maximum network width set to 128? What would be the impact of increasing the width further (e.g., to 256 or 512)? For instance, if a 50% mask ratio with a (4,128) network achieves the best performance, what outcome would you expect if a 25% mask ratio is used with a (4,256) network?

  4. Could you provide more formal insights or theoretical bounds to support the lottery codec hypothesis beyond the empirical results?

  5. What are the computational costs or encoding times for the per-image optimization process, and how do these compare with existing overfitted codecs and autoencoder-based codecs? Clarification on encoding speed is important for assessing the method's real-world practicality.

  6. How does the proposed approach scale to ultra-high-resolution images (e.g., 4K or 8K), particularly in terms of memory usage and the complexity of subnetwork search?

论据与证据

Overall, the main claims in the paper are supported by consistent empirical evidence and ablation studies. However, as with many works extending the “lottery ticket” idea, there is no fully rigorous proof of the underlying “lottery codec hypothesis”. While the authors reference existing theory on the strong lottery ticket hypothesis, the paper itself relies on empirical demonstrations rather than a formal proof.

方法与评估标准

Yes. The paper targets the domain of single-image compression, a space where it is standard practice to evaluate models on well-established datasets like Kodak and CLIC. The authors adopt widely used, transparent metrics—PSNR for measuring distortion and BD-rate to compare rate–distortion trade-offs. They also report decoding complexity in terms of multiply-accumulate (MAC) operations per pixel, which directly addresses deployment feasibility on resource-constrained hardware.

理论论述

There is no detailed derivation or proof in the submission.

实验设计与分析

The experiments in the paper are generally designed and analyzed in a manner consistent with standard practices in neural image compression, and they align with expectations for single-image compression research. Here are the main observations regarding experimental soundness:

  • They use popular datasets (Kodak and CLIC2020) and standard metrics (PSNR, BD-rate, MACs/pixel) for evaluation.
  • They compare LotteryCodec with traditional codecs (e.g., VTM, HEVC) and overfitted approaches (e.g., C3, COOL-CHIC).
  • They vary the rate–distortion parameter and mask ratios to explore how performance scales with different network depths and widths.
  • They perform ablations on initialization, modulation methods, and architecture to highlight each component’s contribution.

补充材料

I reviewed the supplementary material. It contains the source code of the proposed method.

与现有文献的关系

The paper builds on the lottery ticket hypothesis (Frankle & Carbin, 2019) by applying it to image compression. It leverages ideas from overfitted codecs (e.g., COIN, COOL-CHIC, C3) to encode images with minimal parameters. The work also incorporates insights on untrained subnetworks (Ramanujan et al., 2020) and uses Fourier initialization to mitigate low-frequency bias in MLPs. Overall, it integrates established concepts from compression, deep learning, and network pruning to reduce decoding complexity while maintaining high performance.

遗漏的重要参考文献

No essential references are omitted.

其他优缺点

Strengths

  • Leverages the lottery ticket hypothesis to utilize untrained subnetworks for image compression.
  • Achieves state-of-the-art rate–distortion performance while drastically lowering the number of operations at decoding time.
  • Adjustable mask ratios allow the method to balance compression performance with computational cost.

Weaknesses

  • Lacks a formal theoretical proof or detailed bounds, relying mainly on empirical evidence.
  • The per-image optimization required for encoding may result in high encoding times, which is not fully addressed.
  • Searching for an optimal subnetwork in a highly over-parameterized network might become challenging for ultra-high-resolution images.
  • The method is demonstrated for single-image compression, with limited discussion on extending it to video or other signals.

其他意见或建议

  • Consider adding a dedicated paragraph (or section) discussing the limitations of per-image optimization speed, especially in practical scenarios.
  • Including a brief discussion on potential extensions to video or multi-view compression could help contextualize broader applications.
作者回复

We thank the reviewer for valuable comments. We first respond to the reviewer's main concerns:

  • W1: Proof of Lottery Codec Hypothesis (LCH). Although a rigorous bound supporting the LCH is not available, we can provide a rough validation based on existing proofs for the Strong Lottery Tickets Hypothesis (SLTH). Suppose a codec gW(z)g_{{W}}({z}) is overfitted to an image SS with distortion σ\sigma. According to SLTH, for any ϵ>0\epsilon>0, there exists a subnetwork within a sufficiently overparameterized network gWg_{W'}, defined by a supermask τ\tau, such that d(gW(z),gWτ(z))ϵd(g_{{W}}({z}),g_{{W'}\odot \tau}({z})) \le \epsilon (suppose dd is a distortion evaluated pixel by pixel). Thus, reconstructing the image SS using gWτ(z)g_{{W'}\odot \tau}({z}) results in a distortion of at most σ+ϵ\sigma+\epsilon. Now, we can further decrease the distortion by optimizing the latent vector over a set of zz' satisfying H(z)=H(z)H(z')=H(z), along with the supermask τ\tau. Since ϵ\epsilon can be made arbitrarily small, it is highly likely that we can find a pair of (τ,z)(\tau',z') such that d(S,gWτ(z))σd(S,g_{{W'}\odot \tau'}({z'}))\le \sigma.
  • W2: We will add the following discussion about the encoding time: "LotteryCodec’s low and flexible decoding cost is particularly beneficial in multi-user streaming scenarios, where encoding can be done once and offline, to support many users decoding the same content. While high encoding complexity remains a key bottleneck for all overfitted codecs, including ours, potential acceleration strategies include meta-learning, mixed-precision training, and neural architecture search. Notably, LotteryCodec also enables parallel encoding for overfitted codec by reparameterizing distinct network learning processes into a batch of mask learning process."
  • W3: To address the reviewer's concern about ultra-high-resolution images, we provide Table 2.1 to detail the training and inference cost across various resolutions (with mask ratio 0.80.8 and ARM model of d=16d=16). An additional example of 2K image encoding is shown in Table 3.2 (responses to Reviewer itee).
  • W4: We will add the following paragraph to discuss the potential extension to video compression: "LotteryCodec can be extended as a flexible alternative for video coding. By sharing modulation across adjacent groups of frames (GoF) and applying distinct/weighted masks, it can additionally encode temporal information into the network structure, potentially yielding a lower bit cost. Moreover, video coding enables adaptive mask ratio selection across GoF, offering greater flexibility in both computational complexity and rate control."

Table 2.1: Encoding time for different images. OM means out of memory (>32>32 GB).

Input ResolutionGPU Encoding (sec/1k steps): LotteryCodec vs. C3CPU Decoding (ms): LotteryCodec vs. C3Encoding Peak Memory Usage (GB): LotteryCodec vs. C3 vs. MLIC+
512 × 51210.71 vs. 10.43232.46 vs. 228.430.56 vs. 0.31 vs. 1.98
1024 × 102456.81 vs. 38.54565.22 vs. 576.512.15 vs. 1.24 vs. 3.61
1536 × 1536136.81 vs. 84.79984.01 vs. 1086.924.82 vs. 2.78 vs. 9.15
2048 × 2048257.93 vs. 155.021595.86 vs. 1807.358.53 vs. 4.95 vs. 24.37
2560 × 2560407.68 vs. 237.453003.24 vs. 3269.0213.36 vs. 7.72 vs. OM
3840 × 2160446.09 vs. 301.564014.21 vs. 4216.1116.89 vs. 9.84 vs. OM

Responses to questions:

  • Q1&2: We refer the reviewer to Fig. 13 in Appendix E, which details the cost of each component. In the high-rate regime, the total cost of ψ\psi, θ\theta, and τ\tau accounts for less than 5%. The binary mask τ\tau is compressed via range coding (range-coder on PyPI) with a static distribution.
  • Q3: To validate the lottery codec hypothesis, the (4,128) setting suffices. While wider networks (e.g., 4-256) can reduce distortion without increasing the bit cost for zz (hence can validate the hypothesis), they also raise the bit cost for the mask τ\tau and introduce greater training overhead, which often reduces overall compression efficiency. (An example of different configuration results can be seen in last column of Table. 3.1 for Reviewer itee). This motivates us to design the modulation mechanism.
  • Q4. See our response to W1.
  • Q5&6. We present the coding cost of various schemes in Table 4.2 of our response to Reviewer ynP1, and report resolution-dependent coding costs in Table 2.1 above. The proposed method shows scalability to ultra-high-resolution images, albeit with increased coding time. Note that, significant speedups can be achieved through engineering and hardware optimizations. For example, we can accelerate the method via ONNX/DeepSparse library to reduce the decoding time into 208020–80 ms on a CPU. Additional techniques, such as symmetric/separable kernels, filter-based upsampling, and wavefront decoding, can further enhance the speed of overfitted codecs.
审稿人评论

Thank you for the response and the additional results. This is a solid paper, and I will raise my rating to 4.

审稿意见
4

The paper introduces the Lottery Codec hypothesis based on the Lottery Ticket hypothesis and implements an image codec, LotteryCodec, which achieves strong performance and outperforms the best INR-based image codec while maintaining low complexity.

给作者的问题

  • Does the reported MACs/pixel include the masked parameters?
  • In Figures 7a and 7b, is the setting always < 2K MACs/pixel?
  • Both the proposed model and C3 use a set of adaptive settings. In the main experiments (e.g., Figure 7), does the proposed model always use a network and entropy model size that is not larger than C3?

论据与证据

Some claims are not clearly elaborated: For example, in line 260, why can LotteryCodec achieve a lower overall rate compared to overfitted codecs?

方法与评估标准

  • The proposed methods and evaluation criteria are reasonable. The implementation based on the Lottery hypothesis is simple yet effective. It is lightweight while remaining comparable to SOTA models. The evaluation datasets follow common practice in the field.
  • More ablation studies would be helpful. For example, the use of latent variables as modulation vectors differs significantly from C3, where many techniques originate. The authors should compare the performance of the proposed model with modulation removed to clarify whether the performance gain comes from modulation or the lottery ticket-based masking network.
  • The actual encoding/decoding time is not provided. The real runtime of the model is an important factor for the practical application of the proposed codec.

理论论述

The work is more of an application rather than a theoretical contribution, with few theoretical claims. The only issue is that the reason why LotteryCodec outperforms overfitted codecs is not well elaborated (line 260).

实验设计与分析

The experiments designs and analyses are valid. For example, different hyper-parameters (mask ratio) are used for validating the hypothesis.

补充材料

I have reviewed all part of the supplementary material.

与现有文献的关系

The work is mainly related to the neural compression literature, but also the INR’s one. It proposed a new SOTA INR-based image codec.

遗漏的重要参考文献

Lottery ticket hypothesis has been used for video representation/compression, which is highly related to INR-based image compression: "Choi, Hee Min, et al. "Is overfitting necessary for implicit video representation?"

The proposed model is also highly similar with: Mehta, Ishit, et al. "Modulated periodic activations for generalizable local functional representations."

其他优缺点

Overall, I find the work novel and of high quality. The lottery ticket-based codec is lightweight and achieves SOTA performance. The experiments provide a thorough evaluation of the method.

其他意见或建议

作者回复

We thank the reviewer for the valuable comments and recommending two interesting papers. We will add (a) proper discussions of both papers, and (b) suggested ablation studies and tables to the revised manuscript.

For a discussion of Choi et al. (2023), please refer to our response to Reviewer [ynP1]. Regarding Mehta et al. (2021), while their dual-MLP framework also uses a modulation and synthesis network, it targets multi-instance representation rather than compression. Key differences include: (1) our synthesis network is based on the Lottery Ticket Hypothesis; and (2) our ModNet introduces rewind modulation to the synthesis network via concatenation for greater flexibility. Additional ablations over different modulation methods can be seen in Table 4 and Fig. 12 of our original paper.

Responses to the remaining comments:

  • Modifications for clarification: ”As shown in Eqs. (2) and (5), the rate of overfitted codecs depends on {z^,ψ^,W^}\{\hat{z}, \hat{\psi}, \hat{W}\}, while the rate of our method is determined by {z^,ψ^,τ,θ^}\{\hat{z}, \hat{\psi}, \tau, \hat{\theta}\}. According to Lottery Codec Hypothesis (LCH), our bit cost for z^\hat{z} and ψ^\hat{\psi} matches that of standard overfitted codecs. While each quantized parameter in W^\hat{W} typically requires over 13 bits, our binary mask τ\tau uses just 1 bit per entry. Despite its higher dimensionality, τ\tau contributes significantly less rate. Moreover, since θ^\hat{\theta} is lightweight, the combined rate of τ\tau and θ^\hat{\theta} remains lower than that of W^\hat{W}, resulting in improved compression efficiency.“

  • Ablation study. We conducted additional ablation studies to assess the contribution of each component. See Table 3.1 and its discussion in our response to Reviewer [itee].

  • Coding speed. We report coding speed for different baselines and resolutions in Table 4.2 (response to Reviewer [ynP1]) and Table 2.1 (response to Reviewer [bWTB]).

  • Detailed BD-rate results are provided and will be included in the manuscript:

Table 1.1. Detailed BD-rate data points

DatasetLotteryCodecC3MLIC+CSTCOOL-CHIC v2
Kodak-3.64%+3.24%-13.19%3.78%31.65%
CLIC2020-5.89%-2.85%-12.56%11.70%29.30%
  • We will cite the C3 source in the revised manuscript. We would also like to clarify that using either a re-implementated or the original C3 code does not affect the validation of the LCH. The goal of the experiment (Fig. 6) is to demonstrate that, in an overfitted codec setting, the synthesis network can be replaced by a subnetwork of a randomly initialized network while maintaining comparable distortion performance. To ensure a fair comparison, only the synthesis network is replaced, with all other components kept identical to the target overfitted codec structure (see Fig. 8).

Answers to questions:

  • Q1. The current figure reports theoretical minimum complexity, excluding the effect of masked parameters (similar evaluations can be seen in [1-2]). This theoretical lower bound can be approached using sparsity-aware implementations (cuSparse/DeepSparse) on compatible hardware. We adopt this metric to estimate the decoding complexity because both practical MACs and run-time for unstructured sparse networks are heavily influenced by many engineering factors. Thanks to the reviewer’s suggestion, we have decided to also report coding time with a simple structured pruning strategy (see Tables 2.1 of response to Reviewer [bWTB]), showing our decoding efficiency, especially on high-resolution images. Additionally, we provide both theoretical upper and lower bounds on complexity (Table 1.2), where the upper bound includes all operations without any pruning. The practical complexity lies between these bounds, depending on its implementation. Note that compared to the C3 baseline, even a unpruned LotteryCodec can achieve better BD results (-0.1% vs. 3.24%) with comparable complexity (2822 vs. 2626). We will revise the figure using a dashed region to clearly illustrate this range and clarify it in Fig. 1 and Fig. 14 as well. By presenting both the theoretical complexity and the measured run-times, we aim to offer a comprehensive evaluation of our flexible decoding complexity.

    [1]. Han, Song, et al. "Learning both weights and connections for efficient neural network"

    [2]. Han, Song, et al. "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

Table 1.2 Flexible BD-rate vs. MACs/pixel region over Kodak

BD-rate-3.64%-1.8%-0.1%
Lower-bound (Optimal)308325132022
Upper-bound (Non-pruned)373231122822
  • Q2. Experiments in Figs. 7(a)-(b) do not impose such constraints, where model architectures follow their original papers.
  • Q3. Our network is roughly half the size of C3, and our entropy model uses d={8,16,24,32}d = \{8, 16, 24, 32\} vs. C3 (d={12,18,24}d = \{12, 18, 24\}). (Table 1 in our paper for more details)
最终决定

This manuscript propose a compression method using lottery ticket scheme, per-scene optimization based on implicit neural representation. The algorithm improved modulation mechanisms leading to better RD performances compared to VTM and baselines.

All the reviewers agreed the novelty and the soundness of this paper. Reviewer W9j1 found the algorithm is desinged to be lightweight and take a sota position. Reviewers bWTB, itee admit that an idea of this paper adequately utilizes the lottery ticket hypothesis on compression problems. Moreover, all the reviewers noticed that the proposed algorithm lowers computational complexity. The remaining concerns about (1) unreliable metric as MAC/pix, (2) credit on previous researches, and (3) clarities were well resolved after a rebuttal period. After rebuttal phase, all reviewers were satisfied responses from authors and converged to the acceptance of this paper to ICML.

Therefore, I recommend the acceptance of this paper for publication.