这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

AuGQ: Augmented quantization granularity to overcome accuracy degradation for sub-byte quantized deep neural networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deployment of neural networks on IoT devices unleashes the potential for various innovative applications, but the sheer size and computation of many deep learning (DL) networks prevented its widespread. Quantization mitigates this issue by reducing model precision, enabling deployment on resource-constrained edge devices. However, at extremely low bit-widths, such as 2-bit and 4-bit, the aggressive compression leads to significant accuracy degradation due to the reduced representational capacity of the neural network. A critical aspect of effective quantization is identifying the range of real values (FP32) that impact model accuracy. To address accuracy loss at sub-byte levels, we introduce Augmented Quantization (AuGQ), a novel granularity technique tailored for low bit-width quantization. AuGQ segments the range of real-valued (FP32) weight and activation distributions into small uniform intervals, applying affine quantization in each interval to enhance accuracy. We evaluated AuGQ using both post-training quantization (PTQ) and quantization-aware training (QAT) methods, achieving accuracy levels comparable to full precision (32-bit) DL networks. Our findings demonstrate that AuGQ is agnostic to the training pipeline and batch normalization folding, distinguishing it from conventional quantization techniques. Furthermore, when integrated into state-of-the-art PTQ algorithms, AuGQ necessitates only 64 training samples for fine-tuning which is \(16\times \) fewer than traditional methods. This reduction facilitates the application of high-accuracy quantization at sub-byte bit-widths, making it suitable for practical IoT deployments and enhancing computational efficiency on edge devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available on request from the corresponding author of references [2, 31, 33, 57].

Code availability

Researchers or interested parties are welcome to contact the corresponding author for further explanation, who may also provide the Python codes upon request.

References

  1. Dutta L, Bharali S (2021) Tinyml meets iot: a comprehensive survey. Internet of Things. 16:100461

    Article  Google Scholar 

  2. Gholami A, Kim S, Dong Z, Yao Z, Mahoney MW, Keutzer K (2021) A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630

  3. Ding X, Zhou X, Guo Y, Han J, Liu J et al (2019) Global sparse momentum sgd for pruning very deep neural networks. Advances in Neural Information Processing Systems. 32

  4. Ji Y, Chen L (2023) Fedqnn: A computation–communication-efficient federated learning framework for iot with low-bitwidth neural network quantization. IEEE Internet Things J 10(3):2494–2507. https://doi.org/10.1109/JIOT.2022.3213650

    Article  Google Scholar 

  5. Ma T, Wang H, Li C (2023) Quantized distributed federated learning for industrial internet of things. IEEE Internet Things J 10(4):3027–3036. https://doi.org/10.1109/JIOT.2021.3139772

    Article  Google Scholar 

  6. Seo S, Lee J, Ko H, Pack S (2023) Situation-aware cluster and quantization level selection algorithm for fast federated learning. IEEE Internet Things J 10(15):13292–13302. https://doi.org/10.1109/JIOT.2023.3262582

    Article  Google Scholar 

  7. Wang Z, Liu X, Huang L, Chen Y, Zhang Y, Lin Z, Wang R (2022) Qsfm: model pruning based on quantified similarity between feature maps for ai on edge. IEEE Internet Things J 9(23):24506–24515. https://doi.org/10.1109/JIOT.2022.3190873

    Article  Google Scholar 

  8. Zawish M, Ashraf N, Ansari RI, Davy S (2023) Energy-aware ai-driven framework for edge-computing-based iot applications. IEEE Internet Things J 10(6):5013–5023. https://doi.org/10.1109/JIOT.2022.3219202

    Article  Google Scholar 

  9. Nagel M, Amjad RA, Van Baalen M, Louizos C, Blankevoort T (2020) Up or down? adaptive rounding for post-training quantization. In: International Conference on Machine Learning, pp 7197–7206. PMLR

  10. Jeon Y, Lee C, Cho E, Ro Y (2022) Mr. biq: post-training non-uniform quantization based on minimizing the reconstruction error. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12329–12338

  11. Choukroun Y, Kravchik E, Yang F, Kisilev P (2019) Low-bit quantization of neural networks for efficient inference. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp 3009–3018. IEEE

  12. Wang P, Chen Q, He X, Cheng J (2020) Towards accurate post-training network quantization via bit-split and stitching. In: International conference on machine learning, pp 9847–9856. PMLR

  13. Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160

  14. Zhang D, Yang J, Ye D, Hua G (2018) Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 365–382

  15. Kim D, Lee J, Ham B (2021) Distance-aware quantization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5271–5280

  16. McKinstry JL, Esser SK, Appuswamy R, Bablani D, Arthur JV, Yildiz IB, Modha DS (2019) Discovering low-precision networks close to full-precision networks for efficient inference. In: 2019 Fifth workshop on energy efficient machine learning and cognitive computing-NeurIPS edition (EMC2-NIPS), pp 6–9. IEEE

  17. Wu B, Wang Y, Zhang P, Tian Y, Vajda P, Keutzer K (2018) Mixed precision quantization of convnets via differentiable neural architecture search. arXiv:1812.00090

  18. Jung S, Son C, Lee S, Son J, Han J-J, Kwak Y, Hwang SJ, Choi C (2019) Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4350–4359

  19. Krishnamoorthi R (2018) Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv:1806.08342

  20. Hinton G, Vinyals O, Dean J et al (2015) Distilling the knowledge in a neural network. arXiv:1503.02531. 2(7)

  21. Shen Z, He Z, Xue X (2019) Meal: multi-model ensemble via adversarial learning. Proceedings of the AAAI Conference on Artificial Intelligence 33:4886–4893

    Article  Google Scholar 

  22. Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV (2019) Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2820–2828

  23. Wu B, Dai X, Zhang P, Wang Y, Sun F, Wu Y, Tian Y, Vajda P, Jia Y, Keutzer K (2019) Fbnet: hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10734–10742

  24. Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 116–131

  25. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  26. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  27. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  28. Choi J, Wang Z, Venkataramani S, Chuang PI-J, Srinivasan V, Gopalakrishnan K (2018) Pact: parameterized clipping activation for quantized neural networks. arXiv:1805.06085

  29. Esser SK, McKinstry JL, Bablani D, Appuswamy R, Modha DS (2019) Learned step size quantization. arXiv:1902.08153

  30. Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2704–2713

  31. Li Y, Gong R, Tan X, Yang Y, Hu P, Zhang Q, Yu F, Wang W, Gu S (2021) Brecq: pushing the limit of post-training quantization by block reconstruction. arXiv:2102.05426

  32. Nahshan Y, Chmiel B, Baskin C, Zheltonozhskii E, Banner R, Bronstein AM, Mendelson A (2021) Loss aware post-training quantization. Mach Learn 110(11):3245–3262

    Article  MathSciNet  Google Scholar 

  33. Rusci M, Capotondi A, Benini L (2020) Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers. Proceedings of Machine Learning and Systems. 2:326–335

    Google Scholar 

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  35. Radosavovic I, Kosaraju RP, Girshick R, He K, Dollár P (2020) Designing network design spaces. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10428–10436

  36. Fang J, Shafiee A, Abdel-Aziz H, Thorsley D, Georgiadis G, Hassoun J (2020) Near-lossless post-training quantization of deep neural networks via a piecewise linear approximation. arXiv:2002.00104. 10:978–3

  37. Fang J, Shafiee A, Abdel-Aziz H, Thorsley D, Georgiadis G, Hassoun JH (2020) Post-training piecewise linear quantization for deep neural networks. In: European conference on computer vision, pp 69–86. Springer

  38. Garg S, Lou J, Jain A, Nahmias M (2021) Dynamic precision analog computing for neural networks. arXiv:2102.06365

  39. He X, Cheng J (2018) Learning compression from limited unlabeled data. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 752–769

  40. Meller E, Finkelstein A, Almog U, Grobman M (2019) Same, same but different: recovering neural network quantization error through weight factorization. In: International conference on machine learning, pp 4486–4495. PMLR

  41. Nagel M, Baalen Mv, Blankevoort T, Welling M (2019) Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1325–1334

  42. Shomron G, Gabbay F, Kurzum S, Weiser U (2021) Post-training sparsity-aware quantization. Adv Neural Inf Process Syst 34:17737–17748

    Google Scholar 

  43. Zhao R, Hu Y, Dotzel J, De Sa C, Zhang Z (2019) Improving neural network quantization without retraining using outlier channel splitting. In: International conference on machine learning, pp 7543–7552. PMLR

  44. Cai Y, Yao Z, Dong Z, Gholami A, Mahoney MW, Keutzer K (2020) Zeroq: a novel zero shot quantization framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13169–13178

  45. Hubara I, Nahshan Y, Hanani Y, Banner R, Soudry D (2020) Improving post training neural quantization: layer-wise calibration and integer programming. arXiv:2006.10518

  46. Gong R, Liu X, Jiang S, Li T, Hu P, Lin J, Yu F, Yan J (2019) Differentiable soft quantization: bridging full-precision and low-bit neural networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4852–4861

  47. Lin D, Talathi S, Annapureddy S (2016) Fixed point quantization of deep convolutional networks. In: International conference on machine learning, pp 2849–2858. PMLR

  48. Choi J, Venkataramani S, Srinivasan VV, Gopalakrishnan K, Wang Z, Chuang P (2019) Accurate and efficient 2-bit quantized neural networks. Proceedings of Machine Learning and Systems. 1:348–359

    Google Scholar 

  49. Li Y, Dong X, Wang W (2019) Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. arXiv:1909.13144

  50. Miyashita D, Lee EH, Murmann B (2016) Convolutional neural networks using logarithmic data representation. arXiv:1603.01025

  51. Yamamoto K (2021) Learnable companding quantization for accurate low-bit neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5029–5038

  52. Zhou A, Yao A, Guo Y, Xu L, Chen Y (2017) Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv:1702.03044

  53. Yamamoto K (2021) Learnable companding quantization for accurate low-bit neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5029–5038

  54. Liu Z, Cheng K-T, Huang D, Xing EP, Shen Z (2022) Nonuniform-to-uniform quantization: towards accurate quantization via generalized straight-through estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4942–4952

  55. Liu Z-G, Mattina M (2019) Learning low-precision neural networks without straight-through estimator (ste). arXiv:1903.01061

  56. Banner R, Nahshan Y, Soudry D (2019) Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in Neural Information Processing Systems 32

  57. Wei X, Gong R, Li Y, Liu X, Yu F (2022) Qdrop: randomly dropping quantization for extremely low-bit post-training quantization. arXiv:2203.05740

  58. Xu K, Li Z, Wang S, Zhang X (2024) Ptmq: post-training multi-bit quantization of neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 38:16193–16201

    Article  Google Scholar 

  59. Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160

  60. Xu K, Han L, Tian Y, Yang S, Zhang X (2023) Eq-net: elastic quantization neural networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1505–1514

  61. See J-C, Ng H-F, Tan H-K, Chang J-J, Mok K-M, Lee W-K, Lin C-Y (2023) Cryptensor: a resource-shared co-processor to accelerate convolutional neural network and polynomial convolution. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

  62. Metz D, Kumar V, Själander M (2023) Bisdu: a bit-serial dot-product unit for microcontrollers. ACM Transactions on Embedded Computing Systems. 22(5):1–22

    Article  Google Scholar 

  63. AskariHemmat M, Dupuis T, Fournier Y, El Zarif N, Cavalcante M, Perotti M, Gürkaynak F, Benini L, Leduc-Primeau F, Savaria Y et al (2023) Quark: an integer risc-v vector processor for sub-byte quantized dnn inference. In: 2023 IEEE International Symposium on Circuits and Systems (ISCAS), pp 1–5. IEEE

  64. Lai L, Suda N, Chandra V (2018) Cmsis-nn: efficient neural network kernels for arm cortex-m cpus. arXiv:1801.06601

  65. Capotondi A, Rusci M, Fariselli M, Benini L (2020) Cmix-nn: mixed low-precision cnn library for memory-constrained edge devices. IEEE Trans Circuits Syst II Express Briefs 67(5):871–875

    Google Scholar 

  66. Mujtaba A, Lee W-K, Hwang SO (2022) Low latency implementations of cnn for resource-constrained iot devices. IEEE Trans Circuits Syst II Express Briefs 69(12):5124–5128

    Google Scholar 

  67. Ganji DC, Ashfaq S, Saboori E, Sah S, Mitra S, Askarihemmat M, Hoffman A, Hassanien A, Leonardon M (2023) Deepgemm: accelerated ultra low-precision inference on cpu architectures using lookup tables. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4655–4663

Download references

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) under Grant RS-2024-00340882 and by the Gachon University research fund under Grant GCU-202404140001.

Author information

Authors and Affiliations

Authors

Contributions

Ahmed Mujtaba was responsible for the design and execution of the overall investigation. Wai Kong Lee was responsible for the investigation related to quantization. Byoung Chul Ko, Hyung Jin Chang and Seong Oun Hwang were responsible in the data curation, supervision, writing and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Seong Oun Hwang.

Ethics declarations

Competing interests

Not applicable

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mujtaba, A., Lee, W.K., Ko, B.C. et al. AuGQ: Augmented quantization granularity to overcome accuracy degradation for sub-byte quantized deep neural networks. Appl Intell 55, 589 (2025). https://doi.org/10.1007/s10489-025-06495-1

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s10489-025-06495-1

Keywords