Abstract
We present an optimized implementation of the post-quantum Supersingular Isogeny Key Encapsulation (SIKE) for 32-bit ARMv7-A processors supporting NEON engine (i.e., SIMD instruction). Unlike previous SIKE implementations, finite field arithmetic is efficiently implemented in a redundant representation, which avoids carry propagation and pipeline stall. Furthermore, we adopted several state-of-the-art engineering techniques as well as hand-crafted assembly implementation for high performance. Optimized implementations are ported to Microsoft SIKE library written in “a non-redundant representation” and evaluated in high-end 32-bit ARMv7-A processors, such as ARM Cortex-A5, A7, and A15. A full key-exchange execution of SIKEp503 is performed in about 109 million cycles on ARM Cortex-A15 processors (i.e., 54.5 ms @2.0 GHz), which is about 1.58× faster than previous state-of-the-art work presented in CHES’18.
- Reza Azarderakhsh, Matthew Campagna, Craig Costello, Luca De Feo, Basil Hess, Amir Jalali, David Jao, Brian Koziel, Brian LaMacchia, Patrick Longa, Michael Naehrig, Geovandro Pereira, Joost Renes, Vladimir Soukharev, and David Urbanik. 2019. Supersingular Isogeny Key Encapsulation—Submission to the NIST’s Post-Quantum Cryptography Standardization Process, round 2. Retrieved from https://csrc.nist.gov/projects/post-quantum-cryptography/round-2-submissions/SIKE.zip.Google Scholar
- Daniel J. Bernstein. 2009. Batch binary edwards. In Proceedings of the Annual International Cryptology Conference. Springer, 317--336.Google Scholar
Digital Library
- Daniel J. Bernstein, Chitchanok Chuengsatiansup, and Tanja Lange. 2014. Curve41417: Karatsuba revisited. In Proceedings of the Cryptographic Hardware and Embedded Systems (CHES’14). Springer, 316--334.Google Scholar
Digital Library
- Daniel J. Bernstein and Peter Schwabe. 2012. NEON crypto. In Proceedings of the Cryptographic Hardware and Embedded Systems (CHES’12). Lecture Notes in Computer Science, Vol. 7428, E. Prouff and P. R. Schaumont (Eds.). Springer, 320--339.Google Scholar
- Joppe W. Bos, Peter L. Montgomery, Daniel Shumow, and Gregory M. Zaverucha. 2013. Montgomery multiplication using vector instructions. In Proceedings of the Selected Areas in Cryptography (SAC’13). Springer, 471--489.Google Scholar
- Craig Costello, Patrick Longa, and Michael Naehrig. 2016. Efficient algorithms for supersingular isogeny diffie-hellman. In Proceedings of the Advances in Cryptology Conference (CRYPTO’16). Lecture Notes in Computer Science, Vol. 9814. Matthew Robshaw and Jonathan Katz (Eds.). Springer, 572--601.Google Scholar
Digital Library
- Craig Costello, Patrick Longa, and Michael Naehrig. 2016--2018. SIDH Library. Retrieved from https://github.com/Microsoft/PQCrypto-SIDH.Google Scholar
- Steven D. Galbraith, Christophe Petit, Barak Shani, and Yan Bo Ti. 2016. On the security of supersingular isogeny cryptosystems. In Proceedings of Advances in Cryptology: 22nd International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT’16). 63--91.Google Scholar
Cross Ref
- Dennis Hofheinz, Kathrin Hövelmanns, and Eike Kiltz. 2017. A modular analysis of the fujisaki-okamoto transformation. In Proceedings of the 15th International Conference on Theory of Cryptography (TCC’17). 341--371.Google Scholar
Cross Ref
- Amir Jalali, Reza Azarderakhsh, and Mehran Mozaffari Kermani. 2018. NEON SIKE: Supersingular isogeny key encapsulation on ARMv7. In Proceedings of the International Conference on Security, Privacy, and Applied Cryptography Engineering. Springer, 37--51.Google Scholar
Cross Ref
- Amir Jalali, Reza Azarderakhsh, Mehran Mozaffari Kermani, and Daivd Jao. 2017. Supersingular isogeny diffie-hellman key exchange on 64-bit ARM. IEEE Trans. Depend. Sec. Comput. 16, 5 (2017), 902--912.Google Scholar
Cross Ref
- David Jao and Luca De Feo. 2011. Towards quantum-resistant cryptosystems from supersingular elliptic curve isogenies. In Proceedings of the Post-Quantum Cryptography (PQCrypto’11), Lecture Notes in Computer Science, Vol. 7071. Bo-Yin Yang (Ed.). Springer, 19--34.Google Scholar
- Philipp Koppermann, Eduard Pop, Johann Heyszl, and Georg Sigl. 2018. 18 Seconds to Key Exchange: Limitations of Supersingular Isogeny Diffie-Hellman on Embedded Devices. Cryptology ePrint Archive, Report 2018/932. Retrieved from https://eprint.iacr.org/2018/932.Google Scholar
- Brian Koziel, A-Bon Ackie, Rami El Khatib, Reza Azarderakhsh, and Mehran Mozaffari Kermani. 2020. SIKE’d Up: Fast hardware architectures for supersingular isogeny key encapsulation. IEEE Trans. Circ. Syst. I: Regul. Pap. 67, 12 (2020), 4842--4854.Google Scholar
Cross Ref
- Brian Koziel, Amir Jalali, Reza Azarderakhsh, David Jao, and Mehran Mozaffari-Kermani. 2016. NEON-SIDH: Efficient implementation of supersingular isogeny diffie-hellman key exchange protocol on ARM. In Proceedings of the International Conference on Cryptology and Network Security (CANS’16). Springer, 88--103.Google Scholar
Cross Ref
- Weiqiang Liu, Jian Ni, Zhe Liu, Chunyang Liu, and Máire O’Neill. 2019a. Optimized modular multiplication for supersingular isogeny diffie-hellman. IEEE Trans. Comput. 68, 8 (2019), 1249--1255.Google Scholar
Digital Library
- Weiqiang Liu, Ziying Ni, Jian Ni, Ciara Rafferty, and Máire O’Neill. 2019b. High performance modular multiplication for SIDH. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 39, 10 (2019), 3118--3122.Google Scholar
Cross Ref
- Patrick Longa. 2016. Four NEON: Faster elliptic curve scalar multiplications on ARM processors. In Proceedings of the International Conference on Selected Areas in Cryptography. Springer, 501--519.Google Scholar
- Paulo Martins and Leonel Sousa. 2014. On the evaluation of multi-core systems with SIMD engines for public-key cryptography. In Proceedings of the Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW’14). IEEE, 48--53.Google Scholar
Digital Library
- Paulo Martins and Leonel Sousa. 2015. Stretching the limits of programmable embedded devices for public-key cryptography. In Proceedings of the Workshop on Cryptography and Security in Computing Systems. ACM, 19.Google Scholar
Digital Library
- Peter L. Montgomery. 1985. Modular multiplication without trial division. Math. Comp. 44, 170 (1985), 519--521.Google Scholar
Cross Ref
- NIST. 2017--2019. Post-Quantum Cryptography Standardization. Retrieved from https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum-cryptography-standardization.Google Scholar
- Krishna Chaitanya Pabbuleti, Deepak Hanamant Mane, Avinash Desai, Curt Albert, and Patrick Schaumont. 2013. SIMD acceleration of modular arithmetic on contemporary embedded platforms. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’13). IEEE, 1--6.Google Scholar
Cross Ref
- Hwajeong Seo, Mila Anastasova, Amir Jalali, and Reza Azarderakhsh. 2020. Supersingular isogeny key encapsulation (SIKE) round 2 on ARM Cortex-M4.IACR Cryptol. IEEE Transactions on Computers. Early Access.Google Scholar
Cross Ref
- Hwajeong Seo, Amir Jalali, and Reza Azarderakhsh. 2019b. Optimized SIKE round 2 on 64-bit ARM. In Proceedings of the World Conference on Information Security Applications (WISA’19). Springer.Google Scholar
- Hwajeong Seo, Amir Jalali, and Reza Azarderakhsh. 2019a. SIKE round 2 speed record on ARM Cortex-M4. In Proceedings of the International Conference on Cryptology and Network Security. Springer, 39--60.Google Scholar
Cross Ref
- Hwajeong Seo, Zhe Liu, Johann Großschädl, Jongseok Choi, and Howon Kim. 2014. Montgomery modular multiplication on ARM-NEON revisited. In Proceedings of the International Conference on Information Security and Cryptology. Springer, 328--342.Google Scholar
- Hwajeong Seo, Zhe Liu, Johann Großschädl, and Howon Kim. 2016. Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation. Secur. Commun. Netw. 9, 18 (2016), 5401--5411.Google Scholar
Cross Ref
- Hwajeong Seo, Zhe Liu, Patrick Longa, and Zhi Hu. 2018. SIDH on ARM: Faster modular multiplications for faster post-quantum supersingular isogeny key exchange. IACR Trans. Cryptogr. Hardw. Embed. Syst. 1, 3 (2018), 1--20.Google Scholar
- Hwajeong Seo, Zhe Liu, Yasuyuki Nogami, Taehwan Park, Jongseok Choi, Lu Zhou, and Howon Kim. 2015. Faster ECC over (feat. NEON). In Proceedings of the Annual International Conference on Information Security and Cryptology (ICISC’15). Springer, 169--181.Google Scholar
- Peter W. Shor. 1994. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science. IEEE, 124--134.Google Scholar
Digital Library
Index Terms
SIKE in 32-bit ARM Processors Based on Redundant Number System for NIST Level-II
Recommendations
Low overhead dynamic binary translation on ARM
PLDI '17The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Low overhead dynamic binary translation on ARM
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationThe ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Doubling the number of registers on ARM processors
INTERACT '12: Proceedings of the 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT)It is critical that more architectural registers are available to the compiler and programmer, as a small number of architectural registers might hinder the compiler and programmer from producing efficient code. Although modern chip manufacturing ...






Comments