Abstract
Driven by artificial intelligence and computer vision industries, Graphics Processing Units (GPUs) are now rapidly achieving extraordinary computing power. In particular, the NVIDIA Tegra K1/X1/X2 embedded GPU platforms, which are also treated as edge computing devices, are now widely used in embedded environments such as mobile phones, game consoles, and vehicle-mounted systems to support high-dimension display, auto-pilot, and so on. Meanwhile, with the rise of the Internet of Things (IoT), the demand for cryptographic operations for secure communications and authentications between edge computing nodes and IoT devices is also expanding. In this contribution, instead of the conventional implementations based on FPGA, ASIC, and ARM CPUs, we provide an alternative solution for cryptographic implementation on embedded GPU devices. Targeting the new cipher suite added in TLS 1.3, we implement Edwards25519/448 and Curve25519/448 on an edge computing platform, embedded GPU NVIDIA Tegra X2, where various performance optimizations are customized for the target platform, including a novel parallel method for the register-limited embedded GPUs. With about 15 W of power consumption, it can provide 210k/31k ops/s of Curve25519/448 scalar multiplication, 834k/123k ops/s of fixed-point Edwards25519/448 scalar multiplication, and 150k/22k ops/s of unknown-point one, which are respectively the primitives and main workloads of key agreement, signature generation, and verification of the TLS 1.3 protocol. Our implementations achieve 8 to 26 times speedup of OpenSSL running in the very powerful ARM CPU of the same platform and outperform the state-of-the-art implementations in FPGA by a wide margin with better power efficiency.
- [1] . [n.d.]. AWS CloudHSM User Guide. Retrieved 17 Nov., 2021 from https://docs.aws.amazon.com/cloudhsm/latest/userguide/index.html.Google Scholar
- [2] . 2017. Supersingular isogeny key encapsulation (unpublished).Google Scholar
- [3] US Department of Commerce and National Institute of Standards & Technology. 2012. Secure Hash Standard (SHS). National Institute of Standards & Technology, Gaithersburg, MD.Google Scholar
- [4] . 2007. SP 800-56A. Recommendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm Cryptography (Revised). National Institute of Standards & Technology, Gaithersburg, MD. Google Scholar
Digital Library
- [5] . 2006. Curve25519: New diffie-hellman speed records. In Proceedings of the International Workshop on Public Key Cryptography. Springer, 207–228. Google Scholar
Digital Library
- [6] . 2008. Twisted edwards curves. In Proceedings of the International Conference on Cryptology in Africa. Springer, 389–405. Google Scholar
Digital Library
- [7] . 2009. ECM on graphics cards. In Proceedings of the Annual Conference on Advances in Cryptology (EUROCRYPT’09). Springer, 483–501. Google Scholar
Digital Library
- [8] . 2015. Failures in NIST’s ECC standards. Retrieved 17 Nov., 2021 from https://cr.yp.to/newelliptic/nistecc-20160106.pdf.Google Scholar
- [9] . 2012. NEON crypto. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 320–339. Google Scholar
Digital Library
- [10] . 2012. Low-latency elliptic curve scalar multiplication. Int. J. Parallel Program. 40, 5 (2012), 532–550.Google Scholar
Cross Ref
- [11] . 2012. ECM at work. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security. Springer, 467–484. Google Scholar
Digital Library
- [12] . 2021. Retrieved from https://www.china-briefing.com/news/chinas-double-11-shopping-festival-tests-consumption-strength-after-covid-19/.Google Scholar
- [13] . 2014. High-speed elliptic curve cryptography on the NVIDIA GT200 graphics processing unit. In Proceedings of the International Conference on Information Security Practice and Experience. Springer, 202–216. Google Scholar
Digital Library
- [14] . 2018. Towards high-performance X25519/448 key agreement in general purpose GPUs. In Proceedings of the IEEE Conference on Communications and Network Security.Google Scholar
Cross Ref
- [15] . 2017. Utilizing the double-precision floating-point computing power of GPUs for RSA acceleration. Secur. Commun. Netw. 2017, Article 3508786 (2017).Google Scholar
Cross Ref
- [16] . 2007. A normal form for elliptic curves. Bull. Am. Math. Soc. 44, 3 (2007), 393–422.Google Scholar
Cross Ref
- [17] . 2015. Pushing the performance envelope of modular exponentiation across multiple generations of GPUs. In Proceedings of the IEEE 29th International Parallel and Distributed Processing Symposium (IPDPS’15). Proceedings. Google Scholar
Digital Library
- [18] . 2011. The Internet of Things: How the Next Evolution of the Internet Is Changing Everything. CISCO White Paper (2011), 1–11.Google Scholar
- [19] . 2016. OpenSSL Cryptography and SSL/TLS Toolkit. Retrieved from http://www.openssl.org/.Google Scholar
- [20] 2013. FIPS Pub 186-4: Digital signature standard.DSS. NIST.Google Scholar
- [21] . 2020. DPF-ECC: Accelerating elliptic curve cryptography with floating-point computing power of GPUs. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’20). IEEE, 494–504.Google Scholar
Cross Ref
- [22] . 2015. Decaf: Eliminating cofactors through point compression. In Proceedings of the the 25th Annual Conference on Advances in Cryptology (CRYPTO’15). 705–723.
DOI: DOI: https://doi.org/10.1007/978-3-662-47989-6_34Google ScholarCross Ref
- [23] . 2015. Ed448-goldilocks, a new elliptic curve.IACR Cryptol. ePrint Arch. (2015), 625.Google Scholar
- [24] . 2004. Guide to Elliptic Curve Cryptography. Springer. Google Scholar
Digital Library
- [25] . 2009. Efficient acceleration of asymmetric cryptography on graphics hardware. In Proceedings of the Annual Conference on Cryptology in Africa (AFRICACRYPT’09). Springer, 350–367. Google Scholar
Digital Library
- [26] . 2008. Twisted edwards curves revisited. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security. Springer, 326–343. Google Scholar
Digital Library
- [27] . 2011. SSLShader: Cheap SSL acceleration with commodity processors. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 1–1. Google Scholar
Digital Library
- [28] S. Josefsson and I. Liusvaara. 2017. RFC 8032: Edwards-curve digital signature algorithm (EdDSA). Request for Comments, IETF.Google Scholar
- [29] . 1995. Analysis of sliding window techniques for exponentiation. Comput. Math. Appl. 30, 10 (1995), 17–24.Google Scholar
Cross Ref
- [30] . 2017. Low-latency X25519 hardware implementation: Breaking the 100 microseconds barrier. Microprocess. Microsyst. 52 (2017), 491–497. Google Scholar
Digital Library
- [31] A. Langley, M. Hamburg, and S. Turner. 2016. RFC 7748: Elliptic curves for security. Request for Comments, IETF.Google Scholar
- [32] . 2016. Efficient elliptic curve cryptography for embedded devices. ACM Trans. Embed. Comput. Syst. 16, 2 (2016), 1–18. Google Scholar
Digital Library
- [33] . 2019. Low-cost, low-power FPGA implementation of ED25519 and CURVE25519 point multiplication. Information 10, 9 (2019), 285.Google Scholar
Cross Ref
- [34] . 1987. Speeding the pollard and elliptic curve methods of factorization. Mathematics of Computation 48, 177 (1987), 243–264.Google Scholar
Cross Ref
- [35] . 2011. On the performance of GPU public-key cryptography. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’11). IEEE, 133–140. Google Scholar
Digital Library
- [36] . 2017. CUDA C Programming Guide 9.0. Retrieved from https://docs.nvidia.com/cuda/cuda-c-programming- guide/.Google Scholar
- [37] . 2017. Inline PTX Assembly in CUDA. Retrieved from http://docs.nvidia.com/cuda/inline-ptx-assembly/ index.html.Google Scholar
- [38] . 2017. Parallel Thread Execution ISA Version 6.0. Retrieved from http://docs.nvidia.com/cuda/parallel-thread-exec ution/index.html.Google Scholar
- [39] . 2001. Efficient elliptic curve cryptosystems from a scalar multiplication algorithm with recovery of the y-coordinate on a montgomery-form elliptic curve. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 126–141. Google Scholar
Digital Library
- [40] . 2002. A scalar multiplication algorithm with recovery of the y-coordinate on the montgomery form and analysis of efficiency for elliptic curve cryptosystems. IEICE Trans. Fundam. Electr. Commun. Comput. Sci. 85, 1 (2002), 84–93.Google Scholar
- [41] . 2017. An efficient elliptic curve cryptography signature server with GPU acceleration. IEEE Trans. Inf. Forens. Secur. (2017). Google Scholar
Digital Library
- [42] . 2008. Neon Technology Introduction. ARM Corporation.Google Scholar
- [43] E. Rescorla. 2018. RFC 8446: The transport layer security (TLS) protocol version 1.3. Request for Comments, IETF.Google Scholar
- [44] . 2018. Mobile edge computing, fog et al.: A survey and analysis of security threats and challenges. Fut. Gener. Comput. Syst. 78 (2018), 680–698.Google Scholar
Cross Ref
- [45] . 2019. High-speed implementation of ECC scalar multiplication in GF (p) for generic montgomery curves. IEEE Trans. VLSI Syst. 27, 7 (2019), 1587–1600.Google Scholar
Cross Ref
- [46] . 2020. A survey of edge computing-based designs for iot security. Digit. Commun. Netw. 6, 2 (2020), 195–202.Google Scholar
Cross Ref
- [47] . 2016. Edge computing: Vision and challenges. IEEE IoT J. 3, 5 (2016), 637–646.Google Scholar
- [48] . 2008. Exploiting the power of GPUs for asymmetric cryptography. In Proceedings of the Cryptographic Hardware and Embedded Systems Conference (CHES’08). Springer, 79–99. Google Scholar
Digital Library
- [49] . 2019. Compact and flexible FPGA implementation of ed25519 and X25519. ACM Trans. Embed. Comput. Syst. 18, 3 (2019), 1–21. Google Scholar
Digital Library
- [50] . 2018. NVIDIA Tegra. Retrieved from https://en.wikipedia.org/wiki/Tegra.Google Scholar
- [51] . 2019. Edge computing security: State of the art and challenges. Proc. IEEE 107, 8 (2019), 1608–1631.Google Scholar
Cross Ref
- [52] . 2017. PhiOpenSSL: Using the xeon phi coprocessor for efficient cryptographic calculations. In Proceedings of the Parallel and Distributed Processing Symposium. 565–574.Google Scholar
Cross Ref
- [53] . 2018. Data security and privacy-preserving in edge computing paradigm: Survey and open issues. IEEE Access 6 (2018), 18209–18237.Google Scholar
Cross Ref
- [54] . 2014. Exploiting the floating-point computing power of GPUs for RSA. In Proceedings of the 17th International Conference on Information Security (ISC’14). 198–215.Google Scholar
Cross Ref
Index Terms
EC-ECC: Accelerating Elliptic Curve Cryptography for Edge Computing on Embedded GPU TX2
Recommendations
Accelerating PQMRCGSTAB algorithm on GPU
UCHPC-MAW '09: Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshopThe general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We ...
Accelerate video decoding with generic GPU
Most modern computers or game consoles are equipped with powerful yet cost-effective graphics processing units (GPUs) to accelerate graphics operations. Though the graphics engines in these GPUs are specially designed for graphics operations, can we ...
Accelerating aerial image simulation using improved CPU/GPU collaborative computing
Aerial image simulation is a fundamental problem in advanced lithography for chip fabrication. Since it requires a huge number of mathematical computations, an efficient yet accurate implementation becomes a necessity. In the literature, graphic ...






Comments