ZPrize competition 2022
Zero-knowledge cryptography is a groundbreaking new technology enabling privacy, interoperability, and scalability for Web 3.0 protocols and applications. Like the DARPA Grand Challenge, select teams will compete for monetary prizes spanning a range of categories. Over $7M in prize money has been committed, along with token grants from sponsoring projects.
All winning submissions will become open-source libraries for the benefit of all. We hope this foundation can support the next generation of decentralized protocols and applications to enable secure, interoperable, and scalable applications for the next-generation web.
With Gautam Botrel from Consensys, we participated in the inaugural edition of ZPrize (2022) in the open division: “ACCELERATING MSM ON MOBILE” as the gnark team. Our submission resulted in a 4.2x speedup over the baseline and won the first place. The prize was $259,147.
ACCELERATING MSM ON MOBILE
Proving on mobile devices is a transformative technology for applications including private transactions, self-sovereign identity, and scalable computation. Multi-scalar multiplication (MSM) is a core operation for producing zkSNARKs, and it is the primary bottleneck and barrier for deployment of proving on mobile. This prize will focus on minimizing latency of computing MSM in native mobile applications.
The goal was to write the fastest MSM over BLS12-377 curve on a mobile device (Samsung Galaxy A13 5G). The specification details can be found here.
Our submission was written in Go, and derived from the gnark-crypto project. On the target device, for a random BLS12-377 G1 MSM ($n = 2^{16}$), we measured:
- ~2309ms for the reference benchmark
- ~509ms for our submission
This is a -77.9% optimisation.
Code and techniques
The code can be found on github here under Apache-2 or MIT license.
We experimented several approaches; here is a description of the key findings for the final ones:
- It uses our in-house
gnark-crypto/bls12377
package, which out of the box performs very well compared to the zprize baseline. The MSM algorithm is described in this paper (to appear in TCHES, Issue 3) and the Go code is documented. We introduced a bls12-377 algorithmic optimization; the “bucket/pippenger” method now uses an optimized twisted edwards extended cordinate system, resulting in a significant performance improvement (~30% on some target). - We perform a static build targetting a 64bit arm linux architecture, which allows without a complicated build procress to run 64bit code on the target device. We copy the output in the armv7 (32bit) destination folder; in a production deployment, Java calling code must at runtime check for the actual CPU architecture and switch to a fallback if it’s 32bit (outside of the scope of the challenge). Note that while the submission spawn a process at each msm call, other ways may turn out more efficient (allocate the verifying key on the stack, communicate with the process with unix sockets, …).
- We hand tuned the field arithmetic for the Multiplication targetting the arm64 architecture. Our pure-go version performed better than the arm assembly one, and resulted in a ~20% speed up on some platforms compared to existing version in gnark-crypto.
- We implemented and optimized a dedicated Squaring algorithm (rather than calling the Multiplication as in gnark-crypto) following our previous work https://hackmd.io/@gnark/modular_multiplication (also described in the TCHES paper), which resulted in significant perf improvement on the target device. This is not used in the twisted edwards extended MSM, only in the parameterized Jacobian version which uses Affine points as input (branch: buckets/jacobian, performance: ~600ms for $2^16$).
- For the target (arm64) we add ~40lines of arm assembly for a small function (
fp.Butterfly(a, b) -> a = a + b; b = a - b
). The perf impact is ~5%, as it speeds up a bit the UnifiedMixedAdd point addition in the buckets (msm). The rest of the submission is compiled from pure Go code; - On this device, our GPU experimentations were not promising.
- We raised an issue to the Golang team. Once the fix is merged into the latest Golang compiler release, we might squeeze an extra 5-10% perf improvement.
Results
The results were annouced on 04-04-2023 in this blog post.
The main sponsor (Ocelot) authored a more detailed announcement for the specific Mobile division in this post.