Post
915
Introducing Nova (ICLR’25), foundation models for binary/assembly code. We have also released fine-tuned models for binary code decompilation. Preprint: arxiv.org/pdf/2311.13721 This is our follow-up work on binary analysis after our CCS'24 distinguished paper (https://www.linkedin.com/posts/lintan_resym-harnessing-llms-to-recover-variable-activity-7231749452154159105-sEgj)
Highlights:
1. Nova is built with hierarchical attention specially designed for binary and contrastive learning.
2. Nova is pre-trained on 3B binary and source code tokens.
3. Models: lt-asset/nova-6.7b lt-asset/nova-6.7b-bcr
4. Smaller 1.3B models lt-asset/nova-1.3b lt-asset/nova-1.3b-bcr
Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code.
Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand.
#LLM4Code #LLM #BinaryAnalysis #Security
@jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin
Highlights:
1. Nova is built with hierarchical attention specially designed for binary and contrastive learning.
2. Nova is pre-trained on 3B binary and source code tokens.
3. Models: lt-asset/nova-6.7b lt-asset/nova-6.7b-bcr
4. Smaller 1.3B models lt-asset/nova-1.3b lt-asset/nova-1.3b-bcr
Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code.
Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand.
#LLM4Code #LLM #BinaryAnalysis #Security
@jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin