R2E is a framework that turns any GitHub repository into a programming agent environment. This is achieved by a novel equivalence test generation technique powered by a synergy of program analysis and LLMs. These environments can be used for benchmarking static LLMs and dynamic programming agents that can interact with interpreter on real-world (potentially unseen) codebases. Furthermore, R2E environments can also be used for improving LLMs themselves by fine-tuning models with execution traces on such real-world codebases.
In-File
|
Out-File
|
||||
---|---|---|---|---|---|
Strategy | Val | Cov | Val | Cov | |
Output Pred. | 35.43% | 87.59% | 30.68% | 82.54% | |
Equivalence | 52.37% | 88.18% | 35.01% | 79.65% |
@inproceedings{
jain2024r2e,
title={R2E: Turning any Github Repository into a Programming Agent Environment},
author={Naman Jain and Manish Shetty and Tianjun Zhang and King Han and Koushik Sen and Ion Stoica},
booktitle={ICML 2024},
}