This is the official repository for Empirical Asset Pricing with Large Language Model Agents.
- First clone the directory.
git submodule init; git submodule update
(If showing error of no permission, need to first add a new SSH key to your GitHub account.)
- Install dependencies.
Create a new environment using conda, with Python >= 3.10.6 Install PyTorch (version >= 2.0.0). The repo is tested with PyTorch version of 1.10.1 and there is no guarentee that other version works. Then install other dependencies via:
pip install -r requirements.txt
- Download news dataset.
Download the WSJ dataset and unzip it.
-
Build the Factor dataset from https://github.com/bkelly-lab/ReplicationCrisis with your CRSP credential, and download the daily return data from https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html for free.
-
Setup your configurations in config.yaml and input your API keys accordingly.
-
Use analysis.py to produce the analysis report features.
-
Use model.py to train the hybrid asset pricing model.
If you use this code in your research, please cite the following paper:
@inproceedings{cheng2025empiricalassetpricinglarge,
title={Empirical Asset Pricing with Large Language Model Agents},
author={Junyan Cheng and Peter Chin},
year={2025},
maintitle={The Thirteenth International Conference on Learning Representations (ICLR)},
booktitle={Advances in Financial AI Workshop},
}