|
| 1 | +# OpenAlphaDiffract |
| 2 | + |
| 3 | +OpenAlphaDiffract is an open-source implementation of the AlphaDiffract research project. It provides a reproducible pipeline to: |
| 4 | +- Create a diffraction dataset from the Materials Project (CIF inputs) |
| 5 | +- Simulate powder diffraction patterns from those structures |
| 6 | +- Train and evaluate models on the generated dataset |
| 7 | + |
| 8 | +For ease of use, a HF endpoint exists [TODO]. |
| 9 | + |
| 10 | +## Inference Quickstart |
| 11 | +[TODO]: (probably hosting on HF) |
| 12 | +[TODO]: Minimal local install with the trainer container |
| 13 | + |
| 14 | +## Dataset Pipeline Overview |
| 15 | + |
| 16 | +1. Acquire CIFs (Downloader Container) |
| 17 | + - Uses the Materials Project API to fetch crystal structures as CIF files |
| 18 | + - Configurable via `configs/download.yaml` |
| 19 | + - Filters structures by checking conventional cell consistency across multiple angle tolerances. This filters ~4.4% of MP structures as of 10/22/2025. |
| 20 | + |
| 21 | +2. GSAS-II XRD Simulation (Simulator Container) |
| 22 | + - Generates synthetic powder diffraction patterns from CIFs |
| 23 | + - Configurable via `configs/simulator.yaml` (e.g., instrument file, noise ranges, job parallelism) |
| 24 | + - Creates .npy files with simulated pattern and metadata ready to be consumed by the training system |
| 25 | + |
| 26 | +3. TODO: Training |
| 27 | + |
| 28 | +## Training from Scratch Quickstart |
| 29 | + |
| 30 | +Prerequisites: |
| 31 | +- Docker and Docker Compose |
| 32 | +- A Materials Project API key |
| 33 | + |
| 34 | +> [!WARNING] |
| 35 | +> Building the dataset and training will take a significant amount of space and computational resources: |
| 36 | +> - Expect to use around 1TB+ of space in total to replicate the paper's 100-variation dataset |
| 37 | +> - We recommend running simulation with ~100 processes in parallel. For reference [XYZ] this should take [XYZ hours]. |
| 38 | +> - Training took [XYZ hours] on [XYZ hardware] |
| 39 | +
|
| 40 | + |
| 41 | +Setup: |
| 42 | +1. Copy the environment file and set your API key: |
| 43 | + - `cp .env.example .env` |
| 44 | + - Edit `.env` and set `MP_API_KEY` |
| 45 | + - Optionally set `UID` and `GID` so the containers write files as your user. |
| 46 | + |
| 47 | +2. Download CIFs: |
| 48 | + - `scripts/download.sh` |
| 49 | + - CIFs will be written to `./data/raw_cif` |
| 50 | + |
| 51 | +3. Simulate diffraction patterns: |
| 52 | + - `scripts/simulate.sh` |
| 53 | + - Patterns will be written to `./data/dataset` |
| 54 | + - Errors (if any) go to `./data/error_logs` |
| 55 | + |
| 56 | +Notes: |
| 57 | +- You can pass extra CLI args to the simulator via `scripts/simulate.sh`, e.g. `--sims_per_file 1 --parallel_jobs 4` |
| 58 | +- The default container commands and mounts are defined in `compose.yaml` |
| 59 | + |
| 60 | +## Project Structure |
| 61 | + |
| 62 | +``` |
| 63 | +OpenAlphaDiffract/ |
| 64 | +├── configs/ - Pipeline configuration files |
| 65 | +├── docker/ - Container definitions |
| 66 | +├── scripts/ - User-facing scripts |
| 67 | +├── src/ - Source code for pipeline components |
| 68 | +│ ├── downloader/ |
| 69 | +│ └── simulator/ |
| 70 | +
|
| 71 | +``` |
| 72 | + |
| 73 | + |
| 74 | + |
| 75 | +## Citation |
| 76 | + |
| 77 | +We hope this code was helpful to your work! If you use our code or extend our work, please consider citing our paper: |
0 commit comments