Skip to content

Commit cceec9c

Browse files
authored
Merge pull request #97 from CompOmics/feature/api-flexibility
Improve API flexibility, input validation, and robustness
2 parents c6d0e9f + f5c7791 commit cceec9c

15 files changed

Lines changed: 382 additions & 228 deletions

README.md

Lines changed: 19 additions & 175 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
[![Conda](https://img.shields.io/conda/vn/bioconda/deeplc?style=flat-square)](https://bioconda.github.io/recipes/deeplc/README.html)
66
[![GitHub Workflow Status](https://flat.badgen.net/github/checks/compomics/deeplc/)](https://github.com/compomics/deeplc/actions/)
77
[![License](https://flat.badgen.net/github/license/compomics/deeplc)](https://www.apache.org/licenses/LICENSE-2.0)
8-
[![Twitter](https://flat.badgen.net/twitter/follow/compomics?icon=twitter)](https://twitter.com/compomics)
98

10-
DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
9+
10+
DeepLC: Retention time prediction for peptides carrying any modification.
1111

1212
---
1313

@@ -22,21 +22,13 @@ DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
2222
- [Python module](#python-module)
2323
- [Input files](#input-files)
2424
- [Prediction models](#prediction-models)
25-
- [Q&A](#qa)
26-
2725
---
2826

2927
## Introduction
3028

31-
DeepLC is a retention time predictor for (modified) peptides that employs Deep
32-
Learning. Its strength lies in the fact that it can accurately predict
33-
retention times for modified peptides, even if hasn't seen said modification
34-
during training.
29+
DeepLC is a retention time predictor for peptides. Its strength lies in the fact that it can accurately predict retention times for modified peptides, even if hasn't seen said modification during training.
3530

36-
DeepLC can be used through the
37-
[web application](https://iomics.ugent.be/deeplc/),
38-
locally with a graphical user interface (GUI), or as a Python package. In the
39-
latter case, DeepLC can be used from the command line, or as a Python module.
31+
DeepLC can be used through the [web application](https://iomics.ugent.be/deeplc/) or as a Python package. In the latter case, DeepLC can be used from the command line, or as a Python module.
4032

4133
## Citation
4234

@@ -53,29 +45,6 @@ If you use DeepLC for your research, please use the following citation:
5345
Just go to [iomics.ugent.be/deeplc](https://iomics.ugent.be/deeplc/) and get started!
5446

5547

56-
### Graphical user interface
57-
58-
#### In an existing Python environment (cross-platform)
59-
60-
1. In your terminal with Python (>=3.7) installed, run `pip install deeplc[gui]`
61-
2. Start the GUI with the command `deeplc-gui` or `python -m deeplc.gui`
62-
63-
#### Standalone installer (Windows)
64-
65-
[![Download GUI](https://flat.badgen.net/badge/download/GUI/blue)](https://github.com/compomics/DeepLC/releases/latest/)
66-
67-
68-
1. Download the DeepLC installer (`DeepLC-...-Windows-64bit.exe`) from the
69-
[latest release](https://github.com/compomics/DeepLC/releases/latest/)
70-
2. Execute the installer
71-
3. If Windows Smartscreen shows a popup window with "Windows protected your PC",
72-
click on "More info" and then on "Run anyway". You will have to trust us that
73-
DeepLC does not contain any viruses, or you can check the source code 😉
74-
4. Go through the installation steps
75-
5. Start DeepLC!
76-
77-
![GUI screenshot](https://github.com/compomics/DeepLC/raw/master/img/gui-screenshot.png)
78-
7948

8049
### Python package
8150

@@ -180,23 +149,24 @@ For a more elaborate example, see
180149

181150
### Input files
182151

183-
DeepLC expects comma-separated values (CSV) with the following columns:
152+
DeepLC accepts any PSM file format supported by
153+
[psm_utils](https://psm-utils.readthedocs.io/en/stable/api/psm_utils.io.html),
154+
including MaxQuant msms.txt, Sage, MSAmanda, Percolator, and many more. The file
155+
format is automatically inferred from the file extension, or can be specified
156+
explicitly with the `--psm-filetype` option.
184157

185-
- `seq`: unmodified peptide sequences
186-
- `modifications`: MS2PIP-style formatted modifications: Every modification is
187-
listed as `location|name`, separated by a pipe (`|`) between the location, the
188-
name, and other modifications. `location` is an integer counted starting at 1
189-
for the first AA. 0 is reserved for N-terminal modifications, -1 for
190-
C-terminal modifications. `name` has to correspond to a Unimod (PSI-MS) name.
191-
- `tr`: retention time (only required for calibration)
158+
At a minimum, a tab-separated file with a `peptidoform` and `spectrum_id` column
159+
is accepted. Peptidoforms must be in
160+
[ProForma 2.0](https://pubs.acs.org/doi/10.1021/acs.jproteome.1c00771) notation.
161+
For calibration or fine-tuning, a `retention_time` column is also required.
192162

193163
For example:
194164

195-
```csv
196-
seq,modifications,tr
197-
AAGPSLSHTSGGTQSK,,12.1645
198-
AAINQKLIETGER,6|Acetyl,34.095
199-
AANDAGYFNDEMAPIEVKTK,12|Oxidation|18|Acetyl,37.3765
165+
```tsv
166+
spectrum_id peptidoform retention_time
167+
0 AAGPSLSHTSGGTQSK/2 12.16
168+
1 AAINQK[Acetyl]LIETGER/2 34.10
169+
2 AANDAGYFNDEM[Oxidation]APIEVK[Acetyl]TK/3 37.38
200170
```
201171

202172
See
@@ -237,130 +207,4 @@ The different parts refer to:
237207

238208
## Q&A
239209

240-
**__Q: Is it required to indicate fixed modifications in the input file?__**
241-
242-
Yes, even modifications like carbamidomethyl should be in the input file.
243-
244-
**__Q: So DeepLC is able to predict the retention time for any modification?__**
245-
246-
Yes, DeepLC can predict the retention time of any modification. However, if the
247-
modification is **very** different from the peptides the model has seen during
248-
training the accuracy might not be satisfactory for you. For example, if the model
249-
has never seen a phosphor atom before, the accuracy of the prediction is going to
250-
be low.
251-
252-
**__Q: Installation fails. Why?__**
253-
254-
Please make sure to install DeepLC in a path that does not contain spaces. Run
255-
the latest LTS version of Ubuntu or Windows 10. Make sure you have enough disk
256-
space available, surprisingly TensorFlow needs quite a bit of disk space. If
257-
you are still not able to install DeepLC, please feel free to contact us:
258-
259-
Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be
260-
261-
**__Q: I have a special usecase that is not supported. Can you help?__**
262-
263-
Ofcourse, please feel free to contact us:
264-
265-
Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be
266-
267-
**__Q: DeepLC runs out of memory. What can I do?__**
268-
269-
You can try to reduce the batch size. DeepLC should be able to run if the batch size is low
270-
enough, even on machines with only 4 GB of RAM.
271-
272-
**__Q: I have a graphics card, but DeepLC is not using the GPU. Why?__**
273-
274-
For now DeepLC defaults to the CPU instead of the GPU. Clearly, because you want
275-
to use the GPU, you are a power user :-). If you want to make the most of that expensive
276-
GPU, you need to change or remove the following line (at the top) in __deeplc.py__:
277-
278-
```
279-
# Set to force CPU calculations
280-
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
281-
```
282-
283-
Also change the same line in the function __reset_keras()__:
284-
285-
```
286-
# Set to force CPU calculations
287-
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
288-
```
289-
290-
Either remove the line or change to (where the number indicates the number of GPUs):
291-
292-
```
293-
# Set to force CPU calculations
294-
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
295-
```
296-
297-
**__Q: What modification name should I use?__**
298-
299-
The names from unimod are used. The PSI-MS name is used by default, but the Interim name
300-
is used as a fall-back if the PSI-MS name is not available. It should be fine as long as it is support by [proforma](https://pubs.acs.org/doi/10.1021/acs.jproteome.1c00771) and [psm_utils](https://github.com/compomics/psm_utils).
301-
302-
**__Q: I have a modification that is not in unimod. How can I add the modification?__**
303-
304-
Unfortunately since the V3.0 this is not possible any more via the GUI or commandline. You will need to use [psm_utils](https://github.com/compomics/psm_utils), above a minimal example is shown where we convert an identification file into a psm_list which is accepted by DeepLC. Here the sequence can for example include just the composition in proforma format (e.g., SEQUEN[Formula:C12H20O2]CE).
305-
306-
**__Q: Help, all my predictions are between [0,10]. Why?__**
307-
308-
It is likely you did not use calibration. No problem, but the retention times for training
309-
purposes were normalized between [0,10]. This means that you probably need to adjust the
310-
retention time yourselve after analysis or use a calibration set as the input.
311-
312-
313-
**__Q: What does the option `dict_divider` do?__**
314-
315-
This parameter defines the precision to use for fast-lookup of retention times
316-
for calibration. A value of 10 means a precision of 0.1 (and 100 a precision of
317-
0.01) between the calibration anchor points. This parameter does not influence
318-
the precision of the calibration, but setting it too high might mean that there
319-
is bad selection of the models between anchor points. A safe value is usually
320-
higher than 10.
321-
322-
323-
**__Q: What does the option `split_cal` do?__**
324-
325-
The option `split_cal`, or split calibration, sets number of divisions of the
326-
chromatogram for piecewise linear calibration. If the value is set to 10 the
327-
chromatogram is split up into 10 equidistant parts. For each part the median
328-
value of the calibration peptides is selected. These are the anchor points.
329-
Between each anchor point a linear fit is made. This option has no effect when
330-
the pyGAM generalized additive models are used for calibration.
331-
332-
333-
**__Q: How does the ensemble part of DeepLC work?__**
334-
335-
Models within the same directory are grouped if they overlap in their name. The overlap
336-
has to be in their full name, except for the last part of the name after a "_"-character.
337-
338-
The following models will be grouped:
339-
340-
```
341-
full_hc_dia_fixed_mods_a.hdf5
342-
full_hc_dia_fixed_mods_b.hdf5
343-
```
344-
345-
None of the following models will not be grouped:
346-
347-
```
348-
full_hc_dia_fixed_mods2_a.hdf5
349-
full_hc_dia_fixed_mods_b.hdf5
350-
full_hc_dia_fixed_mods_2_b.hdf5
351-
```
352-
353-
**__Q: I would like to take the ensemble average of multiple models, even if they are trained on different datasets. How can I do this?__**
354-
355-
Feel free to experiment! Models within the same directory are grouped if they overlap in
356-
their name. The overlap has to be in their full name, except for the last part of the
357-
name after a "_"-character.
358-
359-
The following models will be grouped:
360-
361-
```
362-
model_dataset1.hdf5
363-
model_dataset2.hdf5
364-
```
365-
366-
So you just need to rename your models.
210+
See the [FAQ](https://deeplc.readthedocs.io/en/latest/faq.html) in the documentation.

deeplc/__init__.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,18 @@
22

33
from importlib.metadata import version
44

5-
from deeplc.core import finetune, finetune_and_predict, predict, predict_and_calibrate, train
5+
from deeplc.core import (
6+
calibrate,
7+
finetune,
8+
finetune_and_predict,
9+
predict,
10+
predict_and_calibrate,
11+
train,
12+
)
613

714
__version__: str = version("deeplc")
815
__all__: list[str] = [
16+
"calibrate",
917
"predict",
1018
"predict_and_calibrate",
1119
"finetune_and_predict",

0 commit comments

Comments
 (0)