dscript.commands
dscript.commands.embed
See Embedding for full usage details.
Generate new embeddings using pre-trained language model.
-
class
dscript.commands.embed.EmbeddingArguments(cmd, device, outfile, seqs, func)[source] Bases:
NamedTuple-
cmd: str Alias for field number 0
-
device: int Alias for field number 1
-
func: collections.abc.Callable[dscript.commands.embed.EmbeddingArguments, None] Alias for field number 4
-
outfile: str Alias for field number 2
-
seqs: str Alias for field number 3
-
dscript.commands.predict
See Prediction for full usage details.
Make new predictions with a pre-trained model using blocked, multi-GPU pariwise inference. One of –proteins and –pairs is required.
-
class
dscript.commands.predict_block.BlockedPredictionArguments(cmd, protins, pairs, model, embeddings, foldseek_fasta, outfile, device, thresh, load_proc, blocks, func)[source] Bases:
NamedTuple-
blocks: int | None Alias for field number 10
-
cmd: str Alias for field number 0
-
device: str | None Alias for field number 7
-
embeddings: str Alias for field number 4
-
foldseek_fasta: str | None Alias for field number 5
-
func: collections.abc.Callable[dscript.commands.predict_block.BlockedPredictionArguments, None] Alias for field number 11
-
load_proc: int | None Alias for field number 9
-
model: str | None Alias for field number 3
-
outfile: str | None Alias for field number 6
-
pairs: str | None Alias for field number 2
-
protins: str | None Alias for field number 1
-
thresh: float | None Alias for field number 8
-
Make new predictions with a pre-trained model using legacy (serial) inference. One of –seqs or –embeddings is required.
-
class
dscript.commands.predict_serial.PredictionArguments(cmd, device, embeddings, outfile, seqs, model, thresh, load_proc, func)[source] Bases:
NamedTuple-
cmd: str Alias for field number 0
-
device: int Alias for field number 1
-
embeddings: str | None Alias for field number 2
-
func: collections.abc.Callable[dscript.commands.predict_serial.PredictionArguments, None] Alias for field number 8
-
load_proc: int | None Alias for field number 7
-
model: str | None Alias for field number 5
-
outfile: str | None Alias for field number 3
-
seqs: str Alias for field number 4
-
thresh: float | None Alias for field number 6
-
dscript.commands.train
See Training for full usage details.
Train a new model.
-
class
dscript.commands.train.TrainArguments(cmd, device, train, test, embedding, no_augment, input_dim, projection_dim, dropout, hidden_dim, kernel_width, no_w, no_sigmoid, do_pool, pool_width, num_epochs, batch_size, weight_decay, lr, interaction_weight, run_tt, glider_weight, glider_thresh, outfile, save_prefix, checkpoint, seed, func)[source] Bases:
NamedTuple-
batch_size: int Alias for field number 16
-
checkpoint: str | None Alias for field number 25
-
cmd: str Alias for field number 0
-
device: int Alias for field number 1
-
do_pool: bool Alias for field number 13
-
dropout: float Alias for field number 8
-
embedding: str Alias for field number 4
-
func: collections.abc.Callable[dscript.commands.train.TrainArguments, None] Alias for field number 27
-
glider_thresh: float Alias for field number 22
-
glider_weight: float Alias for field number 21
Alias for field number 9
-
input_dim: int Alias for field number 6
-
interaction_weight: float Alias for field number 19
-
kernel_width: int Alias for field number 10
-
lr: float Alias for field number 18
-
no_augment: bool Alias for field number 5
-
no_sigmoid: bool Alias for field number 12
-
no_w: bool Alias for field number 11
-
num_epochs: int Alias for field number 15
-
outfile: str | None Alias for field number 23
-
pool_width: int Alias for field number 14
-
projection_dim: int Alias for field number 7
-
run_tt: bool Alias for field number 20
-
save_prefix: str | None Alias for field number 24
-
seed: int | None Alias for field number 26
-
test: str Alias for field number 3
-
train: str Alias for field number 2
-
weight_decay: float Alias for field number 17
-
-
dscript.commands.train.interaction_eval(model, test_iterator, tensors, use_cuda, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source] Evaluate test data set performance.
- Parameters
model (dscript.models.interaction.ModelInteraction) – Model to be trained
test_iterator (torch.utils.data.DataLoader) – Test data iterator
tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings
use_cuda (bool) – Whether to use GPU
- Returns
(Loss, number correct, mean square error, precision, recall, F1 Score, AUPR)
- Return type
(torch.Tensor, int, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor)
-
dscript.commands.train.interaction_grad(model, n0, n1, y, tensors, accuracy_weight=0.35, run_tt=False, glider_weight=0, glider_map=None, glider_mat=None, use_cuda=True, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source] Compute gradient and backpropagate loss for a batch.
- Parameters
model (dscript.models.interaction.ModelInteraction) – Model to be trained
n0 (list[str]) – First protein names
n1 (list[str]) – Second protein names
y (torch.Tensor) – Interaction labels
tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings
accuracy_weight (float) – Weight on the accuracy objective. Representation loss is \(1 - \text{accuracy_weight}\).
run_tt (bool) – Use GLIDE top-down supervision
glider_weight (float) – Weight on the GLIDE objective loss. Accuracy loss is \((\text{GLIDER_BCE}*\text{glider_weight}) + (\text{D-SCRIPT_BCE}*(1-\text{glider_weight}))\).
glider_map (dict[str, int]) – Map from protein identifier to index
glider_mat (np.ndarray) – Matrix with pairwise GLIDE scores
use_cuda (bool) – Whether to use GPU
- Returns
(Loss, number correct, mean square error, batch size)
- Return type
(torch.Tensor, int, torch.Tensor, int)
-
dscript.commands.train.predict_cmap_interaction(model, n0, n1, tensors, use_cuda, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source] Predict whether a list of protein pairs will interact, as well as their contact map.
- Parameters
model (dscript.models.interaction.ModelInteraction) – Model to be trained
n0 (list[str]) – First protein names
n1 (list[str]) – Second protein names
tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings
use_cuda (bool) – Whether to use GPU
-
dscript.commands.train.predict_interaction(model, n0, n1, tensors, use_cuda, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source] Predict whether a list of protein pairs will interact.
- Parameters
model (dscript.models.interaction.ModelInteraction) – Model to be trained
n0 (list[str]) – First protein names
n1 (list[str]) – Second protein names
tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings
use_cuda (bool) – Whether to use GPU
dscript.commands.evaluate
See Evaluation for full usage details.
Evaluate a trained model.
-
class
dscript.commands.evaluate.EvaluateArguments(cmd, device, model, embedding, test, func)[source] Bases:
NamedTuple-
cmd: str Alias for field number 0
-
device: int Alias for field number 1
-
embedding: str Alias for field number 3
-
func: collections.abc.Callable[dscript.commands.evaluate.EvaluateArguments, None] Alias for field number 5
-
model: str Alias for field number 2
-
test: str Alias for field number 4
-
-
dscript.commands.evaluate.get_foldseek_onehot(n0, size_n0, fold_record, fold_vocab)[source] fold_record is just a dictionary {ensembl_gene_name => foldseek_sequence}
-
dscript.commands.evaluate.plot_eval_predictions(labels, predictions, path='figure')[source] Plot histogram of positive and negative predictions, precision-recall curve, and receiver operating characteristic curve.
- Parameters
y (np.ndarray) – Labels
phat (np.ndarray) – Predicted probabilities
path (str) – File prefix for plots to be saved to [default: figure]
dscript.commands.extract_3di
See Extract 3Di for full usage details.
-
class
dscript.commands.extract_3di.Extract3DiArguments(cmd, pdb_directory, out_file, foldseek_path, func)[source] Bases:
NamedTuple-
cmd: str Alias for field number 0
-
foldseek_path: str Alias for field number 3
-
func: collections.abc.Callable[dscript.commands.extract_3di.Extract3DiArguments, None] Alias for field number 4
-
out_file: str Alias for field number 2
-
pdb_directory: str Alias for field number 1
-