dscript.commands

dscript.commands.embed

See Embedding for full usage details.

Generate new embeddings using pre-trained language model.

class dscript.commands.embed.EmbeddingArguments(cmd, device, outfile, seqs, func)[source]

Bases: NamedTuple

cmd: str

Alias for field number 0

device: int

Alias for field number 1

func: collections.abc.Callable[dscript.commands.embed.EmbeddingArguments, None]

Alias for field number 4

outfile: str

Alias for field number 2

seqs: str

Alias for field number 3

dscript.commands.predict

See Prediction for full usage details.

Make new predictions with a pre-trained model using blocked, multi-GPU pariwise inference. One of –proteins and –pairs is required.

class dscript.commands.predict_block.BlockedPredictionArguments(cmd, protins, pairs, model, embeddings, foldseek_fasta, outfile, device, thresh, load_proc, blocks, func)[source]

Bases: NamedTuple

blocks: int | None

Alias for field number 10

cmd: str

Alias for field number 0

device: str | None

Alias for field number 7

embeddings: str

Alias for field number 4

foldseek_fasta: str | None

Alias for field number 5

func: collections.abc.Callable[dscript.commands.predict_block.BlockedPredictionArguments, None]

Alias for field number 11

load_proc: int | None

Alias for field number 9

model: str | None

Alias for field number 3

outfile: str | None

Alias for field number 6

pairs: str | None

Alias for field number 2

protins: str | None

Alias for field number 1

thresh: float | None

Alias for field number 8

Make new predictions with a pre-trained model using legacy (serial) inference. One of –seqs or –embeddings is required.

class dscript.commands.predict_serial.PredictionArguments(cmd, device, embeddings, outfile, seqs, model, thresh, load_proc, func)[source]

Bases: NamedTuple

cmd: str

Alias for field number 0

device: int

Alias for field number 1

embeddings: str | None

Alias for field number 2

func: collections.abc.Callable[dscript.commands.predict_serial.PredictionArguments, None]

Alias for field number 8

load_proc: int | None

Alias for field number 7

model: str | None

Alias for field number 5

outfile: str | None

Alias for field number 3

seqs: str

Alias for field number 4

thresh: float | None

Alias for field number 6

dscript.commands.train

See Training for full usage details.

Train a new model.

class dscript.commands.train.TrainArguments(cmd, device, train, test, embedding, no_augment, input_dim, projection_dim, dropout, hidden_dim, kernel_width, no_w, no_sigmoid, do_pool, pool_width, num_epochs, batch_size, weight_decay, lr, interaction_weight, run_tt, glider_weight, glider_thresh, outfile, save_prefix, checkpoint, seed, func)[source]

Bases: NamedTuple

batch_size: int

Alias for field number 16

checkpoint: str | None

Alias for field number 25

cmd: str

Alias for field number 0

device: int

Alias for field number 1

do_pool: bool

Alias for field number 13

dropout: float

Alias for field number 8

embedding: str

Alias for field number 4

func: collections.abc.Callable[dscript.commands.train.TrainArguments, None]

Alias for field number 27

glider_thresh: float

Alias for field number 22

glider_weight: float

Alias for field number 21

hidden_dim: int

Alias for field number 9

input_dim: int

Alias for field number 6

interaction_weight: float

Alias for field number 19

kernel_width: int

Alias for field number 10

lr: float

Alias for field number 18

no_augment: bool

Alias for field number 5

no_sigmoid: bool

Alias for field number 12

no_w: bool

Alias for field number 11

num_epochs: int

Alias for field number 15

outfile: str | None

Alias for field number 23

pool_width: int

Alias for field number 14

projection_dim: int

Alias for field number 7

run_tt: bool

Alias for field number 20

save_prefix: str | None

Alias for field number 24

seed: int | None

Alias for field number 26

test: str

Alias for field number 3

train: str

Alias for field number 2

weight_decay: float

Alias for field number 17

dscript.commands.train.interaction_eval(model, test_iterator, tensors, use_cuda, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source]

Evaluate test data set performance.

Parameters
  • model (dscript.models.interaction.ModelInteraction) – Model to be trained

  • test_iterator (torch.utils.data.DataLoader) – Test data iterator

  • tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings

  • use_cuda (bool) – Whether to use GPU

Returns

(Loss, number correct, mean square error, precision, recall, F1 Score, AUPR)

Return type

(torch.Tensor, int, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor)

dscript.commands.train.interaction_grad(model, n0, n1, y, tensors, accuracy_weight=0.35, run_tt=False, glider_weight=0, glider_map=None, glider_mat=None, use_cuda=True, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source]

Compute gradient and backpropagate loss for a batch.

Parameters
  • model (dscript.models.interaction.ModelInteraction) – Model to be trained

  • n0 (list[str]) – First protein names

  • n1 (list[str]) – Second protein names

  • y (torch.Tensor) – Interaction labels

  • tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings

  • accuracy_weight (float) – Weight on the accuracy objective. Representation loss is \(1 - \text{accuracy_weight}\).

  • run_tt (bool) – Use GLIDE top-down supervision

  • glider_weight (float) – Weight on the GLIDE objective loss. Accuracy loss is \((\text{GLIDER_BCE}*\text{glider_weight}) + (\text{D-SCRIPT_BCE}*(1-\text{glider_weight}))\).

  • glider_map (dict[str, int]) – Map from protein identifier to index

  • glider_mat (np.ndarray) – Matrix with pairwise GLIDE scores

  • use_cuda (bool) – Whether to use GPU

Returns

(Loss, number correct, mean square error, batch size)

Return type

(torch.Tensor, int, torch.Tensor, int)

dscript.commands.train.predict_cmap_interaction(model, n0, n1, tensors, use_cuda, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source]

Predict whether a list of protein pairs will interact, as well as their contact map.

Parameters
  • model (dscript.models.interaction.ModelInteraction) – Model to be trained

  • n0 (list[str]) – First protein names

  • n1 (list[str]) – Second protein names

  • tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings

  • use_cuda (bool) – Whether to use GPU

dscript.commands.train.predict_interaction(model, n0, n1, tensors, use_cuda, allow_foldseek=False, fold_record=None, fold_vocab=None, add_first=True)[source]

Predict whether a list of protein pairs will interact.

Parameters
  • model (dscript.models.interaction.ModelInteraction) – Model to be trained

  • n0 (list[str]) – First protein names

  • n1 (list[str]) – Second protein names

  • tensors (dict[str, torch.Tensor]) – Dictionary of protein names to embeddings

  • use_cuda (bool) – Whether to use GPU

dscript.commands.train.train_model(args, output)[source]

dscript.commands.evaluate

See Evaluation for full usage details.

Evaluate a trained model.

class dscript.commands.evaluate.EvaluateArguments(cmd, device, model, embedding, test, func)[source]

Bases: NamedTuple

cmd: str

Alias for field number 0

device: int

Alias for field number 1

embedding: str

Alias for field number 3

func: collections.abc.Callable[dscript.commands.evaluate.EvaluateArguments, None]

Alias for field number 5

model: str

Alias for field number 2

test: str

Alias for field number 4

dscript.commands.evaluate.get_foldseek_onehot(n0, size_n0, fold_record, fold_vocab)[source]

fold_record is just a dictionary {ensembl_gene_name => foldseek_sequence}

dscript.commands.evaluate.plot_eval_predictions(labels, predictions, path='figure')[source]

Plot histogram of positive and negative predictions, precision-recall curve, and receiver operating characteristic curve.

Parameters
  • y (np.ndarray) – Labels

  • phat (np.ndarray) – Predicted probabilities

  • path (str) – File prefix for plots to be saved to [default: figure]

dscript.commands.extract_3di

See Extract 3Di for full usage details.

class dscript.commands.extract_3di.Extract3DiArguments(cmd, pdb_directory, out_file, foldseek_path, func)[source]

Bases: NamedTuple

cmd: str

Alias for field number 0

foldseek_path: str

Alias for field number 3

func: collections.abc.Callable[dscript.commands.extract_3di.Extract3DiArguments, None]

Alias for field number 4

out_file: str

Alias for field number 2

pdb_directory: str

Alias for field number 1

dscript.commands.extract_3di.add_args(parser)[source]
dscript.commands.extract_3di.main(args)[source]