API¶
dscript.alphabets¶
-
class
dscript.alphabets.
Alphabet
(chars, encoding=None, mask=False, missing=255)[source]¶ Bases:
object
From Bepler & Berger.
- Parameters
chars (byte str) – List of characters in alphabet
encoding (np.ndarray) – Mapping of characters to numbers [default: encoding]
mask (bool) – Set encoding mask [default: False]
missing (int) – Number to use for a value outside the alphabet [default: 255]
-
decode
(x)[source]¶ Decode numeric encoding to byte string of this alphabet
- Parameters
x (np.ndarray) – Numeric encoding
- Returns
Amino acid string
- Return type
byte str
-
class
dscript.alphabets.
Uniprot21
(mask=False)[source]¶ Bases:
dscript.alphabets.Alphabet
Uniprot 21 Amino Acid Encoding.
From Bepler & Berger.
dscript.fasta¶
dscript.language_model¶
-
dscript.language_model.
embed_from_fasta
(fastaPath, outputPath, device=0, verbose=False)[source]¶ Embed sequences using pre-trained language model from Bepler & Berger.
- Parameters
fastaPath (str) – Input sequence file (
.fasta
format)outputPath (str) – Output embedding file (
.h5
format)device (int) – Compute device to use for embeddings [default: 0]
verbose (bool) – Print embedding progress
-
dscript.language_model.
lm_embed
(sequence, use_cuda=False, verbose=True)[source]¶ Embed a single sequence using pre-trained language model from Bepler & Berger.
- Parameters
sequence (str) – Input sequence to be embedded
use_cuda (bool) – Whether to generate embeddings using GPU device [default: False]
- Returns
Embedded sequence
- Return type
torch.Tensor
dscript.pretrained¶
-
dscript.pretrained.
get_pretrained
(version='human_v1', verbose=True)[source]¶ Get pre-trained model object.
See the documentation for most up-to-date list.
lm_v1
- Language model from Bepler & Berger.human_v1
- Human trained model from D-SCRIPT manuscript.
Default:
human_v1
- Parameters
version (str) – Version of pre-trained model to get
- Returns
Pre-trained model
- Return type
dscript.models.*
-
dscript.pretrained.
get_state_dict
(version='human_v1', verbose=True)[source]¶ Download a pre-trained model if not already exists on local device.
- Parameters
version (str) – Version of trained model to download [default: human_1]
verbose (bool) – Print model download status on stdout [default: True]
- Returns
Path to state dictionary for pre-trained language model
- Return type
str
dscript.glider¶
-
dscript.glider.
compute_cw_score
(p, q, edgedict, ndict, params=None)[source]¶ Computes the common weighted score between p and q.
- Parameters
p – A node of the graph
q – Another node in the graph
edgedict (dict) – A dictionary with key (p, q) and value w.
ndict (dict) – A dictionary with key p and the value a set {p1, p2, …}
params (None) – Should always be none here
- Returns
A real value representing the score
- Return type
float
-
dscript.glider.
compute_cw_score_normalized
(p, q, edgedict, ndict, params=None)[source]¶ Computes the common weighted normalized score between p and q.
- Parameters
p – A node of the graph
q – Another node in the graph
edgedict (dict) – A dictionary with key (p, q) and value w.
ndict (dict) – A dictionary with key p and the value a set {p1, p2, …}
params (None) – Should always be none here
- Returns
A real value representing the score
- Return type
float
-
dscript.glider.
create_edge_dict
(edgelist)[source]¶ Creates an edge dictionary with the edge (p, q) as the key, and weight w as the value.
- Parameters
edgelist (list) – list with elements of form (p, q, w)
- Returns
A dictionary with key (p, q) and value w.
- Return type
dict
-
dscript.glider.
create_neighborhood_dict
(edgelist)[source]¶ Create a dictionary with nodes as key and a list of neighborhood nodes as the value
- Parameters
edgelist (list) – A list with elements of form (p, q, w)
- Returns
neighborhood_dict -> A dictionary with key p and value, a set {p1, p2, p3, …}
- Return type
dict
-
dscript.glider.
densify
(edgelist, dim=None, directed=False)[source]¶ Given an adjacency list for the graph, computes the adjacency matrix.
- Parameters
edgelist (list) – Graph adjacency list
dim (int) – Number of nodes in the graph
directed (bool) – Whether the graph should be treated as directed
- Returns
Graph as an adjacency matrix
- Return type
np.ndarray
-
dscript.glider.
get_dim
(edgelist)[source]¶ Given an adjacency list for a graph, returns the number of nodes in the graph.
- Parameters
edgelist (list) – Graph adjacency list
- Returns
Number of nodes in the graph
- Return type
int
-
dscript.glider.
glide_compute_map
(pos_df, thres_p=0.9, params={})[source]¶ Return glide_mat and glide_map.
- Parameters
pos_df (pd.DataFrame) – Dataframe of weighted edges
thres_p (float) – Threshold to treat an edge as positive
params (dict) – Parameters for GLIDE
- Returns
glide_matrix and corresponding glide_map
- Return type
tuple(np.ndarray, dict)
-
dscript.glider.
glide_predict_links
(edgelist, X, params={}, thres_p=0.9)[source]¶ Predicts the most likely links in a graph given an embedding X of a graph. Returns a ranked list of (edges, distances) sorted from closest to furthest.
- Parameters
edgelist – A list with elements of type (p, q, wt)
X – A nxk embedding matrix
params – A dictionary with entries
- {
alpha => real number beta => real number delta => real number loc => String, can be cw for common weighted, l3 for l3 local scoring
### To enable ctypes, the following entries should be there ###
- ctypes_on => True # This key should only be added if ctypes is on (dont add this
# if ctypes is not added)
so_location => String location of the .so dynamic library
}
dscript.utils¶
-
dscript.utils.
RBF
(D, sigma=None)[source]¶ Convert distance matrix into similarity matrix using Radial Basis Function (RBF) Kernel.
\(RBF(x,x') = \exp{\frac{-(x - x')^{2}}{2\sigma^{2}}}\)
- Parameters
D (np.ndarray) – Distance matrix
sigma (float) – Bandwith of RBF Kernel [default: \(\sqrt{\text{max}(D)}\)]
- Returns
Similarity matrix
- Return type
np.ndarray
-
dscript.utils.
augment_data
(df)[source]¶ For all pairs (A B), also add pairs (B A) :param df: Data frame with 3 columns - pair1, pair2, label :type df: pd.DataFrame :return: Augmented data frame :rtype: pd.DataFrame
-
dscript.utils.
get_local_or_download
(destination: str, source: Optional[str] = None)[source]¶ Return file path destination, and if it does not exist download from source.
- Parameters
destination (str) – Destination path for downloaded file
source (str) – URL to download file from
- Returns
Path of local file
- Return type
str
-
dscript.utils.
load_hdf5_parallel
(file_path, keys, n_jobs=- 1)[source]¶ Load keys from hdf5 file into memory
- Parameters
file_path (str) – Path to hdf5 file
keys (list[str]) – List of keys to get
- Returns
Dictionary with keys and records in memory
- Return type
dict
-
dscript.utils.
plot_eval_predictions
(labels, predictions, path='figure')[source]¶ Plot histogram of positive and negative predictions, precision-recall curve, and receiver operating characteristic curve.
- Parameters
y (np.ndarray) – Labels
phat (np.ndarray) – Predicted probabilities
path (str) – File prefix for plots to be saved to [default: figure]