deepmol.models package
Submodules
deepmol.models.base_models module
- create_dense_model(input_dim: int = 1024, n_hidden_layers: int = 1, layers_units: Optional[List[int]] = None, dropouts: Optional[List[float]] = None, activations: Optional[List[str]] = None, batch_normalization: Optional[List[bool]] = None, l1_l2: Optional[List[float]] = None, loss: str = 'binary_crossentropy', optimizer: str = 'adam', metrics: Optional[List[str]] = None)[source]
Builds a dense neural network model.
- Parameters
input_dim (int) – Number of features.
n_hidden_layers (int) – Number of hidden layers.
layers_units (List[int]) – Number of units in each hidden layer.
dropouts (List[float]) – Dropout rate in each hidden layer.
activations (List[str]) – Activation function in each hidden layer.
batch_normalization (List[bool]) – Whether to use batch normalization in each hidden layer.
l1_l2 (List[float]) – L1 and L2 regularization in each hidden layer.
loss (str) – Loss function.
optimizer (str) – Optimizer.
metrics (List[str]) – Metrics to be evaluated by the model during training and testing.
- Returns
model – Dense neural network model.
- Return type
Sequential
- make_cnn_model(input_dim: int = 1024, g_noise: float = 0.05, DENSE: int = 128, DROPOUT: float = 0.5, C1_K: int = 8, C1_S: int = 32, C2_K: int = 16, C2_S: int = 32, activation: str = 'relu', loss: str = 'binary_crossentropy', optimizer: str = 'adadelta', learning_rate: float = 0.01, metrics: Union[str, List[str]] = 'accuracy')[source]
Builds a 1D convolutional neural network model.
- Parameters
input_dim (int) – Number of features.
g_noise (float) – Gaussian noise.
DENSE (int) – Number of units in the dense layer.
DROPOUT (float) – Dropout rate.
C1_K (int) – The dimensionality of the output space (i.e. the number of output filters in the convolution) of the first convolutional layer.
C1_S (int) – Kernel size specifying the length of the 1D convolution window of the first convolutional layer.
C2_K (int) – The dimensionality of the output space (i.e. the number of output filters in the convolution) of the second convolutional layer.
C2_S (int) – Kernel size specifying the length of the 1D convolution window of the second convolutional layer.
activation (str) – Activation function of the Conv1D and Dense layers.
loss (str) – Loss function.
optimizer (str) – Optimizer.
learning_rate (float) – Learning rate.
metrics (Union[str, List[str]]) – Metrics to be evaluated by the model during training and testing.
- rf_model_builder(n_estimators: int = 100, max_features: Union[int, float, str] = 'auto', class_weight: Optional[dict] = None)[source]
Builds a random forest model.
- Parameters
n_estimators (int) – Number of trees in the forest.
max_features (Union[int, float, str]) – The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split. - If “auto”, then max_features=sqrt(n_features). - If “sqrt”, then max_features=sqrt(n_features). - If “log2”, then max_features=log2(n_features). - If None, then max_features=n_features.
class_weight (dict) – Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.
- Returns
rf_model – Random forest model.
- Return type
RandomForestClassifier
- svm_model_builder(C: float = 1.0, gamma: Union[str, float] = 'auto', kernel: Union[str, callable] = 'rfb')[source]
Builds a support vector machine model.
- Parameters
C (float) – Penalty parameter C of the error term.
gamma (Union[str, float]) –
- Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
if ‘scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma;
if ‘auto’, uses 1 / n_features.
kernel (str) – Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable.
- Returns
svm_model – Support vector machine model.
- Return type
SVC
deepmol.models.deepchem_models module
- class DeepChemModel(model: Model, model_dir: Optional[str] = None, **kwargs)[source]
Bases:
ModelWrapper class that wraps deepchem models. The DeepChemModel class provides a wrapper around deepchem models that allows deepchem models to be trained on Dataset objects and evaluated with the metrics in Metrics.
- cross_validate(dataset: Dataset, metric: Metric, splitter: Splitter, transformers: Optional[List[NormalizationTransformer]] = None, folds: int = 3)[source]
Cross validates the model on the specified dataset.
- Parameters
dataset (Dataset) – Dataset to cross validate on.
metric (Metric) – Metric to evaluate the model on.
splitter (Splitter) – Splitter to split the dataset into train and test sets.
transformers (List[Transformer]) – Transformers that the input data has been transformed by.
folds (int) – Number of folds to use for cross validation.
- Returns
The first element is the best model, the second is the train score of the best model, the third is the train score of the best model, the fourth is the test scores of all models, the fifth is the average train scores of all folds and the sixth is the average test score of all folds.
- Return type
Tuple[DeepChemModel, float, float, List[float], List[float], float, float]
- evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False)[source]
Evaluates the performance of the model on the provided dataset.
- Parameters
- Returns
- multitask_scores: dict
Dictionary mapping names of metrics to metric scores.
- all_task_scores: dict
If per_task_metrics == True, then returns a second dictionary of scores for each task separately.
- Return type
Tuple[Dict, Dict]
- fit(dataset: Dataset) None[source]
Fits DeepChemModel to data.
- Parameters
dataset (Dataset) – The Dataset to train this model on.
- fit_on_batch(X: Sequence, y: Sequence, w: Sequence)[source]
Fits the model on a batch of data.
- Parameters
X (Sequence) – The input data.
y (Sequence) – The output data.
w (Sequence) – The weights for the data.
- get_num_tasks() int[source]
Returns the number of tasks of the model.
- Returns
The number of tasks of the model.
- Return type
int
- get_task_type() str[source]
Returns the task type of the model.
- Returns
The task type of the model.
- Return type
str
- predict(dataset: Dataset, transformers: Optional[List[NormalizationTransformer]] = None) ndarray[source]
Makes predictions on dataset.
- Parameters
dataset (Dataset) – Dataset to make prediction on.
transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
- Returns
The value is a return value of predict method of the DeepChem model.
- Return type
np.ndarray
- generate_sequences(epochs: int, train_smiles: List[Union[str, int]])[source]
Function to generate the input/output pairs for SeqToSeq model. Taken from DeepChem tutorials.
- Parameters
epochs (int) – Number of epochs to train the model.
train_smiles (List[str]) – The ids of the samples in the dataset (smiles)
- Return type
yields a pair of smile strings for epochs x len(train_smiles)
deepmol.models.ensembles module
- class Ensemble(models: List[Model])[source]
Bases:
ABCAbstract class for ensembles of models.
- evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False, n_classes: int = 2)[source]
Evaluates the performance of this model on specified dataset.
- Parameters
- Returns
multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.
all_task_scores (dict, optional) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.
- class VotingClassifier(models: List[Model], voting: str = 'soft')[source]
Bases:
EnsembleVotingClassifier Ensemble. It uses a voting strategy to predict the labels of a dataset.
- predict(dataset: Dataset, proba: bool = False)[source]
Predicts the labels for the specified dataset.
- Parameters
dataset (Dataset) – Dataset object.
proba (bool) – If true, returns the probabilities instead of class labels.
- Returns
final_result – Predicted labels or probabilities.
- Return type
np.ndarray
deepmol.models.keras_models module
- class KerasModel(model_builder: callable, mode: str = 'classification', model_dir: Optional[str] = None, loss: str = 'binary_crossentropy', optimizer: str = 'adam', learning_rate: float = 0.001, epochs: int = 150, batch_size: int = 10, verbose: int = 0, **kwargs)[source]
Bases:
ModelWrapper class that wraps keras models. The KerasModel class provides a wrapper around keras models that allows this models to be trained on Dataset objects.
- cross_validate(dataset: Dataset, metric: Metric, folds: int = 3)[source]
Cross validates the model on a dataset.
- Parameters
- Returns
The first element is the best model, the second is the train score of the best model, the third is the train score of the best model, the fourth is the test scores of all models, the fifth is the average train scores of all folds and the sixth is the average test score of all folds.
- Return type
Tuple[SKlearnModel, float, float, List[float], List[float], float, float]
- fit(dataset: Dataset, **kwargs) None[source]
Fits keras model to data.
- Parameters
dataset (Dataset) – The Dataset to train this model on.
kwargs – Additional arguments to pass to fit method of the keras model.
- predict(dataset: Dataset) ndarray[source]
Makes predictions on dataset.
- Parameters
dataset (Dataset) – Dataset to make prediction on.
- Returns
The value is a return value of predict_proba or predict method of the scikit-learn model. If the scikit-learn model has both methods, the value is always a return value of predict_proba.
- Return type
np.ndarray
deepmol.models.models module
- class Model(model: Optional[BaseEstimator] = None, model_dir: Optional[str] = None, **kwargs)[source]
Bases:
BaseEstimatorAbstract base class for ML/DL models.
- evaluate(dataset: Dataset, metrics: Union[List[Metric], Metric], per_task_metrics: bool = False) Tuple[Dict, Union[None, Dict]][source]
Evaluates the performance of this model on specified dataset.
- Parameters
- Returns
multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.
all_task_scores (dict, optional) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.
- fit(dataset: Dataset)[source]
Fits a model on data in a Dataset object.
- Parameters
dataset (Dataset) – the Dataset to train on
- fit_on_batch(X: Sequence, y: Sequence)[source]
Perform a single step of training.
- Parameters
X (np.ndarray) – the inputs for the batch
y (np.ndarray) – the labels for the batch
- static get_model_filename(model_dir: str) str[source]
Given model directory, obtain filename for the model itself.
- Parameters
model_dir (str) – Path to directory where model is stored.
- Returns
Path to model file.
- Return type
str
- static get_params_filename(model_dir: str) str[source]
Given model directory, obtain filename for the model itself.
- Parameters
model_dir (str) – Path to directory where model is stored.
- Returns
Path to file where model parameters are stored.
- Return type
str
- predict(dataset: Dataset) ndarray[source]
Uses self to make predictions on provided Dataset object.
- Parameters
dataset (Dataset) – Dataset to make prediction on
- Returns
A numpy array of predictions.
- Return type
np.ndarray
deepmol.models.sklearn_models module
- class SklearnModel(model: BaseEstimator, mode: Optional[str] = None, model_dir: Optional[str] = None, **kwargs)[source]
Bases:
ModelWrapper class that wraps scikit-learn models. The SklearnModel class provides a wrapper around scikit-learn models that allows scikit-learn models to be trained on Dataset objects and evaluated with the metrics in Metrics.
- cross_validate(dataset: Dataset, metric: Metric, folds: int = 3)[source]
Performs cross-validation on a dataset.
- Parameters
- Returns
The first element is the best model, the second is the train score of the best model, the third is the train score of the best model, the fourth is the test scores of all models, the fifth is the average train scores of all folds and the sixth is the average test score of all folds.
- Return type
Tuple[SKlearnModel, float, float, List[float], List[float], float, float]
- fit(dataset: Dataset) None[source]
Fits scikit-learn model to data.
- Parameters
dataset (Dataset) – The Dataset to train this model on.
- Returns
The trained scikit-learn model.
- Return type
BaseEstimator
- predict(dataset: Dataset) ndarray[source]
Makes predictions on dataset.
- Parameters
dataset (Dataset) – Dataset to make prediction on.
- Returns
The value is a return value of predict_proba or predict method of the scikit-learn model. If the scikit-learn model has both methods, the value is always a return value of predict_proba.
- Return type
np.ndarray