deepmol.models package
Submodules
deepmol.models.base_models module
deepmol.models.deepchem_model_builders module
deepmol.models.deepchem_models module
- class DeepChemModel(model: Model, model_dir: str | None = None, custom_objects: dict | None = None, **kwargs)[source]
Bases:
Model,PredictorWrapper class that wraps deepchem models. The DeepChemModel class provides a wrapper around deepchem models that allows deepchem models to be trained on Dataset objects and evaluated with the metrics in Metrics.
- cross_validate(dataset: Dataset, metric: Metric, splitter: Splitter | None = None, transformers: List[NormalizationTransformer] | None = None, folds: int = 3)[source]
Cross validates the model on the specified dataset.
- Parameters:
dataset (Dataset) – Dataset to cross validate on.
metric (Metric) – Metric to evaluate the model on.
splitter (Splitter) – Splitter to use for cross validation.
transformers (List[Transformer]) – Transformers that the input data has been transformed by.
folds (int) – Number of folds to use for cross validation.
- Returns:
The first element is the best model, the second is the train score of the best model, the third is the train score of the best model, the fourth is the test scores of all models, the fifth is the average train scores of all folds and the sixth is the average test score of all folds.
- Return type:
Tuple[DeepChemModel, float, float, List[float], List[float], float, float]
- evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False)[source]
Evaluates the performance of the model on the provided dataset.
- Parameters:
- Returns:
- multitask_scores: dict
Dictionary mapping names of metrics to metric scores.
- all_task_scores: dict
If per_task_metrics == True, then returns a second dictionary of scores for each task separately.
- Return type:
Tuple[Dict, Dict]
- fit(dataset: Dataset)[source]
Fits the model on a dataset.
- Parameters:
dataset (Dataset) – The Dataset to train this model on.
- fit_on_batch(X: Sequence, y: Sequence, w: Sequence)[source]
Fits the model on a batch of data.
- Parameters:
X (Sequence) – The input data.
y (Sequence) – The output data.
w (Sequence) – The weights for the data.
- get_num_tasks() int[source]
Returns the number of tasks of the model.
- Returns:
The number of tasks of the model.
- Return type:
int
- get_task_type() str[source]
Returns the task type of the model.
- Returns:
The task type of the model.
- Return type:
str
- classmethod load(folder_path: str, **kwargs)[source]
Loads deepchem model from disk.
- Parameters:
folder_path (str) – Path to the file where the model is stored.
kwargs (Dict) –
Additional parameters. custom_objects: Dict
Dictionary of custom objects to be passed to tensorflow.keras.utils.custom_object_scope.
- model: Model
- property model_type
Returns the type of the model.
- predict(dataset: Dataset, transformers: List[NormalizationTransformer] | None = None, return_invalid: bool = False) ndarray[source]
Makes predictions on dataset.
- Parameters:
dataset (Dataset) – Dataset to make prediction on.
transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
return_invalid (bool) – Return invalid entries with NaN
- Returns:
The value is a return value of predict method of the DeepChem model.
- Return type:
np.ndarray
- predict_on_batch(dataset: Dataset) ndarray[source]
Makes predictions on batch of data.
- Parameters:
dataset (Dataset) – Dataset to make prediction on.
- predict_proba(dataset: Dataset, transformers: List[NormalizationTransformer] | None = None, return_invalid: bool = False) ndarray[source]
Makes predictions on dataset.
- Parameters:
dataset (Dataset) – Dataset to make prediction on.
transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
return_invalid (bool) – Return invalid entries with NaN
- Returns:
The value is a return value of predict method of the DeepChem model.
- Return type:
np.ndarray
- generate_sequences(epochs: int, train_smiles: List[str | int])[source]
Function to generate the input/output pairs for SeqToSeq model. Taken from DeepChem tutorials.
- Parameters:
epochs (int) – Number of epochs to train the model.
train_smiles (List[str]) – The ids of the samples in the dataset (smiles)
- Return type:
yields a pair of smile strings for epochs x len(train_smiles)
deepmol.models.ensembles module
deepmol.models.keras_model_builders module
deepmol.models.keras_models module
deepmol.models.models module
- class Model(model: BaseEstimator | None = None, model_dir: str | None = None, **kwargs)[source]
Bases:
BaseEstimator,Predictor,ABCAbstract base class for ML/DL models.
- evaluate(dataset: Dataset, metrics: List[Metric] | Metric, per_task_metrics: bool = False) Tuple[Dict, None | Dict][source]
Evaluates the performance of this model on specified dataset.
- Parameters:
- Returns:
multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.
all_task_scores (dict, optional) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.
- fit_on_batch(dataset: Dataset) None[source]
Perform a single step of training.
- Parameters:
dataset (Dataset) – Dataset object.
- static get_model_filename(model_dir: str) str[source]
Given model directory, obtain filename for the model itself.
- Parameters:
model_dir (str) – Path to directory where model is stored.
- Returns:
Path to model file.
- Return type:
str
- static get_params_filename(model_dir: str) str[source]
Given model directory, obtain filename for the model itself.
- Parameters:
model_dir (str) – Path to directory where model is stored.
- Returns:
Path to file where model parameters are stored.
- Return type:
str
- classmethod load(folder_path: str) Model[source]
Reload trained model from disk.
- Parameters:
folder_path (str) – Path to folder where model is stored.
- Returns:
Model object.
- Return type:
- predict(dataset: Dataset, return_invalid: bool = False) ndarray[source]
Uses self to make predictions on provided Dataset object.
- Parameters:
dataset (Dataset) – Dataset to make prediction on
return_invalid (bool) – Return invalid entries with NaN
- Returns:
A numpy array of predictions.
- Return type:
np.ndarray
- predict_on_batch(dataset: Dataset) ndarray[source]
Makes predictions on given batch of new data.
- Parameters:
dataset (Dataset) – Dataset object.
- Returns:
Predicted values.
- Return type:
np.ndarray
- predict_proba(dataset: Dataset, return_invalid: bool = False) ndarray[source]
Uses self to make predictions on provided Dataset object.
- Parameters:
dataset (Dataset) – Dataset to make prediction on
return_invalid (bool) – Return invalid entries with NaN
- Returns:
A numpy array of predictions.
- Return type:
np.ndarray