deepmol.models package

Submodules

deepmol.models.base_models module

deepmol.models.deepchem_model_builders module

deepmol.models.deepchem_models module

class DeepChemModel(model: Model, model_dir: str | None = None, custom_objects: dict | None = None, **kwargs)[source]

Bases: Model, Predictor

Wrapper class that wraps deepchem models. The DeepChemModel class provides a wrapper around deepchem models that allows deepchem models to be trained on Dataset objects and evaluated with the metrics in Metrics.

cross_validate(dataset: Dataset, metric: Metric, splitter: Splitter | None = None, transformers: List[NormalizationTransformer] | None = None, folds: int = 3)[source]

Cross validates the model on the specified dataset.

Parameters:
  • dataset (Dataset) – Dataset to cross validate on.

  • metric (Metric) – Metric to evaluate the model on.

  • splitter (Splitter) – Splitter to use for cross validation.

  • transformers (List[Transformer]) – Transformers that the input data has been transformed by.

  • folds (int) – Number of folds to use for cross validation.

Returns:

The first element is the best model, the second is the train score of the best model, the third is the train score of the best model, the fourth is the test scores of all models, the fifth is the average train scores of all folds and the sixth is the average test score of all folds.

Return type:

Tuple[DeepChemModel, float, float, List[float], List[float], float, float]

evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False)[source]

Evaluates the performance of the model on the provided dataset.

Parameters:
  • dataset (Dataset) – Dataset to evaluate the model on.

  • metrics (List[Metric]) – Metrics to evaluate the model on.

  • per_task_metrics (bool) – If true, return computed metric for each task on multitask dataset.

Returns:

multitask_scores: dict

Dictionary mapping names of metrics to metric scores.

all_task_scores: dict

If per_task_metrics == True, then returns a second dictionary of scores for each task separately.

Return type:

Tuple[Dict, Dict]

fit(dataset: Dataset)[source]

Fits the model on a dataset.

Parameters:

dataset (Dataset) – The Dataset to train this model on.

fit_on_batch(X: Sequence, y: Sequence, w: Sequence)[source]

Fits the model on a batch of data.

Parameters:
  • X (Sequence) – The input data.

  • y (Sequence) – The output data.

  • w (Sequence) – The weights for the data.

get_num_tasks() int[source]

Returns the number of tasks of the model.

Returns:

The number of tasks of the model.

Return type:

int

get_task_type() str[source]

Returns the task type of the model.

Returns:

The task type of the model.

Return type:

str

classmethod load(folder_path: str, **kwargs)[source]

Loads deepchem model from disk.

Parameters:
  • folder_path (str) – Path to the file where the model is stored.

  • kwargs (Dict) –

    Additional parameters. custom_objects: Dict

    Dictionary of custom objects to be passed to tensorflow.keras.utils.custom_object_scope.

model: Model
property model_type

Returns the type of the model.

predict(dataset: Dataset, transformers: List[NormalizationTransformer] | None = None, return_invalid: bool = False) ndarray[source]

Makes predictions on dataset.

Parameters:
  • dataset (Dataset) – Dataset to make prediction on.

  • transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

The value is a return value of predict method of the DeepChem model.

Return type:

np.ndarray

predict_on_batch(dataset: Dataset) ndarray[source]

Makes predictions on batch of data.

Parameters:

dataset (Dataset) – Dataset to make prediction on.

predict_proba(dataset: Dataset, transformers: List[NormalizationTransformer] | None = None, return_invalid: bool = False) ndarray[source]

Makes predictions on dataset.

Parameters:
  • dataset (Dataset) – Dataset to make prediction on.

  • transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

The value is a return value of predict method of the DeepChem model.

Return type:

np.ndarray

save(folder_path: str | None = None)[source]

Saves deepchem model to disk.

Parameters:

folder_path (str) – Path to the file where the model will be stored.

generate_sequences(epochs: int, train_smiles: List[str | int])[source]

Function to generate the input/output pairs for SeqToSeq model. Taken from DeepChem tutorials.

Parameters:
  • epochs (int) – Number of epochs to train the model.

  • train_smiles (List[str]) – The ids of the samples in the dataset (smiles)

Return type:

yields a pair of smile strings for epochs x len(train_smiles)

deepmol.models.ensembles module

deepmol.models.keras_model_builders module

deepmol.models.keras_models module

deepmol.models.models module

class Model(model: BaseEstimator | None = None, model_dir: str | None = None, **kwargs)[source]

Bases: BaseEstimator, Predictor, ABC

Abstract base class for ML/DL models.

evaluate(dataset: Dataset, metrics: List[Metric] | Metric, per_task_metrics: bool = False) Tuple[Dict, None | Dict][source]

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (Dataset) – Dataset object.

  • metrics (Union[List[Metric], Metric]) – The set of metrics provided.

  • per_task_metrics (bool) – If true, return computed metric for each task on multitask dataset.

  • kwargs – Additional keyword arguments to pass to Evaluator.compute_model_performance.

Returns:

  • multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.

  • all_task_scores (dict, optional) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.

fit_on_batch(dataset: Dataset) None[source]

Perform a single step of training.

Parameters:

dataset (Dataset) – Dataset object.

static get_model_filename(model_dir: str) str[source]

Given model directory, obtain filename for the model itself.

Parameters:

model_dir (str) – Path to directory where model is stored.

Returns:

Path to model file.

Return type:

str

get_num_tasks() int[source]

Get number of tasks.

static get_params_filename(model_dir: str) str[source]

Given model directory, obtain filename for the model itself.

Parameters:

model_dir (str) – Path to directory where model is stored.

Returns:

Path to file where model parameters are stored.

Return type:

str

get_task_type() str[source]

Currently models can only be classifiers or regressors.

classmethod load(folder_path: str) Model[source]

Reload trained model from disk.

Parameters:

folder_path (str) – Path to folder where model is stored.

Returns:

Model object.

Return type:

Model

predict(dataset: Dataset, return_invalid: bool = False) ndarray[source]

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (Dataset) – Dataset to make prediction on

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

A numpy array of predictions.

Return type:

np.ndarray

predict_on_batch(dataset: Dataset) ndarray[source]

Makes predictions on given batch of new data.

Parameters:

dataset (Dataset) – Dataset object.

Returns:

Predicted values.

Return type:

np.ndarray

predict_proba(dataset: Dataset, return_invalid: bool = False) ndarray[source]

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (Dataset) – Dataset to make prediction on

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

A numpy array of predictions.

Return type:

np.ndarray

save(file_path: str | None = None) None[source]

Function for saving models. Each subclass is responsible for overriding this method.

Parameters:

file_path (str) – Path to file where model should be saved.

deepmol.models.sklearn_model_builders module

deepmol.models.sklearn_models module

Module contents