deepmol.pipeline_optimization package

Submodules

deepmol.pipeline_optimization.objective_wrapper module

class Objective(objective_steps, study, direction, save_top_n)[source]

Bases: object

Wrapper for the objective function of the pipeline optimization. It creates, saves and evaluates pipelines for each trial.

class ObjectiveTrainEval(objective_steps, study, direction, save_top_n, trial_timeout=86400, **kwargs)[source]

Bases: Objective

Wrapper for the objective function of the pipeline optimization. It creates and saves pipelines for each trial and evaluates them on the test dataset.

Parameters:
  • objective_steps (callable) – Function that returns the steps of the pipeline for a given trial.

  • study (optuna.study.Study) – Study object that stores the optimization history.

  • direction (str or optuna.study.StudyDirection) – Direction of the optimization (minimize or maximize).

  • train_dataset (deepmol.datasets.Dataset) – Dataset used for training the pipeline.

  • test_dataset (deepmol.datasets.Dataset) – Dataset used for evaluating the pipeline.

  • metric (deepmol.metrics.Metric) – Metric used for evaluating the pipeline.

  • save_top_n (int) – Number of best pipelines to save.

  • **kwargs – Additional keyword arguments passed to the objective_steps function.

deepmol.pipeline_optimization.pipeline_optimization module

class PipelineOptimization(storage: str | BaseStorage | None = None, sampler: BaseSampler | None = None, pruner: BasePruner | None = None, study_name: str | None = None, direction: str | StudyDirection | None = None, load_if_exists: bool = False, directions: List[str | StudyDirection] | None = None, n_pipelines_ensemble=5, n_jobs=5)[source]

Bases: object

Class for optimizing a pipeline with Optuna. It can optimize all steps of a pipeline and respective hyperparameters.

Parameters:
  • storage (str or optuna.storages.BaseStorage) – Database storage URL such as sqlite:///example.db. If None, in-memory storage is used.

  • sampler (optuna.samplers.BaseSampler) – A sampler object that implements background algorithm for value suggestion. If None, optuna.samplers.TPESampler is used as the default.

  • pruner (optuna.pruners.BasePruner) – A pruner object that decides early stopping of unpromising trials. If None, optuna.pruners.MedianPruner is used as the default.

  • study_name (str) – Study’s name. If this argument is set to None, a unique name is generated automatically.

  • direction (str or optuna.study.StudyDirection) – Direction of the optimization (minimize or maximize).

  • load_if_exists (bool) – Flag to control the behavior to handle a conflict of study names. If set to True, the study will be loaded instead of raising an exception.

  • directions (list of str or optuna.study.StudyDirection) – Direction of the optimization for each step of the pipeline. If None, the direction argument is used for all steps.

  • n_pipelines_ensemble (int) – Number of pipelines to be used in the ensemble.

  • n_jobs (int) – Number of parallel jobs.

best_params

Dictionary with the best hyperparameters.

Type:

dict

best_trial

Best trial.

Type:

optuna.trial.FrozenTrial

best_value

Best value.

Type:

float

trials

List of all trials.

Type:

list of optuna.trial.FrozenTrial

best_pipeline

Best pipeline.

Type:

deepmol.pipeline.Pipeline

pipelines_ensemble

Pipelines ensemble.

Type:

deepmol.pipeline.ensemble.VotingPipeline

Examples

>>> from deepmol.loaders import CSVLoader
>>> from deepmol.pipeline_optimization import PipelineOptimization
>>> from deepmol.metrics import Metric
>>> from deepmol.splitters import RandomSplitter
>>> from sklearn.metrics import accuracy_score
>>> dataset_path = "dataset.csv"
>>> loader = CSVLoader(dataset_path=dataset_path,
>>>                    smiles_field='Smiles',
>>>                    labels_fields=['Class'])
>>> dataset_smiles = loader.create_dataset(sep=";")
>>> po = PipelineOptimization(direction='maximize', study_name='test_pipeline')
>>> metric = Metric(accuracy_score)
>>> train, test = RandomSplitter().train_test_split(dataset_smiles, seed=123)
>>> po.optimize(train_dataset=train, test_dataset=test, objective_steps='classification_objective', metric=metric,
>>>             n_trials=3, data=train, save_top_n=1)
property best_params

Returns the best hyperparameters.

property best_pipeline

Returns the best pipeline.

property best_trial

Returns the best trial.

property best_value

Returns the best value (score of the best trial).

get_param_importances()[source]

Returns the parameter importances.

get_pipelines_ensemble()[source]

Returns the best pipelines ensemble.

optimize(objective_steps: callable | str, n_trials: int, save_top_n: int = 1, objective: ~deepmol.pipeline_optimization.objective_wrapper.Objective = <class 'deepmol.pipeline_optimization.objective_wrapper.ObjectiveTrainEval'>, trial_timeout: int = 86400, **kwargs) None[source]

Optimize the pipeline.

Parameters:
  • objective_steps (callable or str) – Objective function. If a string is passed, a preset objective function is used.

  • n_trials (int) – Number of trials.

  • save_top_n (int) – Number of best pipelines to save.

  • objective (deepmol.pipeline_optimization.objective_wrapper.Objective) – Objective class.

  • trial_timeout (int) – Timeout for each trial in seconds.

  • **kwargs – Additional arguments to be passed to the objective function.

property pipelines_ensemble

Returns the pipelines ensemble.

property trials

Returns all trials.

trials_dataframe(cols: List[str] | None = None) DataFrame[source]

Returns the trials dataframe.

Parameters:

cols (list of str) – Columns to be returned.

Returns:

Trials dataframe.

Return type:

pd.DataFrame

Module contents