deepmol.pipeline package

Submodules

deepmol.pipeline.ensemble module

class VotingPipeline(pipelines: List[Pipeline], voting: Literal['hard', 'soft'] = 'hard', weights: List[float] | None = None)[source]

Bases: object

Pipeline that combines the predictions of multiple pipelines using voting.

evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False) Tuple[Dict, None | Dict][source]

Evaluates the voting pipeline using the given metrics.

Parameters:
  • dataset (Dataset) – Dataset to be used for evaluation.

  • metrics (List[Metric]) – List of metrics to be used.

  • per_task_metrics (bool) – If true, returns the metrics for each task separately.

Returns:

Tuple containing the multitask scores and the scores for each task separately.

Return type:

Tuple[Dict, Union[None, Dict]]

fit(train_dataset: Dataset, validation_dataset: Dataset | None = None) VotingPipeline[source]

Fits the pipelines to the training dataset. A separate validation dataset can also be provided.

Parameters:
  • train_dataset (Dataset) – Dataset to be used for training.

  • validation_dataset (Dataset) – Dataset to be used for validation.

is_fitted() bool[source]

Returns True if all pipelines are fitted, False otherwise.

Returns:

True if all pipelines are fitted, False otherwise.

Return type:

bool

classmethod load(path: str) VotingPipeline[source]

Loads a voting pipeline from the specified path.

Parameters:

path (str) – Path where the voting pipeline is saved.

Returns:

Loaded voting pipeline.

Return type:

VotingPipeline

predict(dataset: Dataset, return_invalid: bool = False) ndarray[source]

Makes predictions for the given dataset using the voting pipeline.

Parameters:
  • dataset (Dataset) – Dataset to be used for prediction.

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

Array of predictions.

Return type:

np.ndarray

predict_proba(dataset: Dataset, return_invalid: bool = False) ndarray[source]

Makes predictions for the given dataset using the voting pipeline.

Parameters:
  • dataset (Dataset) – Dataset to be used for prediction.

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

Array of predictions.

Return type:

np.ndarray

save(path: str) VotingPipeline[source]

Saves the voting pipeline.

Parameters:

path (str) – Path where the voting pipeline will be saved.

deepmol.pipeline.pipeline module

class Pipeline(steps: List[Tuple[str, Transformer | Predictor]], path: str | None = None, hpo: HyperparameterOptimizer | None = None)[source]

Bases: Transformer

Pipeline of transformers and predictors. The last step must be a predictor, all other steps must be transformers. It applies a list of transformers in a sequence followed (or not) by a predictor. The transformers must implement the fit() and transform() methods, the predictor must implement the fit() and predict() methods.

evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False) Tuple[Dict, None | Dict][source]

Evaluate the pipeline on a dataset based on the provided metrics.

Parameters:
  • dataset (Dataset) – Dataset to evaluate on.

  • metrics (Union[List[Metric]]) – List of metrics to evaluate on.

  • per_task_metrics (bool) – Whether to return per-task metrics.

Returns:

  • multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.

  • all_task_scores (dict) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.

fit(train_dataset: Dataset, validation_dataset: Dataset | None = None) Pipeline[source]

Fit the pipeline to the train data.

Parameters:
  • train_dataset (Dataset) – Dataset to fit the pipeline to.

  • validation_dataset (Dataset) – Dataset to validate the pipeline on if hpo is not None.

Returns:

self – Fitted pipeline.

Return type:

Pipeline

is_fitted() bool[source]

Whether the pipeline is fitted.

Returns:

is_fitted – Whether the pipeline is fitted.

Return type:

bool

is_prediction_pipeline() bool[source]

Whether the pipeline is a prediction pipeline.

Returns:

is_prediction_pipeline – Whether the pipeline is a prediction pipeline.

Return type:

bool

classmethod load(path: str) Pipeline[source]

Load the pipeline from disk. The sequence of transformers is loaded from a config file. The transformers and predictor are loaded from separate files. Transformers are loaded from pickle files, while the predictor is loaded using its own load method.

Parameters:

path (str) – Path to the directory where the pipeline is saved.

Returns:

pipeline – Loaded pipeline.

Return type:

Pipeline

predict(dataset: Dataset, return_invalid: bool = False) ndarray[source]

Make predictions on a dataset using the pipeline predictor.

Parameters:
  • dataset (Dataset) – Dataset to make predictions on.

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

y_pred – Predictions.

Return type:

np.ndarray

predict_proba(dataset: Dataset, return_invalid: bool = False) ndarray[source]

Make predictions on a dataset using the pipeline predictor.

Parameters:
  • dataset (Dataset) – Dataset to make predictions on.

  • return_invalid (bool) – Return invalid entries with NaN

Returns:

y_pred – Predictions.

Return type:

np.ndarray

save()[source]

Save the pipeline to disk (transformers and predictor). The sequence of transformers is saved in a config file. The transformers and predictor are saved in separate files. Transformers are saved as pickle files, while the predictor is saved using its own save method.

Module contents