deepmol.pipeline package
Submodules
deepmol.pipeline.ensemble module
- class VotingPipeline(pipelines: List[Pipeline], voting: Literal['hard', 'soft'] = 'hard', weights: List[float] | None = None)[source]
Bases:
object
Pipeline that combines the predictions of multiple pipelines using voting.
- evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False) Tuple[Dict, None | Dict] [source]
Evaluates the voting pipeline using the given metrics.
- Parameters:
- Returns:
Tuple containing the multitask scores and the scores for each task separately.
- Return type:
Tuple[Dict, Union[None, Dict]]
- fit(train_dataset: Dataset, validation_dataset: Dataset | None = None) VotingPipeline [source]
Fits the pipelines to the training dataset. A separate validation dataset can also be provided.
- is_fitted() bool [source]
Returns True if all pipelines are fitted, False otherwise.
- Returns:
True if all pipelines are fitted, False otherwise.
- Return type:
bool
- classmethod load(path: str) VotingPipeline [source]
Loads a voting pipeline from the specified path.
- Parameters:
path (str) – Path where the voting pipeline is saved.
- Returns:
Loaded voting pipeline.
- Return type:
- predict(dataset: Dataset, return_invalid: bool = False) ndarray [source]
Makes predictions for the given dataset using the voting pipeline.
- Parameters:
dataset (Dataset) – Dataset to be used for prediction.
return_invalid (bool) – Return invalid entries with NaN
- Returns:
Array of predictions.
- Return type:
np.ndarray
- predict_proba(dataset: Dataset, return_invalid: bool = False) ndarray [source]
Makes predictions for the given dataset using the voting pipeline.
- Parameters:
dataset (Dataset) – Dataset to be used for prediction.
return_invalid (bool) – Return invalid entries with NaN
- Returns:
Array of predictions.
- Return type:
np.ndarray
- save(path: str) VotingPipeline [source]
Saves the voting pipeline.
- Parameters:
path (str) – Path where the voting pipeline will be saved.
deepmol.pipeline.pipeline module
- class Pipeline(steps: List[Tuple[str, Transformer | Predictor]], path: str | None = None, hpo: HyperparameterOptimizer | None = None)[source]
Bases:
Transformer
Pipeline of transformers and predictors. The last step must be a predictor, all other steps must be transformers. It applies a list of transformers in a sequence followed (or not) by a predictor. The transformers must implement the fit() and transform() methods, the predictor must implement the fit() and predict() methods.
- evaluate(dataset: Dataset, metrics: List[Metric], per_task_metrics: bool = False) Tuple[Dict, None | Dict] [source]
Evaluate the pipeline on a dataset based on the provided metrics.
- Parameters:
- Returns:
multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.
all_task_scores (dict) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.
- fit(train_dataset: Dataset, validation_dataset: Dataset | None = None) Pipeline [source]
Fit the pipeline to the train data.
- is_fitted() bool [source]
Whether the pipeline is fitted.
- Returns:
is_fitted – Whether the pipeline is fitted.
- Return type:
bool
- is_prediction_pipeline() bool [source]
Whether the pipeline is a prediction pipeline.
- Returns:
is_prediction_pipeline – Whether the pipeline is a prediction pipeline.
- Return type:
bool
- classmethod load(path: str) Pipeline [source]
Load the pipeline from disk. The sequence of transformers is loaded from a config file. The transformers and predictor are loaded from separate files. Transformers are loaded from pickle files, while the predictor is loaded using its own load method.
- Parameters:
path (str) – Path to the directory where the pipeline is saved.
- Returns:
pipeline – Loaded pipeline.
- Return type:
- predict(dataset: Dataset, return_invalid: bool = False) ndarray [source]
Make predictions on a dataset using the pipeline predictor.
- Parameters:
dataset (Dataset) – Dataset to make predictions on.
return_invalid (bool) – Return invalid entries with NaN
- Returns:
y_pred – Predictions.
- Return type:
np.ndarray
- predict_proba(dataset: Dataset, return_invalid: bool = False) ndarray [source]
Make predictions on a dataset using the pipeline predictor.
- Parameters:
dataset (Dataset) – Dataset to make predictions on.
return_invalid (bool) – Return invalid entries with NaN
- Returns:
y_pred – Predictions.
- Return type:
np.ndarray