deepmol.feature_selection package

Submodules

deepmol.feature_selection.base_feature_selector module

class BaseFeatureSelector[source]

Bases: ABC

Abstract class for feature selection. A BaseFeatureSelector uses features present in a Dataset object to select the most important ones. FeatureSelectors which are subclasses of this class should always operate over Dataset Objects.

Subclasses need to implement the _select_features method for performing feature selection.

select_features(dataset: Dataset)[source]

Perform feature selection for the molecules present in the dataset.

Parameters

dataset (Dataset) – Dataset to perform feature selection on

Returns

dataset – Dataset containing the selected features and indexes of the features kept as ‘self.features2keep’.

Return type

Dataset

class BorutaAlgorithm(estimator: Optional[callable] = None, task: str = 'classification', support_weak: bool = False, n_estimators: Union[int, str] = 1000, perc: int = 100, alpha: float = 0.05, two_step: bool = True, max_iter: int = 100, random_state: Optional[int] = None, verbose: int = 0)[source]

Bases: BaseFeatureSelector

Class for Boruta feature selection.

Boruta is an all-relevant feature selection method. It is based on the idea that all features are relevant until proven irrelevant. The algorithm is an iterative procedure that consists of two phases: the first phase randomly permutes the feature values and evaluates the performance of the classifier. The second phase eliminates the features that are less important than their shadow features. The shadow features are copies of the original features that are randomly permuted. The algorithm stops when all features are either declared important or declared irrelevant.

class KbestFS(k: int = 10, score_func: callable = <function chi2>)[source]

Bases: BaseFeatureSelector

Class for K best feature selection.

Select features according to the k-highest scores.

class LowVarianceFS(threshold: float = 0.3)[source]

Bases: BaseFeatureSelector

Class for Low Variance feature selection. Feature selector that removes all features with low-variance.

class PercentilFS(percentil: int = 10, score_func: callable = <function chi2>)[source]

Bases: BaseFeatureSelector

Class for percentil feature selection.

Select features according to a percentile of the highest scores.

class RFECVFS(estimator: Optional[callable] = None, step: Union[int, float] = 1, min_features_to_select: int = 1, cv: Optional[Union[int, callable, Iterable]] = None, scoring: Optional[Union[str, callable]] = None, verbose: int = 0, n_jobs: int = -1)[source]

Bases: BaseFeatureSelector

Class for RFECV feature selection.

Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.

class SelectFromModelFS(estimator: Optional[callable] = None, threshold: Optional[Union[str, float]] = None, prefit: bool = False, norm_order: int = 1, max_features: Optional[int] = None)[source]

Bases: BaseFeatureSelector

Class for Select From Model feature selection.

Meta-transformer for selecting features based on importance weights.

Module contents