deepmol.compound_featurization package
Subpackages
- deepmol.compound_featurization.nc_mfp package
- Submodules
- deepmol.compound_featurization.nc_mfp.Fingerprint_representation_step module
- deepmol.compound_featurization.nc_mfp.Fragment_identifying_step module
- deepmol.compound_featurization.nc_mfp.Fragment_list_generation_step module
- deepmol.compound_featurization.nc_mfp.Preprocessing_step module
- deepmol.compound_featurization.nc_mfp.SFCP_assigning_step module
- deepmol.compound_featurization.nc_mfp.Scaffold_matching_step module
ScaffoldMatching
ScaffoldMatching.get_Scaffold_Lv1_Dictionary()
ScaffoldMatching.get_Scaffold_Lv2_Dictionary()
ScaffoldMatching.get_Scaffold_all_Dictionary()
ScaffoldMatching.get_Scaffolds_Lv1_Smarts()
ScaffoldMatching.get_Scaffolds_Lv1_classes()
ScaffoldMatching.get_Scaffolds_Lv2_Smarts()
ScaffoldMatching.get_Scaffolds_Lv2_classes()
ScaffoldMatching.get_Scaffolds_all_Smarts()
ScaffoldMatching.get_Scaffolds_all_classes()
ScaffoldMatching.match_All_Scaffold_Mol()
ScaffoldMatching.match_Scaffold_Lv1_Mol()
ScaffoldMatching.match_Scaffold_Lv1_Smarts()
ScaffoldMatching.match_Scaffold_Lv2_Mol()
ScaffoldMatching.match_Scaffold_Lv2_Smarts()
ScaffoldMatching.match_all_scaffold_smarts()
- deepmol.compound_featurization.nc_mfp.generate_database module
- deepmol.compound_featurization.nc_mfp.identify_smarts_fragmets module
- Module contents
- deepmol.compound_featurization.neural_npfp package
Submodules
deepmol.compound_featurization.base_featurizer module
- class MolecularFeaturizer(n_jobs: int = -1)[source]
Bases:
ABC
,Transformer
Abstract class for calculating a set of features for a molecule. A MolecularFeaturizer uses SMILES strings or RDKit molecule objects to represent molecules.
Subclasses need to implement the _featurize method for calculating features for a single molecule.
- featurize(other_object, inplace=False, **kwargs)
Method that modifies an input object inplace or on a copy.
- Parameters:
self (object) – The class instance object.
other_object (object) – The object to apply the method to.
inplace (bool) – Whether to apply the method in place.
kwargs (dict) – Keyword arguments to pass to the method.
- Returns:
new_object – The new object.
- Return type:
object
deepmol.compound_featurization.deepchem_featurizers module
- class ConvMolFeat(master_atom: bool = False, use_chirality: bool = False, atom_properties: bool = True, per_atom_fragmentation: bool = False, **kwargs)[source]
Bases:
MolecularFeaturizer
Duvenaud graph convolution, adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#convmolfeaturizer). Vector of descriptors for each atom in a molecule. The featurizers computes that vector of local descriptors.
References: Duvenaud, David K., et al. “Convolutional networks on graphs for learning molecular fingerprints.” Advances in neural information processing systems. 2015.
- class CoulombEigFeat(max_atoms: int, remove_hydrogens: bool = False, randomize: bool = False, n_samples: int = 1, max_conformers: int = 1, seed: int | None = None, generate_conformers=True, **kwargs)[source]
Bases:
MolecularFeaturizer
Calculate the eigen values of Coulomb matrices for molecules. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#coulombmatrixeig).
References: Montavon, Grégoire, et al. “Learning invariant representations of molecules for atomization energy prediction.” Advances in neural information processing systems. 2012.
- class CoulombFeat(max_atoms: int, remove_hydrogens: bool = False, randomize: bool = False, upper_tri: bool = False, n_samples: int = 1, max_conformers: int = 1, seed: int | None = None, generate_conformers: bool = True, **kwargs)[source]
Bases:
MolecularFeaturizer
Calculate coulomb matrices for molecules. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#coulombmatrix).
References: Montavon, Grégoire, et al. “Learning invariant representations of molecules for atomization energy prediction.” Advances in neural information processing systems. 2012.
- class DMPNNFeat(features_generators: List[str] | None = None, is_adding_hs: bool = False, use_original_atom_ranks: bool = False, **kwargs)[source]
Bases:
MolecularFeaturizer
Featurizes molecules using DeepChem DMPNNFeaturizer.
This class is a featurizer for Directed Message Passing Neural Network (D-MPNN) implementation
The default node(atom) and edge(bond) representations are based on Analyzing Learned Molecular Representations for Property Prediction paper.
Reference: https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#dmpnnfeaturizer
- class DagTransformer(max_atoms: int = 50)[source]
Bases:
Transformer
Performs transform from ConvMol adjacency lists to DAG calculation orders
This transformer is used by DAGModel before training to transform its inputs to the correct shape. This expansion turns a molecule with n atoms into n DAGs, each with root at a different atom in the molecule.
Reference: https://deepchem.readthedocs.io/en/latest/api_reference/transformers.html#dagtransformer
- class MATFeat(**kwargs)[source]
Bases:
MolecularFeaturizer
Featurizes molecules using DeepChem MATFeaturizer.
This class is a featurizer for Molecular Attribute Transformer (MAT) implementation The returned value is a numpy array which consists of molecular graph descriptions:
Node Features
Adjacency Matrix
Distance Matrix
Reference: [1] https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#matfeaturizer [2] Lukasz Maziarka et al. “Molecule Attention Transformer`<https://arxiv.org/abs/2002.08264>`”
- class MolGanFeat(max_atom_count: int = 9, kekulize: bool = True, bond_labels: List[Any] | None = None, atom_labels: List[int] | None = None, **kwargs)[source]
Bases:
MolecularFeaturizer
Featurizer for MolGAN de-novo molecular generation model, adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html?highlight=CGCNN#molganfeaturizer). It is wrapper for two matrices containing atom and bond type information.
References: Nicola De Cao et al. “MolGAN: An implicit generative model for small molecular graphs” (2018), https://arxiv.org/abs/1805.11973
- class MolGraphConvFeat(use_edges: bool = False, use_chirality: bool = False, use_partial_charge: bool = False, **kwargs)[source]
Bases:
MolecularFeaturizer
Featurizer of general graph convolution networks for molecules. Adapted from deepchem: (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#molgraphconvfeaturizer)
References: Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond fingerprints.” Journal of computer-aided molecular design 30.8 (2016):595-608.
- class PagtnMolGraphFeat(max_length: int = 5, **kwargs)[source]
Bases:
MolecularFeaturizer
This class is a featurizer of PAGTN graph networks for molecules.
The featurization is based on PAGTN model. It is slightly more computationally intensive than default Graph Convolution Featurizer, but it builds a Molecular Graph connecting all atom pairs accounting for interactions of an atom with every other atom in the Molecule. According to the paper, interactions between two pairs of atom are dependent on the relative distance between them and hence, the function needs to calculate the shortest path between them.
References
[1] Chen, Barzilay, Jaakkola “Path-Augmented Graph Transformer Network” 10.26434/chemrxiv.8214422.
- class RawFeat(n_jobs: int = -1)[source]
Bases:
MolecularFeaturizer
- class SmileImageFeat(img_size: int = 80, res: float = 0.5, max_len: int = 250, img_spec: str = 'std', **kwargs)[source]
Bases:
MolecularFeaturizer
Converts SMILE string to image. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#smilestoimage).
References: Goh, Garrett B., et al. “Using rule-based labels for weak supervised learning: a ChemNet for transferable chemical property prediction.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.
- class SmilesSeqFeat(char_to_idx: Dict[str, int] | None = None, max_len: int = 250, pad_len: int = 10)[source]
Bases:
Transformer
Takes SMILES strings and turns into a sequence. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#smilestoseq).
References: Goh, Garrett B., et al. “Using rule-based labels for weak supervised learning: a ChemNet for transferable chemical property prediction.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.
- featurize(other_object, inplace=False, **kwargs)
Method that modifies an input object inplace or on a copy.
- Parameters:
self (object) – The class instance object.
other_object (object) – The object to apply the method to.
inplace (bool) – Whether to apply the method in place.
kwargs (dict) – Keyword arguments to pass to the method.
- Returns:
new_object – The new object.
- Return type:
object
- class WeaveFeat(graph_distance: bool = True, explicit_h: bool = False, use_chirality: bool = False, max_pair_distance: int | None = None, **kwargs)[source]
Bases:
MolecularFeaturizer
Weave convolution featurization, adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#weavefeaturizer). Require a quadratic matrix of interaction descriptors for each pair of atoms.
References: Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond fingerprints.” Journal of computer-aided molecular design 30.8 (2016): 595-608.
deepmol.compound_featurization.mhfp module
- class MHFP(**kwargs)[source]
Bases:
MolecularFeaturizer
MHFP featurizer class. This module contains the MHFP encoder, which is used to encode SMILES and RDKit molecule instances as MHFP fingerprints.
deepmol.compound_featurization.mixed_descriptors module
- class MixedFeaturizer(featurizers: Iterable[MolecularFeaturizer], **kwargs)[source]
Bases:
MolecularFeaturizer
Class to perform multiple types of featurizers. Features from different featurizers are concatenated.
deepmol.compound_featurization.mol2vec module
- class Mol2Vec(pretrain_model_path: str | None = None, radius: int = 1, unseen: str = 'UNK', gather_method: str = 'sum', **kwargs)[source]
Bases:
MolecularFeaturizer
Mol2Vec fingerprint implementation from https://doi.org/10.1021/acs.jcim.7b00616
Inspired by natural language processing techniques, Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties.
- sentences2vec(sentences: Iterable, model: Word2Vec, unseen: str | None = None)[source]
Generate vectors for each sentence (list) in a list of sentences. Vector is simply a sum of vectors for individual words.
- Parameters:
sentences (Iterable) – List with sentences
model (Word2Vec) – Gensim Word2Vec model
unseen (None, str) – Keyword for unseen words. If None, those words are skipped. https://stats.stackexchange.com/questions/163005/how-to-set-the-dictionary-for-text-analysis-using-neural-networks/163032#163032
- Returns:
Array of vectors for each sentence.
- Return type:
np.array
deepmol.compound_featurization.nc_mfp_generator module
- class NcMfp(database_file_path='/home/docs/checkouts/readthedocs.org/user_builds/deepmol/checkouts/latest/src/deepmol/compound_featurization/nc_mfp/databases/ncdb', **kwargs)[source]
Bases:
MolecularFeaturizer
- static generate_new_fingerprint_database(data: SmilesDataset, output_folder: str)[source]
Generate a new fingerprint database from the given dataset.
- Parameters:
data (SmilesDataset) – A dataset of SMILES strings.
output_folder (str) – The output folder to save the database.
- Return type:
None
deepmol.compound_featurization.neural_npfp_generator module
- class NeuralNPFP(model_name='aux', device='cpu', **kwargs)[source]
Bases:
MolecularFeaturizer
deepmol.compound_featurization.np_classifier_fp module
- class NPClassifierFP(radius: int = 2, **kwargs)[source]
Bases:
MolecularFeaturizer
deepmol.compound_featurization.one_hot_encoder module
- class SmilesOneHotEncoder(tokenizer: Tokenizer | None = None, max_length: int | None = None, n_jobs: int = -1)[source]
Bases:
Transformer
A class for one-hot encoding SMILES. The SmilesOneHotEncoder tokenizes SMILES strings and one-hot encodes them.
- Parameters:
tokenizer (Tokenizer) – The tokenizer to use to tokenize SMILES strings.
max_length (int) – The maximum length of the SMILES strings.
n_jobs (int) – The number of jobs to use for tokenization.
Examples
>>> from deepmol.compound_featurization import SmilesOneHotEncoder >>> from deepmol.loaders import CSVLoader
>>> data = loader = CSVLoader('data_path.csv', smiles_field='Smiles', labels_fields=['Class']) >>> dataset = loader.create_dataset(sep=";") >>> ohe = SmilesOneHotEncoder().fit_transform(dataset)
- featurize(other_object, inplace=False, **kwargs)
Method that modifies an input object inplace or on a copy.
- Parameters:
self (object) – The class instance object.
other_object (object) – The object to apply the method to.
inplace (bool) – Whether to apply the method in place.
kwargs (dict) – Keyword arguments to pass to the method.
- Returns:
new_object – The new object.
- Return type:
object
- inverse_transform(matrix: ndarray) List[str] [source]
Inverse transforms a dataset.
- Parameters:
matrix (np.ndarray) – The one-hot encoded matrix.
- Returns:
smiles – The SMILES strings.
- Return type:
List[str]
- property shape: tuple
Returns the shape of the one-hot encoded matrix.
- Returns:
shape – The shape of the one-hot encoded matrix.
- Return type:
tuple
deepmol.compound_featurization.rdkit_descriptors module
- class All3DDescriptors(mandatory_generation_of_conformers=True)[source]
Bases:
MolecularFeaturizer
Class to generate all three-dimensional descriptors.
- class Asphericity(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Calculate molecular Asphericity A. Baumgaertner, “Shapes of flexible vesicles” J. Chem. Phys. 98:7496 (1993) https://doi.org/10.1063/1.464689
- class AutoCorr3D(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
AutoCorr3D. Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37
- class Eccentricity(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Calculate molecular eccentricity G. A. Arteca “Molecular Shape Descriptors” Reviews in Computational Chemistry vol 9 https://doi.org/10.1002/9780470125861.ch5
- class InertialShapeFactor(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Calculate Inertial Shape Factor Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37
- class MORSE(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Molecule Representation of Structures based on Electron diffraction descriptors Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37
- class NormalizedPrincipalMomentsRatios(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Normalized principal moments ratios. Sauer and Schwarz JCIM 43:987-1003 (2003)
- class PlaneOfBestFit(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Plane of best fit Nicholas C. Firth, Nathan Brown, and Julian Blagg, JCIM 52:2516-25
- class PrincipalMomentsOfInertia(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Calculate Principal Moments of Inertia
- class RadialDistributionFunction(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Radial distribution function Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37
- class RadiusOfGyration(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Calculate Radius of Gyration G. A. Arteca “Molecular Shape Descriptors” Reviews in Computational Chemistry vol 9 https://doi.org/10.1002/9780470125861.ch5
- class SpherocityIndex(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
Calculate molecular Spherocity Index Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37
- class ThreeDimensionDescriptor(mandatory_generation_of_conformers: bool, **kwargs)[source]
Bases:
MolecularFeaturizer
,ABC
Class to generate three-dimensional descriptors.
- property descriptor_function
Get the descriptor function.
- class ThreeDimensionalMoleculeGenerator(n_conformations: int = 5, max_iterations: int = 5, threads: int = 1, timeout_per_molecule: int = 40)[source]
Bases:
object
Class to generate three-dimensional conformers and optimize them.
- static check_if_mol_has_explicit_hydrogens(new_mol: Mol)[source]
Method to check if a molecule has explicit hydrogens.
- Parameters:
new_mol (Mol) – Mol object from rdkit.
- Returns:
True if molecule has explicit hydrogens and False if not.
- Return type:
bool
- generate(dataset: Dataset, etkdg_version: int = 3, mode: str = 'MMFF94')[source]
Method to generate three-dimensional conformers for a dataset
- Parameters:
dataset (dataset) – Dataset
etkdg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm
mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).
- Returns:
mol – Mol object with optimized molecular geometry.
- Return type:
Mol
- generate_conformers(new_mol: Mol, etkdg_version: int = 3, **kwargs)[source]
method to generate three-dimensional conformers
- Parameters:
new_mol (Mol) – Mol object from rdkit
etkdg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm
kwargs (dict) – Parameters for the ETKDG algorithm.
- Returns:
new_mol – Mol object with three-dimensional conformers.
- Return type:
Mol
- generate_structure(mol: Mol, etkdg_version: int = 3, mode: str = 'MMFF94')[source]
Method to generate three-dimensional conformers
- Parameters:
new_mol (Mol) – Mol object from rdkit.
etkdg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm
mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).
- Returns:
mol – Mol object with optimized molecular geometry.
- Return type:
Mol
- optimize_molecular_geometry(mol: Mol, mode: str = 'MMFF94')[source]
Class to generate three-dimensional conformers
- Parameters:
mol (Mol) – Mol object from rdkit.
mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).
- Returns:
mol – Mol object with optimized molecular geometry.
- Return type:
Mol
- class TwoDimensionDescriptors(**kwargs)[source]
Bases:
MolecularFeaturizer
Class to generate two-dimensional descriptors. It generates all descriptors from the RDKit library.
- class WHIM(mandatory_generation_of_conformers=False)[source]
Bases:
ThreeDimensionDescriptor
WHIM descriptors vector Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37
- check_atoms_coordinates(mol)[source]
Function to check if a molecule contains zero coordinates in all atoms. Then this molecule must be eliminated.
Example
# Load test set to a frame sdf = ‘miniset.sdf’ df = pt.LoadSDF(sdf, molColName=’mol3DProt’) ## Checking if molecule contains only ZERO coordinates, ## then remove that molecules from dataset df[‘check_coordinates’] = [checkAtomsCoordinates(x) for x in df.mol3DProt] df_eliminated_mols = dfl[df.check_coordinates == False] df = df[df.check_coordinates == True] df.drop(columns=[‘check_coordinates’], inplace=True) print(‘final minitest set:’, df.shape[0]) print(‘minitest eliminated:’, df_eliminated_mols.shape[0])
- Parameters:
mol (Mol) – Molecule to check coordinates.
- Returns:
True if molecule is OK and False if molecule contains zero coordinates.
- Return type:
bool
- generate_conformers(generator: ThreeDimensionalMoleculeGenerator, new_mol: Mol | str, etkg_version: int = 1, optimization_mode: str = 'MMFF94')[source]
Method to generate three-dimensional conformers and optimize them.
- Parameters:
generator (ThreeDimensionalMoleculeGenerator) – Class to generate three-dimensional conformers and optimize them.
new_mol (Union[Mol, str]) – Mol object from rdkit or SMILES string to generate conformers and optimize them.
etkg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm.
optimization_mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).
- Returns:
new_mol – Mol object with three-dimensional conformers and optimized molecular geometry.
- Return type:
Mol
- generate_conformers_to_sdf_file(dataset: Dataset, file_path: str, n_conformations: int = 20, max_iterations: int = 5, threads: int = 1, timeout_per_molecule: int = 12, etkg_version: int = 1, optimization_mode: str = 'MMFF94')[source]
Generate conformers using the experimental-torsion-knowledge distance geometry (ETKDG) algorithm from RDKit, optimize them and save in an SDF file.
- Parameters:
dataset (Dataset) – DeepMol Dataset object
file_path (str) – file_path where the conformers will be saved.
n_conformations (int) – The number of conformations per molecule.
max_iterations (int) – Maximum number of iterations for the molecule’s conformers optimization.
threads (int) – Number of threads.
timeout_per_molecule (int) – The number of seconds in which the conformers are to be generated.
etkg_version (int) – Version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm.
optimization_mode (str) – Mode for the molecular geometry optimization (MMFF or UFF).
deepmol.compound_featurization.rdkit_fingerprints module
- class AtomPairFingerprint(nBits: int = 2048, minLength: int = 1, maxLength: int = 30, nBitsPerEntry: int = 4, includeChirality: bool = False, use2D: bool = True, confId: int = -1, **kwargs)[source]
Bases:
MolecularFeaturizer
Atom pair fingerprints
Returns the atom-pair fingerprint for a molecule as an ExplicitBitVect
- class AtomPairFingerprintCallbackHash(nBits: int = 2048, minLength: int = 1, maxLength: int = 30, includeChirality: bool = False, use2D: bool = True, confId: int = -1, **kwargs)[source]
Bases:
MolecularFeaturizer
Atom pair fingerprints
Returns the atom-pair fingerprint for a molecule as an ExplicitBitVect
- class LayeredFingerprint(layerFlags: int = 4294967295, minPath: int = 1, maxPath: int = 7, fpSize: int = 2048, atomCounts: list | None = None, branchedPaths: bool = True, **kwargs)[source]
Bases:
MolecularFeaturizer
Calculate layered fingerprint for a single molecule.
- Layer definitions:
0x01: pure topology 0x02: bond order 0x04: atom types 0x08: presence of rings 0x10: ring sizes 0x20: aromaticity
- class MACCSkeysFingerprint(**kwargs)[source]
Bases:
MolecularFeaturizer
MACCS Keys. SMARTS-based implementation of the 166 public MACCS keys.
- draw_bit(mol: Mol, bit_index: int, file_path: str | None = None) Image [source]
Draw a molecule with a MACCS key highlighted.
- Parameters:
mol (Mol) – Molecule to draw.
bit_index (int) – Index of the MACCS key to highlight.
file_path (str) – Path to save the image to. If None, the image is not saved.
- Returns:
im – Image of the molecule with the MACCS key highlighted.
- Return type:
PIL.Image.Image
- class MorganFingerprint(radius: int = 2, size: int = 2048, chiral: bool = False, bonds: bool = True, features: bool = False, **kwargs)[source]
Bases:
MolecularFeaturizer
Morgan fingerprints. Extended Connectivity Circular Fingerprints compute a bag-of-words style representation of a molecule by breaking it into local neighborhoods and hashing into a bit vector of the specified size.
- draw_bit(mol: Mol, bit: int, molSize: Tuple[int, int] = (450, 200), file_path: str | None = None) str [source]
Draw a molecule with a Morgan fingerprint bit highlighted.
- Parameters:
mol (Mol) – Molecule to draw.
bit (int) – Bit to highlight.
molSize (Tuple[int, int]) – Size of the molecule.
file_path (str) – Path to save the image.
- Returns:
The molecule in SVG format.
- Return type:
str
- draw_bits(mol: Mol, bit_indexes: int | str | List[int], file_path: str | None = None) str [source]
Draw a molecule with a Morgan fingerprint bit highlighted.
- Parameters:
mol (Mol) – Molecule to draw.
bit_indexes (Union[int, str, List[int]]) – Bit to highlight. If int, only one bit is highlighted. If list, all the bits in the list are highlighted. If ‘ON’, all the bits ON are highlighted.
file_path (str) – Path to save the image.
- Return type:
str
- class RDKFingerprint(minPath: int = 1, maxPath: int = 7, fpSize: int = 2048, nBitsPerHash: int = 2, useHs: bool = True, tgtDensity: float = 0.0, minSize: int = 128, branchedPaths: bool = True, useBondOrder: bool = True, **kwargs)[source]
Bases:
MolecularFeaturizer
RDKit topological fingerprints
This algorithm functions by find all subgraphs between minPath and maxPath in length. For each subgraph:
A hash is calculated.
The hash is used to seed a random-number generator
_nBitsPerHash_ random numbers are generated and used to set the corresponding bits in the fingerprint
- draw_bit(mol: Mol, bit: int, folder_path: str | None = None, molSize: Tuple[int, int] = (450, 200))[source]
Draw a molecule with a RDK fingerprint bit highlighted.
- Parameters:
mol (Mol) – Molecule to draw.
bit (int) – Bit to highlight.
folder_path (str) – Path for the folder to save images.
molSize (Tuple[int, int]) – Size of the molecule.
- Returns:
The molecule with the fingerprint bit highlighted.
- Return type:
Images
- draw_bits(mol: Mol, bits: int | str | List[int], file_path: str | None = None) str [source]
Draw a molecule with a RDK fingerprint bit highlighted.
- Parameters:
mol (Mol) – Molecule to draw.
bits (Union[int, str, List[int]]) – Bit to highlight. If int, the bit to highlight. If str, the name of the bit to highlight. If ‘ON’, all the bits ON are highlighted.
file_path (str) – Path to save the image. If None, the image is not saved.
- Returns:
The molecule with the fingerprint bits.
- Return type:
Str
deepmol.compound_featurization.similarity_matrix module
- class TanimotoSimilarityMatrix(n_molecules: int | None = None, n_jobs: int = -1)[source]
Bases:
Transformer
Class to calculate Tanimoto similarity matrix for a dataset.
The similarity matrix is calculated using Morgan fingerprints.
- featurize(other_object, inplace=False, **kwargs)
Method that modifies an input object inplace or on a copy.
- Parameters:
self (object) – The class instance object.
other_object (object) – The object to apply the method to.
inplace (bool) – Whether to apply the method in place.
kwargs (dict) – Keyword arguments to pass to the method.
- Returns:
new_object – The new object.
- Return type:
object