deepmol.compound_featurization package

Subpackages

Submodules

deepmol.compound_featurization.base_featurizer module

class MolecularFeaturizer(n_jobs: int = -1)[source]

Bases: ABC, Transformer

Abstract class for calculating a set of features for a molecule. A MolecularFeaturizer uses SMILES strings or RDKit molecule objects to represent molecules.

Subclasses need to implement the _featurize method for calculating features for a single molecule.

featurize(other_object, inplace=False, **kwargs)

Method that modifies an input object inplace or on a copy.

Parameters:
  • self (object) – The class instance object.

  • other_object (object) – The object to apply the method to.

  • inplace (bool) – Whether to apply the method in place.

  • kwargs (dict) – Keyword arguments to pass to the method.

Returns:

new_object – The new object.

Return type:

object

deepmol.compound_featurization.deepchem_featurizers module

class ConvMolFeat(master_atom: bool = False, use_chirality: bool = False, atom_properties: bool = True, per_atom_fragmentation: bool = False, **kwargs)[source]

Bases: MolecularFeaturizer

Duvenaud graph convolution, adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#convmolfeaturizer). Vector of descriptors for each atom in a molecule. The featurizers computes that vector of local descriptors.

References: Duvenaud, David K., et al. “Convolutional networks on graphs for learning molecular fingerprints.” Advances in neural information processing systems. 2015.

get_atom_features(mol)[source]
class CoulombEigFeat(max_atoms: int, remove_hydrogens: bool = False, randomize: bool = False, n_samples: int = 1, max_conformers: int = 1, seed: int | None = None, generate_conformers=True, **kwargs)[source]

Bases: MolecularFeaturizer

Calculate the eigen values of Coulomb matrices for molecules. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#coulombmatrixeig).

References: Montavon, Grégoire, et al. “Learning invariant representations of molecules for atomization energy prediction.” Advances in neural information processing systems. 2012.

class CoulombFeat(max_atoms: int, remove_hydrogens: bool = False, randomize: bool = False, upper_tri: bool = False, n_samples: int = 1, max_conformers: int = 1, seed: int | None = None, generate_conformers: bool = True, **kwargs)[source]

Bases: MolecularFeaturizer

Calculate coulomb matrices for molecules. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#coulombmatrix).

References: Montavon, Grégoire, et al. “Learning invariant representations of molecules for atomization energy prediction.” Advances in neural information processing systems. 2012.

class DMPNNFeat(features_generators: List[str] | None = None, is_adding_hs: bool = False, use_original_atom_ranks: bool = False, **kwargs)[source]

Bases: MolecularFeaturizer

Featurizes molecules using DeepChem DMPNNFeaturizer.

This class is a featurizer for Directed Message Passing Neural Network (D-MPNN) implementation

The default node(atom) and edge(bond) representations are based on Analyzing Learned Molecular Representations for Property Prediction paper.

Reference: https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#dmpnnfeaturizer

class DagTransformer(max_atoms: int = 50)[source]

Bases: Transformer

Performs transform from ConvMol adjacency lists to DAG calculation orders

This transformer is used by DAGModel before training to transform its inputs to the correct shape. This expansion turns a molecule with n atoms into n DAGs, each with root at a different atom in the molecule.

Reference: https://deepchem.readthedocs.io/en/latest/api_reference/transformers.html#dagtransformer

class MATFeat(**kwargs)[source]

Bases: MolecularFeaturizer

Featurizes molecules using DeepChem MATFeaturizer.

This class is a featurizer for Molecular Attribute Transformer (MAT) implementation The returned value is a numpy array which consists of molecular graph descriptions:

  • Node Features

  • Adjacency Matrix

  • Distance Matrix

Reference: [1] https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#matfeaturizer [2] Lukasz Maziarka et al. “Molecule Attention Transformer`<https://arxiv.org/abs/2002.08264>`”

class MolGanFeat(max_atom_count: int = 9, kekulize: bool = True, bond_labels: List[Any] | None = None, atom_labels: List[int] | None = None, **kwargs)[source]

Bases: MolecularFeaturizer

Featurizer for MolGAN de-novo molecular generation model, adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html?highlight=CGCNN#molganfeaturizer). It is wrapper for two matrices containing atom and bond type information.

References: Nicola De Cao et al. “MolGAN: An implicit generative model for small molecular graphs” (2018), https://arxiv.org/abs/1805.11973

class MolGraphConvFeat(use_edges: bool = False, use_chirality: bool = False, use_partial_charge: bool = False, **kwargs)[source]

Bases: MolecularFeaturizer

Featurizer of general graph convolution networks for molecules. Adapted from deepchem: (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#molgraphconvfeaturizer)

References: Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond fingerprints.” Journal of computer-aided molecular design 30.8 (2016):595-608.

class PagtnMolGraphFeat(max_length: int = 5, **kwargs)[source]

Bases: MolecularFeaturizer

This class is a featurizer of PAGTN graph networks for molecules.

The featurization is based on PAGTN model. It is slightly more computationally intensive than default Graph Convolution Featurizer, but it builds a Molecular Graph connecting all atom pairs accounting for interactions of an atom with every other atom in the Molecule. According to the paper, interactions between two pairs of atom are dependent on the relative distance between them and hence, the function needs to calculate the shortest path between them.

References

[1] Chen, Barzilay, Jaakkola “Path-Augmented Graph Transformer Network” 10.26434/chemrxiv.8214422.

class RawFeat(n_jobs: int = -1)[source]

Bases: MolecularFeaturizer

class SmileImageFeat(img_size: int = 80, res: float = 0.5, max_len: int = 250, img_spec: str = 'std', **kwargs)[source]

Bases: MolecularFeaturizer

Converts SMILE string to image. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#smilestoimage).

References: Goh, Garrett B., et al. “Using rule-based labels for weak supervised learning: a ChemNet for transferable chemical property prediction.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.

class SmilesSeqFeat(char_to_idx: Dict[str, int] | None = None, max_len: int = 250, pad_len: int = 10)[source]

Bases: Transformer

Takes SMILES strings and turns into a sequence. Adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#smilestoseq).

References: Goh, Garrett B., et al. “Using rule-based labels for weak supervised learning: a ChemNet for transferable chemical property prediction.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.

featurize(other_object, inplace=False, **kwargs)

Method that modifies an input object inplace or on a copy.

Parameters:
  • self (object) – The class instance object.

  • other_object (object) – The object to apply the method to.

  • inplace (bool) – Whether to apply the method in place.

  • kwargs (dict) – Keyword arguments to pass to the method.

Returns:

new_object – The new object.

Return type:

object

class WeaveFeat(graph_distance: bool = True, explicit_h: bool = False, use_chirality: bool = False, max_pair_distance: int | None = None, **kwargs)[source]

Bases: MolecularFeaturizer

Weave convolution featurization, adapted from deepchem (https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#weavefeaturizer). Require a quadratic matrix of interaction descriptors for each pair of atoms.

References: Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond fingerprints.” Journal of computer-aided molecular design 30.8 (2016): 595-608.

deepmol.compound_featurization.mhfp module

class MHFP(**kwargs)[source]

Bases: MolecularFeaturizer

MHFP featurizer class. This module contains the MHFP encoder, which is used to encode SMILES and RDKit molecule instances as MHFP fingerprints.

deepmol.compound_featurization.mixed_descriptors module

class MixedFeaturizer(featurizers: Iterable[MolecularFeaturizer], **kwargs)[source]

Bases: MolecularFeaturizer

Class to perform multiple types of featurizers. Features from different featurizers are concatenated.

deepmol.compound_featurization.mol2vec module

class Mol2Vec(pretrain_model_path: str | None = None, radius: int = 1, unseen: str = 'UNK', gather_method: str = 'sum', **kwargs)[source]

Bases: MolecularFeaturizer

Mol2Vec fingerprint implementation from https://doi.org/10.1021/acs.jcim.7b00616

Inspired by natural language processing techniques, Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties.

sentences2vec(sentences: Iterable, model: Word2Vec, unseen: str | None = None)[source]

Generate vectors for each sentence (list) in a list of sentences. Vector is simply a sum of vectors for individual words.

Parameters:
Returns:

Array of vectors for each sentence.

Return type:

np.array

deepmol.compound_featurization.nc_mfp_generator module

class NcMfp(database_file_path='/home/docs/checkouts/readthedocs.org/user_builds/deepmol/checkouts/latest/src/deepmol/compound_featurization/nc_mfp/databases/ncdb', **kwargs)[source]

Bases: MolecularFeaturizer

static generate_new_fingerprint_database(data: SmilesDataset, output_folder: str)[source]

Generate a new fingerprint database from the given dataset.

Parameters:
  • data (SmilesDataset) – A dataset of SMILES strings.

  • output_folder (str) – The output folder to save the database.

Return type:

None

deepmol.compound_featurization.neural_npfp_generator module

class NeuralNPFP(model_name='aux', device='cpu', **kwargs)[source]

Bases: MolecularFeaturizer

deepmol.compound_featurization.np_classifier_fp module

class NPClassifierFP(radius: int = 2, **kwargs)[source]

Bases: MolecularFeaturizer

deepmol.compound_featurization.one_hot_encoder module

class SmilesOneHotEncoder(tokenizer: Tokenizer | None = None, max_length: int | None = None, n_jobs: int = -1)[source]

Bases: Transformer

A class for one-hot encoding SMILES. The SmilesOneHotEncoder tokenizes SMILES strings and one-hot encodes them.

Parameters:
  • tokenizer (Tokenizer) – The tokenizer to use to tokenize SMILES strings.

  • max_length (int) – The maximum length of the SMILES strings.

  • n_jobs (int) – The number of jobs to use for tokenization.

Examples

>>> from deepmol.compound_featurization import SmilesOneHotEncoder
>>> from deepmol.loaders import CSVLoader
>>> data = loader = CSVLoader('data_path.csv', smiles_field='Smiles', labels_fields=['Class'])
>>> dataset = loader.create_dataset(sep=";")
>>> ohe = SmilesOneHotEncoder().fit_transform(dataset)
featurize(other_object, inplace=False, **kwargs)

Method that modifies an input object inplace or on a copy.

Parameters:
  • self (object) – The class instance object.

  • other_object (object) – The object to apply the method to.

  • inplace (bool) – Whether to apply the method in place.

  • kwargs (dict) – Keyword arguments to pass to the method.

Returns:

new_object – The new object.

Return type:

object

inverse_transform(matrix: ndarray) List[str][source]

Inverse transforms a dataset.

Parameters:

matrix (np.ndarray) – The one-hot encoded matrix.

Returns:

smiles – The SMILES strings.

Return type:

List[str]

property shape: tuple

Returns the shape of the one-hot encoded matrix.

Returns:

shape – The shape of the one-hot encoded matrix.

Return type:

tuple

deepmol.compound_featurization.rdkit_descriptors module

class All3DDescriptors(mandatory_generation_of_conformers=True)[source]

Bases: MolecularFeaturizer

Class to generate all three-dimensional descriptors.

class Asphericity(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Calculate molecular Asphericity A. Baumgaertner, “Shapes of flexible vesicles” J. Chem. Phys. 98:7496 (1993) https://doi.org/10.1063/1.464689

class AutoCorr3D(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

AutoCorr3D. Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37

class Eccentricity(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Calculate molecular eccentricity G. A. Arteca “Molecular Shape Descriptors” Reviews in Computational Chemistry vol 9 https://doi.org/10.1002/9780470125861.ch5

class InertialShapeFactor(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Calculate Inertial Shape Factor Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37

class MORSE(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Molecule Representation of Structures based on Electron diffraction descriptors Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37

class NormalizedPrincipalMomentsRatios(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Normalized principal moments ratios. Sauer and Schwarz JCIM 43:987-1003 (2003)

class PlaneOfBestFit(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Plane of best fit Nicholas C. Firth, Nathan Brown, and Julian Blagg, JCIM 52:2516-25

class PrincipalMomentsOfInertia(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Calculate Principal Moments of Inertia

class RadialDistributionFunction(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Radial distribution function Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37

class RadiusOfGyration(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Calculate Radius of Gyration G. A. Arteca “Molecular Shape Descriptors” Reviews in Computational Chemistry vol 9 https://doi.org/10.1002/9780470125861.ch5

class SpherocityIndex(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

Calculate molecular Spherocity Index Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37

class ThreeDimensionDescriptor(mandatory_generation_of_conformers: bool, **kwargs)[source]

Bases: MolecularFeaturizer, ABC

Class to generate three-dimensional descriptors.

property descriptor_function

Get the descriptor function.

generate_descriptor(mol)[source]

Generate the descriptors.

Parameters:

mol (Mol) – Mol object from rdkit.

Returns:

descriptors – Array with the descriptors.

Return type:

np.ndarray

class ThreeDimensionalMoleculeGenerator(n_conformations: int = 5, max_iterations: int = 5, threads: int = 1, timeout_per_molecule: int = 40)[source]

Bases: object

Class to generate three-dimensional conformers and optimize them.

static check_if_mol_has_explicit_hydrogens(new_mol: Mol)[source]

Method to check if a molecule has explicit hydrogens.

Parameters:

new_mol (Mol) – Mol object from rdkit.

Returns:

True if molecule has explicit hydrogens and False if not.

Return type:

bool

generate(dataset: Dataset, etkdg_version: int = 3, mode: str = 'MMFF94')[source]

Method to generate three-dimensional conformers for a dataset

Parameters:
  • dataset (dataset) – Dataset

  • etkdg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm

  • mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).

Returns:

mol – Mol object with optimized molecular geometry.

Return type:

Mol

generate_conformers(new_mol: Mol, etkdg_version: int = 3, **kwargs)[source]

method to generate three-dimensional conformers

Parameters:
  • new_mol (Mol) – Mol object from rdkit

  • etkdg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm

  • kwargs (dict) – Parameters for the ETKDG algorithm.

Returns:

new_mol – Mol object with three-dimensional conformers.

Return type:

Mol

generate_structure(mol: Mol, etkdg_version: int = 3, mode: str = 'MMFF94')[source]

Method to generate three-dimensional conformers

Parameters:
  • new_mol (Mol) – Mol object from rdkit.

  • etkdg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm

  • mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).

Returns:

mol – Mol object with optimized molecular geometry.

Return type:

Mol

optimize_molecular_geometry(mol: Mol, mode: str = 'MMFF94')[source]

Class to generate three-dimensional conformers

Parameters:
  • mol (Mol) – Mol object from rdkit.

  • mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).

Returns:

mol – Mol object with optimized molecular geometry.

Return type:

Mol

class TwoDimensionDescriptors(**kwargs)[source]

Bases: MolecularFeaturizer

Class to generate two-dimensional descriptors. It generates all descriptors from the RDKit library.

class WHIM(mandatory_generation_of_conformers=False)[source]

Bases: ThreeDimensionDescriptor

WHIM descriptors vector Todeschini and Consoni “Descriptors from Molecular Geometry” Handbook of Chemoinformatics https://doi.org/10.1002/9783527618279.ch37

check_atoms_coordinates(mol)[source]

Function to check if a molecule contains zero coordinates in all atoms. Then this molecule must be eliminated.

Example

# Load test set to a frame sdf = ‘miniset.sdf’ df = pt.LoadSDF(sdf, molColName=’mol3DProt’) ## Checking if molecule contains only ZERO coordinates, ## then remove that molecules from dataset df[‘check_coordinates’] = [checkAtomsCoordinates(x) for x in df.mol3DProt] df_eliminated_mols = dfl[df.check_coordinates == False] df = df[df.check_coordinates == True] df.drop(columns=[‘check_coordinates’], inplace=True) print(‘final minitest set:’, df.shape[0]) print(‘minitest eliminated:’, df_eliminated_mols.shape[0])

Parameters:

mol (Mol) – Molecule to check coordinates.

Returns:

True if molecule is OK and False if molecule contains zero coordinates.

Return type:

bool

generate_conformers(generator: ThreeDimensionalMoleculeGenerator, new_mol: Mol | str, etkg_version: int = 1, optimization_mode: str = 'MMFF94')[source]

Method to generate three-dimensional conformers and optimize them.

Parameters:
  • generator (ThreeDimensionalMoleculeGenerator) – Class to generate three-dimensional conformers and optimize them.

  • new_mol (Union[Mol, str]) – Mol object from rdkit or SMILES string to generate conformers and optimize them.

  • etkg_version (int) – version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm.

  • optimization_mode (str) – mode for the molecular geometry optimization (MMFF or UFF variants).

Returns:

new_mol – Mol object with three-dimensional conformers and optimized molecular geometry.

Return type:

Mol

generate_conformers_to_sdf_file(dataset: Dataset, file_path: str, n_conformations: int = 20, max_iterations: int = 5, threads: int = 1, timeout_per_molecule: int = 12, etkg_version: int = 1, optimization_mode: str = 'MMFF94')[source]

Generate conformers using the experimental-torsion-knowledge distance geometry (ETKDG) algorithm from RDKit, optimize them and save in an SDF file.

Parameters:
  • dataset (Dataset) – DeepMol Dataset object

  • file_path (str) – file_path where the conformers will be saved.

  • n_conformations (int) – The number of conformations per molecule.

  • max_iterations (int) – Maximum number of iterations for the molecule’s conformers optimization.

  • threads (int) – Number of threads.

  • timeout_per_molecule (int) – The number of seconds in which the conformers are to be generated.

  • etkg_version (int) – Version of the experimental-torsion-knowledge distance geometry (ETKDG) algorithm.

  • optimization_mode (str) – Mode for the molecular geometry optimization (MMFF or UFF).

get_all_3D_descriptors(mol)[source]

Method that lists all the methods and uses them to featurize the whole set.

Parameters:

mol (Mol) – Mol object from rdkit.

Returns:

all_descriptors – List with all the 3D descriptors.

Return type:

list

get_all_3D_descriptors_feature_names() List[str][source]

Method that lists all 3D featurizers feature names.

Returns:

feature_names – List with all the 3D descriptors feature names.

Return type:

List[str]

deepmol.compound_featurization.rdkit_fingerprints module

class AtomPairFingerprint(nBits: int = 2048, minLength: int = 1, maxLength: int = 30, nBitsPerEntry: int = 4, includeChirality: bool = False, use2D: bool = True, confId: int = -1, **kwargs)[source]

Bases: MolecularFeaturizer

Atom pair fingerprints

Returns the atom-pair fingerprint for a molecule as an ExplicitBitVect

class AtomPairFingerprintCallbackHash(nBits: int = 2048, minLength: int = 1, maxLength: int = 30, includeChirality: bool = False, use2D: bool = True, confId: int = -1, **kwargs)[source]

Bases: MolecularFeaturizer

Atom pair fingerprints

Returns the atom-pair fingerprint for a molecule as an ExplicitBitVect

static hash_function(bit, value)[source]

Hash function for atom pair fingerprint.

Parameters:
  • bit (int) – The bit to be hashed.

  • value (int) – The value to be hashed.

class LayeredFingerprint(layerFlags: int = 4294967295, minPath: int = 1, maxPath: int = 7, fpSize: int = 2048, atomCounts: list | None = None, branchedPaths: bool = True, **kwargs)[source]

Bases: MolecularFeaturizer

Calculate layered fingerprint for a single molecule.

Layer definitions:

0x01: pure topology 0x02: bond order 0x04: atom types 0x08: presence of rings 0x10: ring sizes 0x20: aromaticity

class MACCSkeysFingerprint(**kwargs)[source]

Bases: MolecularFeaturizer

MACCS Keys. SMARTS-based implementation of the 166 public MACCS keys.

draw_bit(mol: Mol, bit_index: int, file_path: str | None = None) Image[source]

Draw a molecule with a MACCS key highlighted.

Parameters:
  • mol (Mol) – Molecule to draw.

  • bit_index (int) – Index of the MACCS key to highlight.

  • file_path (str) – Path to save the image to. If None, the image is not saved.

Returns:

im – Image of the molecule with the MACCS key highlighted.

Return type:

PIL.Image.Image

class MorganFingerprint(radius: int = 2, size: int = 2048, chiral: bool = False, bonds: bool = True, features: bool = False, **kwargs)[source]

Bases: MolecularFeaturizer

Morgan fingerprints. Extended Connectivity Circular Fingerprints compute a bag-of-words style representation of a molecule by breaking it into local neighborhoods and hashing into a bit vector of the specified size.

draw_bit(mol: Mol, bit: int, molSize: Tuple[int, int] = (450, 200), file_path: str | None = None) str[source]

Draw a molecule with a Morgan fingerprint bit highlighted.

Parameters:
  • mol (Mol) – Molecule to draw.

  • bit (int) – Bit to highlight.

  • molSize (Tuple[int, int]) – Size of the molecule.

  • file_path (str) – Path to save the image.

Returns:

The molecule in SVG format.

Return type:

str

draw_bits(mol: Mol, bit_indexes: int | str | List[int], file_path: str | None = None) str[source]

Draw a molecule with a Morgan fingerprint bit highlighted.

Parameters:
  • mol (Mol) – Molecule to draw.

  • bit_indexes (Union[int, str, List[int]]) – Bit to highlight. If int, only one bit is highlighted. If list, all the bits in the list are highlighted. If ‘ON’, all the bits ON are highlighted.

  • file_path (str) – Path to save the image.

Return type:

str

class RDKFingerprint(minPath: int = 1, maxPath: int = 7, fpSize: int = 2048, nBitsPerHash: int = 2, useHs: bool = True, tgtDensity: float = 0.0, minSize: int = 128, branchedPaths: bool = True, useBondOrder: bool = True, **kwargs)[source]

Bases: MolecularFeaturizer

RDKit topological fingerprints

This algorithm functions by find all subgraphs between minPath and maxPath in length. For each subgraph:

A hash is calculated.

The hash is used to seed a random-number generator

_nBitsPerHash_ random numbers are generated and used to set the corresponding bits in the fingerprint

draw_bit(mol: Mol, bit: int, folder_path: str | None = None, molSize: Tuple[int, int] = (450, 200))[source]

Draw a molecule with a RDK fingerprint bit highlighted.

Parameters:
  • mol (Mol) – Molecule to draw.

  • bit (int) – Bit to highlight.

  • folder_path (str) – Path for the folder to save images.

  • molSize (Tuple[int, int]) – Size of the molecule.

Returns:

The molecule with the fingerprint bit highlighted.

Return type:

Images

draw_bits(mol: Mol, bits: int | str | List[int], file_path: str | None = None) str[source]

Draw a molecule with a RDK fingerprint bit highlighted.

Parameters:
  • mol (Mol) – Molecule to draw.

  • bits (Union[int, str, List[int]]) – Bit to highlight. If int, the bit to highlight. If str, the name of the bit to highlight. If ‘ON’, all the bits ON are highlighted.

  • file_path (str) – Path to save the image. If None, the image is not saved.

Returns:

The molecule with the fingerprint bits.

Return type:

Str

deepmol.compound_featurization.similarity_matrix module

class TanimotoSimilarityMatrix(n_molecules: int | None = None, n_jobs: int = -1)[source]

Bases: Transformer

Class to calculate Tanimoto similarity matrix for a dataset.

The similarity matrix is calculated using Morgan fingerprints.

featurize(other_object, inplace=False, **kwargs)

Method that modifies an input object inplace or on a copy.

Parameters:
  • self (object) – The class instance object.

  • other_object (object) – The object to apply the method to.

  • inplace (bool) – Whether to apply the method in place.

  • kwargs (dict) – Keyword arguments to pass to the method.

Returns:

new_object – The new object.

Return type:

object

Module contents