# Extracting features from molecules ## Introduction Extracting features from molecules is a common task in machine learning. There are 5 different types of features: 0D, 1D, 2D, 3D, or 4D. - 0D features are descriptors that describe the individual parts of the molecule together as a whole, such as the number of atoms, bond counts or the molecular weight. - 1D features are descriptors that describe substructures in the molecule (e.g. molecular fingerprints). - 2D features are descriptors that describe the molecular topology based on the graph representation of the molecules, e.g. the number of rings or the number of rotatable bonds. - 3D features are descriptors geometrical descriptors that describe the molecule as a 3D structure. - 4D features are descriptors that describe the molecule as a 4D structure. A new dimension is added to characterize the interactions between the molecule and the active site of a receptor or the multiple conformational states of the molecule, e.g. the molecular dynamics of the molecule. ![features_image.png](features_image.png) Source : Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach. As we increase the level of information about a molecule (from 0D to 4D), we also increase the computational cost of calculating the features. For example, calculating 3D features requires the generation of 3D conformers, which can be computationally expensive for large molecules. In addition, some features may not be available for certain molecules, e.g. 3D features cannot be calculated for molecules that do not have a 3D structure. Fortunately, DeepMol provides methods for generating compound 3D structures. DeepMol provides a number of featurization methods for generating features from molecules. These features can be used for a variety of tasks, such as virtual screening, drug design, and toxicity prediction. The featurization methods are implemented as classes in the deepmol.compound_featurization module. Each class has a featurize method that takes a dataset as input and returns a featurized dataset. The featurize method can be called directly on a dataset object or used in a pipeline with other featurization methods. The following featurization methods are currently available in DeepMol: - MorganFingerprint - AtomPairFingerprint - LayeredFingerprint - RDKFingerprint - MACCSkeysFingerprint - TwoDimensionDescriptors - WeaveFeat - CoulombFeat - CoulombEigFeat - ConvMolFeat - MolGraphConvFeat - SmileImageFeat - SmilesSeqFeat - MolGanFeat - All3DDescriptors **Load the dataset** ```python from deepmol.loaders import CSVLoader, SDFLoader from deepmol.compound_featurization import MorganFingerprint, TwoDimensionDescriptors, MACCSkeysFingerprint, \ AtomPairFingerprint, LayeredFingerprint, RDKFingerprint from deepmol.compound_featurization import WeaveFeat, CoulombFeat, CoulombEigFeat, ConvMolFeat, MolGraphConvFeat, \ SmileImageFeat, SmilesSeqFeat, MolGanFeat, All3DDescriptors, generate_conformers_to_sdf_file import numpy as np ``` ```python from deepmol.loaders import CSVLoader dataset = CSVLoader("../data/CHEMBL217_reduced.csv", id_field="Original_Entry_ID", smiles_field="SMILES", labels_fields=["Activity_Flag"]).create_dataset() ``` 2023-06-06 16:43:55,572 — ERROR — Molecule with smiles: ClC1=C(N2CCN(O)(CC2)=C/C=C/CNC(=O)C=3C=CC(=CC3)C4=NC=CC=C4)C=CC=C1Cl removed from dataset. 2023-06-06 16:43:55,574 — INFO — Assuming classification since there are less than 10 unique y values. If otherwise, explicitly set the mode to 'regression'! [16:43:55] Explicit valence for atom # 6 N, 5, is greater than permitted ## 1D features: fingerprints and structural keys ![fingerprints.png](fingerprints.png) There are special codes called "structural keys" that have been created for various purposes in the field of chemistry. They help with tasks like finding similar molecules or exploring different chemical structures. One specific type of structural keys is called the Molecular ACCess System (MACCS) keys. These keys use binary digits (bits) to show whether certain parts of a molecule are present or not. For example, if a specific structural fragment exists in a molecule, the corresponding bit will be set to 1, and if it's not present, the bit will be set to 0. There are different versions of MACCS keys, but the most common ones are either 166 or 960 bits long. These keys provide a simplified representation of molecules, which is useful for various chemical analyses and comparisons. [1] Hashed fingerprints are a type of chemical fingerprint that use a special function to convert patterns in molecules into a series of bits. The length of the fingerprint can be predetermined. There are different types of fingerprints used in chemistry. Topological or path-based fingerprints, like Daylight fingerprints, provide information about how atoms are connected in a molecule. Circular fingerprints, such as ECFP, give information about the neighborhoods of atoms. These fingerprints are useful for quickly comparing similarities between molecules, studying the relationship between chemical structures and activities, and creating maps of chemical space. Most fingerprints have been designed for small molecules and may not work well with larger ones. For example, ECFP4 is effective for virtual screening [2] and target prediction [3] with small molecules but may not accurately represent the overall features or structural differences of larger molecules [4]. On the other hand, atom-pair fingerprints, which describe molecular shape, are better suited for larger molecules [4]. However, they don't provide detailed structural information and may perform poorly in benchmarking studies with small molecules compared to substructure fingerprints like ECFP4 [4]. [1] L. David et al. “Molecular representations in AI-driven drug discovery: a review and practical guide”. In: Journal of Cheminformatics 12 (1 2020-12), p. 56 [2] S. Riniker and G. A. Landrum. “Open-source platform to benchmark fingerprints for ligand-based virtual screening”. In: Journal of cheminformatics 5.1 (2013), pp. 1–17 [3] M. Awale and J.-L. Reymond. “Polypharmacology browser PPB2: target prediction combining nearest neighbors with machine learning”. In: Journal of chemical information and modeling 59.1 (2018), pp. 10–17 [4] A. Capecchi, D. Probst, and J.-L. Reymond. “One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome”. In: Journal of cheminformatics 12.1 (2020), pp. 1–15 ### Morgan Fingerprint Morgan fingerprints, also known as circular fingerprints or Morgan/Circular fingerprints, are a type of molecular fingerprint that encodes the structural information of a molecule as a series of binary bitstrings. These fingerprints are generated using the Morgan algorithm, which iteratively applies a circular pattern to a molecule, generating a series of concentric circles around each atom. The resulting bitstring is a binary representation of the presence or absence of certain substructures within a certain radius of each atom. Morgan fingerprints are widely used in cheminformatics and computational chemistry for tasks such as molecular similarity analysis, virtual screening, and quantitative structure-activity relationship (QSAR) modeling. They are also computationally efficient and can be generated quickly for large sets of molecules, making them useful for high-throughput screening applications. ```python from deepmol.compound_featurization import MorganFingerprint MorganFingerprint(n_jobs=10).featurize(dataset, inplace=True) ``` ```python dataset.X.shape ``` (16645, 2048) ```python dataset.X[0] ``` array([0., 1., 0., ..., 0., 0., 0.], dtype=float32) ```python np.unique(dataset.X[0], return_counts=True) ``` (array([0., 1.], dtype=float32), array([2006, 42])) ### Atom Pair Fingerprint Atom pair fingerprint is a type of molecular fingerprinting method used in cheminformatics and computational chemistry. It encodes the presence or absence of pairs of atoms in a molecule, as well as the distance between them. The method involves dividing a molecule into atom pairs and then counting the frequency of each pair in the molecule. The result is a binary bitstring that represents the presence or absence of each atom pair in the molecule. The bitstring is usually truncated to a fixed length to facilitate comparison and analysis. ```python from deepmol.compound_featurization import AtomPairFingerprint AtomPairFingerprint(n_jobs=10).featurize(dataset, inplace=True) ``` ```python dataset.X.shape ``` (16645, 2048) ```python dataset.X[0] ``` array([0., 0., 0., ..., 0., 0., 0.], dtype=float32) ```python np.unique(dataset.X[0], return_counts=True) ``` (array([0., 1.], dtype=float32), array([1841, 207])) ### Layered Fingerprint Layered fingerprints, also known as topological fingerprints, are a type of molecular fingerprinting method used in cheminformatics and computational chemistry. They encode the presence or absence of certain substructures or functional groups in a molecule, which are represented as binary bitstrings. The method involves dividing a molecule into a series of layers, where each layer contains a different set of substructures or functional groups. The bitstring for each layer is generated by hashing the presence or absence of the substructures or functional groups in the layer. The final fingerprint is generated by concatenating the bitstrings for all layers, resulting in a binary bitstring that represents the presence or absence of all substructures or functional groups in the molecule. More specifically, the algorithm of fingerprint generation finds all possible paths or subgraphs of specified lengths in the molecule based on the input parameters. Iterating through the generated paths, the code calculates various hash layers based on the specified layer flags and stores them. Depending on the layer flags, different features are considered, such as bond topology, bond orders, atom types, ring information, ring size, and aromaticity. By default, the fingerprint considers them all. Each path's hash layers are sorted, and the distinct atom count in the path is added. The path is hashed to generate a seed, and the seed is used to determine the bit position in the fingerprint. If specified, the bit is set in the resulting fingerprint, and atom counts are updated if necessary. ```python from deepmol.compound_featurization import LayeredFingerprint LayeredFingerprint(n_jobs=10).featurize(dataset, inplace=True) ``` ```python dataset.X.shape ``` (16645, 2048) ```python dataset.X[0] ``` array([0., 0., 0., ..., 0., 0., 0.], dtype=float32) ```python np.unique(dataset.X[0], return_counts=True) ``` (array([0., 1.], dtype=float32), array([1485, 563])) ### RDK Fingerprint Fingerprints from rdkit ```python from deepmol.compound_featurization import RDKFingerprint RDKFingerprint(n_jobs=10).featurize(dataset, inplace=True) ``` ```python dataset.X.shape ``` (16645, 2048) ```python dataset.X[0] ``` array([1., 0., 1., ..., 0., 1., 1.], dtype=float32) ```python np.unique(dataset.X[0], return_counts=True) ``` (array([0., 1.], dtype=float32), array([1255, 793])) ### MACCS Keys Fingerprint MACCS (Molecular ACCess System) keys are a type of binary molecular fingerprint used in cheminformatics and computational chemistry. They were developed by Molecular Design Limited (now part of Elsevier) and are widely used in the field. The MACCS keys encode the presence or absence of certain molecular fragments or substructures in a molecule as a binary bitstring. The fragments used are based on a predefined set of SMARTS patterns, which represent specific substructures or features of a molecule. The MACCS keys consist of 167 bit positions, with each bit representing the presence or absence of a specific fragment in the molecule. The bitstring can be used to compare the similarity of two molecules or to search a large database of molecules for compounds with similar structures or properties. ```python from deepmol.compound_featurization import MACCSkeysFingerprint MACCSkeysFingerprint(n_jobs=10).featurize(dataset, inplace=True) ``` ```python dataset.X.shape ``` (16645, 167) ```python dataset.X[0] ``` array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 1., 0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.], dtype=float32) ```python np.unique(dataset.X[0], return_counts=True) ``` (array([0., 1.], dtype=float32), array([120, 47])) ## 0D, 1D and 2D Descriptors We provide all 0D, 2D descriptors and some 1D descriptors from rdkit in only one function. These include: - **EState index descriptors**: The EState indices are calculated based on a set of predefined atomic parameters, such as electronegativity, atomic polarizability, and resonance effects. These indices quantify the electronic characteristics of individual atoms in a molecule. - **MaxAbsEStateIndex (MAEstate)**: Maximum absolute EState index - The MAEstate specifically represents the highest absolute EState index value among all the atoms in a molecule. It indicates the atom with the largest charge magnitude, reflecting its potential reactivity or contribution to chemical properties. - **MaxEStateIndex:**: Maximum EState Index - The MaxEStateIndex specifically represents the highest EState index value among all the atoms in a molecule. It indicates the atom with the largest charge or electronic density, reflecting its potential reactivity or significance in the molecule's properties. - **MinAbsEStateIndex**: Minimum absolute EState index - The MinAbsEStateIndex specifically represents the lowest absolute EState index value among all the atoms in a molecule. - **MinEStateIndex**: Minimum EState index - The MinEStateIndex represents the lowest EState index value among all the atoms in a molecule. - **QED**: quantitative estimation of drug-likeness - a computational algorithm used to quantitatively assess the drug-likeness of a molecule. It combines various molecular descriptors, including 2D properties, to generate a single numerical score that represents the overall drug-likeness of the molecule. - **Molecular weight descriptors**: - **MolWt**: Molecular weight. - **HeavyAtomMolWt**: The average molecular weight of the molecule ignoring hydrogens. - **ExactMolWt**: The exact molecular weight of the molecule. - **Electron descriptors**: - **NumValenceElectrons**: The number of valence electrons the molecule has. - **NumRadicalElectrons**: The number of radical electrons the molecule has. - **Charge descriptors**: - **MaxPartialCharge**: Maximum partial charge; - **MinPartialCharge**: Minimum partial charge; - **MaxAbsPartialCharge**: Maximum absolute partial charge; - **MinAbsPartialCharge**: Minimum absolute partial charge; - **Morgan fingerprint density** - quantify the frequency of occurrence of specific substructures within the molecule at a local level, taking into account their immediate surroundings. Higher values of density indicate a higher density of unique substructures in the molecule, while lower values indicate fewer unique substructures or a more uniform distribution of substructures. - **FpDensityMorgan1**: Fingerprint Density for Morgan Radius 1. - **FpDensityMorgan2**: Fingerprint Density for Morgan Radius 2. - **FpDensityMorgan3**: Fingerprint Density for Morgan Radius 3. - **BCUT2D descriptors**: BCUT2D descriptors are based on the Burden matrix, which encodes bond strengths between atoms in the molecule. It changes the diagonal elements of the matrix to include the atom properties and then performs eigenvalue decomposition to obtain the highest and lowest eigenvalues. - **BCUT2D_MWHI**: Incorporates atom masses in the Burden matrix - returns the highest eigenvalue. - **BCUT2D_MWLOW**: Incorporates atom masses in the Burden matrix - returns the lowest eigenvalue. - **BCUT2D_CHGHI**: Incorporates atom charges in the Burden matrix - returns the highest eigenvalue. - **BCUT2D_CHGLO**: Incorporates atom charges in the Burden matrix - returns the lowest eigenvalue. - **BCUT2D_LOGPHI**: Incorporates atom logarithms of the partition coefficient (logP) in the Burden matrix - returns the highest eigenvalue. - **BCUT2D_LOGPLOW**: Incorporates atom logarithms of the partition coefficient (logP) in the Burden matrix - returns the lowest eigenvalue. - **BCUT2D_MRHI**: Incorporates atom molar refractivity in the Burden matrix - returns the highest eigenvalue. - **BCUT2D_MRLOW**: Incorporates atom molar refractivity in the Burden matrix - returns the lowest eigenvalue. - **AvgIpc**: the average information content of the coefficients of the characteristic polynomial of the adjacency matrix of a hydrogen-suppressed graph of a molecule. - **Ipc**: the information content of the coefficients of the characteristic polynomial of the adjacency matrix of a hydrogen-suppressed graph of a molecule. - **BalabanJ**: Balaban's J index. It quantifies the molecular topological structure by considering the connectivity of atoms and bonds in the molecule. See [here](http://www.codessa-pro.com/descriptors/topo/balaban.htm). - **BertzCT**: Bertz complexity index. It measures the topological complexity or branching of a molecule based on its structural connectivity. - **Chi0, Chi1, Chi2n, Chi3v, etc.**: the Chi descriptors represent the count of specific path patterns in the molecule and are calculated based on the Hall-Kier delta values or on the deviation of an atom's valence electron count from the expected count based on its atomic number. - **HallKierAlpha**: Hall-Kier alpha value. It describes the flexibility or rigidity of atoms in a molecule. - **Kappa1, Kappa2, Kappa3**: Kappa shape indices. They describe the shape of a molecule based on the distribution of bond lengths and angles. These descriptors are derived from the Hall-Kier alpha descriptor and the number of paths of specific lengths in the molecule. - **LabuteASA**: Labute's Approximate Surface Area. It estimates the solvent-accessible surface area of a molecule, which is relevant for its solubility and permeability properties. - **PEOE_VSA1, PEOE_VSA2, PEOE_VSA3, etc.**: These descriptors Calculates the PEOE (Partial Equalization of Orbital Electronegativity) VSA (surface area contributions of atoms or groups of atoms in a molecule) for a molecule by assigning atom contributions to predefined bins based on their Labute ASA and Gasteiger charge values. - **SMR_VSA1, SMR_VSA2, SMR_VSA3, etc.**: Calculates the SMR (Molar Refractivity) VSA for a molecule by assigning atom contributions to predefined bins based on their Labute ASA and MR values. - **SlogP_VSA1, SlogP_VSA2, SlogP_VSA3, etc.**: Calculates the SlogP (Atomic contribution model developed by Crippen et. al. 1999 using 7000 molecular structures with the correct protonated state as training set) VSA for a molecule by assigning atom contributions to predefined bins based on their Labute ASA and SlogP values. - **EState_VSA1, EState_VSA2, EState_VSA3, etc.**: Calculates the EState (E-State) VSA for a molecule by assigning atom contributions to predefined bins based on their Labute ASA and EState values. - **FractionCSP3**: Fraction of sp3-hybridized carbon atoms in the molecule. - **MolLogP**: Molar logarithm of the partition coefficient (logP). It quantifies the lipophilicity or hydrophobicity of a molecule, which is important for its distribution and permeability properties. - **TPSA**: Topological polar surface area. It estimates the surface area of a molecule that is involved in polar interactions, which is relevant for its solubility and biological activity. - **NumHAcceptors, NumHDonors, NumHeteroatoms, etc.**: These descriptors count the number of hydrogen bond acceptor groups, hydrogen bond donor groups, heteroatoms (non-carbon atoms), etc., present in the molecule. They provide information about the potential for specific molecular interactions. - **RingCount**: Number of rings in the molecule. It indicates the level of molecular complexity and rigidity. - **fr_Al_COO, fr_ArN, fr_COO, fr_Ph_OH, etc**.: These descriptors represent the count of specific functional groups or substructures in the molecule. They provide information about the presence of particular chemical moieties. ```python from deepmol.compound_featurization import TwoDimensionDescriptors TwoDimensionDescriptors(n_jobs=10).featurize(dataset, inplace=True) ``` ```python dataset.feature_names ``` array(['MaxAbsEStateIndex', 'MaxEStateIndex', 'MinAbsEStateIndex', 'MinEStateIndex', 'qed', 'MolWt', 'HeavyAtomMolWt', 'ExactMolWt', 'NumValenceElectrons', 'NumRadicalElectrons', 'MaxPartialCharge', 'MinPartialCharge', 'MaxAbsPartialCharge', 'MinAbsPartialCharge', 'FpDensityMorgan1', 'FpDensityMorgan2', 'FpDensityMorgan3', 'BCUT2D_MWHI', 'BCUT2D_MWLOW', 'BCUT2D_CHGHI', 'BCUT2D_CHGLO', 'BCUT2D_LOGPHI', 'BCUT2D_LOGPLOW', 'BCUT2D_MRHI', 'BCUT2D_MRLOW', 'AvgIpc', 'BalabanJ', 'BertzCT', 'Chi0', 'Chi0n', 'Chi0v', 'Chi1', 'Chi1n', 'Chi1v', 'Chi2n', 'Chi2v', 'Chi3n', 'Chi3v', 'Chi4n', 'Chi4v', 'HallKierAlpha', 'Ipc', 'Kappa1', 'Kappa2', 'Kappa3', 'LabuteASA', 'PEOE_VSA1', 'PEOE_VSA10', 'PEOE_VSA11', 'PEOE_VSA12', 'PEOE_VSA13', 'PEOE_VSA14', 'PEOE_VSA2', 'PEOE_VSA3', 'PEOE_VSA4', 'PEOE_VSA5', 'PEOE_VSA6', 'PEOE_VSA7', 'PEOE_VSA8', 'PEOE_VSA9', 'SMR_VSA1', 'SMR_VSA10', 'SMR_VSA2', 'SMR_VSA3', 'SMR_VSA4', 'SMR_VSA5', 'SMR_VSA6', 'SMR_VSA7', 'SMR_VSA8', 'SMR_VSA9', 'SlogP_VSA1', 'SlogP_VSA10', 'SlogP_VSA11', 'SlogP_VSA12', 'SlogP_VSA2', 'SlogP_VSA3', 'SlogP_VSA4', 'SlogP_VSA5', 'SlogP_VSA6', 'SlogP_VSA7', 'SlogP_VSA8', 'SlogP_VSA9', 'TPSA', 'EState_VSA1', 'EState_VSA10', 'EState_VSA11', 'EState_VSA2', 'EState_VSA3', 'EState_VSA4', 'EState_VSA5', 'EState_VSA6', 'EState_VSA7', 'EState_VSA8', 'EState_VSA9', 'VSA_EState1', 'VSA_EState10', 'VSA_EState2', 'VSA_EState3', 'VSA_EState4', 'VSA_EState5', 'VSA_EState6', 'VSA_EState7', 'VSA_EState8', 'VSA_EState9', 'FractionCSP3', 'HeavyAtomCount', 'NHOHCount', 'NOCount', 'NumAliphaticCarbocycles', 'NumAliphaticHeterocycles', 'NumAliphaticRings', 'NumAromaticCarbocycles', 'NumAromaticHeterocycles', 'NumAromaticRings', 'NumHAcceptors', 'NumHDonors', 'NumHeteroatoms', 'NumRotatableBonds', 'NumSaturatedCarbocycles', 'NumSaturatedHeterocycles', 'NumSaturatedRings', 'RingCount', 'MolLogP', 'MolMR', 'fr_Al_COO', 'fr_Al_OH', 'fr_Al_OH_noTert', 'fr_ArN', 'fr_Ar_COO', 'fr_Ar_N', 'fr_Ar_NH', 'fr_Ar_OH', 'fr_COO', 'fr_COO2', 'fr_C_O', 'fr_C_O_noCOO', 'fr_C_S', 'fr_HOCCN', 'fr_Imine', 'fr_NH0', 'fr_NH1', 'fr_NH2', 'fr_N_O', 'fr_Ndealkylation1', 'fr_Ndealkylation2', 'fr_Nhpyrrole', 'fr_SH', 'fr_aldehyde', 'fr_alkyl_carbamate', 'fr_alkyl_halide', 'fr_allylic_oxid', 'fr_amide', 'fr_amidine', 'fr_aniline', 'fr_aryl_methyl', 'fr_azide', 'fr_azo', 'fr_barbitur', 'fr_benzene', 'fr_benzodiazepine', 'fr_bicyclic', 'fr_diazo', 'fr_dihydropyridine', 'fr_epoxide', 'fr_ester', 'fr_ether', 'fr_furan', 'fr_guanido', 'fr_halogen', 'fr_hdrzine', 'fr_hdrzone', 'fr_imidazole', 'fr_imide', 'fr_isocyan', 'fr_isothiocyan', 'fr_ketone', 'fr_ketone_Topliss', 'fr_lactam', 'fr_lactone', 'fr_methoxy', 'fr_morpholine', 'fr_nitrile', 'fr_nitro', 'fr_nitro_arom', 'fr_nitro_arom_nonortho', 'fr_nitroso', 'fr_oxazole', 'fr_oxime', 'fr_para_hydroxylation', 'fr_phenol', 'fr_phenol_noOrthoHbond', 'fr_phos_acid', 'fr_phos_ester', 'fr_piperdine', 'fr_piperzine', 'fr_priamide', 'fr_prisulfonamd', 'fr_pyridine', 'fr_quatN', 'fr_sulfide', 'fr_sulfonamd', 'fr_sulfone', 'fr_term_acetylene', 'fr_tetrazole', 'fr_thiazole', 'fr_thiocyan', 'fr_thiophene', 'fr_unbrch_alkane', 'fr_urea'], dtype=', , , ..., , , ], dtype=object) ```python dataset.feature_names ``` array(['weave_feat'], dtype=' ```python dataset.feature_names ``` array(['one_hot_0', 'one_hot_1', 'one_hot_2', 'one_hot_3', 'one_hot_4', 'one_hot_5', 'one_hot_6', 'one_hot_7', 'one_hot_8', 'one_hot_9', 'one_hot_10', 'one_hot_11', 'one_hot_12', 'one_hot_13', 'one_hot_14', 'one_hot_15', 'one_hot_16', 'one_hot_17', 'one_hot_18', 'one_hot_19', 'one_hot_20', 'one_hot_21', 'one_hot_22', 'one_hot_23', 'one_hot_24', 'one_hot_25', 'one_hot_26', 'one_hot_27', 'one_hot_28', 'one_hot_29', 'one_hot_30', 'one_hot_31', 'one_hot_32', 'one_hot_33', 'one_hot_34', 'one_hot_35', 'one_hot_36', 'one_hot_37', 'one_hot_38', 'one_hot_39', 'one_hot_40', 'one_hot_41', 'one_hot_42', 'one_hot_43', 'one_hot_44', 'one_hot_45', 'one_hot_46', 'one_hot_47', 'one_hot_48', 'one_hot_49', 'one_hot_50', 'one_hot_51', 'one_hot_52', 'one_hot_53', 'one_hot_54', 'one_hot_55', 'one_hot_56', 'one_hot_57', 'one_hot_58', 'one_hot_59', 'one_hot_60', 'one_hot_61', 'one_hot_62', 'one_hot_63', 'one_hot_64', 'one_hot_65', 'one_hot_66', 'one_hot_67', 'one_hot_68', 'one_hot_69', 'one_hot_70', 'one_hot_71', 'one_hot_72', 'one_hot_73', 'one_hot_74', 'one_hot_75', 'one_hot_76', 'one_hot_77', 'one_hot_78', 'one_hot_79', 'one_hot_80', 'one_hot_81', 'one_hot_82', 'one_hot_83', 'one_hot_84', 'one_hot_85', 'one_hot_86', 'one_hot_87', 'one_hot_88', 'one_hot_89', 'one_hot_90', 'one_hot_91', 'one_hot_92', 'one_hot_93', 'one_hot_94', 'one_hot_95', 'one_hot_96', 'one_hot_97', 'one_hot_98', 'one_hot_99', 'one_hot_100', 'one_hot_101', 'one_hot_102', 'one_hot_103', 'one_hot_104', 'one_hot_105', 'one_hot_106', 'one_hot_107', 'one_hot_108', 'one_hot_109', 'one_hot_110', 'one_hot_111', 'one_hot_112', 'one_hot_113', 'one_hot_114', 'one_hot_115', 'one_hot_116', 'one_hot_117', 'one_hot_118', 'one_hot_119', 'one_hot_120', 'one_hot_121', 'one_hot_122', 'one_hot_123', 'one_hot_124', 'one_hot_125', 'one_hot_126', 'one_hot_127', 'one_hot_128', 'one_hot_129', 'one_hot_130', 'one_hot_131'], dtype=' ```python dataset.feature_names ``` array(['one_hot_0', 'one_hot_1', 'one_hot_2', 'one_hot_3', 'one_hot_4', 'one_hot_5', 'one_hot_6', 'one_hot_7', 'one_hot_8', 'one_hot_9', 'one_hot_10', 'one_hot_11', 'one_hot_12', 'one_hot_13', 'one_hot_14', 'one_hot_15', 'one_hot_16', 'one_hot_17', 'one_hot_18', 'one_hot_19', 'one_hot_20', 'one_hot_21', 'one_hot_22', 'one_hot_23', 'one_hot_24', 'one_hot_25', 'one_hot_26', 'one_hot_27', 'one_hot_28', 'one_hot_29', 'one_hot_30', 'one_hot_31', 'one_hot_32', 'one_hot_33', 'one_hot_34', 'one_hot_35', 'one_hot_36', 'one_hot_37', 'one_hot_38', 'one_hot_39', 'one_hot_40', 'one_hot_41', 'one_hot_42', 'one_hot_43', 'one_hot_44', 'one_hot_45', 'one_hot_46', 'one_hot_47', 'one_hot_48', 'one_hot_49', 'one_hot_50', 'one_hot_51', 'one_hot_52', 'one_hot_53', 'one_hot_54', 'one_hot_55', 'one_hot_56', 'one_hot_57', 'one_hot_58', 'one_hot_59', 'one_hot_60', 'one_hot_61', 'one_hot_62', 'one_hot_63', 'one_hot_64', 'one_hot_65', 'one_hot_66', 'one_hot_67', 'one_hot_68', 'one_hot_69', 'one_hot_70', 'one_hot_71', 'one_hot_72', 'one_hot_73', 'one_hot_74', 'one_hot_75', 'one_hot_76', 'one_hot_77', 'one_hot_78', 'one_hot_79', 'one_hot_80', 'one_hot_81', 'one_hot_82', 'one_hot_83', 'one_hot_84', 'one_hot_85', 'one_hot_86', 'one_hot_87', 'one_hot_88', 'one_hot_89', 'one_hot_90', 'one_hot_91', 'one_hot_92', 'one_hot_93', 'one_hot_94', 'one_hot_95', 'one_hot_96', 'one_hot_97', 'one_hot_98', 'one_hot_99', 'one_hot_100', 'one_hot_101', 'one_hot_102', 'one_hot_103', 'one_hot_104', 'one_hot_105', 'one_hot_106', 'one_hot_107', 'one_hot_108', 'one_hot_109', 'one_hot_110', 'one_hot_111', 'one_hot_112', 'one_hot_113', 'one_hot_114', 'one_hot_115', 'one_hot_116', 'one_hot_117', 'one_hot_118', 'one_hot_119', 'one_hot_120', 'one_hot_121', 'one_hot_122', 'one_hot_123', 'one_hot_124', 'one_hot_125', 'one_hot_126', 'one_hot_127', 'one_hot_128', 'one_hot_129', 'one_hot_130'], dtype='