backbone_learn.heuristic_solvers package
Submodules
backbone_learn.heuristic_solvers.cart_decision_tree module
- class backbone_learn.heuristic_solvers.cart_decision_tree.CARTDecisionTree(**kwargs)[source]
Bases:
HeuristicSolverBaseImplements a Classification And Regression Tree (CART) Decision Tree with cross-validation using AUC. This solver is a heuristic approach for fitting a decision tree model and identifying significant features.
- _model
An instance of the sklearn DecisionTreeClassifier.
- Type
DecisionTreeClassifier
- _auc_score
The maximum AUC score obtained during cross-validation.
- Type
float
- property auc_score: float
Returns the maximum AUC score obtained from cross-validation.
- Returns
The maximum AUC score.
- Return type
float
- fit(X: ndarray, y: ndarray, cv_folds: int = 5, random_state: int = 0) None[source]
Fits a CART Decision Tree model to the data using hyperparameter tuning with cross-validation and evaluates it using AUC.
- Parameters
X (np.ndarray) – The input features as a NumPy array.
y (np.ndarray) – The target labels as a NumPy array.
cv_folds (int) – The number of folds to use for cross-validation.
backbone_learn.heuristic_solvers.heauristic_solver_base module
- class backbone_learn.heuristic_solvers.heauristic_solver_base.HeuristicSolverBase[source]
Bases:
ABCAbstract class for heuristic solvers.
This class provides a framework for defining heuristic solvers that can fit models to data and identify relevant features. Derived classes need to implement the fit and get_relevant_features methods according to their specific heuristic approach.
- abstract fit(X: ndarray, y: ndarray, random_state: int)[source]
Fits a model to the given data using a heuristic approach.
This method should be implemented to solve a subproblem using the input data matrix X and the target vector y. It should fit a model based on a heuristic algorithm specific to the derived class.
- Parameters
X (np.ndarray) – The input feature matrix.
y (np.ndarray) – The target vector.
random_state (int) – The seed used by the random number generator. Default is 0.
- Returns
The method should fit the model to the data, with the results stored internally within the class instance.
- Return type
None
- abstract get_relevant_variables(**kwargs)[source]
Identifies relevant variables with importance greater than a specified threshold.
This method should be implemented to determine the most relevant variables based on the model fitted using the fit method. It should return the indices of variables that will be used for the exact solver
- property model
backbone_learn.heuristic_solvers.kmeans_solver module
- class backbone_learn.heuristic_solvers.kmeans_solver.KMeansSolver(n_clusters: int = 10, **kwargs)[source]
Bases:
HeuristicSolverBaseA heuristic solver that applies KMeans clustering to identify relevant instances.
- fit(X: ndarray, y: Optional[ndarray] = None, init: str = 'k-means++', n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, random_state: int = 0) None[source]
Applies KMeans clustering to the data with customizable hyperparameters. :param X: Input feature matrix. :type X: np.ndarray :param y: Target vector (not used in clustering). :type y: np.ndarray :param init: Method for initialization. :type init: str :param n_init: Number of time the k-means algorithm will be run with different centroid seeds. :type n_init: int :param max_iter: Maximum number of iterations of the k-means algorithm for a single run. :type max_iter: int :param tol: Relative tolerance with regards to Frobenius norm of the difference in the cluster centers. :type tol: float :param random_state: Determines random number generation for centroid initialization. :type random_state: int
backbone_learn.heuristic_solvers.lasso_regression module
- class backbone_learn.heuristic_solvers.lasso_regression.LassoRegression(**kwargs)[source]
Bases:
HeuristicSolverBaseImplements Lasso regression for feature selection using cross-validation.
This class uses Lasso (Least Absolute Shrinkage and Selection Operator) regression, which is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).
- _model
The LassoCV regression model.
- Type
LassoCV
- _mse_score
The mean squared error score of the trained model.
- Type
float
- fit(X: ndarray, y: ndarray, alphas=None, max_iter=1000, tol=0.0001, selection='cyclic', cv_folds=5, random_state=0) None[source]
Fits a sparse regression model to the data using LassoCV.
- Parameters
X (np.ndarray) – The input feature matrix.
y (np.ndarray) – The target variable.
alphas (array-like, optional) – List of alphas where to compute the models. If None alphas are set automatically.
max_iter (int) – The maximum number of iterations.
tol (float) – The tolerance for the optimization.
selection (str) – If set to ‘random’, a random coefficient is updated every iteration.
- get_relevant_variables(threshold: float) ndarray[source]
Identifies features with coefficients greater than a specified threshold.
- Parameters
threshold (float) – The threshold for determining feature relevance.
- Returns
Indices of features whose coefficients are above the threshold.
- Return type
np.ndarray
- keep_top_features(n_non_zeros: int) None[source]
Retain only the top ‘n_non_zeros’ features in the Lasso model.
Args: n_non_zeros (int): Number of features to retain.
- property mse_score: float
Returns the mean squared error score of the trained model.
- Returns
The mean squared error score.
- Return type
float