backbone_learn.heuristic_solvers package

Submodules

backbone_learn.heuristic_solvers.cart_decision_tree module

class backbone_learn.heuristic_solvers.cart_decision_tree.CARTDecisionTree(**kwargs)[source]

Bases: HeuristicSolverBase

Implements a Classification And Regression Tree (CART) Decision Tree with cross-validation using AUC. This solver is a heuristic approach for fitting a decision tree model and identifying significant features.

_model

An instance of the sklearn DecisionTreeClassifier.

Type

DecisionTreeClassifier

_auc_score

The maximum AUC score obtained during cross-validation.

Type

float

property auc_score: float

Returns the maximum AUC score obtained from cross-validation.

Returns

The maximum AUC score.

Return type

float

fit(X: ndarray, y: ndarray, cv_folds: int = 5, random_state: int = 0) None[source]

Fits a CART Decision Tree model to the data using hyperparameter tuning with cross-validation and evaluates it using AUC.

Parameters
  • X (np.ndarray) – The input features as a NumPy array.

  • y (np.ndarray) – The target labels as a NumPy array.

  • cv_folds (int) – The number of folds to use for cross-validation.

get_relevant_variables(threshold: float) ndarray[source]

Identifies features with importance greater than a specified threshold.

Parameters

threshold (float) – The threshold for determining feature relevance.

Returns

An array of indices of relevant features.

Return type

np.ndarray

predict(X: ndarray) ndarray[source]

Predicts the target labels for the given data.

Parameters

X (np.ndarray) – The input features as a NumPy array.

Returns

The predicted target labels.

Return type

np.ndarray

backbone_learn.heuristic_solvers.heauristic_solver_base module

class backbone_learn.heuristic_solvers.heauristic_solver_base.HeuristicSolverBase[source]

Bases: ABC

Abstract class for heuristic solvers.

This class provides a framework for defining heuristic solvers that can fit models to data and identify relevant features. Derived classes need to implement the fit and get_relevant_features methods according to their specific heuristic approach.

abstract fit(X: ndarray, y: ndarray, random_state: int)[source]

Fits a model to the given data using a heuristic approach.

This method should be implemented to solve a subproblem using the input data matrix X and the target vector y. It should fit a model based on a heuristic algorithm specific to the derived class.

Parameters
  • X (np.ndarray) – The input feature matrix.

  • y (np.ndarray) – The target vector.

  • random_state (int) – The seed used by the random number generator. Default is 0.

Returns

The method should fit the model to the data, with the results stored internally within the class instance.

Return type

None

abstract get_relevant_variables(**kwargs)[source]

Identifies relevant variables with importance greater than a specified threshold.

This method should be implemented to determine the most relevant variables based on the model fitted using the fit method. It should return the indices of variables that will be used for the exact solver

property model

backbone_learn.heuristic_solvers.kmeans_solver module

class backbone_learn.heuristic_solvers.kmeans_solver.KMeansSolver(n_clusters: int = 10, **kwargs)[source]

Bases: HeuristicSolverBase

A heuristic solver that applies KMeans clustering to identify relevant instances.

fit(X: ndarray, y: Optional[ndarray] = None, init: str = 'k-means++', n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, random_state: int = 0) None[source]

Applies KMeans clustering to the data with customizable hyperparameters. :param X: Input feature matrix. :type X: np.ndarray :param y: Target vector (not used in clustering). :type y: np.ndarray :param init: Method for initialization. :type init: str :param n_init: Number of time the k-means algorithm will be run with different centroid seeds. :type n_init: int :param max_iter: Maximum number of iterations of the k-means algorithm for a single run. :type max_iter: int :param tol: Relative tolerance with regards to Frobenius norm of the difference in the cluster centers. :type tol: float :param random_state: Determines random number generation for centroid initialization. :type random_state: int

get_relevant_variables() List[Tuple[int, int]][source]

Identifies tuples of instance indices that are not in the same cluster. :returns: Each tuple contains indices of instances not in the same cluster. :rtype: List of tuples

backbone_learn.heuristic_solvers.lasso_regression module

class backbone_learn.heuristic_solvers.lasso_regression.LassoRegression(**kwargs)[source]

Bases: HeuristicSolverBase

Implements Lasso regression for feature selection using cross-validation.

This class uses Lasso (Least Absolute Shrinkage and Selection Operator) regression, which is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

_model

The LassoCV regression model.

Type

LassoCV

_mse_score

The mean squared error score of the trained model.

Type

float

fit(X: ndarray, y: ndarray, alphas=None, max_iter=1000, tol=0.0001, selection='cyclic', cv_folds=5, random_state=0) None[source]

Fits a sparse regression model to the data using LassoCV.

Parameters
  • X (np.ndarray) – The input feature matrix.

  • y (np.ndarray) – The target variable.

  • alphas (array-like, optional) – List of alphas where to compute the models. If None alphas are set automatically.

  • max_iter (int) – The maximum number of iterations.

  • tol (float) – The tolerance for the optimization.

  • selection (str) – If set to ‘random’, a random coefficient is updated every iteration.

get_relevant_variables(threshold: float) ndarray[source]

Identifies features with coefficients greater than a specified threshold.

Parameters

threshold (float) – The threshold for determining feature relevance.

Returns

Indices of features whose coefficients are above the threshold.

Return type

np.ndarray

keep_top_features(n_non_zeros: int) None[source]

Retain only the top ‘n_non_zeros’ features in the Lasso model.

Args: n_non_zeros (int): Number of features to retain.

property mse_score: float

Returns the mean squared error score of the trained model.

Returns

The mean squared error score.

Return type

float

predict(X: ndarray) ndarray[source]

Predicts the target values for the given data using the trained Lasso model.

Parameters

X (np.ndarray) – The input feature matrix.

Returns

The predicted target values.

Return type

np.ndarray

Module contents