backbone_learn.heuristic_solvers package

Submodules

backbone_learn.heuristic_solvers.cart_decision_tree module

class backbone_learn.heuristic_solvers.cart_decision_tree.CARTDecisionTree(**kwargs)[source]

Bases: HeuristicSolverBase

Implements a Classification And Regression Tree (CART) Decision Tree with cross-validation using AUC. This solver is a heuristic approach for fitting a decision tree model and identifying significant features.

_model

An instance of the sklearn DecisionTreeClassifier.

Type: DecisionTreeClassifier

_auc_score

The maximum AUC score obtained during cross-validation.

Type: float

property auc_score: float

Returns the maximum AUC score obtained from cross-validation.

Returns: The maximum AUC score.
Return type: float

fit(X: ndarray, y: ndarray, cv_folds: int = 5, random_state: int = 0) → None[source]

Fits a CART Decision Tree model to the data using hyperparameter tuning with cross-validation and evaluates it using AUC.

Parameters

X (np.ndarray) – The input features as a NumPy array.
y (np.ndarray) – The target labels as a NumPy array.
cv_folds (int) – The number of folds to use for cross-validation.

get_relevant_variables(threshold: float) → ndarray[source]

Identifies features with importance greater than a specified threshold.

Parameters: threshold (float) – The threshold for determining feature relevance.
Returns: An array of indices of relevant features.
Return type: np.ndarray

predict(X: ndarray) → ndarray[source]

Predicts the target labels for the given data.

Parameters: X (np.ndarray) – The input features as a NumPy array.
Returns: The predicted target labels.
Return type: np.ndarray

backbone_learn.heuristic_solvers.heauristic_solver_base module

class backbone_learn.heuristic_solvers.heauristic_solver_base.HeuristicSolverBase[source]

Bases: ABC

Abstract class for heuristic solvers.

This class provides a framework for defining heuristic solvers that can fit models to data and identify relevant features. Derived classes need to implement the fit and get_relevant_features methods according to their specific heuristic approach.

abstract fit(X: ndarray, y: ndarray, random_state: int)[source]

Fits a model to the given data using a heuristic approach.

This method should be implemented to solve a subproblem using the input data matrix X and the target vector y. It should fit a model based on a heuristic algorithm specific to the derived class.

Parameters

X (np.ndarray) – The input feature matrix.
y (np.ndarray) – The target vector.
random_state (int) – The seed used by the random number generator. Default is 0.

Returns

The method should fit the model to the data, with the results stored internally within the class instance.

Return type

None

abstract get_relevant_variables(**kwargs)[source]

Identifies relevant variables with importance greater than a specified threshold.

This method should be implemented to determine the most relevant variables based on the model fitted using the fit method. It should return the indices of variables that will be used for the exact solver

property model

backbone_learn.heuristic_solvers.kmeans_solver module

class backbone_learn.heuristic_solvers.kmeans_solver.KMeansSolver(n_clusters: int = 10, **kwargs)[source]

Bases: HeuristicSolverBase

A heuristic solver that applies KMeans clustering to identify relevant instances.

fit(X: ndarray, y: Optional[ndarray] = None, init: str = 'k-means++', n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, random_state: int = 0) → None[source]: Applies KMeans clustering to the data with customizable hyperparameters. :param X: Input feature matrix. :type X: np.ndarray :param y: Target vector (not used in clustering). :type y: np.ndarray :param init: Method for initialization. :type init: str :param n_init: Number of time the k-means algorithm will be run with different centroid seeds. :type n_init: int :param max_iter: Maximum number of iterations of the k-means algorithm for a single run. :type max_iter: int :param tol: Relative tolerance with regards to Frobenius norm of the difference in the cluster centers. :type tol: float :param random_state: Determines random number generation for centroid initialization. :type random_state: int

get_relevant_variables() → List[Tuple[int, int]][source]: Identifies tuples of instance indices that are not in the same cluster. :returns: Each tuple contains indices of instances not in the same cluster. :rtype: List of tuples

backbone_learn.heuristic_solvers.lasso_regression module

class backbone_learn.heuristic_solvers.lasso_regression.LassoRegression(**kwargs)[source]

Bases: HeuristicSolverBase

Implements Lasso regression for feature selection using cross-validation.

This class uses Lasso (Least Absolute Shrinkage and Selection Operator) regression, which is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

_model

The LassoCV regression model.

Type: LassoCV

_mse_score

The mean squared error score of the trained model.

Type: float

fit(X: ndarray, y: ndarray, alphas=None, max_iter=1000, tol=0.0001, selection='cyclic', cv_folds=5, random_state=0) → None[source]

Fits a sparse regression model to the data using LassoCV.

Parameters

X (np.ndarray) – The input feature matrix.
y (np.ndarray) – The target variable.
alphas (array-like, optional) – List of alphas where to compute the models. If None alphas are set automatically.
max_iter (int) – The maximum number of iterations.
tol (float) – The tolerance for the optimization.
selection (str) – If set to ‘random’, a random coefficient is updated every iteration.

get_relevant_variables(threshold: float) → ndarray[source]

Identifies features with coefficients greater than a specified threshold.

Parameters: threshold (float) – The threshold for determining feature relevance.
Returns: Indices of features whose coefficients are above the threshold.
Return type: np.ndarray

keep_top_features(n_non_zeros: int) → None[source]

Retain only the top ‘n_non_zeros’ features in the Lasso model.

Args: n_non_zeros (int): Number of features to retain.

property mse_score: float

Returns the mean squared error score of the trained model.

Returns: The mean squared error score.
Return type: float

predict(X: ndarray) → ndarray[source]

Predicts the target values for the given data using the trained Lasso model.

Parameters: X (np.ndarray) – The input feature matrix.
Returns: The predicted target values.
Return type: np.ndarray

backbone_learn.heuristic_solvers package

Submodules

backbone_learn.heuristic_solvers.cart_decision_tree module

backbone_learn.heuristic_solvers.heauristic_solver_base module

backbone_learn.heuristic_solvers.kmeans_solver module

backbone_learn.heuristic_solvers.lasso_regression module

Module contents