soft package

Subpackages

Submodules

soft.datasets module

Implements classes or functions related to datasets.

class soft.datasets.RegressionDatasetConfig(name: str, frame: DataFrame, feature_names: List[str], target_names: List[str])

Bases: object

A configuration for a regression dataset.

feature_names: List[str]
frame: DataFrame
name: str
target_names: List[str]
class soft.datasets.RegressionDatasets

Bases: object

The regression datasets, some are from OpenML and others are from RKeel.

load_datasets() None

Load the datasets.

Returns:

None

static load_openml_dataset(dataset_name: str, dataset_id: int) RegressionDatasetConfig

Load a dataset from OpenML.

Parameters:
  • dataset_name – The name of the dataset to load.

  • dataset_id – The ID of the dataset to load.

Returns:

The dataset configuration.

load_rkeel_dataset(dataset_name) RegressionDatasetConfig

Load a dataset from RKEEL.

Parameters:

dataset_name – The name of the dataset to load.

Returns:

The dataset configuration.

class soft.datasets.RegressionImageFolder(root: str, image_scores: Dict[str, float], **kwargs: Any)

Bases: ImageFolder

A custom ImageFolder class that stores the targets as a tensor.

class soft.datasets.SupervisedDataset(inputs: Tensor, targets: None | Tensor)

Bases: Dataset

A custom Dataset class that stores relates the inputs with their targets.

soft.datasets.convert_dat_to_csv(dat_file_name: Path) RegressionDatasetConfig

Convert a .dat file to a .csv file.

soft.datasets.convert_data_frame_to_supervised_dataset(data_frame: DataFrame, input_features: List[str], target_features: List[str]) SupervisedDataset

Make the data into a supervised dataset.

Parameters:
  • data_frame – The dataframe to convert to a torch tensor.

  • input_features – The input features.

  • target_features – The target features.

Returns:

A supervised dataset.

soft.datasets.load_from_openml_and_split(data_id: int, random_state: int) Tuple[SupervisedDataset, SupervisedDataset, SupervisedDataset]

Load the dataset from sklearn.datasets.fetch_openml and split it into train, val, and test sets.

Parameters:
  • data_id – The ID of the dataset to load from OpenML.

  • random_state – The random state to use for reproducibility.

Returns:

train_df, val_df, test_df

soft.explain module

This script implements the class that is responsible for explaining the self.model. It can be used to explain the self.model in a variety of ways and serves as a convenient interface for the user to interact with the self.model.

class soft.explain.SelectOutputWrapper(model, output_index, *args, **kwargs)

Bases: Module

The SelectOutputWrapper class is a wrapper for a self.model that selects a specific output from the self.model. This is useful when the self.model has multiple outputs and only one of them is needed for the explanation. The class is initialized with a self.model and a output_index. The self.model is the PyTorch model that is being explained. The output_index is the index of the output that is being selected.

forward(input_data: Tensor) Tensor

Forward pass of the self.model. The input_data is a tensor of shape (batch_size, num_features). The output_index is the index of the output that is being selected. The output is a tensor of shape (batch_size, 1).

Parameters:

input_data – A tensor of shape (batch_size, num_features).

Returns:

A tensor of shape (batch_size, 1).

class soft.explain.SoftExplainer(model: Module, train_data: Tensor | CustomDataset, val_data: Tensor | CustomDataset, config: Config)

Bases: object

The SoftExplainer class is responsible for explaining the self.model. It can be used to explain the self.model in a variety of ways and serves as a convenient interface for the user to interact with the self.model. The class is initialized with a self.model, training data, validation data, and a configuration. The self.model is the self.model that is being explained. The training data is the data that was used to train the self.model. The validation data is the data that was used to validate the self.model. The configuration is the configuration that was used to train the self.model. The class provides a variety of methods that can be used to explain the self.model. The methods are explained in detail below.

explain_with_captum(input_data: Tensor, visualize: bool = False)

Explain the self.model’s prediction for a given input with Captum. The input is a tensor of shape (batch_size, num_features). The output is a tensor of shape (batch_size, num_features). Captum is a library for explaining the output of a self.model. It provides a variety of methods for explaining the output of a self.model. The methods are explained in detail below.

Parameters:
  • input_data – The input to the self.model.

  • visualize – Whether to visualize the attributions.

Returns:

The attributions.

explain_with_counterfactuals(input_data: Tensor, possible_alternative_action: str | bool = 'opposite', number_of_counterfactuals: int = 10, export_to_csv: bool = False) DataFrame

Explain the self.model’s prediction for a given input with Counterfactuals. The input is a tensor of shape (batch_size, num_features). The output is a tensor of shape (batch_size, num_features). Counterfactuals are a way to explain the output of a self.model. They are calculated by finding the input that minimizes the difference between the output of the self.model and the expected output of the self.model. The expected output of the self.model is the average output of the self.model over all possible inputs. The difference between the output of the self.model and the expected output of the self.model is called the Counterfactual. The Counterfactual is a measure of how much the output of the self.model changes when the input changes.

Parameters:
  • input_data – The input to the self.model.

  • possible_alternative_action – The possible alternative action. The default is

  • "opposite"

  • the (which means that the possible alternative action is the opposite of)

  • 0 (current action (i.e. if the current action is)

  • action (which means that the possible alternative)

  • "random" (is 1). The other option is)

  • action

  • action. (is a random)

  • number_of_counterfactuals – The number of counterfactuals to generate. The default is

  • 10.

  • export_to_csv – Whether to export the counterfactuals to a csv file. The default is

  • False.

Returns:

The counterfactuals as a pandas DataFrame.

static plot_attributions(results: Dict[str, Tensor], input_data: Tensor) None

Plot the attributions for a given input. The input is a tensor of shape (batch_size, num_features). The attributions are a tensor of shape (batch_size, num_features). The attributions are a measure of how much the output of the self.model changes when the input changes. The attributions are calculated by computing the difference between the output of the self.model and the expected output of the self.model. The expected output of the self.model is the average output of the self.model over all possible inputs. The difference between the output of the self.model and the expected output of the self.model is called the attribution. The attribution is a measure of how much the output of the self.model changes when the input changes. A variety of methods are used to calculate the attributions.

Parameters:
  • results – The attributions for a given input.

  • input_data – The input to the self.model.

Returns:

None

static sum_and_normalize(results: Dict[str, Tensor], metric: str) ndarray

Sum and normalize the attributions for a given input.

Parameters:
  • results

  • metric

Returns:

Module contents