We can pass in sections of a DataFrame, but not a DataLoader. perm_func dictates how to calculate our importance, and reverse will determine how to sort the output

This along with plot_dendrogram and any helper functions along the way are based upon this by Pack911 on the fastai forums.

Example Usage

We'll run an example on the ADULT_SAMPLE dataset

from fastai.tabular.all import *

path = untar_data(URLs.ADULT_SAMPLE)

df = pd.read_csv(path/'adult.csv')
splits = RandomSplitter()(range_of(df))
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
y_names = 'salary'

to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,
                   y_names=y_names, splits=splits)
dls = to.dataloaders()

learn = tabular_learner(dls, layers=[200,100], metrics=accuracy)
learn.fit(3)

Could not gather input dimensions
WandbCallback was not able to prepare a DataLoader for logging prediction samples -> list indices must be integers or slices, not list

After fitting, let's first calculate the relative feature importance on the first 1,000 rows:

dl = learn.dls.test_dl(df)

fi = learn.feature_importance(df=df)

Could not gather input dimensions

wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'epoch': 3}.
wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'valid_loss': 0.8412824869155884, 'accuracy': '00:02'}.

Calculating Permutation Importance

Could not gather input dimensions

wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'epoch': 3}.
wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'valid_loss': 0.8334817886352539, 'accuracy': '00:02'}.

Could not gather input dimensions

wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'epoch': 3}.
wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'valid_loss': 0.8295199871063232, 'accuracy': '00:02'}.

Could not gather input dimensions

wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'epoch': 3}.
wandb: WARNING Adding to old History rows isn't currently supported.  Step 1220 < 1221; dropping {'valid_loss': 0.7927889227867126, 'accuracy': '00:02'}.

Could not gather input dimensions

Next we'll calculate the correlation matrix, and then we will plot it's dendrogram:

corr_dict = learn.get_top_corr_dict(df, thresh=0.3); corr_dict

OrderedDict([('workclass vs sex', 0.991),
             ('marital-status vs race', 0.506),
             ('education vs occupation', 0.493),
             ('fnlwgt vs education-num', 0.488),
             ('age vs education', 0.397),
             ('relationship vs race', 0.363),
             ('education-num vs race', 0.305)])

learn.plot_dendrogram(df)

This allows us to see what family of features are closesly related based on our thresh, and also to show (in combination with the feature importance) how our model uses each variable.

epoch	train_loss	valid_loss	accuracy	time
0	0.360690	0.360770	0.832463	00:40
1	0.358146	0.355560	0.834152	00:38
2	0.346212	0.353760	0.834152	00:39

tabular.interpretation

`base_error`[source]

`TabularLearner.feature_importance`[source]

`TabularLearner.get_top_corr_dict`[source]

`TabularLearner.plot_dendrogram`[source]

Example Usage

tabular.interpretation

base_error[source]

TabularLearner.feature_importance[source]

TabularLearner.get_top_corr_dict[source]

TabularLearner.plot_dendrogram[source]

Example Usage

`base_error`[source]

`TabularLearner.feature_importance`[source]

`TabularLearner.get_top_corr_dict`[source]

`TabularLearner.plot_dendrogram`[source]