keras-io/neural-decision-forest

Classification with Neural Decision Forests This is an example notebook for Keras sprint prepared by Hugging Face. Keras Sprint aims to reproduce Keras examples and build interactive demos to them. The markdown parts beginning with 🤗 and the following code snippets are the parts added by Hugging Face team to give you an example of how to host your model and build a demo.

Original Author of the Neural Decision Forests Example: Khalid Salama

Introduction

This example provides an implementation of the Deep Neural Decision Forest model introduced by P. Kontschieder et al. for structured data classification. It demonstrates how to build a stochastic and differentiable decision tree model, train it end-to-end, and unify decision trees with deep representation learning.

Numerical Features	Categorical Features
age	workclass
education-num	education
capital-gain	marital-status
capital-loss	occupation
hours-per-week	relationship
	race
	gender
	native-country

Dropped Feature: fnlwgt Labelled Feature: income_bracket

The dataset comes in two parts meant for training and testing. The training dataset has 32561 samples whereas the test dataset has 16282 samples.

Training procedure

Prepare Data: Create tf.data.Dataset objects for training and validation-

We create an input function to read and parse the file, and convert features and labels into a tf.data.Dataset for training and validation. We also preprocess the input by mapping the target label to an index. We also use layers.StringLookup to prepare categorical data.

Encode Features: We encode the categorical and numerical features as follows:

Create a lookup to convert a string values to an integer indices. Since we are not using a mask token, nor expecting any out of vocabulary (oov) token, we set mask-token to None and num-oov-indices to 0.

Categorical Features: Create an embedding layer with the specified dimensions. Numerical Features: Use tf.expand_dims on Numerical feature as it is.
Create Model:

Deep Neural Decision Tree

A neural decision tree model has two sets of weights to learn. The first set is pi, which represents the probability distribution of the classes in the tree leaves. The second set is the weights of the routing layer decision-fn, which represents the probability of going to each leave. The forward pass of the model works as follows:

The model expects input features as a single vector encoding all the features of an instance in the batch. This vector can be generated from a Convolution Neural Network (CNN) applied to images or dense transformations applied to structured data features.
The model first applies a used_features_mask to randomly select a subset of input features to use.
Then, the model computes the probabilities (mu) for the input instances to reach the tree leaves by iteratively performing a stochastic routing throughout the tree levels.
Finally, the probabilities of reaching the leaves are combined by the class probabilities at the leaves to produce the final outputs.

Compile, Train and Evaluate Model:
- The loss function chosen was SparseCategoricalCrossentropy.
- The metric chosen for evaluating the model's performance was SparseCategoricalAccuracy.
- The optimizer chosen was Adam with a learning rate of 0.001.
- The batch-size chosen was 265 and the model was trained for 5 epochs.
- Finally the performance of the model was also evaluated on the test-dataset reaching an accuracy of ~85% on both Decision Model and Forest Model.

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameters	Value
name	Adam
learning-rate	0.01
batch-size	265
num-epochs	5
num-trees	10
depth	10
used-features-rate	1.0
num-classes	2

Model Plot

View Model Plot

Credits:

HF Contribution: Tarun R Jain