Tuning Capabilities

Rationale

How capable this package when tuning neural networks? One of the package’s capabilities is the ability to fine-tune the whole architecture, and this includes the depth of the architecture — not limited to the number of hidden neurons, also includes the number of layers. Neural networks with {torch} natively supports different activation functions for different layers, thus {kindling} supports:

Custom grid creation

{kindling} has its own function to define the grid which includes the depth of the architecture: grid_depth(), an analogue function to dials::grid_space_filling(), except it creates "regular" grid. You can tweak n_hlayer parameter, and you can define the grid that has the depth. This parameter can be scalar (e.g. 2), integer vector (e.g. 1:2), and/or using a {dials} function called n_hlayer(). When n_hlayer is greater than 2, the certain parameters hidden_neurons and activations creates a list-column, which contains vectors for each parameter grid, depending on n_hlayer you defined.

Setup

We won’t stop you from using library() function, but we strongly recommend using box::use() and explicitly import the names from the namespaces you want to attach.

# library(kindling)
# library(tidymodels)
# library(modeldata)

box::use(
    kindling[mlp_kindling, act_funs, args, hidden_neurons, activations, grid_depth],
    dplyr[select, ends_with, mutate, slice_sample],
    tidyr[drop_na],
    rsample[initial_split, training, testing, vfold_cv],
    recipes[
        recipe, step_dummy, step_normalize,
        all_nominal_predictors, all_numeric_predictors
    ],
    modeldata[penguins],
    parsnip[tune, set_mode, fit, augment],
    workflows[workflow, add_recipe, add_model],
    dials[learn_rate],
    tune[tune_grid, show_best, collect_metrics, select_best, finalize_workflow, last_fit],
    yardstick[metric_set, rmse, rsq],
    ggplot2[autoplot]
)

We’ll use the penguins dataset from {modeldata} to predict body mass (in kilograms) from physical measurements — a straightforward regression task that lets us focus on the tuning workflow.

Usage

{kindling} provides the mlp_kindling() model spec. Parameters you want to search over are marked with tune().

spec = mlp_kindling(
    hidden_neurons = tune(),
    activations = tune(),
    epochs = 50,
    learn_rate = tune()
) |>
    set_mode("regression")

Note that n_hlayer is not listed here — it is handled inside grid_depth() rather than the model spec directly.

Data Preparation

We sample 30 rows per species to keep the example lightweight, and stratify splits on species to preserve class balance. The target variable is body_mass_kg, derived from the original body_mass_g column.

penguins_clean = penguins |>
    drop_na() |>
    select(body_mass_g, ends_with("_mm"), sex, species) |>
    mutate(body_mass_kg = body_mass_g / 1000) |>
    slice_sample(n = 30, by = species)

set.seed(123)
split = initial_split(penguins_clean, prop = 0.8, strata = species)
train = training(split)
test = testing(split)
folds = vfold_cv(train, v = 5, strata = body_mass_kg)


rec = recipe(body_mass_kg ~ ., data = train) |>
    step_dummy(all_nominal_predictors()) |>
    step_normalize(all_numeric_predictors())

Using grid_depth()

You still can use standard {dials} grids but the limitation is that they don’t know about network depth, so {kindling} provides grid_depth(). The n_hlayer argument controls which depths to search over. Remember, it accepts:

When n_hlayer > 1, the hidden_neurons and activations columns become list-columns, where each row holds a vector of per-layer values.

set.seed(42)
depth_grid = grid_depth(
    hidden_neurons(c(16, 32)),
    activations(c("relu", "elu", "softshrink(lambd = 0.2)")),
    learn_rate(),
    n_hlayer = 1:3,
    size = 10,
    type = "latin_hypercube"
)

depth_grid

Here we constrain hidden_neurons to the range [16, 32] and limit activations to three candidates — including the parametric softshrink. Latin hypercube sampling spreads the 10 candidates more evenly across the search space compared to a random grid.

Tuning

What happens to the tuning part? The solution is easy: the parameters induced into list-columns and it becomes something like list(c(1, 2)), so internally the configured argument unlisted through list(c(1, 2))[[1]] (it always produces only 1 element).

wflow = workflow() |>
    add_recipe(rec) |>
    add_model(spec)

tune_res = tune_grid(
    wflow,
    resamples = folds,
    grid = depth_grid,
    metrics = metric_set(rmse, rsq)
)

Inspect

Even with the list-columns, it still normally produces the output we want to produce. Use functions to extract the metrics output after grid search, e.g. collect_metrics() and show_best().

collect_metrics(tune_res)
show_best(tune_res, metric = "rmse", n = 5)

Visualizing Results

Finalizing the Model

Once we’ve identified the best configuration, we finalize the workflow and fit it on the full training set.

best_params = select_best(tune_res, metric = "rmse")
final_wflow = wflow |>
    finalize_workflow(best_params)

final_model = fit(final_wflow, data = train)
final_model

Evaluating on the test set

final_model |>
    augment(new_data = test) |>
    metric_set(rmse, rsq)(
        truth = body_mass_kg,
        estimate = .pred
    )

A Note on Parametric Activations

{kindling} supports parametric activation functions, meaning each layer’s activation can carry its own tunable parameter. When passed as a string such as "softshrink(lambd = 0.2)", {kindling} parses and constructs the activation automatically. This means you can include them directly in the activations() candidate list inside grid_depth() without any extra setup, as shown above.

For manual (non-tuned) use, you can also specify activations per layer explicitly:

spec_manual = mlp_kindling(
    hidden_neurons = c(50, 15),
    activations = act_funs(
        softshrink[lambd = 0.5],
        relu
    ),
    epochs = 150,
    learn_rate = 0.01
) |>
    set_mode("regression")