Getting Started¶

Mixed Adaptive Random Search (MARS) is a method for optimizing any user-defined black-box problems, commonly found in machine learning or deep learning hyperparameter tuning workflows. MARS explores the space of variables broadly in the beginning and exploits promising areas in later iterations. Mathematically, MARS can be used to solve

\[ \min\{f(x) : x \in \mathcal{X}\}, \]

where \(f\) is a real-valued function denoting the objective function and \(\mathcal{X}\) is the variable space. MARS effectively handles diverse variable types including:

numerical (integer or float, optionally on a log scale),
categorical (e.g., optimizer types, feature encoders, and so on).

To provide an easy-to-use interface for MARS, we have implemented a new Python library marsopt that we introduce in the subsequent part. Note that, we refer to the iterates of MARS interchangeably as trials, solutions, or points - these all reside in \(\mathcal{X}\).

1. Installation¶

Install marsopt using pip:

pip install marsopt

2. Basic Concepts¶

In this section, we will introduce the key components of marsopt. It is worth mentioning that our Python objects are named similarly to those found in the popular optuna package, making it easier for users to navigate and understand the structure.

The Study Class¶

A Study object encapsulates your entire optimization experiment. Key configuration options include:

direction:
- "minimize" or "maximize".
- If you have a loss function (like cross-entropy), you might want to minimize it.
n_init_points:
- The number of purely random initial trials (defaults to max(10, round(√n_trials)) if not specified).
- These initial random trials help the optimizer gather a broad sense of the search space.
initial_noise and final_noise:
- Control how much variability (i.e., “noise”) is introduced when suggesting new variable values.
- The noise decreases over time, enabling exploration early on and fine-tuning later.
random_state:
- Seed for reproducibility. Provide an integer so you can replicate results exactly.
verbose:
- True prints logs after each trial; False runs silently.

Once configured, you call the .optimize() method to run a specified number of trials (n_trials).

The Trial Class¶

A Trial represents a single evaluation of your objective function. Inside the objective_function(trial):

You define how to suggest each variable:
- suggest_float(name, low, high, log=False)
- suggest_int(name, low, high, log=False)
- suggest_categorical(name, categories)

You then return a float or integer that indicates your objective value.

Objective Function¶

It must receive a Trial object and use that object’s suggest methods to propose values.
After configuring and running your model or simulation with those values, it must return a single real numeric value. NaN is not accepted; positive or negative infinity is allowed.

3. Minimal Working Example¶

Below is a simplified yet demonstrative example of how to use marsopt to optimize a set of typical machine learning hyperparameters - learning rate, number of layers, optimizer type, and dropout rate:

from marsopt import Study, Trial
import numpy as np

def objective(trial: Trial) -> float:
    lr = trial.suggest_float("learning_rate", 1e-4, 1e-1, log=True)
    layers = trial.suggest_int("num_layers", 1, 5)
    optimizer = trial.suggest_categorical("optimizer", ["adam", "sgd", "rmsprop"])

    score = -5 * (np.log10(lr) + 3) ** 2  
    score += np.log1p(layers) * 10  
    score += {"adam": 15, "sgd": 5, "rmsprop": 20}[optimizer]

    return -score

# Run optimization
study = Study(direction="minimize", random_state=42)
study.optimize(objective, n_trials=50)

[I ...] Optimization started with 50 trials.
[I ...] Trial 1 finished with value: -7.249446 and variables: {'learning_rate': 0.020983, 'num_layers': 2, 'optimizer': sgd}. Best is trial 1 with value: -7.249446.
[I ...] Trial 2 finished with value: -8.678749 and variables: {'learning_rate': 0.037652, 'num_layers': 4, 'optimizer': sgd}. Best is trial 2 with value: -8.678749.
[I ...] Trial 3 finished with value: -7.42204 and variables: {'learning_rate': 0.084502, 'num_layers': 2, 'optimizer': adam}. Best is trial 2 with value: -8.678749.
...
...
[I ...] Trial 50 finished with value: -32.903512 and variables: {'learning_rate': 0.000885, 'num_layers': 5, 'optimizer': adam}. Best is trial 37 with value: -37.91758.

4. Accessing Detailed Results¶

Below we detail how one can collect information about the optimization process conducted by marsopt.

Trial History¶

After the optimization completes, you can inspect the details of each trial:

study.trials

[{'iteration': 1,
  'objective_value': -7.249445914023765,
  'trial_time': ...,
  'variables': {'learning_rate': 0.020983027299866144,
   'num_layers': 2,
   'optimizer': 'sgd'},
  'user_attrs': {}},
 {'iteration': 2,
  'objective_value': -8.6787492582556,
  'trial_time': ...,
  'variables': {'learning_rate': 0.03765249501831187,
   'num_layers': 4,
   'optimizer': 'sgd'},
  'user_attrs': {}},
  ...
 {'iteration': 50,
  'objective_value': -32.90351179940006,
  'trial_time': ...,
  'variables': {'learning_rate': 0.0008849700072462417,
   'num_layers': 5,
   'optimizer': 'adam'},
  'user_attrs': {}}]

Each trial dictionary contains:

iteration: The trial index.
objective_value: The final metric or loss returned by your objective function.
trial_time: How long that trial took to run.
variables: A dictionary of all variables suggested for that trial.
user_attrs: A dictionary of user-defined attributes added via trial.add_attr().

Likewise, one can also inspect the best trial:

study.best_trial

{'iteration': 37,
 'objective_value': -37.91757992304764,
 'trial_time': ...,
 'variables': {'learning_rate': 0.0010039652381640435,
  'num_layers': 5,
  'optimizer': 'rmsprop'},
 'user_attrs': {}}

Objective Values and Elapsed Times¶

Sometimes you want arrays of all objective function values to quickly visualize or analyze them:

study.objective_values

array([-7.24944591, -8.67874926, -7.42203965, ..., -32.9035118])

study.elapsed_times

array([...])  # execution times in seconds

5. Advanced Configuration¶

This section gives a few other parameters that users can adjust.

Controlling Noise¶

initial_noise (float): The initial sampling noise. Default is 0.33.
final_noise (float): How much noise remains at the end of the search. Defaults to max(1e-7, min(1 / n_trials, initial_noise)) if not set.

Internally, a cosine annealing schedule adjusts noise from initial_noise down to final_noise, facilitating broad exploration early on and refinement later.

Initial Random Points¶

n_init_points (int): Number of random points sampled before adaptive strategies kick in. Defaults to max(10, round(√n_trials)) if unspecified.

Epsilon-Greedy Exploration¶

epsilon (float, default 1.0): Controls a small dose of pure random exploration that is mixed into the adaptive phase. At each adaptive trial, with probability epsilon / (t + 1) MARS ignores the elite-guided sampler and draws a uniform random sample from the search space. The probability decays harmonically with the trial index, so exploration is strongest early on and fades over time. Set to a smaller value (or 0) to reduce or disable random fallback.

Elite Window¶

elite_window (int, default None): If set, only the most recent elite_window completed trials are considered when forming the elite set (and the candidate pool used by the categorical good/bad scoring). Useful when the search space drifts, when older trials are no longer representative, or when you want the optimizer to “forget” early random exploration faster. If None, the full completed history is used.

Adding More Trials Later¶

If you decide 50 trials aren’t enough, you can resume with additional trials:

study.optimize(objective, n_trials=50)

[I ...] Trial 51 finished with value: -36.412249 and variables: {'learning_rate': 0.000283, 'num_layers': 5, 'optimizer': rmsprop}. Best is trial 37 with value: -37.91758.
[I ...] Trial 52 finished with value: -35.939487 and variables: {'learning_rate': 0.0015, 'num_layers': 4, 'optimizer': rmsprop}. Best is trial 37 with value: -37.91758.
...
[I ...] Trial 100 finished with value: -37.901111 and variables: {'learning_rate': 0.000876, 'num_layers': 5, 'optimizer': rmsprop}. Best is trial 37 with value: -37.91758.

marsopt retains its internal state and continues from the previously explored space.