# Getting Started

**Mixed Adaptive Random Search** (MARS) is a method for **optimizing** any user-defined **black-box problems**, commonly found in **machine learning** or **deep learning** hyperparameter tuning workflows. MARS explores the space of variables broadly in the beginning and exploits promising areas in later iterations. Mathematically, MARS can be used to solve

$$
\min\{f(x) : x \in \mathcal{X}\},
$$

where $f$ is a real-valued function denoting the **objective function** and $\mathcal{X}$ is the **variable space**. MARS effectively handles diverse variable types including:
- **numerical** (integer or float, optionally on a log scale),  
- **categorical** (e.g., optimizer types, feature encoders, and so on).

To provide an easy-to-use interface for MARS, we have implemented a new Python library `marsopt` that we introduce in the subsequent part. Note that, we refer to the iterates of MARS interchangeably as **trials**, **solutions**, or **points** - these all reside in $\mathcal{X}$. 

## 1. Installation

Install `marsopt` using `pip`:

```bash
pip install marsopt
```

## 2. Basic Concepts

In this section, we will introduce the key components of `marsopt`. It is worth mentioning that our Python objects are named similarly to those found in the popular `optuna` package, making it easier for users to navigate and understand the structure.

### The **Study** Class

A `Study` object encapsulates your entire optimization experiment. Key configuration options include:

- **`direction`**:  
  - `"minimize"` or `"maximize"`.  
  - If you have a loss function (like cross-entropy), you might want to **minimize** it.  


- **`n_init_points`**:  
  - The number of purely random initial trials (defaults to `max(10, round(√n_trials))` if not specified).  
  - These initial random trials help the optimizer gather a broad sense of the search space.

- **`initial_noise`** and **`final_noise`**:  
  - Control how much variability (i.e., "noise") is introduced when suggesting new variable values.  
  - The noise decreases over time, enabling exploration early on and fine-tuning later.

- **`random_state`**:  
  - Seed for reproducibility. Provide an integer so you can replicate results exactly.

- **`verbose`**:  
  - `True` prints logs after each trial; `False` runs silently.

Once configured, you call the **`.optimize()`** method to run a specified number of trials (`n_trials`).

### The **Trial** Class

A `Trial` represents a **single** evaluation of your objective function. Inside the `objective_function(trial)`:

- You define how to **suggest** each variable:
  - `suggest_float(name, low, high, log=False)`  
  - `suggest_int(name, low, high, log=False)`  
  - `suggest_categorical(name, categories)`

You then **return** a **float or integer** that indicates your objective value.  

### Objective Function

- It must receive a `Trial` object and use that object’s **suggest** methods to propose values.  
- After configuring and running your model or simulation with those values, it must **return a single real numeric value**. NaN is not accepted; positive or negative infinity is allowed.

## 3. Minimal Working Example

Below is a simplified yet demonstrative example of how to use `marsopt` to optimize a set of **typical machine learning hyperparameters** - learning rate, number of layers, optimizer type, and dropout rate:

```python
from marsopt import Study, Trial
import numpy as np

def objective(trial: Trial) -> float:
    lr = trial.suggest_float("learning_rate", 1e-4, 1e-1, log=True)
    layers = trial.suggest_int("num_layers", 1, 5)
    optimizer = trial.suggest_categorical("optimizer", ["adam", "sgd", "rmsprop"])

    score = -5 * (np.log10(lr) + 3) ** 2  
    score += np.log1p(layers) * 10  
    score += {"adam": 15, "sgd": 5, "rmsprop": 20}[optimizer]

    return -score

# Run optimization
study = Study(direction="minimize", random_state=42)
study.optimize(objective, n_trials=50)
```
```
[I ...] Optimization started with 50 trials.
[I ...] Trial 1 finished with value: -7.249446 and variables: {'learning_rate': 0.020983, 'num_layers': 2, 'optimizer': sgd}. Best is trial 1 with value: -7.249446.
[I ...] Trial 2 finished with value: -8.678749 and variables: {'learning_rate': 0.037652, 'num_layers': 4, 'optimizer': sgd}. Best is trial 2 with value: -8.678749.
[I ...] Trial 3 finished with value: -7.42204 and variables: {'learning_rate': 0.084502, 'num_layers': 2, 'optimizer': adam}. Best is trial 2 with value: -8.678749.
...
...
[I ...] Trial 50 finished with value: -32.903512 and variables: {'learning_rate': 0.000885, 'num_layers': 5, 'optimizer': adam}. Best is trial 37 with value: -37.91758.
```

## 4. Accessing Detailed Results

Below we detail how one can collect information about the optimization process conducted by `marsopt`.  

### Trial History

After the optimization completes, you can inspect the details of each trial:

```python
study.trials
```

```python
[{'iteration': 1,
  'objective_value': -7.249445914023765,
  'trial_time': ...,
  'variables': {'learning_rate': 0.020983027299866144,
   'num_layers': 2,
   'optimizer': 'sgd'},
  'user_attrs': {}},
 {'iteration': 2,
  'objective_value': -8.6787492582556,
  'trial_time': ...,
  'variables': {'learning_rate': 0.03765249501831187,
   'num_layers': 4,
   'optimizer': 'sgd'},
  'user_attrs': {}},
  ...
 {'iteration': 50,
  'objective_value': -32.90351179940006,
  'trial_time': ...,
  'variables': {'learning_rate': 0.0008849700072462417,
   'num_layers': 5,
   'optimizer': 'adam'},
  'user_attrs': {}}]
```

Each trial dictionary contains:
- **iteration**: The trial index.  
- **objective_value**: The final metric or loss returned by your `objective` function.  
- **trial_time**: How long that trial took to run.  
- **variables**: A dictionary of all variables suggested for that trial.
- **user_attrs**: A dictionary of user-defined attributes added via `trial.add_attr()`.

Likewise, one can also inspect the **best trial**:

```python
study.best_trial
```

```python
{'iteration': 37,
 'objective_value': -37.91757992304764,
 'trial_time': ...,
 'variables': {'learning_rate': 0.0010039652381640435,
  'num_layers': 5,
  'optimizer': 'rmsprop'},
 'user_attrs': {}}
```

### Objective Values and Elapsed Times

Sometimes you want arrays of all objective function values to quickly visualize or analyze them:

```python
study.objective_values
```

```python
array([-7.24944591, -8.67874926, -7.42203965, ..., -32.9035118])
```

```python
study.elapsed_times
```

```python
array([...])  # execution times in seconds
```

## 5. Advanced Configuration

This section gives a few other parameters that users can adjust.

### Controlling Noise

- **`initial_noise`** (float): The initial sampling noise. Default is `0.33`.
- **`final_noise`** (float): How much noise remains at the end of the search. Defaults to `max(1e-7, min(1 / n_trials, initial_noise))` if not set.

Internally, a **cosine annealing** schedule adjusts noise from `initial_noise` down to `final_noise`, facilitating broad exploration early on and refinement later.

### Initial Random Points

- **`n_init_points`** (int): Number of random points sampled before adaptive strategies kick in.  Defaults to `max(10, round(√n_trials))` if unspecified.

### Epsilon-Greedy Exploration

- **`epsilon`** (float, default `1.0`): Controls a small dose of pure random exploration that is mixed into the adaptive phase. At each adaptive trial, with probability `epsilon / (t + 1)` MARS ignores the elite-guided sampler and draws a uniform random sample from the search space. The probability decays harmonically with the trial index, so exploration is strongest early on and fades over time. Set to a smaller value (or `0`) to reduce or disable random fallback.

### Elite Window

- **`elite_window`** (int, default `None`): If set, only the most recent `elite_window` completed trials are considered when forming the elite set (and the candidate pool used by the categorical good/bad scoring). Useful when the search space drifts, when older trials are no longer representative, or when you want the optimizer to “forget” early random exploration faster. If `None`, the full completed history is used.

### Adding More Trials Later

If you decide 50 trials aren’t enough, you can resume with additional trials:

```python
study.optimize(objective, n_trials=50)
```
```
[I ...] Trial 51 finished with value: -36.412249 and variables: {'learning_rate': 0.000283, 'num_layers': 5, 'optimizer': rmsprop}. Best is trial 37 with value: -37.91758.
[I ...] Trial 52 finished with value: -35.939487 and variables: {'learning_rate': 0.0015, 'num_layers': 4, 'optimizer': rmsprop}. Best is trial 37 with value: -37.91758.
...
[I ...] Trial 100 finished with value: -37.901111 and variables: {'learning_rate': 0.000876, 'num_layers': 5, 'optimizer': rmsprop}. Best is trial 37 with value: -37.91758.
```
`marsopt` retains its internal state and continues from the previously explored space.