Skip to content
Open
109 changes: 109 additions & 0 deletions neutone_midi_sdk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Neutone-MIDI SDK

The goal of this SDK is to provide an environment where researchers, musicians and engineers
can quickly 'wrap' an existing machine-learning model for symbolic music tasks into a format that is deployed in a real-time
plugin for DAW's.

There are two guides to help with this process:
1. model_training_guide: Details the setup you should follow
in the training pipeline to ensure your model will be compatible
with the SDK
2. model_preparation_guide: how to export your model

Once your model is trained and serialized with the above
methods, you will find the remaining instructions in this Readme
to 'wrap' it for deployment in the Neunote-MIDI plugin.

We have designed the SDK to work in conjunction with [MIDITok](https://github.com/Natooz/MidiTok),
which lets you
tokenise an entire collection of MIDI files in a few easy commands. The SDK can convert the MIDI
data in DAW's to and from this format, allowing your model to interact with the same data format that it was trained on.


# Wrapping your model

Once you have a serialized a model trained on a supported tokenization format, it's time to wrap it!

**First, load your vocab and config files**
```angular2html
import torch
import json
from neunote_SDK import MidiToMidiBase
from data_preparation import prepare_token_data
from tokenization import TokenData

with open(vocab_file_path, 'r') as fp:
vocab = json.load(fp)
with open(config_file_path, 'r') as fp:
config = json.load(fp)
tokenizer_type = config["tokenization"]
```

Load your serialized model:
```
remi_Model = torch.jit.load("path_to_model")
```

Wrap it:
```angular2html
tokenizer_type = "REMI"
tokenizer_data: TokenData = prepare_token_data(tokenizer_type, vocab, config)
wrapped_model = MidiToMidiBase(model=remi_Model(),
vocab=tokenizer.vocab,
tokenizer_type=tokenizer_type,
tokenizer_data=tokenizer_data)
scripted_model = torch.jit.script(wrapped_model)
scripted_model.save('REMI_Model.pt')
```
And...that's it! Your model is now ready to deploy in the Neutone-MIDI Plugin.


# SDK Components
### Neutone-MIDI SDK:

Provides the base wrapper for a MIDI-to-MIDI model, which is saved as a pytorch scripted .pt file


### Data Preparation
Each tokenization method has a particular set of quantized values that are available,
related to pitch, timing, velocity, etc. Because sequence length often has a large impact
on computational time, each model can use a slightly different granularity. To maintain efficiency,
it is helpful for the scripted model to have lists already identifying these available values.

For example, if a midi message comes in with ``velocity=43`` and the available values are
``[20, 40, 60, 80, 100, 120]`` then the tokenizer can quickly round the incoming velocity to the
nearest value of ``40``.

Given the original vocab json and the type of tokenization method, the data preparation utility
will return a tuple of dictionaries of lists of the relevant data values. As this can be accomplished during the
wrapping procedure, it saves the plugin the necessity to calculate available values on each forward pass.


### MIDI Data Format

Input tensor will be dim of (x, 4) where x = number of midi messages. Each midi message will have type:

``{type, value, velocity, timestep}``

Current types:
```
0.0 = note on

1.0 = note off
```

``{0.0, 64.0, 90.0, 2.5}`` = note on, pitch of 64, velocity 90, at beat 2.5

Every tokenization method is expecting this as an input, and will return it as an output

**Timing**:

Within the C++ environment, timing is always expressed as **PPQ**, which is a float value in relation to quarter notes.
Continuing off the above example '2.5' means an eight note (.5) after the second quarter note (2). MIDI can communicate time in a number of formats
in varying resolutions; but the input and output must always adhere to this. as it determines where the plugin places MIDI events within the
buffer.

If, for example, your model uses a 'ticks-per-beat' system with a resolution of 96 ticks-per-quarter,
then it is the job of the tokeniser to convert from the PPQ to ticks system. All included tokenisation
methods already take care of this functionality.

6 changes: 6 additions & 0 deletions neutone_midi_sdk/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from .core import NeutoneMIDIModel
from .tokenization import *
from .parameter import *
from .data_preparation import *
from .constants import *
from .neutoneMIDI_SDK import *
6 changes: 6 additions & 0 deletions neutone_midi_sdk/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
SDK_VERSION = "0.1.1"

MAX_N_NUMERICAL_PARAMS = 4
MAX_N_TENSOR_PARAMS = 1
SUPPORTED_TOKENIZATIONS = ["MIDILike", "TSD", "REMI", "HVO", "HVO_taps", "Custom"]
MAX_N_CATEGORICAL_VALUES = 20
199 changes: 199 additions & 0 deletions neutone_midi_sdk/core.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
import torch as tr
from torch import nn, Tensor
from typing import List, Dict, Tuple, Union
from abc import abstractmethod
from neutone_midi_sdk.tokenization import TokenData
from neutone_midi_sdk.parameter import NeutoneParameter
import neutone_midi_sdk.constants as constants


class NeutoneMIDIModel(tr.nn.Module):
def __init__(self,
model: tr.nn.Module,
vocab: Dict[str, int],
tokenizer_type: str,
tokenizer_data: TokenData):

super().__init__()
self.MAX_N_NUMERICAL_PARAMS = constants.MAX_N_NUMERICAL_PARAMS
self.MAX_N_TENSOR_PARAMS = constants.MAX_N_TENSOR_PARAMS
self.SDK_VERSION = constants.SDK_VERSION
self.n_neutone_parameters = len(self.get_neutone_parameters())

# Allocate default numerical params to prevent dynamic allocations later
numerical_default_param_vals = self._get_numerical_default_param_values()
assert len(numerical_default_param_vals) <= self.MAX_N_NUMERICAL_PARAMS, (
f"Number of default numerical parameter values ({len(numerical_default_param_vals)}) "
f"exceeds the maximum allowed ({self.MAX_N_NUMERICAL_PARAMS})."
)
numerical_default_param_values_t = tr.tensor([v for _, v in numerical_default_param_vals])
# Ensure number of parameters is within the maximum allowed
self.n_numerical_neutone_parameters = len(numerical_default_param_vals)
assert self.n_numerical_neutone_parameters <= self.MAX_N_NUMERICAL_PARAMS
# Ensure parameter names are unique
assert len(set([p.name for p in self.get_neutone_parameters()])) == len(
self.get_neutone_parameters()
)
self.register_buffer("tensor_default_param_values", numerical_default_param_values_t.unsqueeze(-1))

# Allocate default tensor params to prevent dynamic allocations later
tensor_default_param_vals = self._get_tensor_default_param_values()
assert len(tensor_default_param_vals) <= self.MAX_N_TENSOR_PARAMS, (
f"Number of default tensor parameter values ({len(numerical_default_param_vals)}) "
f"exceeds the maximum allowed ({self.MAX_N_TENSOR_PARAMS})."
)
# TODO(nic): this assumes a common dimension for all tensor parameters
tensor_default_param_values_t = tr.cat([v for _, v in tensor_default_param_vals])
self.register_buffer("numerical_default_param_values", tensor_default_param_values_t.unsqueeze(-1))

# Save parameter metadata
self.neutone_parameters_metadata = {
p.name: p.to_metadata_dict()
for idx, p in enumerate(self.get_neutone_parameters())
}

# Allocate remapped params dictionary to prevent dynamic allocations later
self.remapped_params = {
name: tr.tensor([val])
for name, val in numerical_default_param_vals
}
self.remapped_params.update(
{
name: val
for name, val in tensor_default_param_vals
}
)
self.default_param_values = self.remapped_params

# Save parameter information
self.neutone_parameter_names = [p.name for p in self.get_neutone_parameters()]
# TODO(nic): remove from here once plugin metadata parsing is implemented
self.neutone_parameter_descriptions = [
p.description for p in self.get_neutone_parameters()
]
self.neutone_parameter_used = [p.used for p in self.get_neutone_parameters()]
self.neutone_parameter_types = [
p.type.value for p in self.get_neutone_parameters()
]

# instantiate model
model.eval()
self.model = model

# Setup tokenization methods
assert tokenizer_type in constants.SUPPORTED_TOKENIZATIONS, \
f"{tokenizer_type} not a recognized tokenization format."
tokenizer_data = generate_fake_token_data() if tokenizer_data is None else tokenizer_data
vocab = {"v": 0} if vocab is None else vocab
self.midi_to_token_vocab = vocab
self.token_to_midi_vocab = {v: k for k, v in vocab.items()}
self.tokenizer_type = tokenizer_type
self.tokenizer_data: TokenData = TokenData(tokenizer_data.strings, tokenizer_data.floats, tokenizer_data.ints)

@abstractmethod
def _get_numerical_default_param_values(
self,
) -> List[Tuple[str, Union[float, int]]]:
"""
Returns a list of tuples containing the name and default value of each
numerical (float or int) parameter.
This should not be overwritten by SDK users.
"""
pass

@abstractmethod
def _get_tensor_default_param_values(
self,
) -> List[Tuple[str, Union[float, int]]]:
"""
Returns a list of tuples containing the name and default value of each
tensor parameter.
This should not be overwritten by SDK users.
"""
pass

@abstractmethod
def get_model_name(self) -> str:
"""
Set the model name
"""
pass

@abstractmethod
def get_model_authors(self) -> List[str]:
"""
Used to set the model authors. This will be displayed on both the
website and the plugin.

Should reflect the name of the people that developed the wrapper
of the model using the SDK. Can be different from the authors of
the original model.

Maximum of 5 authors.
"""
pass

@abstractmethod
def get_model_short_description(self) -> str:
"""
Used to set the model short description. This will be displayed on both
the website and the plugin.

This is meant to be seen by the audio creators and should give a summary
of what the model does.

Maximum of 150 characters.
"""
pass

def get_neutone_parameters(self) -> List[NeutoneParameter]:
return []

@tr.jit.export
def get_neutone_parameters_metadata(self) -> Dict[str, Dict[str, str]]:
"""
Returns the metadata of the parameters as a string dictionary of string
dictionaries.
"""
return self.neutone_parameters_metadata

@tr.jit.export
def get_default_param_values(self) -> Dict[str, Tensor]:
"""
Returns the default parameter values as a tensor of shape
(N_DEFAULT_PARAM_VALUES, 1).
"""
return self.default_param_values

@tr.jit.export
def get_default_param_names(self) -> List[str]:
# TODO(nic): remove this once plugin metadata parsing is implemented
return self.neutone_parameter_names

@tr.jit.export
def get_default_param_descriptions(self) -> List[str]:
# TODO(nic): remove this once plugin metadata parsing is implemented
return self.neutone_parameter_descriptions

@tr.jit.export
def get_default_param_types(self) -> List[str]:
# TODO(nic): remove this once plugin metadata parsing is implemented
return self.neutone_parameter_types

@tr.jit.export
def get_default_param_used(self) -> List[bool]:
# TODO(nic): remove this once plugin metadata parsing is implemented
return self.neutone_parameter_used

def prepare_for_inference(self) -> None:
self.model.eval()
self.eval()


# Todo: Would like to deprecate this method, it is used in "HVO" format where there is no TokenData necessary
def generate_fake_token_data():
token_strings: Dict[str, List[str]] = {"value": ["value"]}
token_floats: Dict[str, List[float]] = {"value": [0.0]}
token_ints: Dict[str, List[int]] = {"value": [0]}
token_data: TokenData = TokenData(token_strings, token_floats, token_ints)
return token_data
Loading