The struct dataset

What is a dataset?

A dataset is a collection of tests.

How is a dataset represented?

A dataset is represented as a MATLAB cell array. Each cell is a test struct. Notice that the tests of the same dataset can be defined on different time intervals \([0,T]\).

How to generate a dataset?

A dataset can be manually generated. For instance, the following code creates a dataset with three tests:

% first test
dataset{1}.tt = linspace(0,10,250);
dataset{1}.uu = sin(dataset{1}.tt);
dataset{1}.yy = cos(dataset{1}.tt);

% second test
dataset{2}.tt = linspace(0,20,250);
dataset{2}.uu = exp(dataset{2}.tt);
dataset{2}.yy = 2 + dataset{2}.tt;

% third test
dataset{3}.tt = linspace(0,10,250);
dataset{3}.uu = 0 * dataset{3}.tt + pi;
dataset{3}.yy = tanh(dataset{3}.tt);

The following example, instead, creates a dataset with three tests, each one being the solution of the model defined in the struct model:

test.tt = [0, 10];

% first test
test.uu = @(t) sin(t);
dataset{1} = model_solve(test, model);

% second test
test.uu = @(t) cos(t);
dataset{2} = model_solve(test, model);

% third test
test.uu = @(t) sin(t) + cos(t);
dataset{3} = model_solve(test, model);

Equivalentely, you can first generate a dataset containing only the inputs, and then obtain the outputs associated with the model defined in the struct model with a single command, thanks to the function dataset_generate. The following code provides the same results as the previous one:

dataset_input{1}.tt = [0, 10];
dataset_input{1}.uu = @(t) sin(t);

dataset_input{2}.tt = [0, 10];
dataset_input{2}.uu = @(t) cos(t);

dataset_input{3}.tt = [0, 10];
dataset_input{3}.uu = @(t) sin(t) + cos(t);

dataset = dataset_generate(model, dataset_input);

To generate a dataset with the goal of training an ANN-based model, it is useful to employ random inputs. This operation can be easily performed with the following command:

dataset = dataset_generate_random(model, 100)

that generates a dataset with 100 tests, where the inputs \(\mathbf{u}_j(t)\), for \(j = 1, \dots, 100\) are generated by an algorithm of random time-series generation (see /tools/get_random_time_course.m).

With the following command, instead, we generate a dataset with 20 tests associated with random constant inputs (i.e. \(\mathbf{u}_j(t) \equiv \overline{\mathbf{u}}_j\) for \(j = 1, \dots, 100\)), where the values of \(\overline{\mathbf{u}}_j\) are obtained by Monte Carlo sampling of the input space defined in the problem struct:

dataset = dataset_generate_random(model, 20, struct('constant', 1));

By specifying the option lhs = 1, the values of \(\overline{\mathbf{u}}_j\) are generated by latin hypercube sampling:

dataset = dataset_generate_random(model, 20, struct('constant', 1, 'lhs', 1));

How to save and load a dataset?

Sometimes it is useful to give a name to a dataset and to save it, so that it can be later reused. This can be done with the following function:

dataset_save(problem, dataset, 'my_dataset.mat')

When the dataset is generated through the functions dataset_generate or dataset_generate_random, it can be directly stored by passing the options do_save = 1 and outFile = 'FILENAME.mat'. For instance, with the following code a dataset with 100 random tests is stored:

opt_gen.do_save = 1;
opt_gen.outFile = 'samples_rnd.mat';
dataset_generate_random(model, 100, opt_gen);

Datasets are stored in an automatically generated path inside the data folder defined in options.ini (see Installation), under the name of 'samples_rnd.mat'. Notice that each problem has its own path (that can be found in problem.dir_data): this entails that the same dataset name can be used for different problems without any conflict. On the other hand, if a dataset with the same name has been already defined for the same problem, it is overwritten by the new one.

The following code loads a previously saved dataset:

dataset_def.problem = problem;
dataset_def.type = 'file';
dataset_def.source = 'samples_rnd.mat';
train_dataset = dataset_get(dataset_def);

It is possible to load a subset of a dataset with the following sintax, that loads only the tests number 2, 3, 5, 6, 7 and 8:

dataset_def.source = 'samples_rnd.mat;[2,3,5:8]';

It is also possible to combine datasets in a single dataset:

dataset_def.source = 'samples_step.mat;[2,3,5:8]|samples_rnd.mat;1:8';

How to plot a dataset?

To plot the dataset, type:

dataset_plot(train_dataset, problem)