The struct dataset
What is a dataset?
A dataset is a collection of tests.
How is a dataset represented?
A dataset is represented as a MATLAB cell array. Each cell is a test struct. Notice that the tests of the same dataset can be defined on different time intervals \([0,T]\).
How to generate a dataset?
A dataset can be manually generated. For instance, the following code creates a dataset with three tests:
% first test
dataset{1}.tt = linspace(0,10,250);
dataset{1}.uu = sin(dataset{1}.tt);
dataset{1}.yy = cos(dataset{1}.tt);
% second test
dataset{2}.tt = linspace(0,20,250);
dataset{2}.uu = exp(dataset{2}.tt);
dataset{2}.yy = 2 + dataset{2}.tt;
% third test
dataset{3}.tt = linspace(0,10,250);
dataset{3}.uu = 0 * dataset{3}.tt + pi;
dataset{3}.yy = tanh(dataset{3}.tt);
The following example, instead, creates a dataset with three tests, each one being the solution of the model defined in the struct model
:
test.tt = [0, 10];
% first test
test.uu = @(t) sin(t);
dataset{1} = model_solve(test, model);
% second test
test.uu = @(t) cos(t);
dataset{2} = model_solve(test, model);
% third test
test.uu = @(t) sin(t) + cos(t);
dataset{3} = model_solve(test, model);
Equivalentely, you can first generate a dataset containing only the inputs, and then obtain the outputs associated with the model defined in the struct model
with a single command, thanks to the function dataset_generate
. The following code provides the same results as the previous one:
dataset_input{1}.tt = [0, 10];
dataset_input{1}.uu = @(t) sin(t);
dataset_input{2}.tt = [0, 10];
dataset_input{2}.uu = @(t) cos(t);
dataset_input{3}.tt = [0, 10];
dataset_input{3}.uu = @(t) sin(t) + cos(t);
dataset = dataset_generate(model, dataset_input);
To generate a dataset with the goal of training an ANN-based model, it is useful to employ random inputs. This operation can be easily performed with the following command:
dataset = dataset_generate_random(model, 100)
that generates a dataset with 100 tests, where the inputs \(\mathbf{u}_j(t)\), for \(j = 1, \dots, 100\) are generated by an algorithm of random time-series generation (see /tools/get_random_time_course.m
).
With the following command, instead, we generate a dataset with 20 tests associated with random constant inputs (i.e. \(\mathbf{u}_j(t) \equiv \overline{\mathbf{u}}_j\) for \(j = 1, \dots, 100\)), where the values of \(\overline{\mathbf{u}}_j\) are obtained by Monte Carlo sampling of the input space defined in the problem struct:
dataset = dataset_generate_random(model, 20, struct('constant', 1));
By specifying the option lhs = 1
, the values of \(\overline{\mathbf{u}}_j\) are generated by latin hypercube sampling:
dataset = dataset_generate_random(model, 20, struct('constant', 1, 'lhs', 1));
How to save and load a dataset?
Sometimes it is useful to give a name to a dataset and to save it, so that it can be later reused. This can be done with the following function:
dataset_save(problem, dataset, 'my_dataset.mat')
When the dataset is generated through the functions dataset_generate
or dataset_generate_random
, it can be directly stored by passing the options do_save = 1
and outFile = 'FILENAME.mat'
. For instance, with the following code a dataset with 100 random tests is stored:
opt_gen.do_save = 1;
opt_gen.outFile = 'samples_rnd.mat';
dataset_generate_random(model, 100, opt_gen);
Datasets are stored in an automatically generated path inside the data folder defined in options.ini
(see Installation), under the name of 'samples_rnd.mat'
. Notice that each problem has its own path (that can be found in problem.dir_data
): this entails that the same dataset name can be used for different problems without any conflict. On the other hand, if a dataset with the same name has been already defined for the same problem, it is overwritten by the new one.
The following code loads a previously saved dataset:
dataset_def.problem = problem;
dataset_def.type = 'file';
dataset_def.source = 'samples_rnd.mat';
train_dataset = dataset_get(dataset_def);
It is possible to load a subset of a dataset with the following sintax, that loads only the tests number 2, 3, 5, 6, 7 and 8:
dataset_def.source = 'samples_rnd.mat;[2,3,5:8]';
It is also possible to combine datasets in a single dataset:
dataset_def.source = 'samples_step.mat;[2,3,5:8]|samples_rnd.mat;1:8';
How to plot a dataset?
To plot the dataset, type:
dataset_plot(train_dataset, problem)