Testing Guidelines for Custom Datastores

All datastores that are derived from the custom datastore classes share some common behaviors. This test procedure provides guidelines to test the minimal set of behaviors and functionalities that all custom datastores should have. You will need additional tests to qualify any unique functionalities of your custom datastore.

If you have developed your custom datastore based on instructions in Develop Custom Datastore, then follow these test procedures to qualify your custom datastore. First perform the unit tests, followed by the workflow tests:

Unit tests qualify the datastore constructor and methods.
Workflow tests qualify the datastore usage.

For all these test cases:

Unless specified in the test description, assume that you are testing a nonempty datastore ds.
Verify the test cases on the file extensions, file encodings, and data locations (like Hadoop^®) that your custom datastore is designed to support.

Unit Tests

Construction

The unit test guidelines for the datastore constructor are as follows.

Test Case Description Expected Output

Test Case Description	Expected Output
Check if your custom datastore constructor works with the minimal required inputs.	Datastore object of your custom datastore type with the minimal expected properties and methods
Check if your datastore object `ds` has `matlab.io.Datastore` as one of its superclasses. Run this command: isa(ds,'matlab.io.Datastore')	`1` or `true`
Call your custom datastore constructor with the required inputs and any supported input arguments and name-value pair arguments.	Datastore object of your custom datastore type with the minimal expected properties and methods

Check if your custom datastore constructor works with the minimal required inputs.

Datastore object of your custom datastore type with the minimal expected properties and methods

Check if your datastore object ds has matlab.io.Datastore as one of its superclasses.

Run this command:

isa(ds,'matlab.io.Datastore')

1 or true

Call your custom datastore constructor with the required inputs and any supported input arguments and name-value pair arguments.

Datastore object of your custom datastore type with the minimal expected properties and methods

`read`

Unit test guidelines for the read method

Test Case Description Expected Output

Test Case Description	Expected Output
Call the `read` method on a datastore object `ds`. t = read(ds);	Data from the beginning of the datastore If you specify read size, then the size of the returned data is equivalent to read size.
Call the `read` method again on the datastore object. t = read(ds);	Data starting from the end point of the previous read operation If you specify read size, then the size of the returned data is equivalent to read size.
Continue calling the `read` method on the datastore object in a while loop. while(hasdata(ds)) t = read(ds); end	No errors Correct data in the correct format
When data is available to read, check the `info` output (if any) of the `read` method. Call a datastore object `ds`. [t,info] = read(ds);	No error `info` contains the expected information `t` contains the expected data
When no more data is available to read, call `read` on the datastore object.	Either expected output or an error message based on your custom datastore implementation.

Call the read method on a datastore object ds.

t = read(ds);

Data from the beginning of the datastore

If you specify read size, then the size of the returned data is equivalent to read size.

Call the read method again on the datastore object.

t = read(ds);

Data starting from the end point of the previous read operation

If you specify read size, then the size of the returned data is equivalent to read size.

Continue calling the read method on the datastore object in a while loop.

while(hasdata(ds))
  t = read(ds);
end

No errors

Correct data in the correct format

When data is available to read, check the info output (if any) of the read method.

Call a datastore object ds.

[t,info] = read(ds);

No error

info contains the expected information

t contains the expected data

When no more data is available to read, call read on the datastore object.

Either expected output or an error message based on your custom datastore implementation.

`readall`

Unit test guidelines for the readall method

Test Case Description Expected Output

Test Case Description	Expected Output
Call the `readall` method on the datastore object.	All data
Call the `readall` method on the datastore object, when `hasdata(ds)` is `false`. Read from the datastore until `hasdata(ds)` is `false`, and then call the `readall` method. while(hasdata(ds)) t = read(ds); end readall(ds)	All data

Call the readall method on the datastore object.

All data

Call the readall method on the datastore object, when hasdata(ds) is false.

Read from the datastore until hasdata(ds) is false, and then call the readall method.

while(hasdata(ds))
  t = read(ds);
end

readall(ds)

All data

`hasdata`

Unit test guidelines for the hasdata method

Test Case Description	Expected Output
Call the `hasdata` method on the datastore object before making any calls to `read`	`true`
Call the `hasdata` method on the datastore object after making a few calls to `read`, but before all the data is read	`true`
When more data is available to read, call the `readall` method, and then call the `hasdata` method.	`true`
When no more data is available to read, call the `hasdata` method.	`false`

`reset`

Unit test guidelines for the reset method

Test Case Description Expected Output

Test Case Description	Expected Output
Call the `reset` method on the datastore object before making any calls to the `read` method. Verify that the `read` method returns the appropriate data after a call to the `reset` method. reset(ds); t = read(ds);	No errors The `read` returns data from the beginning of the datastore. If you specify read size, then the size of the returned data is equivalent to read size.
When more data is available to read, call the `reset` method after making a few calls to the `read` method. Verify that the `read` method returns the appropriate data after making a call to the `reset` method.	No errors The `read` method returns data from the beginning of the datastore. If you specify read size, then the size of the returned data is equivalent to read size.
When more data is available to read, call the `reset` method after making a call to the `readall` method. Verify that the `read` method returns the appropriate data after making a call to the `reset` method.	No errors The `read` method returns data from the beginning of the datastore. If you specify read size, then the size of the returned data is equivalent to read size.
When no more data is available to read, call the `reset` method on the datastore object and then call the `read` method Verify that `read` returns the appropriate data after a call to the `reset` method.	No errors The `read` method returns data from the beginning of the datastore. If you specify read size, then the size of the returned data is equivalent to read size.

Call the reset method on the datastore object before making any calls to the read method.

Verify that the read method returns the appropriate data after a call to the reset method.

reset(ds);
t = read(ds);

No errors

The read returns data from the beginning of the datastore.

If you specify read size, then the size of the returned data is equivalent to read size.

When more data is available to read, call the reset method after making a few calls to the read method.

Verify that the read method returns the appropriate data after making a call to the reset method.

No errors

The read method returns data from the beginning of the datastore.

If you specify read size, then the size of the returned data is equivalent to read size.

When more data is available to read, call the reset method after making a call to the readall method.

Verify that the read method returns the appropriate data after making a call to the reset method.

No errors

The read method returns data from the beginning of the datastore.

If you specify read size, then the size of the returned data is equivalent to read size.

When no more data is available to read, call the reset method on the datastore object and then call the read method

Verify that read returns the appropriate data after a call to the reset method.

No errors

The read method returns data from the beginning of the datastore.

If you specify read size, then the size of the returned data is equivalent to read size.

`progress`

Unit test guidelines for the progress method

Test Case Description	Expected Output
Call the `progress` method on the datastore object before making any calls to the `read` method.	`0` or an expected output based on your custom datastore implementation.
Call the `progress` method on the datastore object after making a call to `readall`, but before making any calls to `read` readall(ds); progress(ds)	`0` or an expected output based on your custom datastore implementation.
Call the `progress` method on the datastore object after making a few calls to `read` and while more data is available to read.	A fraction between `0` and `1` or an expected output based on your custom datastore implementation.
Call the `progress` method on the datastore object when no more data is available to read.	`1` or an expected output based on your custom datastore implementation.

`preview`

Unit test guidelines for the preview method

Test Case Description	Expected Output
Call `preview` on the datastore object before making any calls to `read`.	The `preview` method returns the expected data from the beginning of the datastore, based on your custom datastore implementation.
Call `preview` on the datastore object after making a few calls to `read` and while more data is available to read.	The `preview` method returns the expected data from the beginning of the datastore, based on your custom datastore implementation.
Call `preview` on the datastore object after making a call to `readall` and while more data is available to read.	The `preview` method returns the expected data from the beginning of the datastore, based on your custom datastore implementation.
Call `preview` on the datastore object after making a few calls to `read` and a call to `reset`.	The `preview` method returns the expected data from the beginning of the datastore, based on your custom datastore implementation.
Call `preview` on the datastore object when no more data is available to read.	The `preview` method returns the expected data from the beginning of the datastore, based on your custom datastore implementation.
Call `preview` after making a few calls to `read` method and then call `read` again.	The `read` method returns data starting from the end point of the previous read operation. If you specify read size, then the size of the returned data is equivalent to read size.
Call `preview`, and then call `readall` on the datastore.	The `readall` method returns all the data from the datastore.
While datastore has data available to read, call `preview`, and then call `hasdata`.	The `hasdata` method returns `true`.

`partition`

Unit test guidelines for the partition method

Test Case Description Expected Output

Test Case Description	Expected Output
Call `partition` on the datastore object `ds` with a valid number of partitions and a valid partition index. Call `read` on a partition of the datastore and verify the data. subds = partition(ds,n,index) read(subds) Verify that the partition is valid. isequal(properties(ds),properties(subds)) isequal(methods(ds),methods(subds))	The `partition` method partitions the datastore into `n` partitions and returns the partition corresponding to the specified `index`. The returned partition `subds` must be a datastore object of your custom datastore. The partitioned datastore `subds` must have the same methods and properties as the original datastore. The `isequal` statement returns `true`. Calling `read` on the partition returns data starting from the beginning of the partition. If you specify read size, then the size of the returned data is equivalent to read size.
Call `partition` on the datastore object `ds` with number of partitions specified as `1` and `index` of returned partition specified as `1`. Verify the data returned by calling `read` and `preview` on a partition of the partitioned datastore. subds = partition(ds,1,1) isequal(properties(ds),properties(subds)) isequal(methods(ds),methods(subds)) isequaln(read(subds),read(ds)) isequaln(preview(subds),preview(ds))	The partition `subds` must be a datastore object of your custom datastore. The partition `subds` must have the same methods and properties as the original datastore `ds`. The `isequal` and `isequaln` statements returns `true`.
Call `partition` on the partition `subds` with a valid number of partitions and a valid partition index.	The repartitioning of a partition of the datastore should work without errors.

Call partition on the datastore object ds with a valid number of partitions and a valid partition index.

Call read on a partition of the datastore and verify the data.

subds = partition(ds,n,index)
read(subds)

Verify that the partition is valid.

isequal(properties(ds),properties(subds))
isequal(methods(ds),methods(subds))

The partition method partitions the datastore into n partitions and returns the partition corresponding to the specified index.

The returned partition subds must be a datastore object of your custom datastore.

The partitioned datastore subds must have the same methods and properties as the original datastore.

The isequal statement returns true.

Calling read on the partition returns data starting from the beginning of the partition.

If you specify read size, then the size of the returned data is equivalent to read size.

Call partition on the datastore object ds with number of partitions specified as 1 and index of returned partition specified as 1.

Verify the data returned by calling read and preview on a partition of the partitioned datastore.

subds = partition(ds,1,1)
isequal(properties(ds),properties(subds))
isequal(methods(ds),methods(subds))
isequaln(read(subds),read(ds))
isequaln(preview(subds),preview(ds))

The partition subds must be a datastore object of your custom datastore.

The partition subds must have the same methods and properties as the original datastore ds.

The isequal and isequaln statements returns true.

Call partition on the partition subds with a valid number of partitions and a valid partition index.

The repartitioning of a partition of the datastore should work without errors.

`initializeDatastore`

If your datastore inherits from matlab.io.datastore.HadoopFileBased, then verify the behavior of initializeDatastore using the guidelines in this table.

Test Case Description Expected Output

Test Case Description	Expected Output
Call `initializeDatastore` on the datastore object `ds` with a valid `info` struct. The `info` struct contains these fields: `FileName` `Offset` `Size` `FileName` is of data type `char` and the fields `Offset` and `Size` are of the data type double. For example, initialize the `info` struct, and then call `initializeDatastore` on the datastore object `ds`. info = struct('FileName','myFileName.ext',... 'Offset',0,'Size',500) initializeDatastore(ds,info) Verify the initialization by examining the properties of your datastore object. ds	The `initializeDatastore` method initializes the custom datastore object `ds` with the necessary information from the `info` struct.

Call initializeDatastore on the datastore object ds with a valid info struct.

The info struct contains these fields:

FileName
Offset
Size

FileName is of data type char and the fields Offset and Size are of the data type double.

For example, initialize the info struct, and then call initializeDatastore on the datastore object ds.

info = struct('FileName','myFileName.ext',...
                       'Offset',0,'Size',500)
initializeDatastore(ds,info)

Verify the initialization by examining the properties of your datastore object.

ds

The initializeDatastore method initializes the custom datastore object ds with the necessary information from the info struct.

`getLocation`

If your datastore inherits from matlab.io.datastore.HadoopFileBased, then verify the behavior of getLocation using these guidelines.

Test Case Description Expected Output

Test Case Description	Expected Output
Call `getLocation` on the datastore object. location = getLocation(ds) Based on your custom datastore implementation, the `location` output is either of these: List of files or directories a `matlab.io.datastore.DsFileSet` object If `location` is a `matlab.io.datastore.DsFileSet` object, then call `resolve` to verify the files in the `location` output. resolve(location)	The `getLocation` method returns the location of files in Hadoop.

Call getLocation on the datastore object.

location = getLocation(ds)

Based on your custom datastore implementation, the location output is either of these:

List of files or directories
a matlab.io.datastore.DsFileSet object

If location is a matlab.io.datastore.DsFileSet object, then call resolve to verify the files in the location output.

resolve(location)

The getLocation method returns the location of files in Hadoop.

`isfullfile`

If your datastore inherits from matlab.io.datastore.HadoopFileBased, then verify the behavior of isfullfile using these guidelines.

Test Case Description	Expected Output
Call `isfullfile` on the datastore object.	Based on your custom datastore implementation, the `isfullfile` method returns `true` or `false`.

Workflow Tests

Verify your workflow tests in the appropriate environment.

If your datastore inherits only from matlab.io.Datastore, then verify all workflow tests in a local MATLAB^® session.
If your datastore has parallel processing support (inherits from matlab.io.datastore.Partitionable), then verify your workflow tests in parallel execution environments, such as Parallel Computing Toolbox™ and MATLAB Parallel Server™.
If your datastore has Hadoop support (inherits from matlab.io.datastore.HadoopFileBased), then verify your workflow tests in a Hadoop cluster.

Tall Workflow

Testing guidelines for the tall workflow

Test Case Description Expected Output

Test Case Description	Expected Output
Create a tall array by calling `tall` on the datastore object `ds`. t = tall(ds)	The `tall` function returns an output that is the same data type as the output of the `read` method of the datastore.
For this test step, create a datastore object with data that fits in your system memory. Then, create a tall array using this datastore object. t = tall(ds) If your data is numeric, then apply an appropriate function like the `mean` function to both the `ds` and `t`, then compare the results. If your data is of the data type `string` or `categorical`, then apply the `unique` function on a column of `ds` and a column of `t`, then compare the results. Apply `gather` and verify the result. For examples, see Big Data Workflow Using Tall Arrays and Datastores (Parallel Computing Toolbox).	No errors The function returns an output of the correct data type (not of a `tall` data type). The function returns the same result whether it is applied to `ds` or to `t`.

Create a tall array by calling tall on the datastore object ds.

t = tall(ds)

The tall function returns an output that is the same data type as the output of the read method of the datastore.

For this test step, create a datastore object with data that fits in your system memory. Then, create a tall array using this datastore object.

t = tall(ds)

If your data is numeric, then apply an appropriate function like the mean function to both the ds and t, then compare the results.

If your data is of the data type string or categorical, then apply the unique function on a column of ds and a column of t, then compare the results.

Apply gather and verify the result.

For examples, see Big Data Workflow Using Tall Arrays and Datastores (Parallel Computing Toolbox).

No errors

The function returns an output of the correct data type (not of a tall data type).

The function returns the same result whether it is applied to ds or to t.

MapReduce Workflow

Testing guidelines for the MapReduce workflow

Test Case Description Expected Output

Test Case Description	Expected Output
Call `mapreduce` on the datastore object `ds`. outds = mapreduce(ds,@mapper,@reducer) For more information, see `mapreduce`. To support the use of the `mapreduce` function, the `read` method of your custom datastore must return both the `info` and the `data` output arguments.	No error The MapReduce operation returns the expected result

Call mapreduce on the datastore object ds.

outds = mapreduce(ds,@mapper,@reducer)

For more information, see mapreduce.

To support the use of the mapreduce function, the read method of your custom datastore must return both the info and the data output arguments.

No error

The MapReduce operation returns the expected result

Next Steps

Note

This test procedure provides guidelines to test the minimal set of behaviors and functionalities for custom datastores. Additional tests are necessary to qualify any unique functionalities of your custom datastore.

After you complete the implementation and validation of your custom datastore, your custom datastore is ready to use.

To add help for your custom datastore implementation, see Create Help for Classes.
To share your custom datastore with other users, see Create and Share Toolboxes.

Testing Guidelines for Custom Datastores

Unit Tests

Construction

`read`

`readall`

`hasdata`

`reset`

`progress`

`preview`

`partition`

`initializeDatastore`

`getLocation`

`isfullfile`

Workflow Tests

Tall Workflow

MapReduce Workflow

Next Steps

See Also

Topics