Validating and Converting Data
Validating dictionaries
When a data owner has created their dataset and uploaded their data files, they can confirm that the dataset dictionaries match the underlying data by running a dataset validation report.
The data owner can run a validation report by navigating to the Dataset Administration tab and selecting Data:
This will open the validation report tab:
All dictionaries from the dataset are listed on the left, and the data owner can choose to either validate these individually, or all at the same time by selecting Validate at the top of the screen. When validation has been run, the results will appear on the right of the page. The image below shows a successful validation:
Where the data owner has chosen to validate all dictionaries, the results are displayed on a per dictionary basis. The data owner can toggle between them by selecting the dictionary they wish to view on the left hand side.
If the validation has produced errors, these will be itemised in the right hand panel:
Where an error exists, the report will provide the data owner with the name and label for the dictionary field where the error has occurred, and a description of the error. The validation report can produce three errors:
- Field exists in dictionary but not in data
- Field exists in data but not in dictionary
- Database and Dictionary Types do not match up
In each case the data owner will need to decide on the appropriate resolution.
Field exists in dictionary but not in data - there is a field in the dictionary that is not in the underlying data. This can be resolved by removing the field from the dictionary, or adding the appropriate missing data to the dataset.
Field exists in data but not in dictionary - there is a field in the data that is not in the dictionary. This can be resolved by adding the field to the dictionary or removing it from the data.
Database and Dictionary Types do not match up - where this error occurs, the validation report will also provide the data owner with the data type for the field in both the data and the dictionary. This can be resolved by making the field type the same in the data and the dictionary.
Validation History
Data Owners can view the history of all validation reports run against a dictionary, these can accessed via the ellipses in the right hand column of the the dictionary table and selecting 'View validation reports'. This provides a downloadable list of all previous validation reports run against the dictionary:
For Data Owners the result of the most recent validation report is also visible on the dataset page
above the dictionary on the right of the screen, this is not visible to non-data owners.
Where the most recent report was successful the Data Owner will see the dictionary row count and a green tick:
When the most recent report produced errors the data the Data Owner will see the dictionary row count and a red exclamation mark:
Converting data
Note: data conversion is only available for datasets hosted in FAIR. Externally hosted data cannot be converted.
When a data validation report shows a type mismatch between the data dictionary and the underlying data, the data owner has two options:
- Update the dictionary field in question to match the type of the underlying data
- Update the underlying data to match the field type in the dictionary
This article details how the data owner can use the Convert function in FAIR to update the underlying data to match the dictionary in the event of a type mismatch.
Resolving a Mismatch in your Data
The data owner runs a validation report which produces an error:
In this case there is a type mismatch for the Participant ID field. The dictionary shows it as an integer and the data as text.
The data owner can update the underlying data to match the dictionary by selecting Convert from the tab at the top of the screen (this runs Convert on all dictionaries at once) or from the right-hand column of the dictionary table (this runs convert on the selected dictionary only):
A notification will appear in the top right of the screen when a conversion job starts and when it completes. The data owner does not have to remain on the data tab to receive these notifications.
When the conversion job is complete, the data owner can confirm it has succeeded by re-running the validation report. They can also view the history of data conversion tasks run on each dictionary in a dataset. This is accessible from the ellipses on the right of the dictionary table under View conversion reports: