It is possible to create a new dataset within the workspace by converting from a CSV file.
To convert a CSV to a dataset
From the Files tab, highlight a CSV file either from the blob or file store by selecting the row in the main window. Select the option "Convert to Dataset" in the side bar; a new tab will open in the workspace.
The New Dataset tab allows you to:
- Name the new dataset
- Specify whether the file includes a header row
- Select the delimiter for separating the file contents
- Set one or more qualifiers for text and null values (comma separated list)
- Set the file encoding
The tab will also show a summary of the expected outcome of the transformation, given the parameters set.
Check the data
The New Dataset tab allows you to preview the first 10 rows of their new dataset. It also lets you set the following values for their dataset:
- Type (dropdown list)
- For columns which are suitable, you can also select these as the primary key on the table
- You can also choose whether to add a column to their dataset or exclude it; this can be done by using the checkbox beside the column name
Workspaces not enforce a Primary Key on all tables which are created using this method.
By default, a new column is added to all tables in the New Dataset tab called id; this column is an index which is the default Primary Key for any new table.
The Primary Key can easily be changed at this stage by selecting another column which can qualify as a Primary Key (e.g. it is numerical and has no duplicate values in the first 10GB).
Note, if another column is selected as the Primary Key, the auto-generated index column is no longer added to the table.
When happy with the data preview, you can select the confirm button which closes the current tab and opens a new dataset preview tab.
New dataset is created
For larger datasets, you will see a progress bar in the Dataset preview. The dataset is not accessible as long as the progress bar is shown but you are able to add metadata to their dataset already.
Once the dataset is fully uploaded, you can access their data as normal. They are also able to manually add metadata to the dataset using the Add Metadata function.
Conversion and error report
For all File to Dataset conversions, a report is generated. This report is accessible from the Summary tab as well as from the activity tab of any successfully created datasets.
If the workspace could not create the dataset, a report is generated and shown in the summary tab under the activity. This report contains a list of the errors encountered while trying to create the dataset, including a reference to the line or character whenever possible.
To read the report, simply click on the link in the summary tab or activity of the dataset. The report will open in a new tab.
I can't find the 'convert to dataset' button
This action is only available for files ending in .csv (or .CSV). Make sure that you are trying to make a conversion from a CSV file.
My dataset is taking a long time to show up
Large datasets can take quite a few minutes to convert to be created, this is longer when you choose to exclude certain columns from their dataset or when multiple users are carrying out the action at once.