We've come up with a generic overview of a simple traffic light system for datasets held in the system.
The Red Category
User requirements:
• Uploader/author must provide basic metadata about the dataset in question and contact details aswell as a copy of the data itself. For institutions which have their own data repositories only the metadata and contact info is required, but a link to an appropriate area of the institutional repository is needed. The data itself is not required to be presented in a readily reuseable format.
FISHNet provides:
• Ability to have metadata for a given dataset indexed by our search tools in order to facilitate data discovery by FISHNet users.
Other users:
• Can search and browse for data and contact the owner for more information or a copy of the dataset.
Details: It is envisioned that a dataset will initially be uploaded into the red category where it will have basic metadata and the author/uploader's contact information. This will mean that the amount of 'dead' objects in the system will be minimised. Uploaders need not provide a dataset that is reusable by others, but must make themselves available for contact by others who are interested in their data. Other institutional repositories can also share metadata about their data objects here in this category, such as CEH or the Environment Agency. This should facilitate data discovery whether the data is held in the FISHNet repository or elsewhere.
The Orange Category
User requirements:
• Owner/Author must complete more detailed metadata forms (appropriate metadata schema will depend on the nature of the data) as well as make the dataset available in a reuseable format for other users.
FISHNet provides:
• Tools to facilitate making data reuseable (a possible example being spreadsheet templates), guidance on required metadata or appropriate workflow tools to aid the user. Once the data is in a reuseable format with appropriate metadata a DOI is added in order to make the data citeable. FISHNet will also curate the data for the long term.
Other users:
• Can view improved metadata, only users given permission by the uploader/owner can download the dataset itself.
Details: It is expected that most data will stay in this category whilst being actively used by researchers. To be moved from the Red to the Orange Categories the data will have to meet certain reuse and curation standards so that it can be effectively shared with other researchers without the need for help in interpreting the data by the original author, and also maintained for the long-term. This will require extra effort from the author/uploader of the data so there must be some 'reward' for them making such effort. The return on their time is suggested to be a DOI so that the dataset becomes directly citeable by others. Our experience from talking to the FISHNet participants and other researchers is that some researchers will still be reticent in sharing their data even if it were citable as they may want to be involved in its further use, or may wish to continue working on it and publishing research on it either alone or with selected colleagues. For this reason data in the Orange category will only be made available to registered users of the system as selected by the data uploader/author (there may also be a share with the public option for those who wish to do so).
The Green Category
User requirements:
• Owner/Author must make the dataset available freely to the public uinder an appropriate licence (expected to be the Creative commons 0 licence), they may also - subject to technical development of the project - wish to help in mapping the dataset to an RDF Triple Store so it can be stored as linked data and semantically queried.
FISHNet provides:
• Tools to map data to an RDF Triple Store, long-term curation of the data, tools to integrate the data with other datasets and reuse it in different ways.
Other users:
• Can view the dataset freely and download it in its original format. Can (possibly, dependent on technology available) query the data semantically and derive new datasets from it and other datasets stored in the system.
Details: Datasets in the Green category are made freely available to the public in a reuseable format and unencumbered by licensing and IPR issues, they may also be mapped to a triple store and available as Linked Data. In order to make this Linked Data as useful as possible the underlying datasets must be released to the public with as permissible a license as possible, the CC0 License. This prevents problems of attribution and rights stacking in successive generations of derived datasets (the provenance of the data is important however and so DOIs assigned in the Orange category will be used for provenace). It is envisioned that most researchers will only wish to migrate their data from the Orange to the Green categories once they are effectively finished working on it and are happy for it to be released (currently this is the stage where most data is put on a CD and stored in a drawer somewhere by many researchers).
The table below summarises the traffic light system.
| | Red | Orange | Green |
| Basic metadata available to the public, who have ability to contact the author/uploader for more info. |  |  |  |
| Data is in a managed format so as to make it reusable to others | |  |  |
| Data has an assigned DOI so it is citeable | |  |  |
| Data can be shared with other users (as determined by the uploader/owner) | |  |  |
| Data is freely available in full to all users of the system under the CC0 license. | | |  |
| Data may be mapped for entry into a triple-store | | |  |
| Data may be available in the triple store for querying via SPARQL | | |  |
| Data can be consumed by tools built in FreshwaterLife, i.e. visualization tools on maps etc | | |  |