Vector Search

FAIR supports AI backed vector search, this is disabled by default, but can be enabled by hub owners. This article explains the differences between standard and vector search, and how the latter can be enabled on FAIR.

What is vector search?

Standard search returns results based on string similarity e.g. a user searching for datasets on alzheimer's would likely have to use the search term 'alzheimer's' to return results.

With Vector search enabled search will also return results based on semantic similarity e.g. the search term 'cognitive decline' will return search results including alzheimer's disease datasets.

Where vector search is enabled the dataset title and description are indexed by default, and keywords are also indexed in the standard search configuration.

At present dictionaries and lookup values are not indexed for vector search. Details on how hub owners can index specific catalogue fields for vector search is available here.

Using vector search requires indexed dataset metadata to be sent to external Azure services, these will be hosted in the same region as the DRE hub but are not controlled by Aridhia. Vector search is enabled on a per hub basis, not a per dataset basis.

We recognise that not all hub owners will be comfortable with this, which is why it is disabled by default.

How to enable vector search on FAIR

Given these sensitivities enabling vector search in FAIR is a multi-step process, this is detailed below.

  1. Contact the Aridhia service desk. They can enable the vector search service on your hub. This is a necessary first step.

  2. Update search configuration. The default FAIR search configuration contains a pre-set weighting for vector search, and for all required metadata elements. These pre-set weights are retained by custom search configurations, but can be modified. Instructions for applying a weight to specific catalogue fields and adding them to the vector index are available here.

  3. Add vector search permission and search weighting permission to required roles. None of the standard roles in FAIR have vector search or search weightings enabled by default. To use vector search users need to be given a role that has the Users can perform AI Enhanced Searches [Preview] and Users can perform Searches with custom field weighting (affects search rankings) [Preview] permissions enabled.

Explaining search weighting

Field weighting

This is the search weighting given to individual metadata elements e.g. the dataset title.

Azure search allows users to add weights to any indexed metadata field, matches on a field given a higher weighting will be ranked higher in search results than those from a field with a lower weighting.

Our current default configuration gives the highest weight to the dataset title, followed by the catalogue description, dictionaries and lookups. Our standard catalogue also gives a strong weighting to the keywords catalogue field, however as this not a mandatory metadata element it will not be applied to custom catalogue templates by default.

Vector weighting

This is the overall weighting given to matches returned by vector search versus those returned by the standard string similarity searches.

String similarity searches are always weighted at 1.0, by default vector searches are weighted at 0.7, meaning that by default results returned only by vector search will be ranked lower than those that have a match on string similarity.

Updated on August 19, 2025

Was this article helpful?