-
Notifications
You must be signed in to change notification settings - Fork 151
Expand file tree
/
Copy pathdataset_metadata.yml
More file actions
51 lines (50 loc) · 1.85 KB
/
dataset_metadata.yml
File metadata and controls
51 lines (50 loc) · 1.85 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# This file contains the metadata for the datasets (formats) which do not have a dedicated
# metadata facility. The MFD and hdf5 loaders use this file to determine the similarity function, among other things.
# (HDF5 metadata support is moot for us since the runtime support fall short in other ways)
#
# Ideally, this metadata is part of the format and access layer for a given dataset format. This file exists because
# the dataset names herein are in a form which does _not_ support proper bundled configuration data with the raw data.
# When possible, these dataset should be provided with another mechanism which fully handles this aspect of dataset
# management so that we don't have to maintain separate parts in different places.
#
# You can put additional metadata here, but it will not be type-safe and reified properly unless there is an accompanying
# change in the DataSetProperties interface and associated implementations.
cohere-english-v3-100k:
similarity_function: COSINE
# examples of supported properties
# If not present, presumed to be false
# is_normalized: false
# is_zero_vector_free: false
# is duplicate_vector_free: false
ada002-100k:
similarity_function: COSINE
openai-v3-small-100k:
similarity_function: COSINE
gecko-100k:
similarity_function: COSINE
openai-v3-large-3072-100k:
similarity_function: COSINE
openai-v3-large-1536-100k:
similarity_function: COSINE
e5-small-v2-100k:
similarity_function: COSINE
e5-base-v2-100k:
similarity_function: COSINE
e5-large-v2-100k:
similarity_function: COSINE
ada002-1M:
similarity_function: COSINE
colbert-1M:
similarity_function: COSINE
cap-1M:
similarity_function: COSINE
cap-6M:
similarity_function: COSINE
dpr-1M:
similarity_function: COSINE
dpr-10M:
similarity_function: COSINE
cohere-english-v3-1M:
similarity_function: COSINE
cohere-english-v3-10M:
similarity_function: COSINE