Indices
bocoel.Index
Index(*args: Any, **kwargs: Any)
Bases: Protocol
Index is responsible for fast retrieval given a vector query.
Source code in bocoel/corpora/indices/interfaces/indices.py
30 31 32 |
|
_embeddings abstractmethod
property
_embeddings: NDArray | IndexedArray
The embeddings used by the index.
dims property
dims: int
The number of dimensions that the query vector should be.
search
search(query: ArrayLike, k: int = 1) -> SearchResultBatch
Calls the search function and performs some checks.
Parameters
query: ArrayLike
The query vector. Must be of shape [batch, dims]
.
k: int
The number of nearest neighbors to return.
Returns
A SearchResultBatch
instance. See SearchResultBatch
for details.
Source code in bocoel/corpora/indices/interfaces/indices.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
_search abstractmethod
_search(query: NDArray, k: int = 1) -> InternalResult
Search the index with a given query.
Parameters
query: NDArray
The query vector. Must be of shape [dims].
k: int
The number of nearest neighbors to return.
Returns
A numpy array of shape [k]. This corresponds to the indices of the nearest neighbors.
Source code in bocoel/corpora/indices/interfaces/indices.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
bocoel.HnswlibIndex
HnswlibIndex(
embeddings: NDArray,
distance: str | Distance,
threads: int = -1,
batch_size: int = 64,
)
Bases: Index
HNSWLIB index. Uses the hnswlib library.
Score is calculated slightly differently https://github.com/nmslib/hnswlib#supported-distances
Source code in bocoel/corpora/indices/backend/hnswlib.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
search
search(query: ArrayLike, k: int = 1) -> SearchResultBatch
Calls the search function and performs some checks.
Parameters
query: ArrayLike
The query vector. Must be of shape [batch, dims]
.
k: int
The number of nearest neighbors to return.
Returns
A SearchResultBatch
instance. See SearchResultBatch
for details.
Source code in bocoel/corpora/indices/interfaces/indices.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
bocoel.FaissIndex
FaissIndex(
embeddings: NDArray,
distance: str | Distance,
index_string: str,
cuda: bool = False,
batch_size: int = 64,
)
Bases: Index
Faiss index. Uses the faiss library.
Source code in bocoel/corpora/indices/backend/faiss.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
search
search(query: ArrayLike, k: int = 1) -> SearchResultBatch
Calls the search function and performs some checks.
Parameters
query: ArrayLike
The query vector. Must be of shape [batch, dims]
.
k: int
The number of nearest neighbors to return.
Returns
A SearchResultBatch
instance. See SearchResultBatch
for details.
Source code in bocoel/corpora/indices/interfaces/indices.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
bocoel.WhiteningIndex
WhiteningIndex(
embeddings: NDArray,
distance: str | Distance,
reduced: int,
whitening_backend: type[Index],
**backend_kwargs: Any
)
Bases: Index
Whitening index. Whitens the data before indexing. See https://arxiv.org/abs/2103.15316 for more info.
Source code in bocoel/corpora/indices/whitening.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
search
search(query: ArrayLike, k: int = 1) -> SearchResultBatch
Calls the search function and performs some checks.
Parameters
query: ArrayLike
The query vector. Must be of shape [batch, dims]
.
k: int
The number of nearest neighbors to return.
Returns
A SearchResultBatch
instance. See SearchResultBatch
for details.
Source code in bocoel/corpora/indices/interfaces/indices.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
bocoel.PolarIndex
PolarIndex(
embeddings: NDArray,
distance: str | Distance,
polar_backend: type[Index],
**backend_kwargs: Any
)
Bases: Index
Index that uses N-sphere coordinates as interfaces.
https://en.wikipedia.org/wiki/N-sphere#Spherical_coordinates
Source code in bocoel/corpora/indices/polar.py
25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
search
search(query: ArrayLike, k: int = 1) -> SearchResultBatch
Calls the search function and performs some checks.
Parameters
query: ArrayLike
The query vector. Must be of shape [batch, dims]
.
k: int
The number of nearest neighbors to return.
Returns
A SearchResultBatch
instance. See SearchResultBatch
for details.
Source code in bocoel/corpora/indices/interfaces/indices.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
bocoel.StatefulIndex
StatefulIndex(index: Index)
Bases: Index
An index that tracks states.
Source code in bocoel/corpora/indices/stateful.py
21 22 23 |
|
dims property
dims: int
The number of dimensions that the query vector should be.
history property
history: Sequence[SearchResult]
History for looking up the results of previous searches with index handles.
search
search(query: ArrayLike, k: int = 1) -> SearchResultBatch
Calls the search function and performs some checks.
Parameters
query: ArrayLike
The query vector. Must be of shape [batch, dims]
.
k: int
The number of nearest neighbors to return.
Returns
A SearchResultBatch
instance. See SearchResultBatch
for details.
Source code in bocoel/corpora/indices/interfaces/indices.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
stateful_search
stateful_search(query: ArrayLike, k: int = 1) -> Mapping[int, SearchResult]
Search while tracking states.
Parameters
query: ArrayLike
The query to search for.
k: int = 1
The number of nearest neighbors to return.
Returns
Mapping[int, SearchResult]
A mapping from the index of the search result to the search result. The index is used for retrieving the search result later. Do so by indexing the history
property of this object.
Source code in bocoel/corpora/indices/stateful.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
bocoel.Boundary dataclass
The boundary of embeddings in a corpus. The boundary is defined as a hyperrectangle in the embedding space.
bounds instance-attribute
bounds: NDArray
The boundary arrays of the corpus. Must be of shape [dims, 2]
, where dims is the number of dimensions. The first column is the lower bound, the second column is the upper bound.
dims property
dims: int
The number of dimensions.
lower property
lower: NDArray
The lower bounds. Must be of shape [dims]
.
upper property
upper: NDArray
The upper bounds. Must be of shape [dims]
.
fixed classmethod
fixed(lower: float, upper: float, dims: int) -> Self
Create a boundary with fixed bounds. If lower > upper
, a ValueError
would be raised.
Parameters
lower: float
The lower bound.
upper: float
The upper bound.
dims: int
The number of dimensions.
Returns
A Boundary
instance.
Source code in bocoel/corpora/indices/interfaces/boundaries.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
bocoel.Distance
Bases: StrEnum
bocoel.corpora.indices.interfaces.results._SearchResult dataclass
query instance-attribute
query: NDArray
Query vector. If batched, should have shape [batch, dims]. Or else, should have shape [dims].
vectors instance-attribute
vectors: NDArray
Nearest neighbors. If batched, should have shape [batch, k, dims]. Or else, should have shape [k, dims].
distances instance-attribute
distances: NDArray
Calculated distance. If batched, should have shape [batch, k]. Or else, should have shape [k].
indices instance-attribute
indices: NDArray
Index in the original embeddings. Must be integers. If batched, should have shape [batch, k]. Or else, should have shape [k].
bocoel.corpora.SearchResultBatch dataclass
Bases: _SearchResult
A batched version of search result.
query instance-attribute
query: NDArray
Query vector. If batched, should have shape [batch, dims]. Or else, should have shape [dims].
vectors instance-attribute
vectors: NDArray
Nearest neighbors. If batched, should have shape [batch, k, dims]. Or else, should have shape [k, dims].
distances instance-attribute
distances: NDArray
Calculated distance. If batched, should have shape [batch, k]. Or else, should have shape [k].
indices instance-attribute
indices: NDArray
Index in the original embeddings. Must be integers. If batched, should have shape [batch, k]. Or else, should have shape [k].
bocoel.corpora.SearchResult dataclass
Bases: _SearchResult
A non-batched version of search result.
query instance-attribute
query: NDArray
Query vector. If batched, should have shape [batch, dims]. Or else, should have shape [dims].
vectors instance-attribute
vectors: NDArray
Nearest neighbors. If batched, should have shape [batch, k, dims]. Or else, should have shape [k, dims].
distances instance-attribute
distances: NDArray
Calculated distance. If batched, should have shape [batch, k]. Or else, should have shape [k].
indices instance-attribute
indices: NDArray
Index in the original embeddings. Must be integers. If batched, should have shape [batch, k]. Or else, should have shape [k].