Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support basic searcher #351

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

support basic searcher #351

wants to merge 1 commit into from

Conversation

ShawnShawnYou
Copy link
Collaborator

public:
InnerIdType total_count_{0};
InnerIdType max_capacity_{1000000};
uint32_t code_size_{0};
uint32_t prefetch_neighbor_codes_num_{1};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why flatten datacell need neighbor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used for batch query

template <typename T, typename U>
bool
operator==(const AllocatorWrapper<T>&, const AllocatorWrapper<U>&) noexcept {
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always true?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

private:
std::shared_ptr<hnswlib::HierarchicalNSW> alg_hnsw_;
};
} // namespace vsag
Copy link
Collaborator

@LHT129 LHT129 Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used on test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -70,6 +70,11 @@ class FlattenDataCell : public FlattenInterface {
io_->Prefetch(id * code_size_);
};

bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

virtual void
SetRuntimeParameters(const UnorderedMap<std::string, ParamValue>& new_params) {
// TODO(ZXY): implement
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not use

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


template <typename GraphTmpl, typename VectorDataTmpl>
BasicSearcher<GraphTmpl, VectorDataTmpl>::BasicSearcher(std::shared_ptr<GraphTmpl> graph,
std::shared_ptr<VectorDataTmpl> vector,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use interface is better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


#ifdef USE_SSE
for (uint32_t i = 0; i < prefetch_neighbor_visit_num_; i++) {
_mm_prefetch(vl->mass + neighbors[i], _MM_HINT_T0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Prefetch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (uint32_t i = 0; i < neighbors.size(); i++) {
#ifdef USE_SSE
if (i + prefetch_neighbor_visit_num_ < neighbors.size()) {
_mm_prefetch(vl->mass + neighbors[i + prefetch_neighbor_visit_num_], _MM_HINT_T0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (vl->mass[neighbors[i]] != vl->curV) {
to_be_visited_rid[count_no_visited] = i;
to_be_visited_id[count_no_visited] = neighbors[i];
count_no_visited++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes we don't need to_be_visited

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for graph searcher, it's required


BasicSearcher::BasicSearcher(GraphInterfacePtr graph,
FlattenInterfacePtr vector,
const IndexCommonParam& common_param) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const GraphInterfacePtr& graph
const FlattenInterfacePtr& vector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified to graph & flat interface

#include <shared_mutex>

#include "../utils.h"
#include "ThreadPool.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused header files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const IndexCommonParam& common_param);

virtual MaxHeap
Search(const float* query, InnerSearchParam& inner_search_param) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we search different graph using one Searcher

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified, use SetGraphAndVector to allow search with different graph and vector

#include <string>
#include <thread>
#include <variant>
#include <vector>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#include "fixtures.h"
#include "io/memory_io.h"
#include "quantization/fp32_quantizer.h"
#include "safe_allocator.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

REQUIRE(result.top().second == valid_result.top().second);
REQUIRE(result.top().second == ids[i]);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add new line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int cur_{0};
bool is_end_{false};
};
} // namespace vsag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add new line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

uint32_t prefetch_neighbor_visit_num_{1};
};

} // namespace vsag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add new line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

MaxHeap
BasicSearcher::Search(const float* query, InnerSearchParam& inner_search_param) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const InnerSearchParam& inner_search_param

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -226,7 +226,18 @@ FlattenDataCell<QuantTmpl, IOTmpl>::query(float* result_dists,
const std::shared_ptr<Computer<QuantTmpl>>& computer,
const InnerIdType* idx,
InnerIdType id_count) {
for (uint32_t i = 0; i < this->prefetch_neighbor_codes_num_ and i < id_count; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefetch_neighbor_codes_num_ is always 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be further optimized by optimizer

@ShawnShawnYou ShawnShawnYou force-pushed the support-searcher branch 2 times, most recently from 8af23a1 to 6450d83 Compare January 20, 2025 15:53
Copy link

codecov bot commented Jan 20, 2025

Codecov Report

Attention: Patch coverage is 98.87640% with 1 line in your changes missing coverage. Please review.

@@            Coverage Diff             @@
##             main     #351      +/-   ##
==========================================
- Coverage   90.28%   88.02%   -2.27%     
==========================================
  Files         118      135      +17     
  Lines        7372     8693    +1321     
==========================================
+ Hits         6656     7652     +996     
- Misses        716     1041     +325     
Flag Coverage Δ
cpp 88.02% <98.87%> (-2.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 88.77% <ø> (+2.12%) ⬆️
datacell 91.31% <100.00%> (+1.13%) ⬆️
index 91.14% <100.00%> (+0.42%) ⬆️
simd 69.28% <ø> (-30.72%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04283a8...cb903ef. Read the comment docs.


void
BasicSearcher::SetGraphAndVector(GraphInterfacePtr graph_data_cell,
FlattenInterfacePtr vector_data_cell) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split two function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

std::pair<float, uint64_t>& current_node_pair,
Vector<InnerIdType>& to_be_visited_rid,
Vector<InnerIdType>& to_be_visited_id) const {
// to_be_visited_rid is used in redundant storage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant not used now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment for distinguish to_be_visited_rid and to_be_visited_id in basic_search.h

}

void
BasicSearcher::SetGraphAndVector(GraphInterfacePtr graph_data_cell,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use interface name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

uint32_t hops = 0;
uint32_t dist_cmp = 0;
uint32_t count_no_visited = 0;
Vector<InnerIdType> to_be_visited_rid(graph_data_cell_->MaximumDegree(), allocator_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: zhongxiaoyao.zxy <[email protected]>
@ShawnShawnYou ShawnShawnYou self-assigned this Jan 21, 2025
Copy link
Collaborator

@inabao inabao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants