Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A kind request to update the results + some potential issues that could be with measurement #9

Open
eyalz800 opened this issue Aug 21, 2022 · 1 comment

Comments

@eyalz800
Copy link
Contributor

eyalz800 commented Aug 21, 2022

Hi,

So I finally had some time to read the code in the benchmark, and I wanted to get your opinion about some issues that I think are there and maybe you can help me understand the right behavior and correct me:

  1. In flatbuffers fbs.h, it seems that the creation of the graph is also measured with measuring serialize:
  void serialize() {
    auto const edges = utl::to_map(GRAPH.edges_, [&](auto&& e) {
      return std::pair{e.get(), graph::CreateEdge(fbb_, e->from_->id_,
                                                  e->to_->id_, e->weight_)};
    });
    fbb_.Finish(graph::CreateGraph(
        fbb_, fbb_.CreateVector(utl::to_vec(GRAPH.nodes_, [&](auto&& n) {
          return graph::CreateNode(
              fbb_, n->id_, fbb_.CreateString(n->name_),
              fbb_.CreateVector(utl::to_vec(
                  n->out_edges_, [&](auto&& e) { return edges.at(e); })),
              fbb_.CreateVector(utl::to_vec(
                  n->in_edges_, [&](auto&& e) { return edges.at(e); })));
        }))));
  }
  1. This is also true for capnproto -
void serialize() {
    capnp::MallocMessageBuilder message;
    auto graph = message.initRoot<Graph>();
    auto nodes = graph.initNodes(GRAPH.nodes_.size());

    auto node_idx = 0u;
    for (auto const& n : GRAPH.nodes_) {
      auto node = nodes[node_idx++];

      node.setId(n->id_);
      node.setName(n->name_);

      auto edge_idx = 0u;
      auto in_edges = node.initInEdges(n->in_edges_.size());
      for (auto const& e : n->in_edges_) {
        auto edge = in_edges[edge_idx++];
        edge.setFrom(e->from_->id_);
        edge.setTo(e->to_->id_);
        edge.setWeight(e->weight_);
      }

      edge_idx = 0;
      auto out_edges = node.initOutEdges(n->out_edges_.size());
      for (auto const& e : n->out_edges_) {
        auto edge = out_edges[edge_idx++];
        edge.setFrom(e->from_->id_);
        edge.setTo(e->to_->id_);
        edge.setWeight(e->weight_);
      }
    }

    serialized_ = capnp::messageToFlatArray(message);
  }

Most other libraries, including cista itself do not have this counted, meaning only the serialization is measured, I know serialization is supposed to be zero copy for some of the libraries including cista, so wouldn't it make more sense to include construction times for all serializers (both zero copy and non zero-copy)?
Can you help me understand the benchmark approach to that?

@felixguendling
Copy link
Owner

felixguendling commented Aug 22, 2022

This has to do with the use case where we're using cista. We're creating snapshots of large graph data structures combined with meta data and lookup data structures (hash maps, etc.). Our use case is MOTIS, the intermodal timetable information system. MOTIS listens to update data-streams to update the in-memory representation of the timetable to include delays, reroutings, etc. With cista it is possible to

  • serialize/snapshot/deep-copy the data structures that are currently in use directly from in-memory the actual in-memory data structures we're also using for routing. We use cista::vector<T> as a drop-in-replacement for std::vector<T>, which means we can use modifying non-const functions such as emplace_back directly on the data structures we also serialize. This implies, that there is no need to call functions such as the Flatbuffers generated function graph::CreateGraph. The graph is already there and ready to be written to disk in this form. Thus, the graph::CreateGraph functions are extra overhead we would not do with cista, since with cista the cista::vector<T> is also what we use for normal operations, routing, updating the data, etc.
  • With cista data structures that were read from a on-disk snapshot, we're capable to start operations (such as routing or updating the graph) directly. It's really zero-copy in the sense that all data structures in the zero-copy deserialized buffer can be writte/modified (e.g. push_back on a vector which requires re-allocation). This is enabled by having a flag we call self_allocated_ which is set to false before serialization. So when deserializing, the vector<T> (as an example) knows, that it should not call free since the address where the data is stored is inside the serialized buffer. However, if you call push_back() on the vector<T> it will allocate memory on the heap and set self_allocated_ to true (so it frees it if it needs to be resized again). With Flatbuffers and Capnproto you can directly start reading the data zero-copy, whereas with cista you can read/write the serialized snapshot zero-copy which is a big advantage for our use case. Since we need hash maps for fast lookups, cista even has zero copy (de-)serializable hash map/set that you can directly start using (including modifications that require new allocations) in a zero-copy fashion.

Of course every use case is different and for some use cases this is probably irrelevant. In the end, if you want a benchmark representing exactly your use case, you probably need to design a new one for every use case you have.

However, making a distinction between frameworks that allow you to use the data structures your algorithms need (like vector<T>) also for serialization as is the case with cereal, cista, etc. and those that require you to convert them into another format (like Flatbuffers, Capnproto), makes sense for most use cases.

Being able to start using and modifying data from a serialized snapshot (enable by the self_allocated_ flag approach) is critical for our use case. So to represent our use case, we would even require transforming the Cap'n'proto and Flatbuffers data structures into modifiable in-memory data structures such as vector<T>, etc. This is currently not reflected in the benchmark.

Maybe to reflect different use cases (like modifying deserialized data zero-copy, or just reading, etc.), different benchmarks would be required. But I currently don't have the time to design those benchmarks and implement them with every framework.

Edit: Building a hash map from serialized data can be costly and this is also not necessary for cista which saves a lot of time. None of the other frameworks support that. This is also a (important for us) use case that's missing for the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants