`cbus` public C API #10635

Gerold103 · 2024-09-30T20:56:52Z

Gerold103
Sep 30, 2024
Collaborator

Reviewers

Main Reviewer: @locker
Team Lead: @sergepetrenko
CTO: @sergos

Changelog

v3:

Dropped cqueue. Now cpipe can work both with and without libev.

v2:

Reworked from explicit cpipe and endpoint objects into just a few global functions.

Summary

There is a demand for being able to send messages to the TX thread from other threads. The existing https://github.com/tarantool/xtm is not good enough. Its queue is limited and it won't work as well on Mac, which apparently is important at least for developers, even though it won't run in prod. Also it is an external tool and isn't very "native" to existing Tarantool event loop mechanisms.

It was suggested by Picodata folks that we export our cbus endpoints and a subset of cpipe. The idea was that the users via some sort of public API would be able to create their own cbus_endpoint in TX thread, run cbus_loop() on it in a user-fiber, create a custom cpipe in any other threads, connect them to this new endpoint, and send stuff there.

Besides just suggesting it, the Picodata guys did us a big favor and brought a PR #9129 + took it on board in their fork, started using it, and now some of our projects started using this patch for PoCs as well. Clearly, the idea works well, but needs a bit of polishing before we merge it into vanilla Tarantool.

This document suggests a clear and extendible public API for doing that. However it goes a simpler route. Instead of exposing new public types like pipes and endpoints the proposal is to only expose a few functions which being called by any thread would send callbacks to the TX thread.

What we already have

cbus is the "middleware" for exchanging messages between threads. It allows creation of endpoints which can receive messages.
cbus_endpoint is the endpoint. Each endpoint's name is globally unique in the whole process. Each endpoint's messages can be processed separately and independently from the other endpoints. For example, 2 fibers can be serving 2 different endpoints.
cpipe is a unidirectional message queue. The pipe can be created in any thread, connected to the given endpoint by name, and send messages to there.

cbus_endpoint is tightly integrated with libev. Specifically, cbus_endpoint gets connected to the current thread's event-loop and its events ("new messages received") get delivered in the form of libev callbacks.

cpipe is actually not so much integrated with libev. Most of cpipe is just a size-limited forward-list of cmsg objects. Until the size is reached, the elements are pushed into the queue without any locks. When the size is reached, the queue is flushed into the destination endpoint under a mutex lock. The only integration with libev is that cpipe doesn't require an explicit flush of the queue. Instead, when a first item is pushed, cpipe schedules its own flush in the end of the current event loop iteration.

The original idea

Given that the needed usage is cbus_endpoint in TX thread and cpipe in external threads, it looks like the current APIs are almost suitable as is. Picodata guys saw it too, and did the following:

Extracted cbus_endpoint into the public C API as is.
Extracted non-libev part of cpipe into a new simpler object called lcpipe (from "light" cpipe).

This lcpipe has all the libev stuff stripped. It is a simple queue which can be flushed into its destination endpoint (manually or when reaches the max size). The flush is still the mutex-locked move of the items + ev_async_send() into the endpoint's file descriptor (hence waking it up in the target thread).

The lcpipe and cbus_endpoint APIs were made public as is.

The suggested idea

This document revisits the original goal - being able to send callbacks to TX thread from other threads. For that the users really don't need any explicit public access to cbus endpoints or pipes. All what they really need is a function which sends a callback + its argument to TX thread.

The proposal is to do just that.

There are going to be 2 new public functions for push and flush from any thread. All the rest is private.
On first access to this new API a non-libev pipe is going to be created as thread_local data. Similar to cord_on_demand().
Those public functions would be using the internal thread-local pipe for sending messages to TX.
TX thread would get a new endpoint with a fiber pool on it specifically for handling messages from external threads.
The fiber pool size gets a configuration option in box.cfg.

Public C API for external threads

// Schedule the given callback to be executed in TX thread with the provided argument. In
// order for the messages to be actually sent to TX thread the user must call
// tnt_tx_flush() in the same thread as the pushes.
//
// In TX thread the callbacks are guaranteed to start execution in the same order as the
// push, but the order of completion is undefined if they are yielding, since they get
// executed potentially in different fibers.
//
// If called during Tarantool shutdown, the behaviour is undefined. The external threads
// must be terminated before that.
//
// If called in TX thread, the behaviour is undefined.
//
// The function relies on `thread_local` C++ data to have properly working destructors and
// constructors and won't be suitable for any other runtime.
void
tnt_tx_push(void (*func)(void *), void *arg);

// Send all the pending callbacks of this thread to TX thread. Note, that it doesn't
// guarantee that they are already executed when this function returns. They are only sent
// to TX thread, not called yet. It is the caller's responsibility to ensure that the
// messages are not being sent faster than TX thread is handling them. Otherwise the queue
// in TX thread grows faster than shrinks and could lead to unpredictable latency and even
// OOM.
//
// Note, that push is very cheap while flush is relatively expensive both for this thread
// and for TX thread. Avoid calling it on each push if possible.
void
tnt_tx_flush(void);

Public API for TX thread

TX thread gets a new endpoint "tx_user". It works similar to the "tx" endpoint, which handles IProto requests. This endpoint has a fiber pool, whose size is configurable with box.cfg.tx_user_pool_size (default same as net_msg_max). The worker fibers inherit the pool's name.

The pool is needed because the user callbacks are very likely to be doing fiber yields, which can be especially long if caused by WAL writes. A fiber pool would allow to execute new callbacks while the already running ones are waiting for something.

The pool size must be configurable, because unless the default is infinite, some users would always want to be able to run more callbacks in parallel.

Example of usage

TX thread

-- Optional, if default is too big or too small.
box.cfg{tx_user_pool_size = 12345}

Any other non-TX thread (in C, but doesn't have to be)

// Some sort of channel back to external threads. Could be a SingleProducer
// queue in C/C++, could be a one-way channel in Rust Tokio. Doesn't matter. It
// is a way out of TX thread, which doesn't need to be integrated with the TX
// thread itself.
static single_producer_queue response_queue;

// A socket of an irrelevant type, which is able to read requests from the
// network, and write responses. For example, in HTTP protocol, or in Redis
// protocol. Most importantly, not IProto, because otherwise the IProto threads
// would replace this whole thing.
static struct socket *socket;

// Callback executed in TX thread. It does DML/DQL, has safe access to
// everything in module.h (to all exported public symbols, doesn't need to
// be C).
static void
my_callback_in_tx(void *arg)
{
	struct packet *p = arg;
	if (p->type == PACKET_INSERT)
		p->rc = box_insert(p->args);
	else if (p->type == PACKET_REPLACE)
		p->rc = box_replace(p->args);
	else
		...

	response_queue.push(p);
}

// One external thread which reads the socket and forwards requests to TX.
static void
thread_network_read_f(void)
{
	while (socket->is_closed()) {
		struct packet *p;
		while ((p = socket->read()) != NULL)
			tnt_tx_push(my_callback_in_tx, p);
		tnt_tx_flush();
		socket->wait_readable();
	}
}

// Another external thread which writes the responses to the socket.
static void
thread_network_write_f(void)
{
	struct packet *p;
	while ((p = response_queue.pop_blocking()) != NULL)
		socket->write(p);
}

Internal C API

Instead of creating a new type of pipe the decision is to make cpipe be able to work both with and without libev. It isn't so hard since cpipe doesn't really use libev much. It is only used for auto-flush, which can simply be disabled when a libev loop is not provied.

The public functions tnt_tx_push() and tnt_tx_flush() on first invocation in each thread are going to create a thread-local cpipe without libev, connected to the "tx_user" endpoint in TX thread.

Alternatives

Expose `cbus_endpoint` and a light-`cpipe` into the public API.

It would work, but the API appeared to be requiring too many actions, like explicit creation of the pipe and the endpoint. It would give more freedom, but was discarded, because apparently isn't needed for the immediate task.

Extract a part of `cpipe` into a new struct `lcpipe` like in the original PR.

That is too much code movements and renames and isn't really needed. cpipe is almost able to work without libev anyway.

locker · 2024-10-01T11:30:33Z

locker
Oct 1, 2024
Maintainer

A few thoughts:

Before introducing pipes, I think we need to export the API for creating user threads. I think that all user threads should be integrated with libev/fibers (i.e. created with cord_costart). All our internal threads (except memtx sorters) are created with cord_costart so I think it should be fine to create user threads with libev as well. It'd allow them to use fibers, which is nice.
I think that all pipes should be integrated with libev, and pipes may only be used with threads created using the API from p.1.
From the document, it's unclear how to create a pipe: which thread should call which function? is it possible to create a pipe between two user threads or only between the tx thread and a user thread? it'd be nice to see an example in the document.
Maybe, always create a user thread with an endpoint (or add a thread creation flag for it)? The API would be much easier to use then.

1 reply

Gerold103 Oct 1, 2024
Collaborator Author

@locker , thanks for the input!

Before introducing pipes, I think we need to export the API for creating user threads. I think that all user threads should be integrated with libev/fibers.

Hm, interesting idea! Didn't think of that. Sadly, wouldn't work in this case. The users don't necessarily write code in C or even C++. This specific feature request is for Rust code, when there are external Rust threads which already have their own event loops (both Picodata and our guys run Tokio, which is a multi-threaded event loop (from what I know)).

That means, unless I am missing something, that they can't run +1 our libev loop per their own thread, and can't really run fibers either. Even if they would be using just C++ and having some std::threads of their own or running boost::asio workers, I have a strong subjective feeling that they won't use our threads at all and would again invent wheels like lcpipe.

It'd allow them to use fibers, which is nice.

There are things better than our fibers really. In fact, our cooperative multitasking feels quite obsolete compared to what Rust and Go are capable of, and even the new Java with its "Virtual Threads". Per-thread event loop is a step back for new apps. Still would be cool to allow users create more threads with fibers, but my point is that there doesn't seem to be any demand for it.

I think that all pipes should be integrated with libev, and pipes may only be used with threads created using the API from p.1.

I couldn't agree more. That would make things simpler. If Tarantool runtime would be completely isolated and apps would need to be written inside of it from scratch, always in C or C++. But it won't work when users can't or just don't want to rewrite their stuff into our threading model. I hope I was able to explain why in the text above.

With that said, I see value in your suggestions. Perhaps some future users wouldn't need to go to Rust or write apps outside of Tarantool, if we would give them API for better horizontal scaling out of the box, in C/C++. Could be something like this?

// Execute the given function in a new thread, which is integrated with Tarantool runtime.
// Specifically, the given 'endpoint' can be used to send `tnt_msg`s too via pipes. Also
// This thread supports fibers creation.
//
// The newly created thread starts the given function in a new fiber, and has some
// internal fibers out of the box (for serving the endpoint, etc).
struct tnt_thread *
tnt_thread_start(int (*func)(void *), void *arg, const char *endpoint);

If yes, then I would say that it doesn't contradict with the API that I've suggested.

In these threads people could still create more than one endpoint via tnt_bus_ev_endpoint_new().
Those threads could still use the non-libev pipes tnt_bus_pipe_new().

The API I am suggesting would fit into this threading stuff naturally. Or at least, I can't see any conflicts. tnt_bus_ev_endpoint() can only be created in threads which have libev, that would work in TX thread and in those future tnt_threads. tnt_bus_pipe() works in any thread really. As for the integration with libev threads - in the document I've suggested a naming schema, that would allow us introduce in the future tnt_bus_ev_pipe(), which would have auto-flush in tnt threads.

Maybe, always create a user thread with an endpoint (or add a thread creation flag for it)? The API would be much easier to use then.

People (the current real users of this lcpipe thing) don't need an endpoint in their thread. They only need it in TX thread, for sending user->TX. For the way back (TX->user) there are ways which don't requite event loop integration. To be precise, Rust users simply use Tokio stuff, which allows to send a message from any thread (from TX) to their Tokio threads.

is it possible to create a pipe between two user threads or only between the tx thread and a user thread?

Endpoints are only for TX thread, the pipes are for external threads. I've stated that a couple of times in the document. But I can also add an example, yes. Have another look, please!

locker · 2024-10-02T12:00:12Z

locker
Oct 2, 2024
Maintainer

A few more thoughts:

Can we do without the boilerplate code needed to start a worker fiber at the TX side? I mean, could we have just two TX methods: one for creating an endpoint and another one for destroying it, which would start/stop the worker fiber(s) under the hood?
I don't quite like the message abstraction because I don't understand how one is supposed to reuse messages. Since messages are allocated in a user thread and freed in the TX thread, the user would need to have some inter-process communication to reuse messages, which is IMO difficult to write in a more efficient way than the standard malloc. I'd prefer to have just one method that would schedule invocation of a user callback in the TX thread.
It's unclear to me from the API description what happens if an endpoint is destroyed while there are pipes connected to it or if a pipe tries to connect to a non-existent endpoint. Maybe, better use objects instead of names to avoid confusion? Actually, maybe better use the standard TX endpoint, which already has a fiber pool attached to it and is guaranteed to exist?
Function names look a bit awkward to me: tnt_bus_pipe - what's "bus pipe", is there a "bus" object? tnt_bus_ev_endpoint - confusing as there's no non-ev endpoint.

Come to think of it, do we even need to present the endpoint abstraction to our users? AFAIU they just want to schedule execution of a callback in the TX thread from a user thread. Let's give them just that:

/**
 * Initializes the remote call subsystem.
 *
 * Must be called by each user thread that needs to invoke callbacks
 * in the tx thread. Under the hood, it creates a pipe to the "tx"
 * endpoint.
 *
 * [to be called in user threads]
 */
void
tnt_remote_init(void);

/**
 * Frees the remote call subsystem.
 *
 * Must be called at exit by each each user thread that has initialized
 * the remote call subsystem.  Under the hood, it destroys the pipe to
 * the "tx" endpoint created with tnt_remote_init().
 *
 * [to be called in user threads]
 */
void
tnt_remote_free(void);

/**
 * Sets the max number of messages to buffer in the user thread before
 * flushing them to the tx thread.
 *
 * The default value is INT_MAX (flush manually).
 *
 * See also tnt_remote_flush().
 *
 * [to be called in user threads]
 */
void
tnt_remote_set_backlog(int backlog);

/**
 * Schedules a callback to be invoked in the tx thread.
 *
 * The callback will be executed in a fiber taken from the tx fiber pool.
 * (This pool is also used for processing IPROTO requests.)
 *
 * Note the callback isn't necessarily pushed to the tx thread immediately.
 * Instead it may be queued locally in the user thread. To flush the local
 * queue, use tnt_remote_flush().
 *
 * See also tnt_remote_set_backlog().
 *
 * [to be called in user threads]
 */
void
tnt_remote_call(void (*cb)(void *), void *arg);

/**
 * Flushes all pending messages queued with tnt_remote_call() to the tx thread.
 *
 * [to be called in user threads]
 */
void
tnt_remote_flush(void);

5 replies

Gerold103 Oct 2, 2024
Collaborator Author

I am glad that I've brought you to this topic, didn't even think of some of your ideas before.

Can we do without the boilerplate code needed to start a worker fiber at the TX side?
I don't quite like the message abstraction because I don't understand how one is supposed to reuse messages.

Yes, good idea.

Somehow I was sure that different endpoints would allow users to distribute the load between them, but given this is all located in one thread anyway, it indeed doesn't make sense to split them. Unless the users would want to do something unnatural, like process their endpoint not all the time, but with a period. But I don't know of any need for it.

Then the users only need to create pipes in their threads and connect to the built-in TX-endpoint.

Actually, maybe better use the standard TX endpoint, which already has a fiber pool attached to it and is guaranteed to exist?

I wouldn't recommend reusing the existing endpoints. We currently have "tx" and "tx_prio".

"tx" has a fiber-pool of a limited size, and is connected to IProto and net_msg_max. Reusing it for external requests of unknown nature would potentially steal resources from IProto, if those user-callbacks are going to be taking lots of yield-time.
"tx_prio" is running right in the scheduler fiber. It can't yield. If we make it being served by a non-scheduler fiber, then we slow down processing of our system messages, like WAL batch responses. They would need +1 fiber switch even when users don't send anything.

Which means that the safest option would be to create a new "public" endpoint with a known name, and serve it on a single system fiber. Users could connect their pipes to it. It would be created before box.cfg, right after cbus initialization.

I would suggest, that we rename the current internal endpoints to something like "__tx" and "__tx_prio", and the new endpoint would be called just "tx".

If the users need a fiber pool on it, they could implement it themselves on top. This looks a bit crutchy, but I don't yet see an easy solution for it, when the endpoint is "built-in". In my previous suggestion they could run tnt_bus_ev_endpoint_loop() in multiple fibers and those fibers would be the pool naturally. To help them, we would need (maybe not now) to introduce a C API for a fiber pool, or introduce a function like box_endpoint_loop() which would make the current fiber join the processing on "box" endpoint, or introduce a new box.cfg setting for the size of a built-in pool on that endpoint (too complex, IMO).

This single-endpoint stuff also makes me worried if different independent modules would try to send heavy yielding messages to TX thread and would compete for the worker fibers of this endpoint. With separate manually managed endpoints it wouldn't happen.

Nonetheless, I like the new way more, it is way simpler than manual endpoints, but the concerns about endpoint worker fibers are still worrying.

Since messages are allocated in a user thread and freed in the TX thread, the user would need to have some inter-process communication to reuse messages.

Yes. It is even possible in the example, that I've shown in the RFC. But I don't mind dropping it. Small allocations are usually very fast.

what happens if an endpoint is destroyed while there are pipes connected to it or if a pipe tries to connect to a non-existent endpoint
Maybe, better use objects instead of names to avoid confusion?

Wouldn't help. What happens if an endpoint is destroyed before the pipes? Object or a name - in both cases we either need to return an error from destroy or just panic. Anyway, looks like we don't need endpoint creation API, so this becomes not relevant.

tnt_bus_pipe - what's "bus pipe", is there a "bus" object?

tnt is a prefix of the exported methods. bus is the name of the subsystem, similar to box. We don't have a box object. It is rather a namespace. In C++ we would write this tnt::bus::ev_endpoint. But in C I have to use prefixes.

tnt_bus_ev_endpoint - confusing as there's no non-ev endpoint.

It is actually the point - there might appear a non-ev endpoint in the future. I intentionally added the ev suffix for this object. I can imagine, how in the future people would want to use multithreading "out of the box" more, and we would be able to give them endpoint APIs for their own threads. Like tnt_bus_fd_endpoint() - just a file descriptor inside, not libev. And people could use it with epoll directly, and send there messages from TX. But for now only ev_endpoint was suggested, yes.

Regardless, I understand that for now with your ideas we won't need to allow endpoint management anyway, which makes this confusion go away for now.

void tnt_remote_init(void);

I do agree that the endpoints API we can drop, but I can't see yet, why we should make the users create any globals in their code for pipes. This function apparently should initialize some global thread-local pipe, right? When there is no object for it, it becomes harder to track when the users should call tnt_remote_free(), while with the pipes we have an object which can be explicitly destroyed.

But that isn't the main thing. The primary concern of mine is that we don't know about runtime of the library which is going to be using this API. It worries me that the API would enforce users to rely on thread-local globals. How would this work in Rust? Wouldn't anything break? I am not an expert and that makes me want to do a safe way, when users create explicit non-global objects.

Also this single global pipe would mean the following:

In the future, if we introduce endpoints after all, people won't be able to connect to multiple ones. They would need to choose one endpoint to connect to.
The global pipe would have global settings, like max queue size until flush. What if an app wants to have multiple pipes with different max sizes?

I would lean to this direction, if it would be much simpler. But we still have to do the following:

Create a pipe (tnt_bus_pipe_new() or tnt_remote_init()).
Set the backlog (in constructor of tnt_bus_pipe_new() or in tnt_remote_set_backlog()).
Flush the pipe (due to lack of event-loop) (tnt_bus_pipe_flush() or tnt_remote_flush()).
Destroy the pipe (tnt_bus_pipe_delete() or tnt_remote_free()).

We still have to do all the same stuff, but now it is stored in some global variable.

This rather feels like something that would work for people who create threads with us. I.e., from your previous suggestion about tnt_threads. If we really would have full control over what a thread does on start, on stop, and can have our own system fibers in it, then indeed we could say "just create a tnt_thread and inside of it you have a built-in libev loop, a thread-local auto-flush pipe, and can send stuff to TX using tnt_send_to_tx(callback, arg) without initializing anything".

For non-tnt threads the explicit non-libev pipes look at least not more complex than tnt_remote thing. Or?

I would suggest the following new version of the API, based on the talks above:

// For now the only supported consumer is "tx", located in TX thread.
struct tnt_bus_pipe *
tnt_bus_pipe_new(const char *consumer, uint64_t max_size);

void
tnt_bus_pipe_delete(struct tnt_bus_pipe *pipe);

void
tnt_bus_pipe_flush(struct tnt_bus_pipe *pipe);

// Push and only flush if became > max size.
void
tnt_bus_pipe_push(struct tnt_bus_pipe *pipe, void (*callback)(void *), void *arg);

// Push + flush.
void
tnt_bus_pipe_push_now(struct tnt_bus_pipe *pipe, void (*callback)(void *), void *arg);

locker Oct 3, 2024
Maintainer

I wouldn't recommend reusing the existing endpoints. We currently have "tx" and "tx_prio".

OK, introducing a new endpoint dedicated to handling user requests makes sense.

If the users need a fiber pool on it, they could implement it themselves on top.

I think it's better to create a fiber pool for them because we can do it in the most efficient way. Besides, a built-in fiber pool would guarantee that different modules using the same tx endpoint (the default one for handling user requests) wouldn't mess with each other unless one module is pushing way too many long-running yielding messages, which is bad anyways. AFAICS we can create a new endpoint with a fiber pool attached using fiber_pool_create().

I do agree that the endpoints API we can drop, but I can't see yet, why we should make the users create any globals in their code for pipes. This function apparently should initialize some global thread-local pipe, right? When there is no object for it, it becomes harder to track when the users should call tnt_remote_free(), while with the pipes we have an object which can be explicitly destroyed.

But that isn't the main thing. The primary concern of mine is that we don't know about runtime of the library which is going to be using this API. It worries me that the API would enforce users to rely on thread-local globals. How would this work in Rust? Wouldn't anything break? I am not an expert and that makes me want to do a safe way, when users create explicit non-global objects.

Actually, we already do have a thread local variable initialized automatically in user threads - it's cord_ptr - and it seems to work just fine with Rust modules. It's initialized and destroyed using the C++ class CordOnDemand. We need it so that user threads can use box_error_last() for getting errors returned by the box_raw_read_view C API. So I guess we could initialize a pipe to the tx thread on demand, store it in struct cord, and destroy it automatically when the thread exists. This way the API would reduce to just a couple functions - push() and flush().

The global pipe would have global settings, like max queue size until flush. What if an app wants to have multiple pipes with different max sizes?

Actually, I fail to imagine an example when users would want to configure the max input size, let alone use different sizes in the same thread. However, if they did, they could just implement their own buffering before sending messages to the tx thread.

AFAIU users just want to invoke a callback in the tx thread ASAP. It's our job to balance the load and make sure that everything works okay. Unfortunately, we can't get rid of manual flushing because we don't control the event loop in user threads, but I'm not sure it justifies manual backlog configuration. AFAIK the Rust module doesn't set the max_input limit at all. Maybe, we could set it to some reasonable default, like we do in IPROTO:

tarantool/src/box/iproto.cc

Line 3435 in 37bf64b

cpipe_set_max_input(&iproto_thread->tx_pipe, iproto_msg_max / 2);

or disable the max input limit altogether and always require manual flushing? Anyway, there will be a Rust wrapper around this C API - it can keep track of how many messages are pushed and flush them if there are too many of them.

In the future, if we introduce endpoints after all, people won't be able to connect to multiple ones. They would need to choose one endpoint to connect to.

I wouldn't worry about that. If we decide to introduce endpoints in future, we can introduce a new API for sending messages to them. Taking into account everything is difficult at this point, when we don't know real use cases for multiple endpoints. That's why I'd prefer not to mention "bus", "pipe", or "endpoint" in this API.

Gerold103 Oct 4, 2024
Collaborator Author

OK, introducing a new endpoint dedicated to handling user requests makes sense.

I now realized, that we can't rename the existing endpoints, because their fibers take their names. People might be having monitoring systems checking fibers having "tx" name to see how much load they produce from IProto (given that this pool and its endpoint are only used for IProto requests).

Which means we need a new non-"tx" name for the new endpoint. I would suggest "threads". Sounds ok?

I think it's better to create a fiber pool for them because we can do it in the most efficient way.

OK, makes sense. If we create it, then we need to give the users an API to change its size. Otherwise they will run into the same issues that net_msg_max is solving, such as when the pool is full and all fibers are taken, but the users need more.

Or do you think we need to make this pool of an infinite size? Otherwise I can suggest an API like this: void tnt_set_threads_fiber_pool_size(unsigned). I want to avoid too generic names like tnt_set_fiber_pool_size(), because it is unclear which pool is affects. And names like tnt_set_fiber_pool_size() wouldn't allow to distinguish which endpoint we affect (we already have "tx", then +1 new endpoint, and in future potentially more custom endpoints) (given that we also don't want to mention 'endpoints' in this API as you suggested).

we already do have a thread local variable initialized automatically in user threads - it's cord_ptr

This is not good.

Reason 1
Threads don't have to use thread-local storage like pthreads do. Even right now I can create a thread using clone() syscall which wouldn't initialize the thread-local storage. I can imagine a language which could want to handle its threads in a different non-POSIX way. Or a language could just not use the built-in thread-local init+destruction functions.

Reason 2
Our cord_create() is messing with signal handlers. Making it initialized in an alien runtime might break things in that runtime, if it was using its own signals and relying on that. Also cord_create() does some libev stuff, but from what I see, when it is unused, it does no harm, unless I miss something.

My questions to you then are these:

Do we need to handle that or should we keep relying on thread-local storage working like in pthreads?
Do we need to do anything with the signals?
Do we still need the explicit init and free functions, like you said before with those remote_init and remote_free functions? Then people wouldn't need to rely on thread-local storage being automatically initialized and destroyed. Otherwise if we rely on cord being created, then can as well put this all into cord.

we could initialize a pipe to the tx thread on demand, store it in struct cord, and destroy it automatically when the thread exists.

I saw this, but thought you might rethink you suggestion, if you hadn't considered the stuff I said above. Thus I am re-asking, whether you are sure about this.

Taking into account everything is difficult at this point, when we don't know real use cases for multiple endpoints. That's why I'd prefer not to mention "bus", "pipe", or "endpoint" in this API.

Yes, that makes sense, agreed.

locker Oct 7, 2024
Maintainer

Which means we need a new non-"tx" name for the new endpoint. I would suggest "threads". Sounds ok?

I'd name it "tx_user" or "tx_custom" to be consistent with "tx_prio" and emphasize that it's for custom tasks created by users.

Otherwise I can suggest an API like this: void tnt_set_threads_fiber_pool_size(unsigned).

I doubt anybody would want to set the pool size from the code. Besides, if there were a few modules, none of them should configure the pool size because it'd be shared among them. I think the pool size should be set via the config.

we already do have a thread local variable initialized automatically in user threads - it's cord_ptr

My questions to you then are these:

Do we need to handle that or should we keep relying on thread-local storage working like in pthreads?

Do we need to do anything with the signals?

I wouldn't change anything because it works fine with Rust modules and we aren't planning to support other languages for now. In future, if there's a demand, we'll think of ways to work around these issues.

Do we still need the explicit init and free functions, like you said before with those remote_init and remote_free functions? Then people wouldn't need to rely on thread-local storage being automatically initialized and destroyed. Otherwise if we rely on cord being created, then can as well put this all into cord.

Since we already rely on the thread-local cord variable being initialized (for raw read view API errors), I think it's okay to put tx pipe initialization in the cord constructor as well.

Taking into account everything is difficult at this point, when we don't know real use cases for multiple endpoints. That's why I'd prefer not to mention "bus", "pipe", or "endpoint" in this API.

Yes, that makes sense, agreed.

What about tnt_tx_push(), tnt_tx_flush()?

Gerold103 Oct 7, 2024
Collaborator Author

I think it's okay to put tx pipe initialization in the cord constructor as well.

Won't work so simple directly. Check the updated RFC and the suggestion in "Open Questions" section.

What about tnt_tx_push(), tnt_tx_flush()?

Applied!

locker · 2024-10-08T08:22:20Z

locker
Oct 8, 2024
Maintainer

Internal C API

In order to create this lcpipe not attached to any event-loop, some part of the existing cpipe needs to be factored out into a new more basic struct. The suggestion is to call it cqueue.

This section looks like a leftover from the previous document version because there's no mentions of lcpipe in the idea suggested in the new version.

Store the new non-libev pipe directly in struct cord?

Done directly like this, it would be unusable for the internal non-TX threads anyway. Because they all use libev and need the auto-flush feature.

If we used one endpoint for both system and user threads, this would be a problem. However, since we agreed to introduce a new endpoint for user messages, I think we don't need to bother making this endpoint work with system threads.

Short summary:

One pipe cpipe remains, no any cqueue or lcpipe.

It can work with ev_loop and without. Depends on how it is created.

Internal threads call cord_start() + cord_tx_connect(endpoint_name). The latter call would make cpipe connected to cord's loop.

External threads call cord_tx_connect(endpoint_name). This would make cpipe without a loop.

tnt_tx_push() and tnt_tx_flush() call cord_tx_connect("tx_user") on first invocation.

I wouldn't mix user and system pipes in one class, even if it entailed some code duplication, because they are quite different. Probably, it'd be better to introduce a new thread-local C++ singleton for initializing user pipes on demand.

2 replies

Gerold103 Oct 8, 2024
Collaborator Author

This section looks like a leftover from the previous document version because there's no mentions of lcpipe in the idea suggested in the new version.

Nope. lcpipe is mentioned in "The original idea". Until my section "Internal C API" this new object didn't have any other name, so I had to call it lcpipe one more time until I introduced cqueue.

I think we don't need to bother making this endpoint work with system threads.

If you aren't against this idea, and just wouldn't want to bother with implementing this, then I would still suggest we give it a shot. It will be my task anyway. It would allow us to drop some code from our own internal threads. I will leave it to the other reviewers then.

If we used one endpoint for both system and user threads, this would be a problem.

We won't. In the suggested alternative I am saying, that the threads could choose the target endpoint's name.

I wouldn't mix user and system pipes in one class, because they are quite different.

They aren't super different either, really. It is around 90% same stuff, with 10% on top of it for auto-flush feature, quite independent from the push and flush themselves.

even if it entailed some code duplication,

To be clear, are you ok with factoring out a part of cpipe into a new cqueue class? The initial PR actually did a full duplicate of cpipe with libev removed from it, but it was quite a lot of code duplication actually. And that code was not trivial. I am fine with splitting into base class + cpipe on top, and with keeping one cpipe with optional libev auto-flush in it, but I would prefer not to fully duplicate a part of cpipe. It would also require to duplicate all the tests and would be just more code to maintain in the future.

locker Oct 9, 2024
Maintainer

To be clear, are you ok with factoring out a part of cpipe into a new cqueue class?

OK, let's give it a try.

sergos · 2024-10-15T15:12:18Z

sergos
Oct 15, 2024
Maintainer

I want @mkokryashkin have a look as the "other reviewers" mentioned.
I can't see an example how it is supposed to create an ev-loop enabled pipe. From the 'open question' I see that it is supposed to call the cord_start() to enable it, still it's not clear how to integrate with an alternative ev-loop.

3 replies

Gerold103 Oct 15, 2024
Collaborator Author

how it is supposed to create an ev-loop enabled pipe

What do you mean by "it"? If you mean the public functions tnt_tx_push() and tnt_tx_flush(), they don't use any event loop. They never call cord_start().

cord_start() is an internal function and it is only called by internal Tarantool threads. The external threads do create a struct cord object implicitly, because it is in our __thread storage and we initialize it on first access. But the foreign threads never call cord_start() on that cord-object. Because, naturally, that would require them to switch to our runtime completely, with fibers and the libev loop.

This "Open Question" thing does not affect the public API. It is only about internal things:

Whether we want cqueue loop-less base + cpipe child class for libev, or just one cpipe flexible to have a loop or not.
Whether we want struct cord to always have such flexible cpipe connected to TX. In foreign threads without a loop, and in the internals ones with the libev loop.

My personal opinion is that better have one generic cpipe which can work with or without a loop. And then have each cord such a pipe-to-tx embedded.

sergos Oct 16, 2024
Maintainer

public functions tnt_tx_push() and tnt_tx_flush(), they don't use any event loop

How one can create a pipe with an event loop integration using those 2 functions? Should some internals be used for it? Either the tnt_tx_flush() should be called at every event loop, no matter if it (the event callback) had any intervention with the cqueue?
Use of libev is a separate example, still worth mention.

It ["Open Question"] is only about internal things

My questions are exactly about these internal things - how to use them in external thread? And how the TX code will look like?

As for the internal representation I think separation of cpipe from cqueue is more self-explanatory, than a rule of 'you have not init cpipe in a dedicated way'. The example of use should show this more explicitly.

Gerold103 Oct 16, 2024
Collaborator Author

How one can create a pipe with an event loop integration using those 2 functions?

One can't. And isn't supposed to.

My questions are exactly about these internal things - how to use them in external thread?

External threads won't use them. Because cord API is internal.

And how the TX code will look like?

TX code won't change really. We only create a new endpoint and run a fiber pool on it. No any changes to existing things. But I do suggest that we start using things differently in other internal Tarantool threads.

I can try to show an example.

**This is how public C API implementation would look like.**

// Establish connection to the TX thread on first usage.
//
// Given that this function is only used in external threads not having a libev
// loop in their cord->loop, the connected pipe won't have automatic flush and
// manual flush is required.
//
// External threads right now have no way to start their `cord`, so they never
// have a libev loop in their `cord->loop`.
//
// If someday we give users an ability to create actual cords with libev in
// them, then the same function can be used anyway, and will have auto-flush
// enabled in such threads.
//
static struct cpipe *
get_tx_pipe(void)
{
	static thread_local init = []() {
		cord_tx_connect("tx_user");
	};
	return &cord->tx_pipe;
}

API_EXPORT void
tnt_tx_push(void (*func)(void *), void *arg)
{
	cpipe_push(get_tx_pipe(), make_message(func, arg));
}

API_EXPORT void
tnt_tx_flush(void)
{
	cpipe_deliver_now(get_tx_pipe());
}

// NOTE: there is no `cord_start()` anywhere. User threads can't start their
// cords. Because it would mean the user has to run actual fibers and libev
// loop, which they can't possibly do now.

**This is what I want to do with the internal cord and cpipe API.**

// fiber.h
struct cord {
	// >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NEW CODE
	// Not working until `cord_tx_connect()` is called.
	struct cpipe tx_pipe;	
	// <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NEW CODE
};

// fiber.c
void
cord_start(struct cord *cord, const char *name, void *(*f)(void *), void *arg)
{
	// This already exists in our code.
	cord->loop = ev_loop_new(EVFLAG_AUTO | EVFLAG_ALLOCFD);
	// ...
}

// >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NEW CODE
void
cord_tx_connect(struct cord *cord, const char *endpoint_name)
{
	cpipe_create(&cord->tx_pipe, endpoint_name);
}
// <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NEW CODE

// cbus.c
void
cpipe_create(struct cpipe *pipe, const char *consumer)
{
	// >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NEW CODE
	ev_loop *loop = cord()->loop;
	if (loop != NULL) {
		// Can enable auto-flush. Works like now in existing code.
		// Is only so when `cord_start()` was called. Which we do in
		// all internal threads.
		pipe->producer = loop;
	} else {
		// Will need manual flush. New functionality.
		pipe->producer = NULL;
	}
	// <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NEW CODE
}

// cbus.h
static inline void
cpipe_push(struct cpipe *pipe, struct cmsg *msg)
{
	cpipe_push_input(pipe, msg);
	assert(pipe->n_input < pipe->max_input);

	// >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NEW CODE
	// Auto-flush is not available.
	if (pipe->producer == NULL)
		return;
	// <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NEW CODE

	if (pipe->n_input == 1)
		ev_feed_event(pipe->producer, &pipe->flush_input, EV_CUSTOM);
}

**And this is how our own threads will use `cord_tx_connect()`**

I use WAL thread as an example.

// wal.c
int
wal_init(...)
{
	// ...
	if (cord_costart(&writer->cord, "wal", wal_writer_f, NULL) != 0)
		return -1;
	// ...
	return 0;
}

// wal.c
static int
wal_writer_f(va_list ap)
{
	struct wal_writer *writer = &wal_writer_singleton;
	struct cbus_endpoint endpoint;
	cbus_endpoint_create(&endpoint, "wal", fiber_schedule_cb, fiber());
	
	// >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NEW CODE
	// 'tx_prio' is the endpoint that we use for transaction batches
	// right now. This connect has auto-flush, because is being run inside
	// of a `cord` with `cord->loop` being set.
	cord_tx_connect("tx_prio");

	// Use cord's built-in pipe for sending stuff to TX. No need to create
	// our own.
	writer->wal_request_route = {
		{wal_write_to_disk, &cord->tx_pipe},
		{tx_complete_batch, NULL},
	};
	// <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NEW CODE

	// ...
}

As you can see, we would be able to do this in internal threads:

Diff

diff --git a/src/box/wal.c b/src/box/wal.c
index 740b9bfd0..e96171a2f 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -116,11 +116,7 @@ struct wal_writer
 	struct xdir wal_dir;
 	/** 'wal' thread doing the writes. */
 	struct cord cord;
-	/**
-	 * Return pipe from 'wal' to tx'. This is a
-	 * priority pipe and DOES NOT support yield.
-	 */
-	struct cpipe tx_prio_pipe;
+	struct cmsg_hop wal_request_route[2];
 	/**
 	 * The vector clock of the WAL writer. It's a bit behind
 	 * the vector clock of the transaction thread, since it
@@ -216,11 +212,6 @@ wal_write_to_disk(struct cmsg *msg);
 static void
 tx_complete_batch(struct cmsg *msg);
 
-static struct cmsg_hop wal_request_route[] = {
-	{wal_write_to_disk, &wal_writer_singleton.tx_prio_pipe},
-	{tx_complete_batch, NULL},
-};
-
 static void
 wal_msg_create(struct wal_msg *batch)
 {
@@ -507,7 +498,7 @@ wal_open(struct wal_writer *writer)
 	 * thread.
 	 */
 	struct cbus_call_msg msg;
-	if (cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe, &msg,
+	if (cbus_call(&writer->wal_pipe, &cord()->tx_pipe, &msg,
 		      wal_open_f) == 0) {
 		/*
 		 * Success: we can now append to
@@ -645,7 +636,7 @@ wal_sync(struct vclock *vclock)
 		return -1;
 	}
 	struct wal_vclock_msg msg;
-	int rc = cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe, &msg.base,
+	int rc = cbus_call(&writer->wal_pipe, &cord()->tx_pipe, &msg.base,
 			   wal_sync_f);
 	if (vclock != NULL)
 		vclock_copy(vclock, &msg.vclock);
@@ -719,7 +710,7 @@ wal_begin_checkpoint(struct wal_checkpoint *checkpoint)
 		diag_set(ClientError, ER_CASCADE_ROLLBACK);
 		return -1;
 	}
-	return cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe,
+	return cbus_call(&writer->wal_pipe, &cord()->tx_pipe,
 			 &checkpoint->base, wal_begin_checkpoint_f);
 }
 
@@ -753,7 +744,7 @@ wal_commit_checkpoint(struct wal_checkpoint *checkpoint)
 		vclock_copy(&writer->checkpoint_vclock, &checkpoint->vclock);
 		return;
 	}
-	cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe, &checkpoint->base,
+	cbus_call(&writer->wal_pipe, &cord()->tx_pipe, &checkpoint->base,
 		  wal_commit_checkpoint_f);
 }
 
@@ -780,7 +771,7 @@ wal_set_checkpoint_threshold(int64_t threshold)
 		return;
 	struct wal_set_checkpoint_threshold_msg msg;
 	msg.checkpoint_threshold = threshold;
-	cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe, &msg.base,
+	cbus_call(&writer->wal_pipe, &cord()->tx_pipe, &msg.base,
 		  wal_set_checkpoint_threshold_f);
 }
 
@@ -814,7 +805,7 @@ wal_set_retention_period(double period)
 	struct wal_writer *writer = &wal_writer_singleton;
 	struct wal_set_retention_period_msg msg;
 	msg.retention_period = period;
-	cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe,
+	cbus_call(&writer->wal_pipe, &cord()->tx_pipe,
 		  &msg.base, wal_set_retention_period_f);
 }
 
@@ -838,7 +829,7 @@ wal_get_retention_vclock(struct vclock *vclock)
 		return;
 	}
 	struct wal_vclock_msg msg;
-	cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe,
+	cbus_call(&writer->wal_pipe, &cord()->tx_pipe,
 		  &msg.base, wal_get_retention_vclock_f);
 	vclock_copy(vclock, &msg.vclock);
 }
@@ -885,7 +876,7 @@ wal_collect_garbage(const struct vclock *vclock)
 		return;
 	struct wal_gc_msg msg;
 	msg.vclock = vclock;
-	cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe, &msg.base,
+	cbus_call(&writer->wal_pipe, &cord()->tx_pipe, &msg.base,
 		  wal_collect_garbage_f);
 }
 
@@ -1020,7 +1011,7 @@ wal_fallocate(struct wal_writer *writer, size_t len)
 					      &msg->vclock) < 0)
 				vclock_copy(&msg->vclock, &writer->vclock);
 			cmsg_init(&msg->base, route);
-			cpipe_push(&writer->tx_prio_pipe, &msg->base);
+			cpipe_push(&cord()->tx_pipe, &msg->base);
 		} else
 			say_warn("failed to allocate gc notification message");
 	}
@@ -1213,7 +1204,7 @@ wal_write_to_disk(struct cmsg *msg)
 		struct cmsg *msg = malloc(sizeof(*msg));
 		if (msg != NULL) {
 			cmsg_init(msg, route);
-			cpipe_push(&writer->tx_prio_pipe, msg);
+			cpipe_push(&cord()->tx_pipe, msg);
 			writer->checkpoint_triggered = true;
 		} else {
 			say_warn("failed to allocate checkpoint "
@@ -1269,14 +1260,15 @@ wal_writer_f(va_list ap)
 	/** Initialize eio in this thread */
 	coio_enable();
 
+	cord_tx_connect("tx_prio");
+
 	struct cbus_endpoint endpoint;
 	cbus_endpoint_create(&endpoint, "wal", fiber_schedule_cb, fiber());
-	/*
-	 * Create a pipe to TX thread. Use a high priority
-	 * endpoint, to ensure that WAL messages are delivered
-	 * even when tx fiber pool is used up by net messages.
-	 */
-	cpipe_create(&writer->tx_prio_pipe, "tx_prio");
+
+	writer->wal_request_route = {
+		{wal_write_to_disk, &cord->tx_pipe},
+		{tx_complete_batch, NULL},
+	};
 
 	cbus_loop(&endpoint);
 
@@ -1303,7 +1295,6 @@ wal_writer_f(va_list ap)
 	if (xlog_is_open(&vy_log_writer.xlog))
 		wal_xlog_close(&vy_log_writer.xlog);
 
-	cpipe_destroy(&writer->tx_prio_pipe);
 	cbus_endpoint_destroy(&endpoint, cbus_process);
 	return 0;
 }
@@ -1438,7 +1429,7 @@ wal_write_vy_log(struct journal_entry *entry)
 	struct wal_writer *writer = &wal_writer_singleton;
 	struct wal_write_vy_log_msg msg;
 	msg.entry= entry;
-	return cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe, &msg.base,
+	return cbus_call(&writer->wal_pipe, &cord()->tx_pipe, &msg.base,
 			 wal_write_vy_log_f);
 }
 
@@ -1456,7 +1447,7 @@ wal_rotate_vy_log(void)
 {
 	struct wal_writer *writer = &wal_writer_singleton;
 	struct cbus_call_msg msg;
-	cbus_call(&writer->wal_pipe, &writer->tx_prio_pipe, &msg,
+	cbus_call(&writer->wal_pipe, &cord()->tx_pipe, &msg,
 		  wal_rotate_vy_log_f);
 }

mkokryashkin · 2024-10-16T08:26:29Z

mkokryashkin
Oct 16, 2024
Collaborator

I’ve reviewed the RFC and the discussion, and I believe the most valuable insight I can provide is from the perspective of someone integrating with the Tarantool C API via a module.

// In TX thread the callbacks are guaranteed to start execution in the same order as the
// push, but the order of completion is undefined if they are yielding, since they get
// executed potentially in different fibers.
//
// If called during Tarantool shutdown, the behaviour is undefined. The external threads
// must be terminated before that.
//
// If called in TX thread, the behaviour is undefined.
//
// The function relies on `thread_local` C++ data to have properly working destructors and
// constructors and won't be suitable for any other runtime.
void
tnt_tx_push(void (*func)(void *), void *arg);

What happens if a user initiates a transaction and submits it incrementally, call by call, with regular flushes? While this use case may seem unusual, I think it could enable highly efficient interactive transactions without additional overhead. Currently, I have to implement a complex WASM VM rescheduling process to achieve similar behavior from an external thread.

At what exact point does the behavior become undefined? Is it safe to use it within a box_on_shutdown handler?

In the discussion about whether threads should be controlled by Tarantool or the user, I agree with @Gerold103. Tarantool’s fiber model has significant limitations when compared to what Rust or Golang can offer. Having your own thread with a Golang or Tokio event loop is a lifesaver. Some runtimes, like Tokio, allow you to choose whether to manage threads yourself or let the runtime handle them, but not all runtimes offer this flexibility. For example, WASI runtimes include a threading proposal that allows user applications to create threads independently. My WASM module relies on that, and as far as I know, there’s no elegant way to override threading handles in WASI while maintaining the same safety guarantees.

That said, as you mentioned, Tarantool’s threads have certain non-ev-related setups, like signal masks, which are essential for applications. In my opinion, a good middle ground would be to offer handles like:

tnt_init_thread_env()
tnt_destroy_thread_env()

These would handle the non-ev setup. This approach is similar to what’s done in the wasm-micro-runtime, where, if you choose not to let your application manage threads, you can manage them manually.

@locker mentioned that our Rust team is fine with no signal control and stuff, but the only reason they’re okay with it is because they don’t handle signal masks or implement any form of graceful shutdown, so the point is not really valid.

3 replies

Gerold103 Oct 16, 2024
Collaborator Author

What happens if a user initiates a transaction and submits it incrementally, call by call, with regular flushes?

If you maintain this transaction on TX side yourself, all should be fine. This API simply gives you an ability to call a callback in TX thread in a fiber. In there you can do absolutely anything (but not crashing!). Like open a transaction and send changes for it in subsequent callbacks, and then commit with one more callback. Or whatever you like. As long as you have MVCC enabled, because there will be yields.

Or what do you mean?

At what exact point does the behavior become undefined?

Unfortunately, I can't tell yet, until I start implementing it. At some point we do destroy the whole cbus. Which means at this time all external pipes will turn into pumpkins. But I believe it happens after box_on_shutdown(). This handler also uses fibers and cbus, so it should still be working at this moment in theory. In this callback you could terminate your threads safely. Unless I've missed something.

Tarantool’s fiber model has significant limitations when compared to what Rust or Golang can offer

Yep. Note though, that this API won't force you to create your own thread. The only thing you need is that thread_local storage from C++ works in your thread correctly.

AFAIR, it compiles into TLS (Thread Local Storage) section in the binary file with all initial values, then gets copied for each created thread, then its constructors are invoked, then the thread can change its own copy of TLS, and in the end destructors are called, and this copy of TLS is freed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html - C compiler fills this TLS thing with initial values during compilation time, you can even observe it in the compiled binary. And C++ compiler is even able to generate creation of complex thread-local things on first access at runtime + hangs their destructors onto thread termination. I suspect, using pthread_key_create() feature.

What I do not know at all is who is doing that exactly. Copying the initial TLS for each thread, calling constructors and then destructors in the end. IIRC, it is done by pthread library internally. Which means that if your runtime creates threads using raw clone() and ignores TLS storage, then this API is going to suck ass.

Do you know if your threads do handle TLS correctly?

Tarantool’s threads have certain non-ev-related setups, like signal masks, which are essential for applications

Yes, this is unfortunately happening right now even if you don't create a Tarantool thread, but use any of the things like box_error_last() in your own thread. It would on-demand create a thread_local cord object. It won't start it, but this cord-thing will setup its own signal altstack right away for sure. And maybe masks, but this I would need to re-check. This I think we could even fix separately. Because it is already happening unrelated to the API in this RFC.

Is it a problem? Should we create a ticket for it?

good middle ground would be to offer handles

Would you elaborate what they should do exactly? Manage all the thread-local storage in general? Or install/clear signal-related stuff, start libev loop, etc?

For our usecase right now seems like we don't need libev anyway. We only need the thread-local storage to work properly. I.e. it should exist, and its constructors and destructors must be working.

We could potentially introduce tnt_init_thread_tls() and tnt_destroy_thread_tls() to call constructors and destructors of TLS objects if you would say that it won't work otherwise, but the TLS memory must exist. Must be allocated for this thread already in advance, and freed later. Doing this manually would be hell.

If you would say we can't rely on TLS as a concept at all, then we would need to rollback to one of the previous design options which didn't use thread_local C++ thing.

mkokryashkin Oct 31, 2024
Collaborator

If you maintain this transaction on TX side yourself, all should be fine. This API simply gives you an ability to call a callback in TX thread in a fiber. In there you can do absolutely anything (but not crashing!). Like open a transaction and send changes for it in subsequent callbacks, and then commit with one more callback. Or whatever you like. As long as you have MVCC enabled, because there will be yields.

Or what do you mean?

That's exactly the use case I wanted to ensure we can handle.

In this callback you could terminate your threads safely. Unless I've missed something.
Again, exactly what I was worried about.

Do you know if your threads do handle TLS correctly?
In my case there is no question whether it works or not, since I basically have the C++ thread-local storage, which is accessed by guest languages.

Would you elaborate what they should do exactly? Manage all the thread-local storage in general? Or install/clear signal-related stuff, start libev loop, etc?
I think we should do that in a separate ticket and we should begin by adding the sigmask initialization there, since it is required for Tarantool to function properly and most of our projects miss that initialization step: as I have already mentioned, neither TDB-Redis, nor TCS set their signal masks properly in custom threads.

mkokryashkin Oct 31, 2024
Collaborator

As for the RFC, I have no objections

sergepetrenko · 2024-10-29T15:10:35Z

sergepetrenko
Oct 29, 2024
Maintainer

Short summary:

One pipe cpipe remains, no any cqueue or lcpipe.

Sounds good to me. I'd prefer this variant to splitting the struct into cqueue and cpipe.

It can work with ev_loop and without. Depends on how it is created.

Internal threads call cord_start() + cord_tx_connect(endpoint_name). The latter call would make cpipe connected to cord's loop.

TBH I don't see the point in keeping a builtin pipe-to-tx in each cord. How would this help us? In a user-created thread the pipe would be stored in thread local storage, no? If yes, then what's the difference where to store our system pipes?

External threads call cord_tx_connect(endpoint_name). This would make cpipe without a loop.

tnt_tx_push() and tnt_tx_flush() call cord_tx_connect("tx_user") on first invocation.

1 reply

Gerold103 Oct 29, 2024
Collaborator Author

How would this help us?

It would allow not to create one another thread-local singleton, and would be in line with potentially in the future exposing an API for creating Tarantool threads. One thread would be fully self-contained in struct cord.

Gerold103 · 2024-10-29T20:18:35Z

Gerold103
Oct 29, 2024
Collaborator Author

@mkokryashkin and @sergos - please, have another look. The Open Question was resolved.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tarantool

`cbus` public C API #10635

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 15 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Internal C API

Store the new non-libev pipe directly in `struct cord`?

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Tarantool

cbus public C API #10635

Gerold103 Sep 30, 2024 Collaborator

Reviewers

Changelog

v3:

v2:

Summary

What we already have

The original idea

The suggested idea

Public C API for external threads

Public API for TX thread

Example of usage

Internal C API

Alternatives

Expose cbus_endpoint and a light-cpipe into the public API.

Extract a part of cpipe into a new struct lcpipe like in the original PR.

Replies: 7 comments · 15 replies

locker Oct 1, 2024 Maintainer

Gerold103 Oct 1, 2024 Collaborator Author

locker Oct 2, 2024 Maintainer

Gerold103 Oct 2, 2024 Collaborator Author

locker Oct 3, 2024 Maintainer

Gerold103 Oct 4, 2024 Collaborator Author

locker Oct 7, 2024 Maintainer

Gerold103 Oct 7, 2024 Collaborator Author

locker Oct 8, 2024 Maintainer

Internal C API

Store the new non-libev pipe directly in struct cord?

Gerold103 Oct 8, 2024 Collaborator Author

locker Oct 9, 2024 Maintainer

sergos Oct 15, 2024 Maintainer

Gerold103 Oct 15, 2024 Collaborator Author

sergos Oct 16, 2024 Maintainer

Gerold103 Oct 16, 2024 Collaborator Author

mkokryashkin Oct 16, 2024 Collaborator

Gerold103 Oct 16, 2024 Collaborator Author

mkokryashkin Oct 31, 2024 Collaborator

mkokryashkin Oct 31, 2024 Collaborator

sergepetrenko Oct 29, 2024 Maintainer

Gerold103 Oct 29, 2024 Collaborator Author

Gerold103 Oct 29, 2024 Collaborator Author

`cbus` public C API #10635

Gerold103
Sep 30, 2024
Collaborator

Expose `cbus_endpoint` and a light-`cpipe` into the public API.

Extract a part of `cpipe` into a new struct `lcpipe` like in the original PR.

Replies: 7 comments 15 replies

locker
Oct 1, 2024
Maintainer

Gerold103 Oct 1, 2024
Collaborator Author

locker
Oct 2, 2024
Maintainer

Gerold103 Oct 2, 2024
Collaborator Author

locker Oct 3, 2024
Maintainer

Gerold103 Oct 4, 2024
Collaborator Author

locker Oct 7, 2024
Maintainer

Gerold103 Oct 7, 2024
Collaborator Author

locker
Oct 8, 2024
Maintainer

Store the new non-libev pipe directly in `struct cord`?

Gerold103 Oct 8, 2024
Collaborator Author

locker Oct 9, 2024
Maintainer

sergos
Oct 15, 2024
Maintainer

Gerold103 Oct 15, 2024
Collaborator Author

sergos Oct 16, 2024
Maintainer

Gerold103 Oct 16, 2024
Collaborator Author

mkokryashkin
Oct 16, 2024
Collaborator

Gerold103 Oct 16, 2024
Collaborator Author

mkokryashkin Oct 31, 2024
Collaborator

mkokryashkin Oct 31, 2024
Collaborator

sergepetrenko
Oct 29, 2024
Maintainer

Gerold103 Oct 29, 2024
Collaborator Author

Gerold103
Oct 29, 2024
Collaborator Author