Log fiber #365
Replies: 3 comments 6 replies
-
One of the main problems here is in the proposal name - you try to build the monitoring on top of fibers instead of services, modules. It stops working at the moment when something stops being a fiber. For example, take Or take Or take In other words, I don't think we need to rely on having a fiber in any way.
What do you mean? Not have any info by default at all? Or only the last logs?
Should better think in terms of a "service state" instead of a "fiber state". One fiber can potentially have many services, and some services can have no fiber at all.
I think at first it should be fine to have the number hardcoded in You still can try to add the logs though, I don't insist on dropping them. I just think it a bit complicates the main feature ("state"), which is in turn mainly motivated by tests, where we need to implement Another point about the logs is that the name is probably a bit unfortunate too. We don't need to just duplicate the
What do you need heap for? The heap is used for sorting and quick push/pop operation of the max/min item. Here you just have time series data - new messages always come to the last place and old messages are always deleted from the first place. You can simply have a list like in
The idea is to have a single info point -
As you yourself noted, the value can be taken from
This is correct. But you are missing the fact that we won't show a copy of I would recommend to collect for each background service what kinds of activities they do and what states they have. Maybe indeed we will be able to just store a state. For tests it would probably be enough.
The problem is that people don't know what rebalancer or any other service is doing right now - did it stop? does it still work? In tests we have the same problem. If the states would be just Another problem to think about, mostly related to tests. Assume the rebalancer state is 'balanced'. Now we change something and expect the rebalancer to perform rebalancing. It will end up in the same state 'balanced'. But from the outside by just looking at the state we can't tell whether it worked at all. It could be old 'balanced' or it could quickly go through 'applying routes' and back to 'balanced'. Maybe we also need a monotonic number like a "generation" which is updated on every change in 'state'/'activity'. Then we can tell if the services progresses. |
Beta Was this translation helpful? Give feedback.
-
As for the original idea with setting states, I see some association from the world of kubernetes, where objects can have such concepts as conditions, events, states, etc. https://maelvls.dev/kubernetes-conditions/ |
Beta Was this translation helpful? Give feedback.
-
The related issue is #107. The discussion starts with a description of how the task looks in my understanding. Then I provide my vision of API and behaviour, some insights at internals, open and frequent questions, alternatives, and what I want to do in the future.
Problems with how it works now
A lot of fibers work constantly in background of the vshard's routers and storages (discovery, failover, rebalancer, recovery etc). But there's no way to check their current activity and errors that occured during the fiber's execution other than manual grepping of the logs.
We need to introduce more convenient way of monitoring background fibers and their testing using luatest.
How it should work
The idea is to introduce the module, which is capable of saving logs, states and maybe current activity of the background fibers. The data of such module can be acessed via
vshard.router.info()
andvshard.storage.info()
:The initial purpose of the module is monitoring. So, I think it should be disabled by default in order not to waste space and not to increase the execution time of the fiber step. Moreover, the user will not need all messages forever. I suppose we should introduce a way to configure the number of saved log messages.
API and behavior
The module (let's call it
log_fiber
) should fulfill such roles:I think that every instance (router or storage) should have this module right inside it as an attribute, and initialize or reconfigure it during
vshard.router/storage.cfg()
.The structure of the
log_fiber
can be like this:As
logs[fiber_name].messages
we can use heap, which is already written and used in ref'ssession_heap
.I won't talk about activity field and its API, as I'm not sure that we really need it (see 'Open questions' section).
Fiber state API
Essentially, state would be like vshard.storage.info().status - a simple string which tells whether all is good or bad. It can also be used in cartridge GUI, which could translate the states to colors indicating if there's any kind of error occured in a fiber.
As far as I can see, the only needed function is:
It should extract the name of the fiber, from which it was invoked via
fiber.self().name()
and save the state to the storage (under the storage I mean an entry inlogs
table (see the [structure] (#api-and-behavior) of the module)), which is related to the corresponding fiber.We could also need a way to get the state of the current fiber, which will do almost the same thing as
set_state
.We should not save any state, if the
log_fiber
is disabled.Fiber log message API
The module should also be a wrapper around the default log module, used in tarantool, so we should introduce the same API,
log
has:These methods does the following:
log
, it can be formatted string, table (for json format), and any other type, which has__tostring
).fiber.clock()
, fiber's name, log level.logs[fiber_name].messages
,Actions 3-4 are performed only if the
log_fiber
is enabled and the level of the message is <= than thelog_level
of thelog
module.Basic API
log_fiber:cfg()
The function for configuring the
log_fiber
instance. We can introduce new option tovshard.router/storage.cfg()
:log_vshard_fiber
, which can have a boolean or non-negative number type. It will indicate the following:log_fiber
instance is disabled.log_length
from consts is used.log_length
.log_length
represents the number of log messages saved for all fibers.On reconfiguration from enabled state to disabled all
logs
will be dropped.log_fiber:drop()
Drop all info about the fiber the name of which is
fiber_name
. Sometimes all info about the fiber is needed to be deleted without deleting info about all other ones (e.g. whendiscovery_mode
is changed tooff
).log_fiber:get_info()
Get a table of all known data in the formatted way. By formatted I mean that the messages should be converted from table:
to a single string:
See the example in 'How it should work' section.
Module API
Creates the instance of the
log_fiber
.Note: all of the above functions are the part of the
log_fiber
's index metatable, butlog_fiber.new()
is the function returned by thelog_fiber
module.FAQ
log_level
is not passed tolog_fiber:cfg()
. How will we get it?Yes, we should not save the log message if the level of the message (
debug
, e.g.) is more thanlog_level
of thelog
module (it'sinfo
e.g.). So we need a way to getlog_level
from somewhere.I don't really want to get this option from the user's cfg table, passed to
vshard.router/storage.cfg()
, as at some day we will dropbox.cfg()
from the router and there will be only vshard's cfg table passed tovshard.router.cfg
with nolog_level
in it.We can easily access
log_level
from the log module withlog.cfg.log_level
. But in old versions there's nocfg
inlog
, so we can accessbox.cfg.log_level
if box is configured. Overwise, we'll use defaultlog_level
, which equals toinfo
.Open questions
Fiber activity
In addition to the state, we can also save, what the fiber is doing right now: idle, sending buckets etc. I don't think we'll need it and here's the reason why.
The problem out there is that the user will almost always see just the one activity: idle, which is related to the fact, that the console's fiber is executed in the same thread whith all other background fibers and there's not many
fiber.yield()
(both: explicit and implicit) during the execution of the single fiber step. Only in case of context swap during the step the user can catch the last activity, which was set before this yield.The user cannot get 'first_one' as the fiber doesn't stop its execution before
set_state('second_one')
.Please, correct me, if I'm wrong.
Fiber state
Basically, there will be two states for every fiber: everything is good or not. Do we really want to make several versions of good state:
balanced/good/full
e.g. It will lead to harder processing of these states in cartridge e.g.Maybe we'll be enough with just three states:
ok/error/unknown
. The last one will represent the state of the fiber, which didn't even make its first step and which doesn't know, if there is some error.Alternatives
Static log_fiber
We can make
log_fiber
module to be static and have all loggers to be saved right inside it, and not inside the router and storage instances as an attribute. We can use the last part of the fiber's name (e.g._static_router
) as the name for such loggers. Callinglog_fiber.info(...)
would extract the name of the corresponding logger from the part offiber.self().name()
and save all info tologgers[log_name].logs[fiber_name].messages/state
.I don't really see any need in implementing it in such way. It looks like unnecessary complication. Moreover, it will require renaming all storage's fibers to be not
vshard.rebalancer
butvshard.rebalancer.storage
or complicating the extracting of the logger name in thelog_fiber
module.Beta Was this translation helpful? Give feedback.
All reactions