Improve consistency of python and binary logging #4186

ataymano · 2022-10-05T02:54:37Z

Short description

There are mismatches in how python and vw binary logs are working:

There are two logs channels in bin (driver_output/log_output) and only one in python (get_log())
get_log() seems to be not associated with neither driver_output nor log_output: looks like it is having most of driver_output messages, but not having last lines from --explore_evals runs (update_count / final_multiplier)
cmd:

>>vw --cb_explore_adf --dsjson -d cb_simple.json --explore_eval --log_output stderr --driver_output stderr
...
weighted label sum = 0.000000
average loss = -1.000000
total feature number = 4
update count = 1
final multiplier = 1.000000

python:

>>from vowpalwabbit import pyvw
>>execution = pyvw.vw('--cb_explore_adf --dsjson -d cb_simple.json --explore_eval --driver_output stderr --log_output stderr', enable_logging=True)
>>execution.finish()
>>execution.get_log()
...
 'average loss = -1.000000',
 'total feature number = 4',
 '']

get_log() is not associated neither with stdout nor with stderr (is not affected by --driver_output / --log_output parameters)

How this suggestion will help you/others

Given 2 and 3 there are messages that are impossible to see from python (explore_eval/audit/etc)

Possible solution/implementation details

Maybe python get_log() method can be replaced with either get_stdout/get_stderr or get_driver/get_logs pairs?

The text was updated successfully, but these errors were encountered:

jackgerrits · 2022-10-05T13:35:12Z

Ideally, Python could hook into the structured logs that VW produces and forward them to the standard Python logging infrastructure. Then the user can filter or consume as need be

Harsh188 · 2023-02-06T05:05:20Z

Hey @ataymano, I tried reproducing this issue with vw v9.0.0+

pyvw.vw() has been deprecated and now uses Workspace. The outputs create

train.dat

1:2:0.4 | a c 
3:0:0.2 | b d 
4:1:0.5 | a b 
2:1:0.3 | a b c
3:1:0.7 | a d

Python command

import vowpalwabbit

vw = vowpalwabbit.Workspace("--cb 4 -d train.dat --driver_output stderr --log_output stderr", quiet=False, enable_logging=True)
vw.finish()
for line in vw.get_log():
    print(line)

output:

using no cache
Reading datafile = train.dat
num sources = 1
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
cb_type = mtr
Enabled reductions: gd, scorer-identity, csoaa_ldf-rank, cb_adf, shared_feature_merger, cb_to_cbadf
Input label = CB
Output pred = MULTICLASS
average  since         example        example        current        current  current
loss     last          counter         weight          label        predict features
5.000000 5.000000            1            1.0        0:2:0.4            0:0       12
2.500000 0.000000            2            2.0        2:0:0.2            1:0       12
2.083333 1.666667            4            4.0        1:1:0.3            1:0       16

finished run
number of examples = 5
weighted example sum = 5.000000
weighted label sum = 0.000000
average loss = 1.952381
total feature number = 64

CLI Command

vw --cb 4 -d train.dat

output:

Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = train.dat
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
5.000000 5.000000            1            1.0    known        1        3
2.500000 0.000000            2            2.0    known        2        3
2.083333 1.666667            4            4.0    known        2        4

finished run
number of examples = 5
weighted example sum = 5.000000
weighted label sum = 0.000000
average loss = 1.952381
total feature number = 16

According to my observed output, the difference between Python and CLI is in "cb_type", "Enabled reductions", "Input label" and "Output pred".

I tried to take a look at pylibvw.cc to understand how the logging wrapper is setup but I'm really confused. It states "do not use vw_log". However, vw_log is what's being called during .get_log().

Any insights on this?

Harsh188 · 2023-02-13T03:11:07Z

Python could hook into the structured logs that VW produces and forward them to the standard Python logging infrastructure.

@jackgerrits Isn't this what's currently happening with the py_log_wrapper class?

jackgerrits · 2023-02-13T15:48:50Z

I implemented my take on this feature in the new set of python bindings (https://github.com/VowpalWabbit/py-vowpal-wabbit-next/blob/f5727409f9674168e7468865aa83b7e6ba8cdccd/src/main.cpp#L266). Essentially any internal log in VW is forwarded to Python's logging module so that the user can control the output using standard methods

ataymano added Bug Bug in learning semantics, critical by default Feature Request New feature requested in system labels Oct 5, 2022

ataymano changed the title ~~Improve consistency of python and bin logging~~ Improve consistency of python and binary logging Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve consistency of python and binary logging #4186

Improve consistency of python and binary logging #4186

ataymano commented Oct 5, 2022

jackgerrits commented Oct 5, 2022

Harsh188 commented Feb 6, 2023

Harsh188 commented Feb 13, 2023

jackgerrits commented Feb 13, 2023

Improve consistency of python and binary logging #4186

Improve consistency of python and binary logging #4186

Comments

ataymano commented Oct 5, 2022

Short description

How this suggestion will help you/others

Possible solution/implementation details

jackgerrits commented Oct 5, 2022

Harsh188 commented Feb 6, 2023

train.dat

Python command

CLI Command

Harsh188 commented Feb 13, 2023

jackgerrits commented Feb 13, 2023