Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate unoconvert so it can be run from user virtualenv without uno or system-site-packages #29

Open
hottwaj opened this issue Apr 25, 2022 · 17 comments
Labels
enhancement New feature or request

Comments

@hottwaj
Copy link

hottwaj commented Apr 25, 2022

This would completely isolate unoconvert and make use of it in virtualenvs safer

unoconvert is already a remote process that doesn't in theory need access to uno directly (via import, etc), and using system-site-packages is a bit fragile - if your virtualenv is not setup correctly you might accidentally import a system package instead of one you thought you had in your virtualenv.

Thanks for great package!

@jannek-aalto
Copy link

There should be no need for libreoffice packages etc. at all on a unoconvert-client-only host. It's horrible to install (and update) the whole of libreoffice as a dependency for what's probably a very simple TCP thingy.
I'm not fluent with Python, at all, unfortunately.

@regebro
Copy link
Member

regebro commented May 12, 2022

That's not possible, sorry. The communication is handled by the uno library that is a part of Libreoffice.

@regebro regebro closed this as completed May 12, 2022
@hottwaj
Copy link
Author

hottwaj commented May 12, 2022

Hi there, it is possible and there are a few ways to do it.

One way (I did something similar recently) is to use python's xmlrpc to add an rpc server to converter.py. The server would need to wrap Converter.convert so that it can be called from another process via rpc.

An xmlrpc client would also be needed, but you only have one function that needs to be wrapped so would be quick to do

The xmlrpc client would be installable in virtualenvs. Users would need to take care of starting the server/converter before using it via the xmlrpc client.

See e.g. https://docs.python.org/3/library/xmlrpc.server.html#module-xmlrpc.server

hottwaj added a commit to hottwaj/unoserver that referenced this issue May 12, 2022
@hottwaj
Copy link
Author

hottwaj commented May 12, 2022

I've added in #32 rough changes that would be needed to add the rpc server. The client will also be quite simple

@regebro
Copy link
Member

regebro commented May 13, 2022

Yes, sure we could build a completely different rpcserver and do this, but that's not what this module does, and you could do that on your side as well. The convert server in your case would still have the exact same problem, it needs to have access to the uno library. Yes, you would now get a "client client" that does not, but that doesn't really solve the issue itself. You still need to install unoserver so it has access to LibreOffice.

@hottwaj
Copy link
Author

hottwaj commented May 13, 2022

Thanks & yes agreed that this library would still need to be installed into system python env along with libreoffice+uno, there is no way around that.

The problem I was trying to solve is that other projects need to use this library, but at the moment they can only do so by either: i) running converter.py as a separate process (taking care to use system python not virtualenv python), or ii) installing the project into system python env. Both have their downsides especially the latter option you risk breaking your system python or being unable to satisfy the project's dependencies.

Also as an improvement to suggested structure, it would make more sense to run the rpc server in the same process that starts soffice (rpc server would still run from system python env). All the conversion code could be moved back into a "unoserver+rpc" module and the converter client could be very lightweight, using only RPC for conversion via both command line and API. This would also allow the converter to be installed in virtualenvs. Thanks!

@regebro
Copy link
Member

regebro commented May 13, 2022

It would make it possible to make a Python library to make conversions, so that's a benefit.

@regebro regebro reopened this Jun 19, 2022
@regebro
Copy link
Member

regebro commented Jun 19, 2022

Having unoserver start a server, f ex with xml-rpc, and letting unoconvert use that instead is something we are fine with maintaining. It doesn't look like I will have time to implement it at the moment, but we are happy to accept contributions.

I would like to see the protocol used to somehow be versioned, so that we give an explicit error message if we end up using the wrong versions, and possibly even support different versions.

@regebro regebro added the enhancement New feature or request label Apr 29, 2023
@regebro
Copy link
Member

regebro commented Aug 18, 2023

2.0a is out now with an XML-RPC server

@jannek-aalto
Copy link

2.0a is out now with an XML-RPC server

A great improvement! Will start testing ASAP and move into production rather quickly, I assume...
Can't wait to drop libreoffice from application servers. Excellent, that the RPC-version already seems capable of shipping the payload too - we've been using NFS, but it provides the converter servers too much access (due to bad application design, which is out of our hands).

@regebro
Copy link
Member

regebro commented Aug 21, 2023

Headsup! I noticed the --daemon argument doesn't work in 2.0b1. If you need that I'll release a 2.0b2.

@jannek-aalto
Copy link

Daemons are kind of redundant these days, with systemd it's much better to run service processes in the foreground.

I managed to get the server bit running on a dedicated conversion server for a test (moodle) environment, ie. installed 2.0b1 over what was there previously and it still just magically works.

But then, I hit sort of a brick wall. Sorry for the stupid question, but how might one actually run the light client?
Seems its a library at this stage, needing a bit of a wrapper script to work - this, unfortunately, is beyond my pretty much nonexistent python skills. Getting it to run with the RHEL8.8 system python3 3.6 would rock.

We now have a situation where transferring the data to convert and the resulting file via the XML-RPC mechanism (instead of NFS access currently used elsewhere) would be pure gold too...

@regebro
Copy link
Member

regebro commented Oct 4, 2023

The unoconvert client script acts as that wrapper, so you would still use it the same way, by starting unoserver, and then using unconvert to convert files.

@jannek-aalto
Copy link

Ah. Thanks for that golden tip. Running setup install with the system python and a custom prefix actually works, I did get the whole thing running and a remote conversion working.
Had a bit of head-banging with haproxy on the conversion server, as I didn't first realize one now needs to specify --uno-port too... Didn't help that I use ports starting at 2001 for the workers! 2002.. doh. :D
Just some final tweaks, packaging, distribution, testing, documentation and we're golden.
Big thanks from Aalto University, we'll be very early adopters for 2.0.

@regebro
Copy link
Member

regebro commented Oct 6, 2023

Great stuff! Thanks for that, I'll release a 2.0 final shortly.

@jannek-aalto
Copy link

Happy to report that after some stress- and other testing we went into production (with 2.0b1) on our main Moodle instance, so now there are quite a lot of (end user) beta testers involved. :D

In case you'd like to add some docs/tips/examples, I've written a wrapper script which emulates unoconv 0.7, which Moodle directly supports - hopefully, can get rid of it via native support some day, but it works. Other than that, the most interesting bits are probably the converter service multi-instance systemd unit (ie. 'systemctl enable unoserver@2001 unoserver@2002') and maybe some haproxy config tips. (Perhaps the non-interactive installation scripts for everything too, but they're a bit too specific for our environment to share...)

[Unit]
Description=Unoserver document conversion service
Documentation=https://github.com/unoconv/unoserver
Wants=network-online.target
After=network-online.target
StartLimitIntervalSec=600
StartLimitBurst=5

[Service]
User=apache
Group=apache
ExecStart=/opt/bin/unoserver-wrapper %i
TimeoutStartSec=60
TimeoutStopSec=15
RestartSec=10s
Restart=always

[Install]
WantedBy=multi-user.target

/opt/bin/unoserver-wrapper:

#!/bin/sh
/opt/bin/unoserver --interface 0.0.0.0 --port $1 --uno-port $(($1+100))

haproxy config bits (from a keepalived+haproxy balancer pair, shared with other related use):

defaults
    mode                    tcp
    log                     global
    retries                 3
    timeout queue           45s
    timeout connect         5s
    timeout client          5m
    timeout server          5m
    timeout check           10s
    maxconn                 1000
    balance                 roundrobin
    log-format              %ci:%cp\ %fi:%fp\ %bi:%bp\ %si:%sp\ %b/%s\ %U/%B\ %t

listen unoserver_prod
  bind 1.2.3.4:2000 name unoserver
  default-server inter 20s fastinter 5s downinter 10s
  server unosrvp1_2001 1.2.3.5:2001 maxconn 1 check
  server unosrvp1_2002 1.2.3.5:2002 maxconn 1 check
  server unosrvp2_2001 1.2.3.6:2001 maxconn 1 check
  server unosrvp2_2002 1.2.3.6:2002 maxconn 1 check

(In production, we actually use 2 converter servers with 6 libreoffice 7.6.2's running on each.)

last but not least, the unoconv-remote translator script for Moodle (uses a static document formats list generated with unoconv 0.7):

#!/bin/ksh
typeset -i EC=0
CONFIGFILE=/opt/etc/unoconv-remote.env
[[ -s $CONFIGFILE ]] && . $CONFIGFILE

## config defaults
PYTHONPATH=/opt/unoserver-client/lib
DEBUG=${DEBUG:-0}
LOG=${LOG:-/var/log/unoconv-remote/unoconv-remote.log}
UNOCONVERT=${UNOCONVERT:-/opt/unoserver-client/bin/unoconvert}
UNOCVERS=${UNOCVERS:-unoconv 0.7}
FORMATS=${FORMATS:-$(cat /var/lib/unoconv-remote/formats.txt)}
SERVER=${SERVER:-1.2.3.4}
PORT=${PORT:-2000}

## globals
HOST=${HOST:-$(uname -n)}
STARTTS=$(date +%FT%T)
ENDTS=$STARTTS
MISCHEAD="============== $HOST"
CONVHEAD="++++++++++++++ $HOST"
ENDCHEAD="-------------- $HOST"

export PYTHONPATH

function logmsg {
  echo "$@" >>$LOG 2>&1
}

function logparams {
  typeset -i _i=1
  for _p in "$@"; do
    print "\$$_i: '$_p'" >>$LOG 2>&1
    _i+=1
  done
}

case $1 in
  --version)
    logmsg "$MISCHEAD $STARTTS, version check"
    (( $DEBUG > 0 )) && logparams $@
    print "$UNOCVERS"
    ;;
  --show)
    logmsg "$MISCHEAD $STARTTS, show formats"
    (( $DEBUG > 0 )) && logparams $@
    print "$FORMATS" 1>&2
    ;;
  -f)
    logmsg "$CONVHEAD $STARTTS, conversion start"
    (( $DEBUG > 0 )) && logparams $@
    logmsg "Input:  '$5'"
    logmsg "Output: '$4' [$2] via $SERVER:$PORT"
    $UNOCONVERT \
      --host "$SERVER" \
      --port "$PORT" \
      --convert-to "$2" \
      "$5" - \
      2>>"$LOG" > "$4"
    EC=$?
    ENDTS=$(date +%FT%T)
    if (( EC )) && [[ -f "$4" ]]; then
      logmsg "WARNING - $ENDTS: non-zero exit for '$5', removing destination file '$4'"
      rm "$4"
    fi
    logmsg "$ENDCHEAD $ENDTS, conversion end in $SECONDS s, exit code $EC"
    ;;
  *)
    logmsg "$MISCHEAD $STARTTS, unknown 1st option"
    logparams $@
    ;;
esac

exit $EC

@regebro
Copy link
Member

regebro commented Oct 31, 2023

Glad to hear it!

It would be good to have some place to store configuration tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants