Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse and import both rpm and deb packages metadata #9101

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
dad4efe
Create lzreposync subdirectory
agraul May 23, 2024
bc1fe58
Correct error message
waterflow80 Aug 4, 2024
4cea811
Add remote_path column
waterflow80 Aug 4, 2024
4832718
Add expand_full_filelist parameter
waterflow80 Aug 4, 2024
ad74bb1
Update deprecated method
waterflow80 Aug 4, 2024
e3bd4d3
Add import_signatures parameter
waterflow80 Aug 4, 2024
b71d9d8
Implement Primary.xml file parser
waterflow80 Aug 4, 2024
66455c6
Implement filelists.xml file parser
waterflow80 Aug 4, 2024
8daab3f
Implement full rpm metadata parsing
waterflow80 Aug 4, 2024
bb3ec40
Parse and import rpm patches/updates
waterflow80 Aug 4, 2024
3caddb5
Import parsed rpm & deb packages to db
waterflow80 Aug 4, 2024
7846311
Implement the deb Packages md file
waterflow80 Aug 4, 2024
85cb26a
Implement the Translation file parser
waterflow80 Aug 4, 2024
7f4075f
Implement full deb metadata parsing
waterflow80 Aug 4, 2024
dbbb508
Fetch repository information from the db
waterflow80 Aug 4, 2024
da4f376
Complete lzreposync service entry point
waterflow80 Aug 4, 2024
9195cad
Add new dependency
waterflow80 Aug 4, 2024
feba67e
Add unit tests for rpm metadata parsers
waterflow80 Aug 4, 2024
19357f6
Delete no longer used files
waterflow80 Aug 4, 2024
ca311e4
Remove already defined function
waterflow80 Aug 4, 2024
bb7955b
Fix linting complain
waterflow80 Aug 4, 2024
7e523d6
Complete code for lzreposync version 0.1
waterflow80 Aug 15, 2024
efaab5f
Complete tests for lzreposync service
waterflow80 Aug 15, 2024
a698afb
Fix error: too many clients already
waterflow80 Aug 15, 2024
24275e2
Complete latest version
waterflow80 Aug 17, 2024
ab31971
Optimize code and do some cleanup
waterflow80 Aug 26, 2024
43a367b
Optimize and consolidate code
waterflow80 Aug 29, 2024
0397029
Fix cachedir path formatting issue
waterflow80 Aug 29, 2024
3458720
fixup! Complete lzreposync service entry point
waterflow80 Sep 2, 2024
abf709d
fixup! Optimize code and do some cleanup
waterflow80 Sep 2, 2024
612bc5c
fixup! Optimize and consolidate code
waterflow80 Sep 2, 2024
7512b4b
fixup! Complete latest version
waterflow80 Sep 2, 2024
8831d42
fixup! Optimize and consolidate code
waterflow80 Sep 2, 2024
979523e
Complete gpg signature check for rpm
waterflow80 Sep 9, 2024
544e98b
fixup! Add remote_path column
waterflow80 Sep 9, 2024
f8ef20d
Refactor: Allow more input variants in makedirs()
agraul Sep 9, 2024
1b43565
fixup! Refactor: Allow more input variants in makedirs()
waterflow80 Sep 9, 2024
c2fee2e
Complete gpg signature check for debian
waterflow80 Sep 11, 2024
6bb43b5
Mock spacewak gpg home directory
waterflow80 Oct 24, 2024
574e43e
fixup! Mock spacewak gpg home directory
waterflow80 Oct 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,6 @@ yarn-error.log
# This should never be used since we use Yarn, but avoid anyone accidentally committing it
package-lock.json

rel-eng/custom/__pycache__

# Intellij IDEA
.idea/
*.iml
Expand All @@ -88,6 +86,12 @@ python/.vscode
# Python
venv/
.venv/
*.egg-info/
*.egg
wheels/
__pycache__/
build/
.pytest_cache/

# Schema

Expand Down
1 change: 1 addition & 0 deletions python/lzreposync/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.cache/
73 changes: 73 additions & 0 deletions python/lzreposync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# lzreposync

TODO: project description

## How to work in this project

1. Create a new virtual environment
```sh
$ python3.11 -m venv .venv
$ . .venv/bin/activate
```
2. Install `lzreposync` in *editable* mode
``` sh
$ pip install -e .
```
3. Install other required dependencies (required by spacewalk and other modules)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add these to pyproject.toml as dependencies, then step 2 will install them as well.

```sh
pip install rpm
pip install salt
```
4. Add a path configuration file (**Important!**)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct fix will be that we build python311 versions of these libraries and install them as RPMs. We'll do that later, it's not for you (and especially not for this PR).

```
echo "absolute/path/to/uyuni/python/" > .venv/lib64/python3.11/site-packages/uyuni_python_paths.pth
# This is a temporary solution that will allow the lzreposync service to recognize/locate other modules like spacewalk, etc...
```
5. Add configuration environment variables
```sh
vim /etc/rhn/rhn.conf: # create directory/file if not exists

DB_BACKEND=postgresql
DB_USER=spacewalk
DB_PASSWORD=spacewalk
DB_NAME=susemanager
DB_HOST=127.0.0.1 # might not work with 'localhost'
DB_PORT=5432
PRODUCT_NAME=any
TRACEBACK_MAIL=any
DB_SSL_ENABLED=
DB_SSLROOTCERT=any
DEBUG=1
ENABLE_NVREA=1
MOUNT_POINT=/tmp
SYNC_SOURCE_PACKAGES=0

# Some values might not be the right ones
```
6. Try `lzreposync`
``` sh
$ lzreposync -u https://download.opensuse.org/update/leap/15.5/oss/ --type yum [--no-errata]
$ lzreposync --type deb --url 'https://ppa.launchpadcontent.net/longsleep/golang-backports/ubuntu?uyuni_suite=jammy&uyuni_component=main&uyuni_arch=amd64'
```

### How do I ...?

- add new a dependency? Add the *pypi* name to the `dependencies` list in the `[project]` section in `pyproject.toml`.

## Tests
We're using a special postgres db docker container that contains all the `susemanager` database schema built and ready.

To pull and start the database, you should:
```sh
cd /uyuni/java
sudo make -f Makefile.docker EXECUTOR=podman dockerrun_pg
# Wait a few seconds until the db is fully initialized
```

After installing with `pip install .` (or `pip install -e .`), `python3.11 -m pytest pytest tests/` runs all tests. Sometimes a `rehash` is required to ensure `.venv/bin/pytest` is used by your shell.

You can connect to the test database by:
```sh
psql -h localhost -d susemanager -U spacewalk # password: spacewalk
```

22 changes: 22 additions & 0 deletions python/lzreposync/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"

[project]
name = "lzreposync"
version = "0.1"
dependencies = [
"memory_profiler",
"pytest",
"requests",
"python-gnupg",
"pycurl",
"pyopenssl",
"psycopg2-binary",
"urlgrabber",
"python-debian",
"python-dateutil"
]

[project.scripts]
lzreposync = "lzreposync:main"
242 changes: 242 additions & 0 deletions python/lzreposync/src/lzreposync/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# pylint: disable=missing-module-docstring

import argparse
import logging

from lzreposync import db_utils, updates_util
from lzreposync.db_utils import (
get_compatible_arches,
get_channel_info_by_label,
get_all_arches,
create_channel,
ChannelAlreadyExistsException,
NoSourceFoundForChannel,
)
from lzreposync.import_utils import (
import_package_batch,
batched,
import_repository_packages_in_batch,
)
from lzreposync.rpm_repo import RPMRepo
from spacewalk.common.repo import GeneralRepoException
from spacewalk.satellite_tools.repo_plugins.deb_src import DebRepo


def main():
parser = argparse.ArgumentParser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, i.e. not in this PR, we should align the CLI interface as much as possible with spacewalk-repo-sync. We don't have to support all the flags, a subset is fine. Right now we have conflicting flags (e.g. -d means debug here but dry-run in spacewalk-repo-sync)

description="Lazy reposync service",
conflict_handler="resolve",
formatter_class=argparse.RawDescriptionHelpFormatter,
)

parser.add_argument(
"--url",
"-u",
help="The target url of the remote repository of which we'll "
"parse the metadata",
dest="url",
type=str,
default=None,
)

parser.add_argument(
"-n",
"--name",
help="Name of the repository",
dest="name",
type=str,
default="noname",
)

parser.add_argument(
"-D",
"--debug",
help="Show debug messages",
action="store_const",
dest="loglevel",
const=logging.DEBUG,
default=logging.INFO,
)

parser.add_argument(
"-c",
"--cache",
help="Path to the cache directory",
dest="cache",
default=".cache",
type=str,
)

parser.add_argument(
"-b",
"--batch-size",
help="Size of the batch (num of packages by batch)",
dest="batch_size",
default=20,
type=int,
)

parser.add_argument(
"-a",
"--arch",
help="A filter for package architecture. Can be a regex, for example: 'x86_64', '(x86_64|arch_64)'",
default=".*",
dest="arch",
type=str,
)

parser.add_argument(
"-c",
"--channel",
help="The channel label of which you want to synchronize repositories",
dest="channel",
type=str,
default=None,
)

parser.add_argument(
"--type",
help="Repo type (yum or deb)",
dest="repo_type",
type=str,
default=None,
)

parser.add_argument(
"--no-errata",
help="Do not sync errata",
action="store_true",
dest="no_errata",
default=False,
)

parser.add_argument(
"--create-channel",
help="Create a new channel by providing the 'channel_label', and the 'channel_arch' eg: x86_64.\n"
"Eg: --create-channel test_channel x86_64",
dest="channel_info",
type=str,
nargs=2,
)

args = parser.parse_args()

# Remove any existing handlers (loggers)
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)
logging.getLogger().setLevel(args.loglevel)

# Creating a new channel
if args.channel_info:
channel_label, channel_arch = args.channel_info[0], args.channel_info[1]
print(
f"Creating a new channel with label: {channel_label}, and arch: {channel_arch}"
)
try:
channel = create_channel(
channel_label=channel_label, channel_arch=channel_arch
)
print(
f"Info: successfully created channel: {channel_label} -> id={channel.get_id()}, name={channel.get_label()}"
)
except ChannelAlreadyExistsException:
print(f"Warn: failed to create channel {channel_label}. Already exists !!")
return

arch = args.arch
if arch != ".*":
arch = f"(noarch|{args.arch})"

if args.url:
if not args.repo_type:
print("ERROR: --type (yum/deb) must be specified when using --url")
return
if args.repo_type == "yum":
repo = RPMRepo(args.name, args.cache, args.url, arch)
elif args.repo_type == "deb":
repo = DebRepo(args.url, args.cache, "/tmp")
try:
repo.verify()
except GeneralRepoException as e:
logging.error("__init__.py: Couldn't verify signature ! %s", e)
exit(0)
else:
print(f"ERROR: not supported repo_type: {args.repo_type}")
return
compatible_arches = get_all_arches()
failed = import_repository_packages_in_batch(
repo, args.batch_size, compatible_arches=compatible_arches
)
logging.debug("Completed import with %d failed packages", failed)

else:
# No url specified
if args.channel:
channel_label = args.channel
channel = get_channel_info_by_label(channel_label)
if not channel:
logging.error("Couldn't fetch channel with label %s", channel_label)
return
compatible_arches = get_compatible_arches(channel_label)
if args.arch and args.arch != ".*" and args.arch not in compatible_arches:
logging.error(
"Not compatible arch: %s for channel: %s",
args.channel_arch,
args.channel,
)
return
try:
target_repos = db_utils.get_repositories_by_channel_label(channel_label)
except NoSourceFoundForChannel as e:
print("Error:", e.msg)
return
for repo in target_repos:
if repo.repo_type == "yum":
rpm_repo = RPMRepo(
repo.repo_label, args.cache, repo.source_url, repo.channel_arch
)
logging.debug("Importing package for repo %s", repo.repo_label)
failed = import_repository_packages_in_batch(
rpm_repo,
args.batch_size,
channel,
compatible_arches=compatible_arches,
no_errata=args.no_errata,
)
logging.debug(
"Completed import for repo %s with %d failed packages",
repo.repo_label,
failed,
)
elif repo.repo_type == "deb":
dep_repo = DebRepo(
repo.source_url,
args.cache,
pkg_dir="/tmp",
channel_label=repo.channel_label,
)
try:
dep_repo.verify()
except GeneralRepoException as e:
logging.error("__init__.py: Couldn't verify signature ! %s", e)
exit(0)

logging.debug("Importing package for repo %s", repo.repo_label)
failed = import_repository_packages_in_batch(
dep_repo,
args.batch_size,
channel,
compatible_arches=compatible_arches,
)
logging.debug(
"Completed import for repo %s with %d failed packages",
repo.repo_label,
failed,
)
else:
# TODO: handle repositories other than yum and deb
logging.debug("Not supported repo type: %s", repo.repo_type)
continue

else:
logging.error("Either --url or --channel must be specified")
Loading
Loading