-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openbsd updater needs adapting for model changes in sqlports ? #1349
Comments
I've had a quick look at the parser, the DISTFILES and MASTER_SITES* fields you are using are indeed gone. We've changed the layout in bsd.port.mk, so that we can have arbitrary DISTFILES.sufx variables, and SITES.sufx that match (also taken the opportunity to shorten the name). Accordingly, there's no longer a fixed list of MASTER_SITES names, thus it's gone from the main table. I don't know what you use DISTFILES and MASTER_SITES for, but at a guess, you might want to use the "simpler" ROACH_URL and ROACH_SITES variables instead. On complicated ports with multiple origins, these now should contain the main url that we fetch for the package being built, and the corresponding originating sites (ROACH_URL has all the "fun" parts of OpenBSD-style MASTER_SITES already parsed). That should actually make the data you report more accurate, hopefully. If you really need to, you can instead find the full data in subsidiary tables: |
Thanks for the heads up. I'll try to update the parser this week, until then I have to disable openbsd repo. |
I use repology for openbsd very regularly so losing it on repology isn’t ideal. Instead of taking down the entire thing, would it be possible to just show the main master site as a quick stop gap for now? I think that will show the same as before for about 93% of ports as most ports don’t need more than one master site. Example diff here: repology_diff |
I'm working on this. For now I came up with this query to return all site+distfile pairs for all ports, which can then be converted to real URLs. Does it look right? I'm particularly puzzled with _distfiles' SELECT
_distfiles.FullPkgPath AS FullPkgPath,
_sites.Value AS Sites,
_fetchfiles.Value AS Files
FROM _distfiles
JOIN _fetchfiles
ON KeyRef=_Distfiles.Value
JOIN _sites
ON _sites.FullPkgPath=_Distfiles.FullPkgPath AND (coalesce(_distfiles.N, _distfiles.SUFX) is not distinct from _sites.N) |
What does repology need exactly? The OpenBSD sqlports package includes a man page which has some light documentation. I'm not sure if you've seen it, but if not I've uploaded it here in case it is helpful: sqlports man page The database has a table called Meta which might be useful to check for freshness and schema versioning.
Once that's known, the raw tables are the ones starting with underscores. So for example I see repology is using the raw tables today. Is that what you want to do? Then there are some convenience views which merge the data in the raw tables. Those convenience views do not start with underscores. The "Ports" view may be very useful to use. But since it's a view it's slow, so the results from the dynamic view are also stored as static data in "PortsQ" which breaks the pattern and is actually a table. I'm not an expert on sqlports, but if you can share how you need the data for repology I can try to work it out with you. @marcespie is the real expert here. Maybe the PortsQ table is what Marc was suggesting for you to use when he mentioned using the ROACH_SITES and ROACH_URL variables? Here is an example:
Nevertheless if you can describe what repology needs it might be helpful for those of us on the OpenBSD side to try to help. |
On Mon, Sep 18, 2023 at 07:19:43PM -0700, Dmitry Marakasov wrote:
I'm working on this. For now I came up with this query to return all site+distfile pairs for all ports, which can then be converted to real URLs. Does it look right? I'm particularly puzzled with _distfiles' `SUFX` vs `N` but it seems to work with ports using either.
```sql
SELECT
_distfiles.FullPkgPath AS FullPkgPath,
_sites.Value AS Sites,
_fetchfiles.Value AS Files
FROM _distfiles
JOIN _fetchfiles
ON KeyRef=_Distfiles.Value
JOIN _sites
ON _sites.FullPkgPath=_Distfiles.FullPkgPath AND (coalesce(_distfiles.N, _distfiles.SUFX) is not distinct from _sites.N)
```
N has to do with the distfiles table containing DISTFILES, SUPDISTFILES
and PATCHFILES.
As I mentioned *MUCH EARLIER* in the thread, unless you really want
the full list of all files which are grabbed by the port, in the
new scheme of things, *THE MAIN DISTFILE* for a given port will be
located at ROACH_URL, grabbable from ROACH_SITE.
This is your best guess at figuring out whether a port is up-to-date
(more or less), unless you get into stuff which releases as main
files along with mammoth patches. Stuff like vim, for instance.
Note that this is also the data that other tools like portroach are
supposed to use.
By just grabbing this info, instead of painting the shed red, you
will get OpenBSD to >99% functionality again.
|
https://repology.org/docs/requirements
Thank you, but it contains only the basic info which is already clear from the database structure. It doesn't answer my question regarding |
As I understand, that is Back to
And this is and example of linkage via
I would have already implemented |
On Tue, Sep 19, 2023 at 05:13:55AM -0700, Dmitry Marakasov wrote:
> N has to do with the distfiles table containing DISTFILES, SUPDISTFILES
and PATCHFILES.
As I understand, that is `_distfiles.Type`, not `N`.
Back to `N` and `SUFX`, if my guess is right, this is an example of distfiles-sites linkage via `SUFX`:
```
sqlite> select * from _distfiles where fullpkgpath=4448;
FullPkgPath|Value|N|SUFX|Type
4448|18545||.github|0
4448|18546||.github|0
sqlite> select * from _sites where fullpkgpath=4448;
FullPkgPath|N|Value
4448|.github|https://github.com/
4448|.gitlab|https://gitlab.com/
4448|.srht|https://git.sr.ht/
```
And this is and example of linkage via `N`:
```
sqlite> select * from _distfiles where fullpkgpath=13146;
FullPkgPath|Value|N|SUFX|Type
13146|131315|||0
13146|131316|0||0
13146|131317|1||0
13146|131318|2||0
sqlite> select * from _sites where fullpkgpath=13146;
FullPkgPath|N|Value
13146||https://github.com/stumpwm/stumpwm/archive/
13146|0|https://github.com/edicl/cl-ppcre/archive/
13146|1|https://beta.quicklisp.org/archive/clx/2022-07-07/
13146|2|https://beta.quicklisp.org/archive/alexandria/2022-07-07/
```
hence, `coalesce` in the query above should, which should handle both cases. So am I right?
Ah right, I had entirely forgotten about that part. The "N" part is transient.
It's there so that we know about stuff where we haven't converted from the old way
to the new way (it still records the old DISTFILES = somefile:0 -> MASTER_SITES0
as opposed to the new DISTFILES.sufx = somefile -> SITES.sufx.
Don't spend too much time on that field, it will go away as soon as I find time to finish
a full sweep of the ports tree :)
In fact, even if you "miss" out on a few files, I would advise not even trying to support it.
That part of the schema will go away entirely as soon as I find the time to finish
the sweep.
|
As forecasted, I just finished removing MASTER_SITES0..9 So you don't need to worry about N, it's gone in sqlports 7.49 |
Fantastic! Thank you so much for the quick adaptation! Following the final bit of work Marc just did, the tweak mentioned in the comment is now required. diff --git a/repology/parsers/parsers/openbsd.py b/repology/parsers/parsers/openbsd.py
index 076ad7ac..5962b880 100644
--- a/repology/parsers/parsers/openbsd.py
+++ b/repology/parsers/parsers/openbsd.py
@@ -95,7 +95,6 @@ def _iter_sqlports(path: str) -> Iterator[Port]:
yield Port(**row_dict, distfiles_cursor=distfiles_cursor)
-# TODO: drop _distfiles.N for sqlports >= 7.49
_DISTFILES_QUERY = """
SELECT
_sites.Value AS sites,
@@ -104,7 +103,7 @@ FROM _distfiles
JOIN _fetchfiles
ON KeyRef=_Distfiles.Value
LEFT JOIN _sites
- ON _sites.FullPkgPath=_Distfiles.FullPkgPath AND (coalesce(_distfiles.N, _distfiles.SUFX) is not distinct from _sites.N)
+ ON _sites.FullPkgPath=_Distfiles.FullPkgPath AND (_distfiles.SUFX is not distinct from _sites.N)
WHERE
_distfiles.FullPkgPath = ?
""" |
Following-up (somewhat): in the more generic case of all package distributions, we all have different rules about splitting up things and how "unique" things get. I think I might get some students (!!!!) looking at it this year. For starters, the number of packages varies widely depending on distributions, but it doesn't mean the number of supported software is that different (depending on supported versions, available options, various splits in packages). Speaking for OpenBSD, I know the whole pkgpath/fullpkgpath/flavors/subpackages enchilada can be mighty confusing when coming from outside. I have zero idea what more info could be provided in views from sqlports to make this less confusing... nor do I know where other distros get their info. |
I won't count that as quick, but you're welcome)
Yesterday sqlports was still at 7.48 so couldn't drop it right away. |
On Tue, Sep 26, 2023 at 07:46:21AM -0700, Dmitry Marakasov wrote:
> Fantastic! Thank you so much for the quick adaptation!
I won't count that as quick, but you're welcome)
> Following the final bit of work Marc just did, the tweak mentioned in the comment is now required.
Yesterday sqlports was still at 7.48 so couldn't drop it right away.
Depending on our snaps, you'll soon get 7.49. after all, it still takes a day to run a full bulk...
|
Repology primarily counts projects, which is specifically "number of supported software", so apart from a few exceptions subpackages and splits does not count. For instance, despite CentOS 6:
ConanCenter:
Cygwin:
it's counted once for each repo. Of course there are discrepancies, and you can estimate the error rate by percentage of "unique" projects (not matched with any other repository), which is usually below 15%, or by percentage of unique projects which have "related" ones (with matches by url, which mean they can potentialy be merged with something) which is usually below 10%.
I don't have a problem with it. There's minor problem with a lot of flavors which Repology cannot deduplicate yet (thus showing a bunch of similar entries on e.g. package versions page), but that does not affect statistics and statuses and will be fixed on Repology side someday.
As said, I don't feel much need to. From what I can see, most unmatched projects in OpenBSD are truly unique or have ambiguous names and have unmatched counterparts in other repos, so there's no OpenBSD specific problems here. Still, it can be improved somewhat by submitting rules to repology or renaming packages in OpenBSD where that makes sense. For instance, I see some potentially misnamed python modules, like
Yes, it's already there and I've deployed updated parser. |
Renaming packages is a bitch. We got a mechanism for that, but it is fairly expensive on the updater-side. |
Well, just suggesting, it's not fatal as it is. |
Looking at some false positives on OpenBSD, I noticed repology says security/ghidra is out of date. However that port is marked BROKEN on all archs right now. So it is not built. Therefore does the below change make sense? It filters out ports that are broken on ALL archs. This would stop showing what I feel is a false positive for security/ghidra being vulnarable on OpenBSD. (in fact the security issues were noted as part of the reason it was disabled. see: marc.info This diff purposely does not filter out ports that are only broken on SOME archs. diff --git a/repology/parsers/parsers/openbsd.py b/repology/parsers/parsers/openbsd.py
index e88fcc33..6357d297 100644
--- a/repology/parsers/parsers/openbsd.py
+++ b/repology/parsers/parsers/openbsd.py
@@ -58,6 +58,8 @@ FROM _Ports
ON Categories_ordered.FullPkgpath=_Ports.FullPkgpath
JOIN _Email
ON _Email.KeyRef=MAINTAINER
+WHERE
+ _Ports.FullPkgPath NOT IN (SELECT DISTINCT FullPkgPath FROM _Broken where Arch IS NULL)
"""
|
Another possible false positive. I can't figure out why repology says Python 3.10.13 in our tree has a vulnerability. Blindly guessing I see that we have CVE-2023-40217 which states that some affected versions include "[3.10.0, 3.10.13)". To my knowledge a square bracket on the right would mean version <= 3.10.13, while a parenthesis means version < 3.10.13. i.e. strictly less than version 3.10.13. Is it possible repology isn't marking the affected versions correctly? Or did I miss a vulnerability that's in Python 3.10.13 on my side? |
This makes sense, but I don't feel good about hiding information. Repology's target audience is not only maintainers but also upstream, and visibility of broken packages is especially important for these. Also there are maintainers from other repositories who may find even disabled package useful. Showing it as outdated is also a valid call for action. The point that its vulnerability does not in fact affect users because the package is not available is valid though, and it would be fair to at least exclude it from count of vulnerable packages. This requires a bit of development though (#1352).
You can click an exclamation mark to get a page which highlights CVEs matched with a given version: https://repology.org/project/python/cves?version=3.10.13 |
On Wed, Sep 27, 2023 at 02:53:34PM -0700, Dmitry Marakasov wrote:
> This would stop showing what I feel is a false positive for security/ghidra being vulnarable on OpenBSD
This makes sense, but I don't feel good about hiding information. Repology's target audience is not only maintainers but also upstream, and visibility of broken packages is especially important for these. Also there are maintainers from other repositories who may find even disabled package useful. Showing it as outdated is also a valid call for action. The point that its vulnerability does not in fact affect users because the package is not available is valid though, and it would be fair to at least exclude it from count of vulnerable packages. This requires a bit of development though (#1352).
Our ghidra is totally out-of-date. There are few OpenBSD developers who care
about updating java software.
I have zero idea about the status of our patches to make ghidra run,
whether they were committed to upstream (apparently there's at least
an issue).
From what I can see, it looks like somewhat poorly architectured software
that actually wants lists and lists of OSes instead of doing somewhat OO
stuff... well, not that I'm surprised considering the usual verbosity of
java.
Anyhow, if someone is interested enough in running a recent ghidra on
OpenBSD, they're welcome to try and update the port and submit patches.
|
some recent changes made it to sqlports (cf https://github.com/openbsd/ports/commits/master/databases/sqlports), which might have broken the openbsd updater in repology according to @ajacoutot. @marcespie might know better, i'm only the messenger and havent looked at the actual changes nor potential breakage.
right now sqlports doesnt have a "stable api" but https://marc.info/?l=openbsd-ports&m=169419062712772&w=2 hints that there might be one someday.
The text was updated successfully, but these errors were encountered: