-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid full object reindex on security reindex #143
Conversation
Add `update_metadata=0` parameter to reindexObjectSecurity to avoid a full reindex instead of just `allowedRolesAndUsers`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should at least add a CHANGES.rst
entry describing your change.
Ideally, you would also provide a test verifying that your change is effective. However, as this might be difficult and effectiveness obvious, I do not insist on such a test.
To contribute you should sign a |
@d-maurer Sorry, I don't really have an explanation as to why I have not initially updated the changelog (but it is done now). And yes, testing in this case would be far too complicated for the simplicity and effectiveness of the fix. Also, I was under the impression, that the Plone Contributor Agreement is used, which I have signed years ago, I am a contributor on Plone already. But true, back then I specified Plone, not Zope. I have just sent another signed copy to agreements@plone with Zope mentioned. Please let me know if I need to wait for that to go through and create another branch directly here, or you could merge this private fork. |
Valentin Dumitru wrote at 2024-8-20 03:58 -0700:
...
> You should at least add a `CHANGES.rst` entry describing your change.
>
> Ideally, you would also provide a test verifying that your change is effective. However, as this might be difficult and effectiveness obvious, I do not insist on such a test.
@d-maurer Sorry, I don't really have an explanation as to why I have not initially updated the changelog (but it is done now). And yes, testing in this case would be far too complicated for the simplicity and effectiveness of the fix.
Also, I was under the impression, that the Plone Contributor Agreement is used, which I have signed years ago, I am a contributor on Plone already. But true, back then I specified Plone, not Zope. I have just sent another signed copy to ***@***.*** with Zope mentioned.
@icemac processes the contributor agreements (and adds you to
the `zopefoundation` organisation).
After that, you can create pull requests in `zopefoundation`
repositories.
Please let me know if I need to wait for that to go through and create another branch directly here, or you could merge this private fork.
I have run the tests for your pull request and there are wide spread
failures - at least most of them related to your change.
e.g.
`py310`
Failure in test test_reindexObjectSecurity_missing_noraise (Products.CMFCore.tests.test_CMFCatalogAware.CMFCatalogAwareTests)
Traceback (most recent call last):
...
File "/home/runner/work/Products.CMFCore/Products.CMFCore/src/Products/CMFCore/tests/test_CMFCatalogAware.py", line 228, in test_reindexObjectSecurity_missing_noraise
self.assertEqual(
...
AssertionError: Lists differ: ["reindex /site/foo ('allowedRolesAndUsers',) 0"] != ["reindex /site/foo ('allowedRolesAndUsers',) 1"]
First differing element 0:
"reindex /site/foo ('allowedRolesAndUsers',) 0"
"reindex /site/foo ('allowedRolesAndUsers',) 1"
- ["reindex /site/foo ('allowedRolesAndUsers',) 0"]
? ^
+ ["reindex /site/foo ('allowedRolesAndUsers',) 1"]
`lint`
/home/runner/work/Products.CMFCore/Products.CMFCore/src/Products/CMFCore/CMFCatalogAware.py:128:80: E501 line too long (80 > 79 characters)
lint: exit 1 (2.19 seconds) /home/runner/work/Products.CMFCore/Products.CMFCore> flake8 /home/runner/work/Products.CMFCore/Products.CMFCore/src /home/runner/work/Products.CMFCore/
You can run tests locally via `tox`, e.g.
* `tox -e lint` to check for cosmetic problems
* `tox -e py310` to check for Python 3.10.
You must have installed `tox` to do this.
|
fix lint error
@d-maurer I have fixed those tests, which were clearly designed to just mirror the incorrect situation where the metadata was updated. I have also fixed the linting error. |
Just a small correction: I am not the one who processes contributor agreements, this is done by the Plone foundation. To bo honest I do not even know who actually does this. |
The broken docs test is fixed in #144 and might require a rebase or merge with master, so it gets also fixed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be safer to only skip updating the catalog metadata if one of the _cmf_security_indexes is not also included in the metadata columns.
David Glick wrote at 2024-8-21 12:54 -0700:
@davisagli requested changes on this pull request.
It would be safer to only skip updating the catalog metadata if one of the _cmf_security_indexes is not also included in the metadata columns.
I do not think that this is necessary:
unless overridden by derived classes, `_cmf_security_indexes`
is `('allowedRolesAndUsers',)`.
This is a specialized index with the (likely) only purpose
to help the catalog not to include in search results objects the
searcher cannot access. It is unlikely that someone adds this
to the metadata (as the information is not relevant to be displayed
for "normal" users; if necessary for debugging purposes, it
can be looked up via the catalog's ZMI).
If a class overrides `_cmf_security_indexes` and includes
an index likely to be used as metadata, too,
it can provide its own `reindexObjectSecurity`.
Currently, I would not add additional complexity
combining `_cmf_catalog_indexes` and metadata.
|
@d-maurer I mentioned it because I have added |
David Glick wrote at 2024-8-22 07:56 -0700:
... I have added `allowedRolesAndUsers` to the catalog metadata for one project, where I needed to be able to read it from catalog results.
This is surprising: the catalog will ensure that only documents
the current searcher can access are returned as part of the search result.
In principle, this is everything the current searcher should be interested in:
why should he want to know what roles or other users could access the
document as well?
For special purposes, the information could be easily obtained from the
"reverse map" associated with the "allowed_roles_and_users" index
(each index is actually composed of a forward
(term --> did) and a reverse (did -> terms) map).
There should be no real need to store the information redundantly
in the metadata.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-maurer I went to check the project I was thinking of, and can't find it, so perhaps I was remembering wrong. I'll approve this change.
I do wish that the ZCatalog API worked differently here. It should require only passing a list of attributes that changed, and then it should be the catalog's responsibility to efficiently update the indexes and catalog metadata that is based on those attributes, if any. Not possible to change now, but perhaps a new API could be added...
For special purposes, the information could be easily obtained from the
"reverse map" associated with the "allowed_roles_and_users" index
This is a good point -- but as far as I'm aware, there isn't a convenient API for getting this info from a catalog query result.
David Glick wrote at 2024-8-22 13:31 -0700:
...
I do wish that the ZCatalog API worked differently here. It should require only passing a list of attributes that changed, and then it should be the catalog's responsibility to efficiently update the indexes and catalog metadata that is based on those attributes, if any.
I agree. In particular, this would make the metadata update symmetric
to the index update -- and symmetrie is always a nice thing.
...
This is a good point -- but as far as I'm aware, there isn't a convenient API for getting this info from a catalog query result.
I cannot much say about the "convenience", but you can get the information
from a catalog result proxy:
Such a proxy (AKA "brain") has the attribute `data_record_id_`; this
gives the id used by the catalog to identify the object.
You can then use `ZCatalog.getIndexDataForRID` to get all index information
for this "RID" (aka "data_record_id_").
Because this gives you the complete index information, it is
inefficient and likely should not be used regularly.
Obtaining the information for a single index it more difficult:
you use `ZCatalog._catalog.getIndex(name)` to get the index named `name`
and then use `index._unindex` to get the reverse map (mapping
the `data_record_id_` to the indexed terms).
|
So the |
@icemac Sorry, I forgot to tag you. |
@icemac Also perhaps worth mentioning that I have signed the contributor agreement and I have been approved as a contributor. |
@valipod Thank you for signing the contributor agreement. |
@d-maurer Can we have new release? |
I'll do a release after #145 is reviewed and merged. |
Release 3.6 is now live. |
Add
update_metadata=0
parameter toreindexObjectSecurity
to avoid a full reindex and instead just reindexallowedRolesAndUsers
.This fixes #142