-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalized paramset insert? #42
Comments
Thanks @CBroz1. Yes, if possible, let's centralize. The attributes are not exactly the same in all cases so it will require refactoring. Also, @dimitri-yatsenko had suggested the following refactor for @classmethod
def insert_params(cls, clustering_method: str, paramset_idx: int, paramset_desc: str, params: dict):
param_hash = {'param_set_hash': dict_to_uuid(params)}
try:
existing_idx = (cls & param_hash).fetch1('paramset_idx')
except dj.DataJointError:
cls.insert1(dict(locals(), **param_hash), ignore_extra_fields=True)
else:
if existing_idx != paramset_idx:
raise dj.DataJointError(
f'The specified param-set already exists - paramset_idx: {existing_idx}') |
Aha! If the idea is to skip duplicates if the secondary parameters match, then we can consider the following type of implementation: def insert1_skip_full_duplicates(table, entry):
"""
This function inserts one entry into the table.
It ignores duplicates on if all the other entries also match.
Duplicates on secondary keys are not ignored.
After validation, this functionality will be integrating into core DataJoint.
"""
try:
table.insert1(entry)
except dj.errors.DuplicateError as err:
if "PRIMARY" not in err.args[0]:
# secondary indexes are checked only after confirming primary uniqueness
raise
key = {k: v for k, v in entry.items() if k in table.primary_key}
if (table & key).fetch1() != entry:
raise err.suggest(
'Duplicate primary key with different secondary attributes from existing value.')
@schema
class PreprocessParamSet(dj.Lookup):
definition = """ # Parameter set used for pre-processing of calcium imaging data
paramset_idx: smallint
---
-> PreprocessMethod
paramset_desc: varchar(128)
param_set_hash: uuid
unique index (param_set_hash)
params: longblob # dictionary of all applicable parameters
"""
@classmethod
def insert_params(cls, paramset_idx: int, *, preprocess_method: str, paramset_desc: str, params: dict):
"""
Insert a set of parameters. Ignore duplicates unless attempting to enter
the same parameters under a different primary key
"""
# construct entry from input arguments
param_set_hash = dict_to_uuid(params)
entry = dict({k: v for k, v in locals().items() if k in cls.heading.names})
insert1_skip_full_duplicates(cls(), entry) |
I would put the primary key as the first argument. To enforce backward validity, I would put |
Let's have a quick code review on this and propagate throughout. |
I'd replace entry = dict({k: v for k, v in locals().items() if k in cls.heading.names}) with explicitly creating the dictionary entry = dict(paramset_idx=paramset_idx, paramset_desc=paramset_desc, param_set_hash=param_set_hash, params=params) It's a bit longer but more readable compared to using |
@schema
class PreprocessParamSet(dj.Lookup):
definition = """ # Parameter set used for pre-processing of calcium imaging data
paramset_idx: smallint
---
-> PreprocessMethod
paramset_desc: varchar(128)
param_set_hash: uuid
unique index (param_set_hash)
params: longblob # dictionary of all applicable parameters
"""
@classmethod
def insert_params(cls, paramset_idx: int, *, preprocess_method: str, paramset_desc: str, params: dict):
"""
Insert a set of parameters. Ignore duplicates unless attempting to enter
the same parameters under a different primary key
"""
# construct entry from input arguments
param_set_hash = dict_to_uuid({**params, 'preprocess_method': preprocess_method})
entry = dict(paramset_idx=paramset_idx,
paramset_desc=paramset_desc,
param_set_hash=param_set_hash,
params=params)
insert1_skip_full_duplicates(cls(), entry) note that the |
But not |
|
It does not cost any extra to include the |
@tdincer |
In my testing, I found that our previous drafts couldn't handle tables without a def insert1_skip_full_duplicates(table: Table, entry: dict):
"""Insert one entry into a table, ignoring duplicates if all entries match.
Duplicates on either primary or secondary attributes log existing entries. After
validation, this functionality will be integrated into core DataJoint.
Cases:
0. New entry: insert as normal
1. Entry has an exact match: return silently
2. Same primary key, new secondary attributes: log warning with full entry
3. New primary key, same secondary attributes: log warning with existing key
Arguments:
table (Table): datajoint Table object
entry (dict): table entry as a dictionary
"""
if table & entry: # Test for 1. Return silently if exact match
return
existing_entry_via_secondary = table & { # Test for Case 3
k: v for k, v in entry.items() if k not in table.primary_key
}
if existing_entry_via_secondary:
logger.warning( # Log warning if secondary attribs already exist under diff key
"Entry already exists with a different primary key:\n\t"
+ str(existing_entry_via_secondary.fetch1("KEY"))
)
return
existing_entry_via_primary = table & { # Test for 2, existing primary key
k: v for k, v in entry.items() if k in table.primary_key
}
if existing_entry_via_primary:
logger.warning( # Log warning if existing primary key
"Primary key already exists in the following entry:\n\t"
+ str(existing_entry_via_primary.fetch1())
)
return
table.insert1(entry) # Handles 0, full new entry |
Issue incorrectly closed due to PR closure after solve was tabled |
Currently, multiple elements (and sometimes multiple schemas within each) have a paramset insert function
Should element-interface host a generalized version of this process?
The text was updated successfully, but these errors were encountered: