Generalized paramset insert? #42

CBroz1 · 2022-07-19T18:15:10Z

Currently, multiple elements (and sometimes multiple schemas within each) have a paramset insert function

ephys
calcium imaging
deeplabcut (under revision)

Should element-interface host a generalized version of this process?

kabilar · 2022-07-19T18:29:31Z

Thanks @CBroz1. Yes, if possible, let's centralize. The attributes are not exactly the same in all cases so it will require refactoring.

Also, @dimitri-yatsenko had suggested the following refactor for ClusteringParamSet.insert_new_params. I would suggest keeping the function name as insert_new_params to ensure backwards compatibility.

@classmethod
def insert_params(cls, clustering_method: str, paramset_idx: int, paramset_desc: str, params: dict):
    param_hash = {'param_set_hash': dict_to_uuid(params)}
    try:
        existing_idx = (cls & param_hash).fetch1('paramset_idx')
    except dj.DataJointError:
        cls.insert1(dict(locals(), **param_hash), ignore_extra_fields=True)
    else:
        if existing_idx != paramset_idx: 
            raise dj.DataJointError(
                f'The specified param-set already exists - paramset_idx: {existing_idx}')

dimitri-yatsenko · 2022-08-04T19:22:37Z

Aha!

If the idea is to skip duplicates if the secondary parameters match, then we can consider the following type of implementation:

def insert1_skip_full_duplicates(table, entry):
    """
    This function inserts one entry into the table. 
    It ignores duplicates on if all the other entries also match.
    Duplicates on secondary keys are not ignored.
    After validation, this functionality will be integrating into core DataJoint.
    """
    try:
        table.insert1(entry)
    except dj.errors.DuplicateError as err:
        if "PRIMARY" not in err.args[0]:
            # secondary indexes are checked only after confirming primary uniqueness
            raise  
        key = {k: v for k, v in entry.items() if k in table.primary_key}
        if (table & key).fetch1() != entry:
            raise err.suggest(
                'Duplicate primary key with different secondary attributes from existing value.')
            
    
                                  
@schema
class PreprocessParamSet(dj.Lookup):
    definition = """  #  Parameter set used for pre-processing of calcium imaging data
    paramset_idx:  smallint
    ---
    -> PreprocessMethod
    paramset_desc: varchar(128)
    param_set_hash: uuid
    unique index (param_set_hash)
    params: longblob  # dictionary of all applicable parameters
    """

    @classmethod
    def insert_params(cls, paramset_idx: int, *, preprocess_method: str, paramset_desc: str, params: dict):
        """
        Insert a set of parameters. Ignore duplicates unless attempting to enter 
        the same parameters under a different primary key
        """
        # construct entry from input arguments
        param_set_hash = dict_to_uuid(params)
        entry = dict({k: v for k, v in locals().items() if k in cls.heading.names})
        insert1_skip_full_duplicates(cls(), entry)

dimitri-yatsenko · 2022-08-04T19:24:13Z

I would put the primary key as the first argument. To enforce backward validity, I would put * after the first argument as in the example above.

dimitri-yatsenko · 2022-08-04T19:32:46Z

Let's have a quick code review on this and propagate throughout.

ttngu207 · 2022-08-17T01:13:23Z

I'd replace

entry = dict({k: v for k, v in locals().items() if k in cls.heading.names})

with explicitly creating the dictionary

entry = dict(paramset_idx=paramset_idx, paramset_desc=paramset_desc, param_set_hash=param_set_hash, params=params)

It's a bit longer but more readable compared to using locals()

ttngu207 · 2022-08-17T01:14:42Z

@schema
class PreprocessParamSet(dj.Lookup):
    definition = """  #  Parameter set used for pre-processing of calcium imaging data
    paramset_idx:  smallint
    ---
    -> PreprocessMethod
    paramset_desc: varchar(128)
    param_set_hash: uuid
    unique index (param_set_hash)
    params: longblob  # dictionary of all applicable parameters
    """

    @classmethod
    def insert_params(cls, paramset_idx: int, *, preprocess_method: str, paramset_desc: str, params: dict):
        """
        Insert a set of parameters. Ignore duplicates unless attempting to enter 
        the same parameters under a different primary key
        """
        # construct entry from input arguments
        param_set_hash = dict_to_uuid({**params, 'preprocess_method': preprocess_method})
        entry = dict(paramset_idx=paramset_idx, 
          paramset_desc=paramset_desc, 
          param_set_hash=param_set_hash, 
          params=params)
        insert1_skip_full_duplicates(cls(), entry)

note that the param_set_hash needs to include also the preprocess_method

dimitri-yatsenko · 2022-08-31T20:50:14Z

note that the param_set_hash needs to include also the preprocess_method

But not paramset_desc?

tdincer · 2022-08-31T20:54:37Z

Let's not complicate it. process_method (or preprocess_method) is to indicate the suite2p vs caiman like information. These cases would have completely different set of parameters.

dimitri-yatsenko · 2022-08-31T22:22:59Z

It does not cost any extra to include the preprocess_methods in the hash and this would catch the condition when the method has changed but the parameters are the same, which can hypothetically happen.

ttngu207 · 2022-08-31T22:32:19Z

@tdincer process_method could be suite2p_2.0 or caiman_matlab, caiman_py (hypothetical cases but could very well happen)

CBroz1 · 2022-12-30T18:01:03Z

In my testing, I found that our previous drafts couldn't handle tables without a UNIQUE INDEX. I put together the following draft to handle more cases (tables without the hash attribute), at the while having some speed costs

def insert1_skip_full_duplicates(table: Table, entry: dict):
    """Insert one entry into a table, ignoring duplicates if all entries match.

    Duplicates on either primary or secondary attributes log existing entries. After
    validation, this functionality will be integrated into core DataJoint.

    Cases:
        0. New entry: insert as normal
        1. Entry has an exact match: return silently
        2. Same primary key, new secondary attributes: log warning with full entry
        3. New primary key, same secondary attributes: log warning with existing key

    Arguments:
        table (Table): datajoint Table object 
        entry (dict): table entry as a dictionary
    """
    if table & entry:  # Test for 1. Return silently if exact match
        return

    existing_entry_via_secondary = table & {  # Test for Case 3
        k: v for k, v in entry.items() if k not in table.primary_key
    }
    if existing_entry_via_secondary:
        logger.warning(  # Log warning if secondary attribs already exist under diff key
            "Entry already exists with a different primary key:\n\t"
            + str(existing_entry_via_secondary.fetch1("KEY"))
        )
        return

    existing_entry_via_primary = table & {  # Test for 2, existing primary key
        k: v for k, v in entry.items() if k in table.primary_key
    }
    if existing_entry_via_primary:
        logger.warning(  # Log warning if existing primary key
            "Primary key already exists in the following entry:\n\t"
            + str(existing_entry_via_primary.fetch1())
        )
        return

    table.insert1(entry)  # Handles 0, full new entry

CBroz1 · 2023-02-01T20:22:26Z

Issue incorrectly closed due to PR closure after solve was tabled

CBroz1 added the enhancement New feature or request label Jul 19, 2022

CBroz1 assigned kabilar Jul 19, 2022

kabilar assigned CBroz1 and unassigned kabilar Jul 19, 2022

CBroz1 mentioned this issue Aug 4, 2022

Add imaging_preprocess module datajoint/element-calcium-imaging#83

Merged

This was referenced Aug 10, 2022

Add insert1_skip_full_duplicates and remove recursive_search #43

Closed

Query simplifications, edits in table explanations, some code refactoring datajoint/element-calcium-imaging#82

Merged

dimitri-yatsenko mentioned this issue Sep 1, 2022

Advanced handling of duplicates in insert1 datajoint/datajoint-python#1049

Open

CBroz1 linked a pull request Dec 20, 2022 that will close this issue

Add insert1_skip_full_duplicates and remove recursive_search #43

Closed

CBroz1 linked a pull request Dec 30, 2022 that will close this issue

Add pre-commit & utilities for pytests #71

Merged

6 tasks

CBroz1 mentioned this issue Dec 30, 2022

Add pre-commit & utilities for pytests #71

Merged

6 tasks

kushalbakshi closed this as completed in #71 Jan 11, 2023

CBroz1 reopened this Feb 1, 2023

CBroz1 removed their assignment Feb 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalized paramset insert? #42

Generalized paramset insert? #42

CBroz1 commented Jul 19, 2022

kabilar commented Jul 19, 2022 •

edited

Loading

dimitri-yatsenko commented Aug 4, 2022 •

edited

Loading

dimitri-yatsenko commented Aug 4, 2022 •

edited

Loading

dimitri-yatsenko commented Aug 4, 2022

ttngu207 commented Aug 17, 2022

ttngu207 commented Aug 17, 2022 •

edited

Loading

dimitri-yatsenko commented Aug 31, 2022

tdincer commented Aug 31, 2022

dimitri-yatsenko commented Aug 31, 2022

ttngu207 commented Aug 31, 2022

CBroz1 commented Dec 30, 2022

CBroz1 commented Feb 1, 2023

Generalized paramset insert? #42

Generalized paramset insert? #42

Comments

CBroz1 commented Jul 19, 2022

kabilar commented Jul 19, 2022 • edited Loading

dimitri-yatsenko commented Aug 4, 2022 • edited Loading

dimitri-yatsenko commented Aug 4, 2022 • edited Loading

dimitri-yatsenko commented Aug 4, 2022

ttngu207 commented Aug 17, 2022

ttngu207 commented Aug 17, 2022 • edited Loading

dimitri-yatsenko commented Aug 31, 2022

tdincer commented Aug 31, 2022

dimitri-yatsenko commented Aug 31, 2022

ttngu207 commented Aug 31, 2022

CBroz1 commented Dec 30, 2022

CBroz1 commented Feb 1, 2023

kabilar commented Jul 19, 2022 •

edited

Loading

dimitri-yatsenko commented Aug 4, 2022 •

edited

Loading

dimitri-yatsenko commented Aug 4, 2022 •

edited

Loading

ttngu207 commented Aug 17, 2022 •

edited

Loading