Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized paramset insert? #42

Open
CBroz1 opened this issue Jul 19, 2022 · 12 comments · Fixed by #71
Open

Generalized paramset insert? #42

CBroz1 opened this issue Jul 19, 2022 · 12 comments · Fixed by #71
Labels
enhancement New feature or request

Comments

@CBroz1
Copy link
Contributor

CBroz1 commented Jul 19, 2022

Currently, multiple elements (and sometimes multiple schemas within each) have a paramset insert function

Should element-interface host a generalized version of this process?

@CBroz1 CBroz1 added the enhancement New feature or request label Jul 19, 2022
@kabilar
Copy link
Collaborator

kabilar commented Jul 19, 2022

Thanks @CBroz1. Yes, if possible, let's centralize. The attributes are not exactly the same in all cases so it will require refactoring.

Also, @dimitri-yatsenko had suggested the following refactor for ClusteringParamSet.insert_new_params. I would suggest keeping the function name as insert_new_params to ensure backwards compatibility.

@classmethod
def insert_params(cls, clustering_method: str, paramset_idx: int, paramset_desc: str, params: dict):
    param_hash = {'param_set_hash': dict_to_uuid(params)}
    try:
        existing_idx = (cls & param_hash).fetch1('paramset_idx')
    except dj.DataJointError:
        cls.insert1(dict(locals(), **param_hash), ignore_extra_fields=True)
    else:
        if existing_idx != paramset_idx: 
            raise dj.DataJointError(
                f'The specified param-set already exists - paramset_idx: {existing_idx}')

@dimitri-yatsenko
Copy link
Member

dimitri-yatsenko commented Aug 4, 2022

Aha!

If the idea is to skip duplicates if the secondary parameters match, then we can consider the following type of implementation:

def insert1_skip_full_duplicates(table, entry):
    """
    This function inserts one entry into the table. 
    It ignores duplicates on if all the other entries also match.
    Duplicates on secondary keys are not ignored.
    After validation, this functionality will be integrating into core DataJoint.
    """
    try:
        table.insert1(entry)
    except dj.errors.DuplicateError as err:
        if "PRIMARY" not in err.args[0]:
            # secondary indexes are checked only after confirming primary uniqueness
            raise  
        key = {k: v for k, v in entry.items() if k in table.primary_key}
        if (table & key).fetch1() != entry:
            raise err.suggest(
                'Duplicate primary key with different secondary attributes from existing value.')
            
    
                                  
@schema
class PreprocessParamSet(dj.Lookup):
    definition = """  #  Parameter set used for pre-processing of calcium imaging data
    paramset_idx:  smallint
    ---
    -> PreprocessMethod
    paramset_desc: varchar(128)
    param_set_hash: uuid
    unique index (param_set_hash)
    params: longblob  # dictionary of all applicable parameters
    """

    @classmethod
    def insert_params(cls, paramset_idx: int, *, preprocess_method: str, paramset_desc: str, params: dict):
        """
        Insert a set of parameters. Ignore duplicates unless attempting to enter 
        the same parameters under a different primary key
        """
        # construct entry from input arguments
        param_set_hash = dict_to_uuid(params)
        entry = dict({k: v for k, v in locals().items() if k in cls.heading.names})
        insert1_skip_full_duplicates(cls(), entry)

@dimitri-yatsenko
Copy link
Member

dimitri-yatsenko commented Aug 4, 2022

I would put the primary key as the first argument. To enforce backward validity, I would put * after the first argument as in the example above.

@dimitri-yatsenko
Copy link
Member

Let's have a quick code review on this and propagate throughout.

@ttngu207
Copy link
Contributor

I'd replace

entry = dict({k: v for k, v in locals().items() if k in cls.heading.names})

with explicitly creating the dictionary

entry = dict(paramset_idx=paramset_idx, paramset_desc=paramset_desc, param_set_hash=param_set_hash, params=params)

It's a bit longer but more readable compared to using locals()

@ttngu207
Copy link
Contributor

ttngu207 commented Aug 17, 2022

@schema
class PreprocessParamSet(dj.Lookup):
    definition = """  #  Parameter set used for pre-processing of calcium imaging data
    paramset_idx:  smallint
    ---
    -> PreprocessMethod
    paramset_desc: varchar(128)
    param_set_hash: uuid
    unique index (param_set_hash)
    params: longblob  # dictionary of all applicable parameters
    """

    @classmethod
    def insert_params(cls, paramset_idx: int, *, preprocess_method: str, paramset_desc: str, params: dict):
        """
        Insert a set of parameters. Ignore duplicates unless attempting to enter 
        the same parameters under a different primary key
        """
        # construct entry from input arguments
        param_set_hash = dict_to_uuid({**params, 'preprocess_method': preprocess_method})
        entry = dict(paramset_idx=paramset_idx, 
          paramset_desc=paramset_desc, 
          param_set_hash=param_set_hash, 
          params=params)
        insert1_skip_full_duplicates(cls(), entry)

note that the param_set_hash needs to include also the preprocess_method

@dimitri-yatsenko
Copy link
Member

note that the param_set_hash needs to include also the preprocess_method

But not paramset_desc?

@tdincer
Copy link
Contributor

tdincer commented Aug 31, 2022

Let's not complicate it. process_method (or preprocess_method) is to indicate the suite2p vs caiman like information. These cases would have completely different set of parameters.

@dimitri-yatsenko
Copy link
Member

It does not cost any extra to include the preprocess_methods in the hash and this would catch the condition when the method has changed but the parameters are the same, which can hypothetically happen.

@ttngu207
Copy link
Contributor

@tdincer process_method could be suite2p_2.0 or caiman_matlab, caiman_py (hypothetical cases but could very well happen)

@CBroz1
Copy link
Contributor Author

CBroz1 commented Dec 30, 2022

In my testing, I found that our previous drafts couldn't handle tables without a UNIQUE INDEX. I put together the following draft to handle more cases (tables without the hash attribute), at the while having some speed costs

def insert1_skip_full_duplicates(table: Table, entry: dict):
    """Insert one entry into a table, ignoring duplicates if all entries match.

    Duplicates on either primary or secondary attributes log existing entries. After
    validation, this functionality will be integrated into core DataJoint.

    Cases:
        0. New entry: insert as normal
        1. Entry has an exact match: return silently
        2. Same primary key, new secondary attributes: log warning with full entry
        3. New primary key, same secondary attributes: log warning with existing key

    Arguments:
        table (Table): datajoint Table object 
        entry (dict): table entry as a dictionary
    """
    if table & entry:  # Test for 1. Return silently if exact match
        return

    existing_entry_via_secondary = table & {  # Test for Case 3
        k: v for k, v in entry.items() if k not in table.primary_key
    }
    if existing_entry_via_secondary:
        logger.warning(  # Log warning if secondary attribs already exist under diff key
            "Entry already exists with a different primary key:\n\t"
            + str(existing_entry_via_secondary.fetch1("KEY"))
        )
        return

    existing_entry_via_primary = table & {  # Test for 2, existing primary key
        k: v for k, v in entry.items() if k in table.primary_key
    }
    if existing_entry_via_primary:
        logger.warning(  # Log warning if existing primary key
            "Primary key already exists in the following entry:\n\t"
            + str(existing_entry_via_primary.fetch1())
        )
        return

    table.insert1(entry)  # Handles 0, full new entry

@CBroz1 CBroz1 linked a pull request Dec 30, 2022 that will close this issue
6 tasks
@CBroz1
Copy link
Contributor Author

CBroz1 commented Feb 1, 2023

Issue incorrectly closed due to PR closure after solve was tabled

@CBroz1 CBroz1 reopened this Feb 1, 2023
@CBroz1 CBroz1 removed their assignment Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants