GridsearchCV and pipeline: input dimensionality #263

FlyingFordAnglia · 2024-11-11T21:43:02Z

Hi! I am trying to fit a glm to some spiking data from a bunch of neurons. My design matrix is the binned spike counts of all neurons, and my 'y' is the spike counts of the neuron I am interested in. Before fitting the glm, I wanted to run a grid search for hyperparameter tuning.
When I run the attached code, I get the following error:

TypeError: Input dimensionality mismatch. This basis evaluation requires 1 inputs, 15 inputs provided instead.

From what I can gather, it appears that the fit_transform method that gridsearchCV uses internally expects a design matrix with a single column, not a matrix of n_samples, n_features. How can I get this to work?

            # region Hyperparameter tuning
            num_bases = 10
            print(f'Number of basis functions: {num_bases}')
            basis = nemos.basis.RaisedCosineBasisLinear(n_basis_funcs=num_bases, mode="conv", window_size=filter_size)
            transformer_basis = basis.to_transformer()
            neuron = 15
            print(f'{neuron} Neurons considered = {neurons_slice[0:neuron]}')
            spike_counts = spike_dat[:][neurons_slice[0:neuron], :time_vec_cut_index].T
            train_spike_counts = spike_counts[0:int(len(spike_counts) * 0.7), :]
            pipeline = Pipeline(
                [
                    (
                        "transformerbasis",
                        transformer_basis,
                    ),
                    (
                        "glm",
                        nemos.glm.GLM(regularizer_strength=0.5, regularizer="Ridge", solver_kwargs={'verbose': True}),
                    ),
                ]
            )
            param_grid = dict(
                glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
                transformerbasis__n_basis_funcs=(5, 10, 15, 20),
            )
            gridsearch = GridSearchCV(
                pipeline,
                param_grid=param_grid,
                cv=2
            )
            gridsearch.fit(train_spike_counts, train_spike_counts[:, glm_neuron_id].flatten())
            cvdf = pd.DataFrame(gridsearch.cv_results_)

            cvdf_wide = cvdf.pivot(
                index="param_transformerbasis__n_basis_funcs",
                columns="param_glm__regularizer_strength",
                values="mean_test_score",
            )
            plot_heatmap_cv_results(cvdf_wide)
            # best_params = hyper_param_tuning()
            sys.exit()
            # endregion

My installed nemos version is 0.1.6 and sklearn version is 1.5.0

A tangential question: How do I integrate batch gradient descent with this pipeline?

Any help would be appreciated, thanks!

The text was updated successfully, but these errors were encountered:

sjvenditto · 2024-11-12T22:38:35Z

Currently, basis objects assume a single input, and addressing this issue is a work-in-progress. The current work-around is to define a basis in param_grid that matches the dimensionality of the input; in your case, this will be an additive basis with the number of components (RaisedCosineBasisLinear bases) equal to the number of neurons. This will look like:

param_grid = dict(
                glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
                transformerbasis__basis=[basis*neuron],
            )

where basis*neuron is shorthand for adding basis together neuron times. Unfortunately, this solution will raise another error in both main and dev branches having to do with transformer basis property names (as well as the shorthand not existing). This issue is being fixed in PR #235, but you can try it out in the meantime by using the fix_transformer branch in nemos if you've installed it from source. Let me know if this works for you!

BalzaniEdoardo · 2024-12-16T14:27:26Z

Hi, sorry for the delay with this fix but we were in the middle of improving the basis module structure.
We still did not merge the fix into the main release, but you can try it out the new api by forking the repo, and checking out to the branch improve_transformer_api.

The other simpler alternative is install directly the branch, from your environment,

pip install git+https://github.com/flatironinstitute/nemos.git@improve_transformer_api

You should update the installation once we release the new version of nemos with this fixes incorporated.

Below is how to fix your script with the new TransformerBasis fixes.

import nemos
import pandas as pd

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

# generate some data for illustration
spike_dat = np.random.poisson(size=(20, 1000))

# region Hyperparameter tuning
num_bases = 10
filter_size=100
neurons_slice = np.arange(10)

time_vec_cut_index = 10000
glm_neuron_id = 0

print(f'Number of basis functions: {num_bases}')

# new basis class name (now, distinct classes for conv and eval)
basis = nemos.basis.RaisedCosineLinearConv(n_basis_funcs=num_bases, window_size=filter_size)


transformer_basis = basis.to_transformer()
neuron = 15
print(f'{neuron} Neurons considered = {neurons_slice[0:neuron]}')
spike_counts = spike_dat[:][neurons_slice[0:neuron], :time_vec_cut_index].T

# must tell the transformer how many inputs the basis has to process
# you can  pass the number of inputs or the input directly
transformer_basis.set_input_shape(spike_counts)

train_spike_counts = spike_counts[0:int(len(spike_counts) * 0.7), :]
pipeline = Pipeline(
    [
        (
            "transformerbasis",
            transformer_basis,
        ),
        (
            "glm",
            nemos.glm.GLM(regularizer_strength=0.5, regularizer="Ridge", solver_kwargs={'verbose': True}),
        ),
    ]
)
param_grid = dict(
    glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
    transformerbasis__n_basis_funcs=(5, 10, 15, 20),
)
gridsearch = GridSearchCV(
    pipeline,
    param_grid=param_grid,
    cv=2
)
gridsearch.fit(train_spike_counts, train_spike_counts[:, glm_neuron_id].flatten())
cvdf = pd.DataFrame(gridsearch.cv_results_)

cvdf_wide = cvdf.pivot(
    index="param_transformerbasis__n_basis_funcs",
    columns="param_glm__regularizer_strength",
    values="mean_test_score",
)

BalzaniEdoardo · 2024-12-16T14:33:58Z

@FlyingFordAnglia let me know if that's working well for you and thank you for bringing this up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridsearchCV and pipeline: input dimensionality #263

GridsearchCV and pipeline: input dimensionality #263

FlyingFordAnglia commented Nov 11, 2024 •

edited

Loading

sjvenditto commented Nov 12, 2024

BalzaniEdoardo commented Dec 16, 2024 •

edited

Loading

BalzaniEdoardo commented Dec 16, 2024

GridsearchCV and pipeline: input dimensionality #263

GridsearchCV and pipeline: input dimensionality #263

Comments

FlyingFordAnglia commented Nov 11, 2024 • edited Loading

sjvenditto commented Nov 12, 2024

BalzaniEdoardo commented Dec 16, 2024 • edited Loading

BalzaniEdoardo commented Dec 16, 2024

FlyingFordAnglia commented Nov 11, 2024 •

edited

Loading

BalzaniEdoardo commented Dec 16, 2024 •

edited

Loading