Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridsearchCV and pipeline: input dimensionality #263

Open
FlyingFordAnglia opened this issue Nov 11, 2024 · 3 comments
Open

GridsearchCV and pipeline: input dimensionality #263

FlyingFordAnglia opened this issue Nov 11, 2024 · 3 comments

Comments

@FlyingFordAnglia
Copy link

FlyingFordAnglia commented Nov 11, 2024

Hi! I am trying to fit a glm to some spiking data from a bunch of neurons. My design matrix is the binned spike counts of all neurons, and my 'y' is the spike counts of the neuron I am interested in. Before fitting the glm, I wanted to run a grid search for hyperparameter tuning.
When I run the attached code, I get the following error:

TypeError: Input dimensionality mismatch. This basis evaluation requires 1 inputs, 15 inputs provided instead.

From what I can gather, it appears that the fit_transform method that gridsearchCV uses internally expects a design matrix with a single column, not a matrix of n_samples, n_features. How can I get this to work?

            # region Hyperparameter tuning
            num_bases = 10
            print(f'Number of basis functions: {num_bases}')
            basis = nemos.basis.RaisedCosineBasisLinear(n_basis_funcs=num_bases, mode="conv", window_size=filter_size)
            transformer_basis = basis.to_transformer()
            neuron = 15
            print(f'{neuron} Neurons considered = {neurons_slice[0:neuron]}')
            spike_counts = spike_dat[:][neurons_slice[0:neuron], :time_vec_cut_index].T
            train_spike_counts = spike_counts[0:int(len(spike_counts) * 0.7), :]
            pipeline = Pipeline(
                [
                    (
                        "transformerbasis",
                        transformer_basis,
                    ),
                    (
                        "glm",
                        nemos.glm.GLM(regularizer_strength=0.5, regularizer="Ridge", solver_kwargs={'verbose': True}),
                    ),
                ]
            )
            param_grid = dict(
                glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
                transformerbasis__n_basis_funcs=(5, 10, 15, 20),
            )
            gridsearch = GridSearchCV(
                pipeline,
                param_grid=param_grid,
                cv=2
            )
            gridsearch.fit(train_spike_counts, train_spike_counts[:, glm_neuron_id].flatten())
            cvdf = pd.DataFrame(gridsearch.cv_results_)

            cvdf_wide = cvdf.pivot(
                index="param_transformerbasis__n_basis_funcs",
                columns="param_glm__regularizer_strength",
                values="mean_test_score",
            )
            plot_heatmap_cv_results(cvdf_wide)
            # best_params = hyper_param_tuning()
            sys.exit()
            # endregion

My installed nemos version is 0.1.6 and sklearn version is 1.5.0

A tangential question: How do I integrate batch gradient descent with this pipeline?

Any help would be appreciated, thanks!

@sjvenditto
Copy link
Collaborator

Currently, basis objects assume a single input, and addressing this issue is a work-in-progress. The current work-around is to define a basis in param_grid that matches the dimensionality of the input; in your case, this will be an additive basis with the number of components (RaisedCosineBasisLinear bases) equal to the number of neurons. This will look like:

param_grid = dict(
                glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
                transformerbasis__basis=[basis*neuron],
            )

where basis*neuron is shorthand for adding basis together neuron times. Unfortunately, this solution will raise another error in both main and dev branches having to do with transformer basis property names (as well as the shorthand not existing). This issue is being fixed in PR #235, but you can try it out in the meantime by using the fix_transformer branch in nemos if you've installed it from source. Let me know if this works for you!

@BalzaniEdoardo
Copy link
Collaborator

BalzaniEdoardo commented Dec 16, 2024

Hi, sorry for the delay with this fix but we were in the middle of improving the basis module structure.
We still did not merge the fix into the main release, but you can try it out the new api by forking the repo, and checking out to the branch improve_transformer_api.

The other simpler alternative is install directly the branch, from your environment,

pip install git+https://github.com/flatironinstitute/nemos.git@improve_transformer_api

You should update the installation once we release the new version of nemos with this fixes incorporated.

Below is how to fix your script with the new TransformerBasis fixes.

import nemos
import pandas as pd

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

# generate some data for illustration
spike_dat = np.random.poisson(size=(20, 1000))

# region Hyperparameter tuning
num_bases = 10
filter_size=100
neurons_slice = np.arange(10)

time_vec_cut_index = 10000
glm_neuron_id = 0

print(f'Number of basis functions: {num_bases}')

# new basis class name (now, distinct classes for conv and eval)
basis = nemos.basis.RaisedCosineLinearConv(n_basis_funcs=num_bases, window_size=filter_size)


transformer_basis = basis.to_transformer()
neuron = 15
print(f'{neuron} Neurons considered = {neurons_slice[0:neuron]}')
spike_counts = spike_dat[:][neurons_slice[0:neuron], :time_vec_cut_index].T

# must tell the transformer how many inputs the basis has to process
# you can  pass the number of inputs or the input directly
transformer_basis.set_input_shape(spike_counts)

train_spike_counts = spike_counts[0:int(len(spike_counts) * 0.7), :]
pipeline = Pipeline(
    [
        (
            "transformerbasis",
            transformer_basis,
        ),
        (
            "glm",
            nemos.glm.GLM(regularizer_strength=0.5, regularizer="Ridge", solver_kwargs={'verbose': True}),
        ),
    ]
)
param_grid = dict(
    glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
    transformerbasis__n_basis_funcs=(5, 10, 15, 20),
)
gridsearch = GridSearchCV(
    pipeline,
    param_grid=param_grid,
    cv=2
)
gridsearch.fit(train_spike_counts, train_spike_counts[:, glm_neuron_id].flatten())
cvdf = pd.DataFrame(gridsearch.cv_results_)

cvdf_wide = cvdf.pivot(
    index="param_transformerbasis__n_basis_funcs",
    columns="param_glm__regularizer_strength",
    values="mean_test_score",
)

@BalzaniEdoardo
Copy link
Collaborator

@FlyingFordAnglia let me know if that's working well for you and thank you for bringing this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants