Feature requests: when linked by ID, 1) allow cross-dataset visualization, and 2) merge datasets on ID #2529

janeadams · 2025-01-23T17:33:02Z

Describe the problem
Say I have two datasets with an ID, and I want to visualize a 2D scatter of some measure in two different experiments. Currently, even though I have linked the datasets by ID, I cannot accomplish this in either of the following ways:

I can't keep the two datasets separate and drag them both to the same chart, because glue doesn't allow a chart to rely on more than one dataset
I can't select both datasets and choose "merge", because even though they are linked by ID, they become merged by index. I know this because of the chart below, which shows that the rank-order of the measures of these genes in each experiment is the same, which I know is not the case. I could "force" the merge to work correctly by sorting the datasets ahead of time by ID, but this is a shaky solution because it only works if we assume that the exact same genes are in both datasets and do not exist in only one dataset.

Describe the solution you'd like:
The chart above should look like this instead:

Describe alternatives you've considered:
I wrote the following code to merge all my datasets on my ID before bringing it into glue as a single dataset. This isn't a general solution because it involves traversing a file system to find the correct files, but could be generalized within glue using dataset selections. Note that I have adapted this code from my use case so it is more like pseudo-code; I haven't run this specific version.

id_to_link_on = 'my_id'

dfs = []

for file in files:
    df = pd.read_csv(os.path.join('data', folder, file))
    df.rename(columns={a: f'{a}_{file}' for a in df.columns}, inplace=True)
    dfs.append(df)

merged = dfs[0]

for i, df in enumerate(dfs[1:]):
    merged = merged.merge(df,
      left_on=id_to_link_on,
      right_on=id_to_link_on)

merged = merged.set_index(id_to_link_on)

This would be a broadly useful tool for anyone trying to visualize measures for the same entities across datasets.

janeadams added the enhancement label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature requests: when linked by ID, 1) allow cross-dataset visualization, and 2) merge datasets on ID #2529

Feature requests: when linked by ID, 1) allow cross-dataset visualization, and 2) merge datasets on ID #2529

janeadams commented Jan 23, 2025

Feature requests: when linked by ID, 1) allow cross-dataset visualization, and 2) merge datasets on ID #2529

Feature requests: when linked by ID, 1) allow cross-dataset visualization, and 2) merge datasets on ID #2529

Comments

janeadams commented Jan 23, 2025