Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Front-end for Initiative Portfolio Participation #91

Merged
merged 2 commits into from
Dec 28, 2023

Conversation

gbdubs
Copy link
Contributor

@gbdubs gbdubs commented Dec 23, 2023

  • Creates a frontend for initiative memberships, including adding and removing portfolio participation in an initiative.
  • Creates a frontend for incomplete upload management, flagging to a user using a yellow banner if there are incomplete uploads for them to delete.
  • Renames the 'portfolios' page into the 'my-data' page, and adjusts URLs accordingly.
  • Changes the URL param structure to accomodate more range of tab configurations (missing tabs, optional tabs, named tabs)
  • Centralizes PVToast so that we don't have them overlap.
  • Makes a ton of semantic tweaks to the data management page, using icons + colors more consistently.
  • Fixes a number of i18n misses.

Additionally, fixes a few backend errors that arose during testing

  • Fixed a cardinality bug where we were imprecisely using ARRAY_AGG, which led to strange outcomes when we had aggregated arrays of differing lengths.
  • Adds a uniqueness constraint to the portfolio initiative membership table, which was missing.
  • Adds better handling in the conv layer for partially populated entities.

@gbdubs gbdubs requested a review from bcspragu December 23, 2023 19:58
gbdubs added a commit that referenced this pull request Dec 27, 2023
Copy link
Collaborator

@bcspragu bcspragu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

WHERE portfolio_id IN (SELECT id FROM selected_portfolio_ids)
GROUP BY portfolio_id
) itvs ON itvs.portfolio_id = portfolio.id
%[1]s;`, where)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand these changes, specifically why these LEFT JOINs are now structured as subqueries, since they just join on the portfolio ID anyway. I know you mentioned a "cardinality bug", but I don't follow how this fixes that, when the subqueries are grouping by the same thing we were initially grouping by anyway (portfolio ID).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. This wasn't clear to me and took a while to debug. Here's my best explanation for posterity:

  • Joins happen before any aggregation or group by.
  • 3-table joins will create rows based on the cartesian product of their rows. If we have a primary table (P) and two secondary tables (A) and (B), for a given primary key, if we have 3 values in A (A1, A2, A3), and 2 values in B (B1, B2), then the Cartesian product will have 6 values in the join.
  • When we then group by the primary key, that row has six values for both A and B in the join result (A1, A3, A3, A1, A2, A3), (B1, B2, B1, B2, B1, B2).
  • We could unwind this through uniqueness sets or similar. However, there are two complicating factors that make this much harder to do: (a) nulls (b) objects that are unbunded into different columns (c) (most difficult) objects that are unbundled into different columns across tables.

After playing around with some row-based solutions, I found this was the simplest way (and probably the most extensible/easy to change: do your group-bys where it's not over a join result, and then just select). This also has the advantage of not generating huge cross products when the cardinality of the subtables is large (as it might be in this case).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* 3-table joins will create rows based on the cartesian product of their rows. If we have a primary table (P) and two secondary tables (A) and (B), for a given primary key, if we have 3 values in A (A1, A2, A3), and 2 values in B (B1, B2), then the Cartesian product will have 6 values in the join.

Huh yeah, sure enough:

CREATE TABLE p (
  id TEXT PRIMARY KEY NOT NULL
);

CREATE TABLE a (
  id TEXT PRIMARY KEY NOT NULL,
  p_id TEXT REFERENCES p (id) NOT NULL
);

CREATE TABLE b (
  id TEXT PRIMARY KEY NOT NULL,
  p_id TEXT REFERENCES p (id) NOT NULL
);


INSERT INTO p (id) VALUES ('p1');
INSERT INTO a (id, p_id) VALUES ('a1', 'p1'), ('a2', 'p1');
INSERT INTO b (id, p_id) VALUES ('b1', 'p1'), ('b2', 'p1'), ('b3', 'p1');

SELECT p.id, ARRAY_AGG(a.id), ARRAY_AGG(b.id)
FROM p
LEFT JOIN a ON p.id = a.p_id
LEFT JOIN b ON p.id = b.p_id
GROUP BY p.id;

produces

 id |      array_agg      |      array_agg
----+---------------------+---------------------
 p1 | {a2,a1,a2,a1,a2,a1} | {b1,b1,b2,b2,b3,b3}

When asking the internet about this, I got the suggestion:

  SELECT p.id,
      (SELECT ARRAY_AGG(a.id) FROM a WHERE a.p_id = p.id),
      (SELECT ARRAY_AGG(b.id) FROM b WHERE b.p_id = p.id)
  FROM p;

Which involves subqueries but otherwise seems like the simplest approach, and I think applies here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fair, that approach also would work. I think the downside of it is when we have multiple columns from a or b (and because memberships will often have an id and a created at, I think this is most cases), a nested query in the select statement then needs to be either (a) assumed order equivalent across multiple expressions, which might be the case or might not, or (b) needs to be multi-selected and the unnested (i.e. put into a composite object, then decomposed).

db/sqldb/portfolio.go Outdated Show resolved Hide resolved
db/sqldb/portfolio_group.go Outdated Show resolved Hide resolved
frontend/pages/my-data.vue Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Do you expect running this migration to cause any problems (for my own edification when I try to deploy this)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. My understanding is that it's essentially equivalent to running a not null and index creation and unique column modification. The not-null constraint should hold based on our bizlogic. The uniquness constraint is the actual source of the need here, so it's plausible that there are duplicate rows. Should that be the case, we could delete those rows when deleting. However, since nobody has used these features in dev yet, we probably won't experience this at all, and if we needed to we could drop the contents of the table to make these hold.

@gbdubs gbdubs merged commit ca3c484 into main Dec 28, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants