-
Notifications
You must be signed in to change notification settings - Fork 20
Conversation
e56df8a
to
db60039
Compare
2023-07-11 conversation:
|
5265703
to
4664ac7
Compare
lib/graph_gateway.go
Outdated
escapedPath := url.PathEscape(path.String()) | ||
escapedPath = strings.ReplaceAll(escapedPath, "%2F", "/") | ||
paramsStr := paramsToString(params) | ||
urlWithoutHost := fmt.Sprintf("%s?%s", escapedPath, paramsStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the beginning we only had string
content paths, and passed that to fetcher.
Now we have URL query params and we append them blindly to content path.
I am afraid mixing URL paths and query params with IPFS content paths will lead to ugly code, special handling like the one above, and even more bugs down the road.
We should clean this up in this PR.
Current state (iiuc)
I think the underlying source of problems is that we have a content path concatenated with URL query params – this is invalid, mixing abstractions, and will lead to more bugs, such as unnecessary escaping of entity-bytes
values.
My understanding of "right thing" is the edge case where content path includes characters which look like percent-encoding, and we want to preserve that by double encoding:
In Kubo 0.21 this works:
http://localhost:8080/ipns/en.wikipedia-on-ipfs.org/I/Auditorio_de_Tenerife%252C_Santa_Cruz_de_Tenerife%252C_Espa%C3%B1a%252C_2012-12-15%252C_DD_02.jpg.webp?format=car&dag-scope=entity&entity-bytes=0:*
→ returns HTTP 301 → http://en.wikipedia-on-ipfs.org.ipns.localhost:8080/I/Auditorio_de_Tenerife%252C_Santa_Cruz_de_Tenerife%252C_Espa%C3%B1a%252C_2012-12-15%252C_DD_02.jpg.webp?format=car&dag-scope=entity&entity-bytes=0:*
We want the same to work in biforst-gateway.
Proposed fix
We need to separate content path from query params, it cna be done in two ways:
- (A) pass
ImmutablePath
+ params asurl.Values
(requires fetcher to construct URL correctly) - (B) create a detached (only path + query, no host)
url.URL
and pass that to fetcher (imo cleaner and less error prone – we control the source of truth, fetcher won't mangle it as it already has validurl.URL
to execute, will only add hostname and protocol)
Rationale: ul.URL
already takes care of escaping, we should reuse it as abstraction if possible
JS has dedicated functions for escaping full URI vs URI component:
encodeURIComponent('/ą/ę')
→ %2F%C4%85%2F%C4%99
encodeURI('/ą/ę')
→ /%C4%85/%C4%99
in GO we have url.PathEscape
and url.QueryEscape
instead of encodeURIComponent
and the url.URL
type's String()
func itself will act as encodeURI
already, so no need for extra escaping.
And I think we can reuse it here, as it takes care of all escaping + a detached path is a valid ("relative") URL:
escapedPath := (&url.URL{Path: "/ą/ę"}).String() // → /%C4%85/%C4%99
@aschmahmann @willscott my preference would be to do proper cleanup and do (B) to remove number of moving pieces pased to fetcher – would it be ok?
I can help with code tomorrow if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a url
object rather than a string here seems like a good idea to me. We should be clear about the contract of what's allowed/not in the URL though.
For example, if it's valid/invalid to put format=raw
in the params or headers or if the Fetch
function implies CAR. While it's somewhat obviously true now it doesn't have to be that way so the docs should clarify either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann I've pushed small refactor in 982535a that does not change the fetcher
API (we still pass URL as string), but fixes the way we encode content paths. I also added explicit format=car
to make it easier to debug and test.
I don't think we should block this PR on the fetcher API cleanup unless we really need to include it, and it does not seem to me like we need to solve the problem at hand: it seems to work as expected already.
Would it be ok to fill issue to do it as follow-up work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel yeah, I don't think we need to block on a refactor unless it's hindering functionality in some way. If this is enough for now and the refactor is cleanup + reducing headaches here then can definitely be a follow up.
Updating the top-level post with the areas currently under evaluation/investigation before merging.
this implements part of #160 (comment) that fixes the way we encode content paths, without changing the interface of fetcher in caboose.
b0fca23
to
59887ff
Compare
@@ -43,14 +43,20 @@ func (ps *proxyBlockStore) Fetch(ctx context.Context, path string, cb lib.DataCa | |||
return err | |||
} | |||
goLog.Debugw("car fetch", "url", req.URL) | |||
req.Header.Set("Accept", "application/vnd.ipld.car") | |||
req.Header.Set("Accept", "application/vnd.ipld.car;order=dfs;dups=y") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this resolves #177 for non-Saturn backends. Caboose/Saturn will have to change when they're ready.
2023-08-01 conversation:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: check if we have tests for when CAR backend returns HTTP 502 or 504.
We want to ensure pass-through works and the same code is returned to end user.
case data.Data_Raw, data.Data_Metadata: | ||
// TODO: for now, we decided to return nil here. The different implementations are inconsistent | ||
// and UnixFS is not properly specified: https://github.com/ipfs/specs/issues/316. | ||
// - Is Data_Raw different from Data_File? | ||
// - Data_Metadata is handled differently in boxo/ipld/unixfs and go-unixfsnode. | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 @aschmahmann do we know what was the old behavior in Kubo?
Unsure how popular these are, but if we don't have spec, we may want to derisk and not change behavior that has been in production for years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We sort of ended up giving this a 🤷 anyhow here ipfs/boxo#303 (comment). We can revisit both, but don't think that needs to happen in this PR.
b0de13d
to
03a7448
Compare
Co-authored-by: Will Scott <[email protected]>
…nto feat/backpressure
…rectories for dir_index.html rendering
688d15c
to
e2be930
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann I've enabled gateway-conformance tests with GRAPH_BACKEND=true on this PR and main
branch (merged both #189 and #190). In the past we've only run conformance against the block-by-block backend, because the gw code paths were the same. With Rhea changes, we've diverged so much we need to run tests twice now.
Main branch is green, but around 200 checks are failing in this PR:
The number looks big, but it is most likely one or two bugs which impact a lot of tests.
Some quick thoughts (I did not dig too deep, just shallow pass over the log linked above):
- seem to be mostly related to 404 / "not found" scenarios and related error handling:
- sometimes returned code is 5XX instead of expected 404
index.html
is not found in a few cases (no link named \"index.html\" under bafybeig6ka5mlwkl4subqhaiatalkcleo4jgnr3hqwvpmsqfca27cijp3i
)- some
_redirects
file tests are failing because HTTP 502 is returned instead of expected 200|302 etc
@lidel thanks for noticing the discrepancies with the how the conformance tests were run 🙏 and getting that fixed up. Will poke around a little more to get to the bottom, but I suspect some of the failures are coming from kubo v0.22.0 (and any released version of boxo) not returning CAR blocks when the path doesn't exist and returning an error instead which .... is not great for the use case of tooling like bifrost-gateway but is being fixed in the version of boxo this PR relies on here. Alright to swap out the action that pulls a released kubo for custom building a branch? |
…hat happens to be for a raw block
… CAR fetches for duplicates
53b76b3
to
4f6b224
Compare
This is to handle a bug where if there is a CAR request for a non-existent path an error is returned rather than a CAR that proves that the requested path cannot exist.
4f6b224
to
004ea10
Compare
Relies on ipfs/boxo#369
7/31:
Other outstanding (non functionally mandatory) items:
Note: This change makes the testing very heavily rely on code paths other than kubo sharness tests between
There have been bugs discovered in each, so being eagle eyed and reporting bugs is appreciated.