-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2249] Add compression option to getQueueApplication API #757
base: master
Are you sure you want to change the base?
Conversation
pkg/webservice/handlers.go
Outdated
@@ -614,6 +616,33 @@ func getQueueApplications(w http.ResponseWriter, r *http.Request) { | |||
appsDao = append(appsDao, getApplicationDAO(app)) | |||
} | |||
|
|||
if strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have test for gzip header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to make it modular? We already thinking of doing it in two more places. We should be able to use the method wherever required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Test is missing
- Pls extract all headers to a
const
section, it's better for maintainability. We have string literals all over the place, eg.writeHeaders()
. I don't think we need to import a separate library for this, but at least have a single place where all of them are defined. Also do the same for MIME types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By modular I mean create two functions. One that tests for the header content see other review point and one that does all the compress work and takes two parameters and returns nothing. We can use that compress function anywhere.
func checkHeader(h http.Header, key, value string) bool
func compress(w http.ResponseWriter, v any)
Need to file a follow up jira when we add this to add support for all REST endpoints to support compressed data...
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #757 +/- ##
==========================================
+ Coverage 77.72% 79.24% +1.51%
==========================================
Files 82 82
Lines 13430 11418 -2012
==========================================
- Hits 10439 9048 -1391
+ Misses 2664 2040 -624
- Partials 327 330 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test + small refactor
pkg/webservice/handlers.go
Outdated
@@ -614,6 +616,33 @@ func getQueueApplications(w http.ResponseWriter, r *http.Request) { | |||
appsDao = append(appsDao, getApplicationDAO(app)) | |||
} | |||
|
|||
if strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Test is missing
- Pls extract all headers to a
const
section, it's better for maintainability. We have string literals all over the place, eg.writeHeaders()
. I don't think we need to import a separate library for this, but at least have a single place where all of them are defined. Also do the same for MIME types.
pkg/webservice/handlers.go
Outdated
@@ -614,6 +616,33 @@ func getQueueApplications(w http.ResponseWriter, r *http.Request) { | |||
appsDao = append(appsDao, getApplicationDAO(app)) | |||
} | |||
|
|||
if strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for gzip
as we do now is broken. The header is an array of strings and get
only returns the first value. So if I call the URL from the browser it depends on the browser if gzip
is the first in the list. We should use this:
func checkHeader(h map[string][]string, key, value string) bool {
values := h.Values(key)
for _, v := range values {
if v == value {
return true
}
}
return false
}
pkg/webservice/handlers.go
Outdated
@@ -614,6 +616,33 @@ func getQueueApplications(w http.ResponseWriter, r *http.Request) { | |||
appsDao = append(appsDao, getApplicationDAO(app)) | |||
} | |||
|
|||
if strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By modular I mean create two functions. One that tests for the header content see other review point and one that does all the compress work and takes two parameters and returns nothing. We can use that compress function anywhere.
func checkHeader(h http.Header, key, value string) bool
func compress(w http.ResponseWriter, v any)
Need to file a follow up jira when we add this to add support for all REST endpoints to support compressed data...
Thanks @chia7712 @pbacsko @wilfred-s for the advice. |
pkg/webservice/handlers.go
Outdated
@@ -746,6 +752,11 @@ func getQueueApplications(w http.ResponseWriter, r *http.Request) { | |||
appsDao = append(appsDao, getApplicationDAO(app)) | |||
} | |||
|
|||
if checkHeader(r.Header, "Content-Encoding", "gzip") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR aims to enhance only getQueueApplication
. However, I'd like to open the room to discuss whether we should bring this enhancement to all restful APIs. @targetoee WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. It can be an optional choice for user to provide more flexibility.
Some APIs typically don't return large amounts of data in common cases, so it may be necessary to discuss which ones require this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be necessary to discuss which ones require this functionality.
The improvement you propose is a kind of infra to our Restful APIs, so I prefer to bring such benefit to all APIs if there is no obvious side-effect or cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly why I would make the two functions so we can easily apply it to any REST end points. Anything that sends more than a single IP packet as the response can benefit.
For the streaming API, which uses really small messages that fit in a single IP packet, compressing might be more overhead than the gains we get so that one might not be a candidate everything else is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave two comments below. overall LGTM. We should have some follow-up to enhance it.
I also run YK with this patch, and all LGTM
pkg/webservice/handlers.go
Outdated
@@ -1216,3 +1227,40 @@ func getStream(w http.ResponseWriter, r *http.Request) { | |||
} | |||
} | |||
} | |||
|
|||
func checkHeader(h http.Header, key string, value string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return error if users use unsupported compression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be necessary. Skipping it is a solution when dealing with unsupported compression types in requests. Do you think it's essential for users to require this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406
It seems to me following the standard error can avoid the misunderstanding in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me following the standard error can avoid the misunderstanding in the future.
NO. We should never return an error to the client if it request an encoding we don't understand. Accept-Encoding: identity
is the default, which means the identity encoding is always allowed. Therefore, if an unacceptable encoding is requested, we simply send the request uncompressed and without a Content-Encoding: gzip
header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO. We should never return an error to the client if it request an encoding we don't understand. Accept-Encoding: identity is the default, which means the identity encoding is always allowed. Therefore, if an unacceptable encoding is requested, we simply send the request uncompressed and without a Content-Encoding: gzip header.
That is an acceptable way to me, but I'd like to have more discussion for my own education :)
Should we support full representation of Accept-Encoding
( weight
and coding
)? If yes, we need to consider the Accept-Encoding: gzip;q=1.0, identity; q=0
.
Or we ignore the weight
and only check the existence of gzip
from the Accept-Encoding
. This is the solution adopted by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a common saying in network protocol design: Be liberal in what you accept, strict in what you produce. In other words, we can get away with just checking for the substring gzip
in Accept-Encoding
, and produce exactly Content-Encoding: gzip
in that case. If we choose not to compress due to size, or gzip was not requested, then we use the standard identity version. Weights are not really necessary; yes, they are part of the spec, but the client is only giving its preference; we do not have to honor it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a very simple function like: GetCompressedWriter(headers, writer) (writer)
that checks for the gzip header and wraps the given writer with a gzip-compressed one, else returns the original writer. Then in any endpoint we want to (potentially compress), we just replace our writer with that one instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will cause a leak as the gzip writer must be closed for it not to leak. Calling close on the http.ResponseWriter is not possible so we need more. Probably the easiest solution is to use the same solution as we have for the loggingHandler(). We wrap the compression choice in a handler function, which then gets wrapped in the logging handler. That means we have it all in one place and expand on it with compressor pooling or other things in the future.
Example code, which is not complete but gives some idea on how we can close the compressor. That can be expanded to use a sync.Pool to not recreate the zip writer each time and just reset it before use.
type gzipResponseWriter struct {
io.Writer
http.ResponseWriter
}
func (w gzipResponseWriter) Write(b []byte) (int, error) {
return w.Writer.Write(b)
}
func makeGzipHandler(fn http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
if !strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
fn(w, r)
return
}
w.Header().Set("Content-Encoding", "gzip")
w.Header().Del("Content-Length")
gz := gzip.NewWriter(w)
defer gz.Close()
gzr := gzipResponseWriter{Writer: gz, ResponseWriter: w}
fn(gzr, r)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to everyone for all the advice. I have got some questions about what @wilfred-s said.
In the provided example, the compress handler would be wrapped within the logging handler. So we first see which API is being called, and then decide whether to use the gzip handler, since we haven't decided to compress all APIs yet? Is there any misunderstanding?
Some adjust based on the review. |
@targetoee please take a look at https://issues.apache.org/jira/browse/YUNIKORN-2499 for the failed test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor nits
pkg/webservice/handlers.go
Outdated
@@ -59,6 +60,7 @@ const ( | |||
GroupNameMissing = "Group name is missing" | |||
ApplicationDoesNotExists = "Application not found" | |||
NodeDoesNotExists = "Node not found" | |||
UnsupportedCompType = "Compression type not support" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "not supported"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here at all. As I've repeatedly indicated we should never fail with an error, just fall back to no compression.
pkg/webservice/handlers.go
Outdated
return | ||
} | ||
|
||
writer := gzip.NewWriter(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close the writer with a defer call eg. (defer writer.Close()
or similar)
@targetoee is this still a draft? |
Yes, there is still something which needs to be modified. I will change the tag after the modification. |
A draft means that you don't want the changes to be reviewed just yet because modifications are still being made. We're already reviewing this PR. I think this can be marked as "Ready for review". |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #757 +/- ##
==========================================
- Coverage 79.40% 77.15% -2.25%
==========================================
Files 82 97 +15
Lines 11317 12011 +694
==========================================
+ Hits 8986 9267 +281
- Misses 2009 2407 +398
- Partials 322 337 +15 ☔ View full report in Codecov by Sentry. |
pkg/webservice/handlers_test.go
Outdated
@@ -2602,3 +2607,82 @@ func NewResponseRecorderWithDeadline() *ResponseRecorderWithDeadline { | |||
ResponseRecorder: httptest.NewRecorder(), | |||
} | |||
} | |||
|
|||
func TestCompressGetQueueApplicationAPI(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we apply the compression to all restful APIs by changing the webservice
directly, it seems we should add test for webservice
instead of verifying all APIs here. For example: we can modify newRouter
to make it accept custom routes, and then we write some dumb routes in testing to check the gzip compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the dummy route
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test about webservice
has been added. I don't think we should modify newRouter
just for the test case, so I kept it as it is. Instead, I simulate the action in webservice
and then check whether the result is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits
pkg/webservice/handlers_test.go
Outdated
@@ -2602,3 +2607,82 @@ func NewResponseRecorderWithDeadline() *ResponseRecorderWithDeadline { | |||
ResponseRecorder: httptest.NewRecorder(), | |||
} | |||
} | |||
|
|||
func TestCompressGetQueueApplicationAPI(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the dummy route
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 small comments from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@targetoee thanks for updated PR. two major comments are left.
|
||
// start simulation server | ||
m.httpServer = &http.Server{Addr: ":9080", Handler: router, ReadHeaderTimeout: 5 * time.Second} | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse the code of WebService
? For example, we can add variety startWebApp
to accept a custom route.
func newRouter(routes []route) *httprouter.Router {
router := httprouter.New()
for _, webRoute := range routes {
handler := gzipHandler(loggingHandler(webRoute.HandlerFunc, webRoute.Name))
router.Handler(webRoute.Method, webRoute.Pattern, handler)
}
return router
}
func (m *WebService) StartWebApp() {
m.startWebApp(webRoutes)
}
func (m *WebService) startWebApp(routes []route) {
m.httpServer = &http.Server{Addr: ":9080", Handler: newRouter(routes)}
log.Log(log.REST).Info("web-app started", zap.Int("port", 9080))
go func() {
httpError := m.httpServer.ListenAndServe()
if httpError != nil && !errors.Is(httpError, http.ErrServerClosed) {
log.Log(log.REST).Error("HTTP serving error",
zap.Error(httpError))
}
}()
}
and then we call startWebApp
instead of invoking thread.
m := NewWebApp(scheduler.NewScheduler().GetClusterContext(), nil)
// start simulation server
m.startWebApp([]route{route{
"testHelloWord",
"GET",
"/ws/v1/helloWorld",
getHelloWorld,
}})
assert.NilError(t, err, "Create new http request failed.") | ||
req.Header.Set("Accept", "application/json") | ||
|
||
// prevent http.DefaultClient from automatically adding gzip header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge those code into a helper function. For example:
check := func(compression bool) {
req, err := http.NewRequest("GET", u.String(), nil)
assert.NilError(t, err, "Create new http request failed.")
req.Header.Set("Accept", "application/json")
if compression {
req.Header.Set("Accept-Encoding", "gzip")
} else {
req.Header.Set("Accept-Encoding", "deflate")
}
resp, err := http.DefaultClient.Do(req)
assert.NilError(t, err, "Http request failed.")
defer resp.Body.Close()
var reader io.Reader
if compression {
gzipReader, err := gzip.NewReader(resp.Body)
assert.NilError(t, err, "Failed to create gzip reader.")
defer gzipReader.Close()
reader = gzipReader
} else {
reader = resp.Body
}
byteArr, err := io.ReadAll(reader)
assert.NilError(t, err, "Failed to read body.")
var respMsg map[string]string
err = json.Unmarshal(byteArr, &respMsg)
assert.NilError(t, err, unmarshalError)
assert.Equal(t, len(respMsg), 1)
assert.Equal(t, respMsg["data"], "hello world")
}
check(false)
check(true)
@@ -43,7 +46,7 @@ type WebService struct { | |||
func newRouter() *httprouter.Router { | |||
router := httprouter.New() | |||
for _, webRoute := range webRoutes { | |||
handler := loggingHandler(webRoute.HandlerFunc, webRoute.Name) | |||
handler := gzipHandler(loggingHandler(webRoute.HandlerFunc, webRoute.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should unconditionally GZIP every response, only those that might be large. I'd suggest adding a "Compress" boolean to the route object in routes.go and use that to determine whether to add the gzip handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, none of the /debug routes should have it (many of those generate binary data which is not compressible, and/or are simple text that wouldn't benefit). It should probably not be on anything that typically results in < 10 KB of data.
What is this PR for?
When the cluster has many pods, the API response size can be quite large (e.g. 50,000 pod creating a 30MB response in getQueueApplications). It causes long response time due to data transmission.
So this PR add a compression option to getQueueApplications API. If user specifies the 'Accept-Encoding' header as gzip, then the scheduler will compress the data before sending back to user.
What type of PR is it?
Todos
The compression option can be applied to other APIs facing similar response size issues.
What is the Jira issue?
YUNIKORN-2249
How should this be tested?
local build
Screenshots (if appropriate)
N/A
Questions:
N/A