You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to make a few suggestions. I am happy to make these changes myself, but I would like to outline a few ideas and make sure we are aligned before I start opening PRs.
Organization and File Structure
The CLI and server are both stored in a cmd directory, which is consistent with go conventions. It would probably make sense to do something similar with the various packages using a pkg directory. In addition to being consistent with go conventions, it would allow static analysis and code coverage to target a single directory rather than the entire project.
I would also suggest using a config file for the server. There are many formats and packages to choose from, but there are enough configuration parameters set through the environment that it would be nice to have the option to store them in a file.
Concurrency and Synchronization
Looking through the code, there are several places that where the use of concurrency should be revisited. For example:
This does not benefit from the use of WaitGroup. This spawns a single goroutine and collects its result by blocking the parent thread. It is functionally no different from iterating the tree sequentially and collecting each matching key, and it may even be slower(albeit negligible) due to the concurrency overhead. In order to correctly use a WaitGroup here, the tree should be iterated first, with each key being passed into a goroutine where the re.MatchString(k) function is run and the results are written to some shared structure, like a mutex protected string array. I think it is debatable whether this is even worth doing, as the underlying tree structure does not lend itself to parallel processing. If there was a way to parallelize the iteration itself, that would probably be faster than anything we could do with a WaitGroup here.
Protocol and Packet Structure
There might be some aspects of this I don't completely understand, but I would like to raise a few points about how we pass information between the client and server:
PacketSize in protocol.go
const (
PacketSize int = 1500
My main question here is regarding the use of 1500 as MTU. On the OSI model for a modern network stack, the MTU is 1500 for ethernet frames at layer 2. If this was a layer 2 protocol, 1500 would make sense, but we are sending the Packet at the transport layer either as a TCP segment or a UDP datagram, which each have their own headers in addition to the payload being defined in Packet. Additionally, each segment or datagram is sent as an IPv4 packet, which also has its own header. In other words, sending a padded buffer of 1500 bytes at layer 5 will exceed the 1500 MTU for ethernet frames. The good news is the respective MTU values at higher layers is significantly larger(65536 for IPv4 packets), so there is no real issue with sending buffers that are larger than 1500. If 1500 is just an arbitrary value, fair enough, but it will not map to any actual limitations in the networking stack as we are using it.
Given the comment above, I would consider modifying the protocol to use a higher level serialization framework such as capnproto or gRPC(this would be especially useful given that you also have a REST API). But regardless of that consideration, I would revisit the packet structure and decouple the packet data from the serialized data. HashBytes is a network representation of Hash, and has no value to the client or server outside of sending the hash value over the wire. If we wanted to decouple the data from the serialization, it would be useful to create a separate structure SerializedPacket or add a function that packs all of these values into a bytes.Buffer which is only used when reading/writing over the network. This is mostly cosmetic, but I think it would be useful to separate this logic.
Miscellaneous Refactoring
There are a few other cases where the code can be shortened or simplified for the sake of readability. This will also improve code coverage if we choose to add that. Here are a few examples:
This is a giant switch statement with a lot of overlap in how each case is handled. I have only included a few blocks to illustrate the point, but the same code is identical in each case after the first few lines. There are many ways to do this, but this switch statement could be shrunken down to only handle the command specific logic, and then writing and sending the packet could be handled in a shared block after the switch(provided the command was valid and successfully handled. This would greatly reduce the amount of repeated code.
The text was updated successfully, but these errors were encountered:
I would like to make a few suggestions. I am happy to make these changes myself, but I would like to outline a few ideas and make sure we are aligned before I start opening PRs.
Organization and File Structure
The CLI and server are both stored in a
cmd
directory, which is consistent with go conventions. It would probably make sense to do something similar with the various packages using apkg
directory. In addition to being consistent with go conventions, it would allow static analysis and code coverage to target a single directory rather than the entire project.I would also suggest using a config file for the server. There are many formats and packages to choose from, but there are enough configuration parameters set through the environment that it would be nice to have the option to store them in a file.
Concurrency and Synchronization
Looking through the code, there are several places that where the use of concurrency should be revisited. For example:
KeyMatch
intree.go
This does not benefit from the use of
WaitGroup
. This spawns a single goroutine and collects its result by blocking the parent thread. It is functionally no different from iterating the tree sequentially and collecting each matching key, and it may even be slower(albeit negligible) due to the concurrency overhead. In order to correctly use aWaitGroup
here, the tree should be iterated first, with each key being passed into a goroutine where there.MatchString(k)
function is run and the results are written to some shared structure, like a mutex protected string array. I think it is debatable whether this is even worth doing, as the underlying tree structure does not lend itself to parallel processing. If there was a way to parallelize the iteration itself, that would probably be faster than anything we could do with aWaitGroup
here.Protocol and Packet Structure
There might be some aspects of this I don't completely understand, but I would like to raise a few points about how we pass information between the client and server:
PacketSize
inprotocol.go
My main question here is regarding the use of 1500 as MTU. On the OSI model for a modern network stack, the MTU is 1500 for ethernet frames at layer 2. If this was a layer 2 protocol, 1500 would make sense, but we are sending the
Packet
at the transport layer either as a TCP segment or a UDP datagram, which each have their own headers in addition to the payload being defined inPacket
. Additionally, each segment or datagram is sent as an IPv4 packet, which also has its own header. In other words, sending a padded buffer of 1500 bytes at layer 5 will exceed the 1500 MTU for ethernet frames. The good news is the respective MTU values at higher layers is significantly larger(65536 for IPv4 packets), so there is no real issue with sending buffers that are larger than 1500. If 1500 is just an arbitrary value, fair enough, but it will not map to any actual limitations in the networking stack as we are using it.Packet
inprotocol.go
Given the comment above, I would consider modifying the protocol to use a higher level serialization framework such as capnproto or gRPC(this would be especially useful given that you also have a REST API). But regardless of that consideration, I would revisit the packet structure and decouple the packet data from the serialized data.
HashBytes
is a network representation ofHash
, and has no value to the client or server outside of sending the hash value over the wire. If we wanted to decouple the data from the serialization, it would be useful to create a separate structureSerializedPacket
or add a function that packs all of these values into abytes.Buffer
which is only used when reading/writing over the network. This is mostly cosmetic, but I think it would be useful to separate this logic.Miscellaneous Refactoring
There are a few other cases where the code can be shortened or simplified for the sake of readability. This will also improve code coverage if we choose to add that. Here are a few examples:
ReadTCPFrames
inserver.go
This could be shortened to:
worker
inserver.go
This is a giant switch statement with a lot of overlap in how each case is handled. I have only included a few blocks to illustrate the point, but the same code is identical in each case after the first few lines. There are many ways to do this, but this switch statement could be shrunken down to only handle the command specific logic, and then writing and sending the packet could be handled in a shared block after the switch(provided the command was valid and successfully handled. This would greatly reduce the amount of repeated code.
The text was updated successfully, but these errors were encountered: