Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup(minipipeline): factor code into utility functions #1400

Merged
merged 73 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
766d4d0
chore: start investigating LTE vs v0.4
bassosimone Nov 22, 2023
f3701a0
document why some QA tests with redirects are broken
bassosimone Nov 23, 2023
f1cc4bb
document more doubts about emmitting events
bassosimone Nov 23, 2023
0b2203d
document more caveats
bassosimone Nov 23, 2023
8eb1ba1
[ci skip] remember to update files in sync
bassosimone Nov 23, 2023
92eb7b7
doc: document more doubts that I have
bassosimone Nov 23, 2023
ea0d3bf
[ci skip] more documentation on what to do
bassosimone Nov 23, 2023
fd06406
feat: progress towards fixing some fundamental issues
bassosimone Nov 23, 2023
ad75714
resolve one more test case
bassosimone Nov 23, 2023
d6cbfd9
more fixes
bassosimone Nov 23, 2023
41fbd3f
doc: explain issues caused by adding HTTP response
bassosimone Nov 23, 2023
e5e4c37
try to sketch out an ooni/data-inspired pipeline
bassosimone Nov 23, 2023
6aff4f0
convert more of v0.5's analysis to the ooni/data-like style
bassosimone Nov 24, 2023
94f9fd7
some more progress
bassosimone Nov 24, 2023
ff42f3c
break the code in a different way
bassosimone Nov 24, 2023
5ad88d5
feat: rewrite the pipeline to match ooni/data more closely
bassosimone Nov 24, 2023
132ba4d
also implement the analysis
bassosimone Nov 24, 2023
be18947
work
bassosimone Nov 24, 2023
18da855
we're mostly done in terms of passing the existing QA tests
bassosimone Nov 25, 2023
e7764c8
tests now green
bassosimone Nov 25, 2023
dac4170
make more test cases work with LTE
bassosimone Nov 26, 2023
258d7fb
we now pass all tests
bassosimone Nov 27, 2023
c6a49ff
[ci skip] remove TODO
bassosimone Nov 27, 2023
8d25a65
fix tricky case with order of DNS processing
bassosimone Nov 27, 2023
44541ea
adjust test case where actually dns is consistent with lte
bassosimone Nov 27, 2023
183f524
make all lte tests pass consistently
bassosimone Nov 27, 2023
a4bedcc
x
bassosimone Nov 27, 2023
3ded283
start generating test cases for the minipipeline
bassosimone Nov 27, 2023
1eaaac0
start adding tests for the minipipeline
bassosimone Nov 27, 2023
df33632
add tests for the minipipeline command
bassosimone Nov 27, 2023
5ad8387
more testing
bassosimone Nov 27, 2023
9fc77fc
more minipipeline tests
bassosimone Nov 27, 2023
c7c310a
x
bassosimone Nov 27, 2023
281e38d
x
bassosimone Nov 27, 2023
0ea4803
add more test cases
bassosimone Nov 27, 2023
9ec20fc
x
bassosimone Nov 27, 2023
66364ed
x
bassosimone Nov 27, 2023
329b2c8
x
bassosimone Nov 27, 2023
72b2be9
x
bassosimone Nov 27, 2023
7f8c143
start documenting code and existing bugs
bassosimone Nov 27, 2023
14840ac
attempt to fix the model problems
bassosimone Nov 27, 2023
55c05ac
commit the measurements
bassosimone Nov 27, 2023
c45a2f6
okay, this looks relatively good
bassosimone Nov 27, 2023
28f93aa
other changes
bassosimone Nov 27, 2023
d849894
x
bassosimone Nov 27, 2023
5091608
add measurements
bassosimone Nov 27, 2023
f1d7137
x
bassosimone Nov 27, 2023
c228c4e
add measurements
bassosimone Nov 27, 2023
8d668fe
x
bassosimone Nov 27, 2023
29bfdd4
x
bassosimone Nov 27, 2023
310ab28
x
bassosimone Nov 27, 2023
dfdf673
meas
bassosimone Nov 27, 2023
9110384
meas
bassosimone Nov 27, 2023
1a8c235
obs
bassosimone Nov 27, 2023
29fec45
x
bassosimone Nov 27, 2023
05f8838
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
b6c643f
x
bassosimone Nov 28, 2023
cc60691
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
0956494
fix potential bug with failed DNS lookups
bassosimone Nov 28, 2023
fae9155
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
0c6d012
x
bassosimone Nov 28, 2023
0b3a979
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
7a6f00b
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
1040d73
simplify
bassosimone Nov 28, 2023
cfe7643
x
bassosimone Nov 28, 2023
9138fb6
x
bassosimone Nov 28, 2023
a6cf23b
[ci skip] Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
5b7aa00
Butcher lte and make sure tests are aligned with v0.4
bassosimone Nov 28, 2023
4331276
we need to trust everything that v0.4 emits
bassosimone Nov 28, 2023
383eb69
x
bassosimone Nov 28, 2023
8683d63
x
bassosimone Nov 28, 2023
05ca7ee
x
bassosimone Nov 28, 2023
10cbef2
x
bassosimone Nov 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion internal/cmd/minipipeline/testdata/analysis.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@
"TCPTransactionsWithUnexpectedTLSHandshakeFailures": {},
"TCPTransactionsWithUnexpectedHTTPFailures": {},
"TCPTransactionsWithUnexplainedUnexpectedFailures": {}
}
}
8 changes: 1 addition & 7 deletions internal/cmd/minipipeline/testdata/observations.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
"DNSEngine": "udp",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
"IPAddressBogon": null,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -46,7 +45,6 @@
"DNSEngine": "doh",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
"IPAddressBogon": null,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -86,7 +84,6 @@
"DNSEngine": "udp",
"IPAddress": "130.192.16.171",
"IPAddressASN": 137,
"IPAddressOrg": "Consortium GARR",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -124,7 +121,6 @@
"DNSEngine": "getaddrinfo",
"IPAddress": "130.192.16.171",
"IPAddressASN": 137,
"IPAddressOrg": "Consortium GARR",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -162,7 +158,6 @@
"DNSEngine": "doh",
"IPAddress": "130.192.16.171",
"IPAddressASN": 137,
"IPAddressOrg": "Consortium GARR",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -202,7 +197,6 @@
"DNSEngine": null,
"IPAddress": "130.192.16.171",
"IPAddressASN": 137,
"IPAddressOrg": "Consortium GARR",
"IPAddressBogon": false,
"EndpointTransactionID": 4,
"EndpointProto": "tcp",
Expand Down Expand Up @@ -263,4 +257,4 @@
"ControlHTTPResponseTitle": "Nexa Center for Internet \u0026 Society | Il centro Nexa è un centro di ricerca del Dipartimento di Automatica e Informatica del Politecnico di Torino"
}
}
}
}
105 changes: 26 additions & 79 deletions internal/minipipeline/observation.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ import (
"net"
"net/url"
"strconv"
"strings"

"github.com/ooni/probe-cli/v3/internal/geoipx"
"github.com/ooni/probe-cli/v3/internal/measurexlite"
Expand Down Expand Up @@ -107,11 +106,6 @@ type WebObservation struct {
// means that the probe failed to discover the IP address ASN.
IPAddressASN optional.Value[int64]

// IPAddressOrg is the optional organization name associated to this IP adddress
// as discovered by the probe while performing the measurement. When this field is
// optional.None, it means that the probe failed to discover the IP address org.
IPAddressOrg optional.Value[string]

// IPAddressBogon is true if IPAddress is a bogon.
IPAddressBogon optional.Value[bool]

Expand Down Expand Up @@ -284,41 +278,27 @@ func (c *WebObservationsContainer) ingestDNSLookupSuccesses(evs ...*model.Archiv
}

// walk through the answers
for _, answer := range ev.Answers {
// extract the IP address we resolved
var addr string
switch answer.AnswerType {
case "A":
addr = answer.IPv4
case "AAAA":
addr = answer.IPv6
default:
continue
}

utilsForEachIPAddress(ev.Answers, func(ipAddr string) {
// create the record
obs := &WebObservation{
DNSTransactionID: optional.Some(ev.TransactionID),
DNSDomain: optional.Some(ev.Hostname),
DNSLookupFailure: optional.Some(""),
DNSQueryType: optional.Some(ev.QueryType),
DNSEngine: optional.Some(ev.Engine),
IPAddress: optional.Some(addr),
IPAddressBogon: optional.Some(netxlite.IsBogon(addr)),
}
if asn, asOrg, err := geoipx.LookupASN(addr); err == nil {
obs.IPAddressASN = optional.Some(int64(asn))
obs.IPAddressOrg = optional.Some(asOrg)
IPAddress: optional.Some(ipAddr),
IPAddressASN: utilsGeoipxLookupASN(ipAddr),
IPAddressBogon: optional.Some(netxlite.IsBogon(ipAddr)),
}

// add record
c.DNSLookupSuccesses = append(c.DNSLookupSuccesses, obs)

// store the first lookup that resolved this address
if _, found := c.knownIPAddresses[addr]; !found {
c.knownIPAddresses[addr] = obs
if _, found := c.knownIPAddresses[ipAddr]; !found {
c.knownIPAddresses[ipAddr] = obs
}
}
})
}
}

Expand All @@ -331,12 +311,9 @@ func (c *WebObservationsContainer) IngestTCPConnectEvents(evs ...*model.Archival
if !found {
obs = &WebObservation{
IPAddress: optional.Some(ev.IP),
IPAddressASN: utilsGeoipxLookupASN(ev.IP),
IPAddressBogon: optional.Some(netxlite.IsBogon(ev.IP)),
}
if asn, asOrg, err := geoipx.LookupASN(ev.IP); err == nil {
obs.IPAddressASN = optional.Some(int64(asn))
obs.IPAddressOrg = optional.Some(asOrg)
}
}

// clone the record because the same IP address MAY belong
Expand All @@ -350,7 +327,6 @@ func (c *WebObservationsContainer) IngestTCPConnectEvents(evs ...*model.Archival
DNSLookupFailure: obs.DNSLookupFailure,
IPAddress: obs.IPAddress,
IPAddressASN: obs.IPAddressASN,
IPAddressOrg: obs.IPAddressOrg,
IPAddressBogon: obs.IPAddressBogon,
EndpointTransactionID: optional.Some(ev.TransactionID),
EndpointProto: optional.Some("tcp"),
Expand Down Expand Up @@ -390,37 +366,20 @@ func (c *WebObservationsContainer) IngestHTTPRoundTripEvents(evs ...*model.Archi
continue
}

// update the record
// start updating the record
obs.HTTPRequestURL = optional.Some(ev.Request.URL)
obs.HTTPFailure = optional.Some(utilsStringPointerToString(ev.Failure))
obs.HTTPResponseStatusCode = optional.Some(ev.Response.Code)
obs.HTTPResponseBodyLength = optional.Some(int64(len(ev.Response.Body)))
obs.HTTPResponseBodyIsTruncated = optional.Some(ev.Request.BodyIsTruncated)

httpResponseHeadersKeys := make(map[string]bool)
for key := range ev.Response.Headers {
httpResponseHeadersKeys[key] = true
}
obs.HTTPResponseHeadersKeys = optional.Some(httpResponseHeadersKeys)

if value := measurexlite.WebGetTitle(string(ev.Response.Body)); value != "" {
obs.HTTPResponseTitle = optional.Some(value)
}
for key, value := range ev.Response.Headers {
if strings.ToLower(key) == "location" {
obs.HTTPResponseLocation = optional.Some(string(value))
break // only first entry (typically there's just a single entry)
}
// consider the response authoritative only in case of success
if ev.Failure == nil {
obs.HTTPResponseStatusCode = optional.Some(ev.Response.Code)
obs.HTTPResponseBodyLength = optional.Some(int64(len(ev.Response.Body)))
obs.HTTPResponseBodyIsTruncated = optional.Some(ev.Request.BodyIsTruncated)
obs.HTTPResponseHeadersKeys = utilsExtractHTTPHeaderKeys(ev.Response.Headers)
obs.HTTPResponseTitle = optional.Some(measurexlite.WebGetTitle(string(ev.Response.Body)))
obs.HTTPResponseLocation = utilsExtractHTTPLocation(ev.Response.Headers)
obs.HTTPResponseIsFinal = utilsDetermineWhetherHTTPResponseIsFinal(ev.Response.Code)
}

obs.HTTPResponseIsFinal = optional.Some((func() bool {
switch ev.Response.Code / 100 {
case 2, 4, 5:
return true
default:
return false
}
}()))
}
}

Expand Down Expand Up @@ -554,30 +513,18 @@ func (c *WebObservationsContainer) controlXrefTLSFailures(resp *model.THResponse
}

func (c *WebObservationsContainer) controlSetHTTPFinalResponseExpectation(resp *model.THResponse) {
// Implementation note: the TH response does not have a clear semantics for "missing" values
// therefore we are accepting as valid values within the correct range
//
// also note that we add control information to all endpoints and then we check for "final"
// responses and only compare against "final" responses during the analysis
for _, obs := range c.KnownTCPEndpoints {
obs.ControlHTTPFailure = optional.Some(utilsStringPointerToString(resp.HTTPRequest.Failure))
if value := resp.HTTPRequest.StatusCode; value > 0 {
obs.ControlHTTPResponseStatusCode = optional.Some(value)
}
if value := resp.HTTPRequest.BodyLength; value >= 0 {
obs.ControlHTTPResponseBodyLength = optional.Some(value)
}

controlHTTPResponseHeadersKeys := make(map[string]bool)
for key := range resp.HTTPRequest.Headers {
controlHTTPResponseHeadersKeys[key] = true
}
if len(controlHTTPResponseHeadersKeys) > 0 {
obs.ControlHTTPResponseHeadersKeys = optional.Some(controlHTTPResponseHeadersKeys)
// leave everything else nil if there was a failure, like we
// already do when processing the probe events
if resp.HTTPRequest.Failure != nil {
continue
}

if v := resp.HTTPRequest.Title; v != "" {
obs.ControlHTTPResponseTitle = optional.Some(v)
}
obs.ControlHTTPResponseStatusCode = optional.Some(resp.HTTPRequest.StatusCode)
obs.ControlHTTPResponseBodyLength = optional.Some(resp.HTTPRequest.BodyLength)
obs.ControlHTTPResponseHeadersKeys = utilsExtractHTTPHeaderKeys(resp.HTTPRequest.Headers)
obs.ControlHTTPResponseTitle = optional.Some(resp.HTTPRequest.Title)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
"DNSEngine": "udp",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
"IPAddressBogon": null,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -48,7 +47,6 @@
"DNSEngine": "getaddrinfo",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -86,7 +84,6 @@
"DNSEngine": "udp",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -126,7 +123,6 @@
"DNSEngine": null,
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": 3,
"EndpointProto": "tcp",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
"DNSEngine": "udp",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
"IPAddressBogon": null,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -46,7 +45,6 @@
"DNSEngine": "doh",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
"IPAddressBogon": null,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -86,7 +84,6 @@
"DNSEngine": "getaddrinfo",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -124,7 +121,6 @@
"DNSEngine": "udp",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -162,7 +158,6 @@
"DNSEngine": "doh",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -202,7 +197,6 @@
"DNSEngine": null,
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": 4,
"EndpointProto": "tcp",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
"DNSEngine": "udp",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
"IPAddressBogon": null,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -48,7 +47,6 @@
"DNSEngine": "getaddrinfo",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -86,7 +84,6 @@
"DNSEngine": "udp",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -126,7 +123,6 @@
"DNSEngine": null,
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": 3,
"EndpointProto": "tcp",
Expand Down Expand Up @@ -169,7 +165,6 @@
"DNSEngine": null,
"IPAddress": "93.184.216.34",
"IPAddressASN": 15133,
"IPAddressOrg": "Edgecast Inc.",
"IPAddressBogon": false,
"EndpointTransactionID": 4,
"EndpointProto": "tcp",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
"DNSEngine": "udp",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
"IPAddressBogon": null,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -48,7 +47,6 @@
"DNSEngine": "udp",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -86,7 +84,6 @@
"DNSEngine": "getaddrinfo",
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
Expand Down Expand Up @@ -126,7 +123,6 @@
"DNSEngine": null,
"IPAddress": "104.154.89.105",
"IPAddressASN": 396982,
"IPAddressOrg": "Google LLC",
"IPAddressBogon": false,
"EndpointTransactionID": 3,
"EndpointProto": "tcp",
Expand Down
Loading