Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(minipipeline): introduce "classic" observations filtering #1402

Merged
merged 81 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
766d4d0
chore: start investigating LTE vs v0.4
bassosimone Nov 22, 2023
f3701a0
document why some QA tests with redirects are broken
bassosimone Nov 23, 2023
f1cc4bb
document more doubts about emmitting events
bassosimone Nov 23, 2023
0b2203d
document more caveats
bassosimone Nov 23, 2023
8eb1ba1
[ci skip] remember to update files in sync
bassosimone Nov 23, 2023
92eb7b7
doc: document more doubts that I have
bassosimone Nov 23, 2023
ea0d3bf
[ci skip] more documentation on what to do
bassosimone Nov 23, 2023
fd06406
feat: progress towards fixing some fundamental issues
bassosimone Nov 23, 2023
ad75714
resolve one more test case
bassosimone Nov 23, 2023
d6cbfd9
more fixes
bassosimone Nov 23, 2023
41fbd3f
doc: explain issues caused by adding HTTP response
bassosimone Nov 23, 2023
e5e4c37
try to sketch out an ooni/data-inspired pipeline
bassosimone Nov 23, 2023
6aff4f0
convert more of v0.5's analysis to the ooni/data-like style
bassosimone Nov 24, 2023
94f9fd7
some more progress
bassosimone Nov 24, 2023
ff42f3c
break the code in a different way
bassosimone Nov 24, 2023
5ad88d5
feat: rewrite the pipeline to match ooni/data more closely
bassosimone Nov 24, 2023
132ba4d
also implement the analysis
bassosimone Nov 24, 2023
be18947
work
bassosimone Nov 24, 2023
18da855
we're mostly done in terms of passing the existing QA tests
bassosimone Nov 25, 2023
e7764c8
tests now green
bassosimone Nov 25, 2023
dac4170
make more test cases work with LTE
bassosimone Nov 26, 2023
258d7fb
we now pass all tests
bassosimone Nov 27, 2023
c6a49ff
[ci skip] remove TODO
bassosimone Nov 27, 2023
8d25a65
fix tricky case with order of DNS processing
bassosimone Nov 27, 2023
44541ea
adjust test case where actually dns is consistent with lte
bassosimone Nov 27, 2023
183f524
make all lte tests pass consistently
bassosimone Nov 27, 2023
a4bedcc
x
bassosimone Nov 27, 2023
3ded283
start generating test cases for the minipipeline
bassosimone Nov 27, 2023
1eaaac0
start adding tests for the minipipeline
bassosimone Nov 27, 2023
df33632
add tests for the minipipeline command
bassosimone Nov 27, 2023
5ad8387
more testing
bassosimone Nov 27, 2023
9fc77fc
more minipipeline tests
bassosimone Nov 27, 2023
c7c310a
x
bassosimone Nov 27, 2023
281e38d
x
bassosimone Nov 27, 2023
0ea4803
add more test cases
bassosimone Nov 27, 2023
9ec20fc
x
bassosimone Nov 27, 2023
66364ed
x
bassosimone Nov 27, 2023
329b2c8
x
bassosimone Nov 27, 2023
72b2be9
x
bassosimone Nov 27, 2023
7f8c143
start documenting code and existing bugs
bassosimone Nov 27, 2023
14840ac
attempt to fix the model problems
bassosimone Nov 27, 2023
55c05ac
commit the measurements
bassosimone Nov 27, 2023
c45a2f6
okay, this looks relatively good
bassosimone Nov 27, 2023
28f93aa
other changes
bassosimone Nov 27, 2023
d849894
x
bassosimone Nov 27, 2023
5091608
add measurements
bassosimone Nov 27, 2023
f1d7137
x
bassosimone Nov 27, 2023
c228c4e
add measurements
bassosimone Nov 27, 2023
8d668fe
x
bassosimone Nov 27, 2023
29bfdd4
x
bassosimone Nov 27, 2023
310ab28
x
bassosimone Nov 27, 2023
dfdf673
meas
bassosimone Nov 27, 2023
9110384
meas
bassosimone Nov 27, 2023
1a8c235
obs
bassosimone Nov 27, 2023
29fec45
x
bassosimone Nov 27, 2023
05f8838
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
b6c643f
x
bassosimone Nov 28, 2023
cc60691
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
0956494
fix potential bug with failed DNS lookups
bassosimone Nov 28, 2023
fae9155
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
0c6d012
x
bassosimone Nov 28, 2023
0b3a979
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
7a6f00b
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
1040d73
simplify
bassosimone Nov 28, 2023
cfe7643
x
bassosimone Nov 28, 2023
9138fb6
x
bassosimone Nov 28, 2023
a6cf23b
[ci skip] Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
5b7aa00
Butcher lte and make sure tests are aligned with v0.4
bassosimone Nov 28, 2023
4331276
we need to trust everything that v0.4 emits
bassosimone Nov 28, 2023
383eb69
x
bassosimone Nov 28, 2023
1488d05
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
4217859
x
bassosimone Nov 28, 2023
99d5e1e
add classic filter
bassosimone Nov 28, 2023
6eeb352
[ci skip]
bassosimone Nov 28, 2023
7d5bed9
x
bassosimone Nov 28, 2023
e1f87b1
x
bassosimone Nov 28, 2023
f7f5c3e
x
bassosimone Nov 28, 2023
90457e9
Merge branch 'master' into issue/2634
bassosimone Nov 29, 2023
1bed6a7
x
bassosimone Nov 29, 2023
a342a11
x
bassosimone Nov 29, 2023
e017555
Apply suggestions from code review
bassosimone Nov 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion internal/cmd/minipipeline/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ func main() {
fmt.Fprintf(os.Stderr, "\n")
fmt.Fprintf(os.Stderr, "Analyzes the <file> provided using -measurement <file> and writes the\n")
fmt.Fprintf(os.Stderr, "observations.json and analysis.json files in the -destdir <dir> directory,\n")
fmt.Fprintf(os.Stderr, "which must already exist.\n")
fmt.Fprintf(os.Stderr, "which must already exist. Additionally, we also perform a \"classic\"\n")
fmt.Fprintf(os.Stderr, "analysis like the one in Web Connectivity v0.4 and generate accordingly the\n")
fmt.Fprintf(os.Stderr, "observations_classic.json and analysis_classic.json files.\n")
fmt.Fprintf(os.Stderr, "\n")
fmt.Fprintf(os.Stderr, "Use -prefix <prefix> to add <prefix> in front of the generated files names.\n")
fmt.Fprintf(os.Stderr, "\n")
Expand All @@ -58,8 +60,18 @@ func main() {
container := runtimex.Try1(minipipeline.IngestWebMeasurement(&parsed))
mustWriteFileFn(observationsPath, must.MarshalAndIndentJSON(container, "", " "), 0600)

// generate and write classic observations
classicObservationsPath := filepath.Join(*destdirFlag, *prefixFlag+"observations_classic.json")
containerClassic := minipipeline.ClassicFilter(container)
mustWriteFileFn(classicObservationsPath, must.MarshalAndIndentJSON(containerClassic, "", " "), 0600)

// generate and write observations analysis
analysisPath := filepath.Join(*destdirFlag, *prefixFlag+"analysis.json")
analysis := minipipeline.AnalyzeWebObservations(container)
mustWriteFileFn(analysisPath, must.MarshalAndIndentJSON(analysis, "", " "), 0600)

// generate and write the classic analysis
classicAnalysisPath := filepath.Join(*destdirFlag, *prefixFlag+"analysis_classic.json")
analysisClassic := minipipeline.AnalyzeWebObservations(containerClassic)
mustWriteFileFn(classicAnalysisPath, must.MarshalAndIndentJSON(analysisClassic, "", " "), 0600)
}
14 changes: 14 additions & 0 deletions internal/cmd/minipipeline/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,26 @@ func TestMainSuccess(t *testing.T) {
t.Fatal(diff)
}

// make sure the generated classic observations are good
expectedObservationsClassic := mustloadfile(filepath.Join("testdata", "observations_classic.json"))
gotObservationsClassic := mustloaddata(contentmap, filepath.Join("xo", "y-observations_classic.json"))
if diff := cmp.Diff(expectedObservationsClassic, gotObservationsClassic); diff != "" {
t.Fatal(diff)
}

// make sure the generated analysis is good
expectedAnalysis := mustloadfile(filepath.Join("testdata", "analysis.json"))
gotAnalysis := mustloaddata(contentmap, filepath.Join("xo", "y-analysis.json"))
if diff := cmp.Diff(expectedAnalysis, gotAnalysis); diff != "" {
t.Fatal(diff)
}

// make sure the generated classic analysis is good
expectedAnalysisClassic := mustloadfile(filepath.Join("testdata", "analysis_classic.json"))
gotAnalysisClassic := mustloaddata(contentmap, filepath.Join("xo", "y-analysis_classic.json"))
if diff := cmp.Diff(expectedAnalysisClassic, gotAnalysisClassic); diff != "" {
t.Fatal(diff)
}
}

func TestMainUsage(t *testing.T) {
Expand Down
25 changes: 25 additions & 0 deletions internal/cmd/minipipeline/testdata/analysis_classic.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"DNSExperimentFailure": null,
"DNSTransactionsWithBogons": {},
"DNSTransactionsWithUnexpectedFailures": null,
"DNSPossiblyInvalidAddrs": {},
"DNSPossiblyInvalidAddrsClassic": {},
"DNSPossiblyNonexistingDomains": null,
"HTTPDiffBodyProportionFactor": 1,
"HTTPDiffStatusCodeMatch": true,
"HTTPDiffTitleDifferentLongWords": {},
"HTTPDiffUncommonHeadersIntersection": {
"x-drupal-cache": true,
"x-generator": true
},
"HTTPFinalResponsesWithControl": {
"4": true
},
"HTTPFinalResponsesWithTLS": {
"4": true
},
"TCPTransactionsWithUnexpectedTCPConnectFailures": {},
"TCPTransactionsWithUnexpectedTLSHandshakeFailures": {},
"TCPTransactionsWithUnexpectedHTTPFailures": {},
"TCPTransactionsWithUnexplainedUnexpectedFailures": {}
}
111 changes: 111 additions & 0 deletions internal/cmd/minipipeline/testdata/observations_classic.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
{
"DNSLookupFailures": [],
"DNSLookupSuccesses": [
{
"DNSTransactionID": 1,
"DNSDomain": "nexa.polito.it",
"DNSLookupFailure": "",
"DNSQueryType": "ANY",
"DNSEngine": "getaddrinfo",
"IPAddress": "130.192.16.171",
"IPAddressASN": 137,
"IPAddressBogon": false,
"EndpointTransactionID": null,
"EndpointProto": null,
"EndpointPort": null,
"EndpointAddress": null,
"TCPConnectFailure": null,
"TLSHandshakeFailure": null,
"TLSServerName": null,
"HTTPRequestURL": null,
"HTTPFailure": null,
"HTTPResponseStatusCode": null,
"HTTPResponseBodyLength": null,
"HTTPResponseBodyIsTruncated": null,
"HTTPResponseHeadersKeys": null,
"HTTPResponseLocation": null,
"HTTPResponseTitle": null,
"HTTPResponseIsFinal": null,
"ControlDNSDomain": null,
"ControlDNSLookupFailure": null,
"ControlTCPConnectFailure": null,
"MatchWithControlIPAddress": null,
"MatchWithControlIPAddressASN": null,
"ControlTLSHandshakeFailure": null,
"ControlHTTPFailure": null,
"ControlHTTPResponseStatusCode": null,
"ControlHTTPResponseBodyLength": null,
"ControlHTTPResponseHeadersKeys": null,
"ControlHTTPResponseTitle": null
}
],
"KnownTCPEndpoints": {
"4": {
"DNSTransactionID": 3,
"DNSDomain": "nexa.polito.it",
"DNSLookupFailure": "",
"DNSQueryType": null,
"DNSEngine": null,
"IPAddress": "130.192.16.171",
"IPAddressASN": 137,
"IPAddressBogon": false,
"EndpointTransactionID": 4,
"EndpointProto": "tcp",
"EndpointPort": "443",
"EndpointAddress": "130.192.16.171:443",
"TCPConnectFailure": "",
"TLSHandshakeFailure": "",
"TLSServerName": "nexa.polito.it",
"HTTPRequestURL": "https://nexa.polito.it/",
"HTTPFailure": "",
"HTTPResponseStatusCode": 200,
"HTTPResponseBodyLength": 36564,
"HTTPResponseBodyIsTruncated": false,
"HTTPResponseHeadersKeys": {
"Cache-Control": true,
"Content-Language": true,
"Content-Type": true,
"Date": true,
"Etag": true,
"Expires": true,
"Last-Modified": true,
"Link": true,
"Server": true,
"Vary": true,
"X-Content-Type-Options": true,
"X-Drupal-Cache": true,
"X-Frame-Options": true,
"X-Generator": true
},
"HTTPResponseLocation": null,
"HTTPResponseTitle": "Nexa Center for Internet \u0026 Society | Il centro Nexa è un centro di ricerca del Dipartimento di Automatica e Informatica del Politecnico di Torino",
"HTTPResponseIsFinal": true,
"ControlDNSDomain": null,
"ControlDNSLookupFailure": null,
"ControlTCPConnectFailure": "",
"MatchWithControlIPAddress": true,
"MatchWithControlIPAddressASN": true,
"ControlTLSHandshakeFailure": "",
"ControlHTTPFailure": "",
"ControlHTTPResponseStatusCode": 200,
"ControlHTTPResponseBodyLength": 36564,
"ControlHTTPResponseHeadersKeys": {
"Cache-Control": true,
"Content-Language": true,
"Content-Type": true,
"Date": true,
"Etag": true,
"Expires": true,
"Last-Modified": true,
"Link": true,
"Server": true,
"Vary": true,
"X-Content-Type-Options": true,
"X-Drupal-Cache": true,
"X-Frame-Options": true,
"X-Generator": true
},
"ControlHTTPResponseTitle": "Nexa Center for Internet \u0026 Society | Il centro Nexa è un centro di ricerca del Dipartimento di Automatica e Informatica del Politecnico di Torino"
}
}
}
13 changes: 11 additions & 2 deletions internal/cmd/qatool/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,23 @@ func runWebConnectivityLTE(tc *webconnectivityqa.TestCase) {
// serialize the observations
mustSerializeMkdirAllAndWriteFile(actualDestdir, "observations.json", observationsContainer)

// convert to classic observations
observationsContainerClassic := minipipeline.ClassicFilter(observationsContainer)

// serialize the classic observations
mustSerializeMkdirAllAndWriteFile(actualDestdir, "observations_classic.json", observationsContainerClassic)

// analyze the observations
analysis := minipipeline.AnalyzeWebObservations(observationsContainer)

// serialize the observations analysis
mustSerializeMkdirAllAndWriteFile(actualDestdir, "analysis.json", analysis)

// print the analysis to stdout
fmt.Printf("%s\n", must.MarshalAndIndentJSON(analysis, "", " "))
// perform the classic analysis
analysisClassic := minipipeline.AnalyzeWebObservations(observationsContainerClassic)

// serialize the classic analysis results
mustSerializeMkdirAllAndWriteFile(actualDestdir, "analysis_classic.json", analysisClassic)
}
}

Expand Down
26 changes: 17 additions & 9 deletions internal/cmd/qatool/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,23 @@ func TestMainSuccess(t *testing.T) {

// make sure we attempted to write the desired files
expect := map[string]bool{
"xo/dnsBlockingAndroidDNSCacheNoData/measurement.json": true,
"xo/dnsBlockingAndroidDNSCacheNoData/observations.json": true,
"xo/dnsBlockingAndroidDNSCacheNoData/analysis.json": true,
"xo/dnsBlockingBOGON/analysis.json": true,
"xo/dnsBlockingBOGON/measurement.json": true,
"xo/dnsBlockingBOGON/observations.json": true,
"xo/dnsBlockingNXDOMAIN/measurement.json": true,
"xo/dnsBlockingNXDOMAIN/observations.json": true,
"xo/dnsBlockingNXDOMAIN/analysis.json": true,
"xo/dnsBlockingAndroidDNSCacheNoData/measurement.json": true,
"xo/dnsBlockingAndroidDNSCacheNoData/observations.json": true,
"xo/dnsBlockingAndroidDNSCacheNoData/observations_classic.json": true,
"xo/dnsBlockingAndroidDNSCacheNoData/analysis.json": true,
"xo/dnsBlockingAndroidDNSCacheNoData/analysis_classic.json": true,

"xo/dnsBlockingBOGON/analysis.json": true,
"xo/dnsBlockingBOGON/analysis_classic.json": true,
"xo/dnsBlockingBOGON/measurement.json": true,
"xo/dnsBlockingBOGON/observations.json": true,
"xo/dnsBlockingBOGON/observations_classic.json": true,

"xo/dnsBlockingNXDOMAIN/measurement.json": true,
"xo/dnsBlockingNXDOMAIN/observations.json": true,
"xo/dnsBlockingNXDOMAIN/observations_classic.json": true,
"xo/dnsBlockingNXDOMAIN/analysis.json": true,
"xo/dnsBlockingNXDOMAIN/analysis_classic.json": true,
}
got := make(map[string]bool)
for key := range contentmap {
Expand Down
53 changes: 53 additions & 0 deletions internal/minipipeline/classic.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package minipipeline

// ClassicFilter takes in input a [*WebObservationsContainer] and returns in output
// another [*WebObservationsContainer] where we only keep:
//
// 1. DNS lookups using getaddrinfo;
//
// 2. IP addresses discovered using getaddrinfo;
//
// 3. endpoints using such IP addresses.
//
// We use this filter to produce a backward compatible Web Connectivity analysis
// when the input [*WebObservationsContainer] was built using LTE.
//
// The result should approximate what v0.4 would have measured.
func ClassicFilter(input *WebObservationsContainer) (output *WebObservationsContainer) {
output = &WebObservationsContainer{
DNSLookupFailures: []*WebObservation{},
DNSLookupSuccesses: []*WebObservation{},
KnownTCPEndpoints: map[int64]*WebObservation{},
knownIPAddresses: map[string]*WebObservation{},
}

// DNSLookupFailures
for _, entry := range input.DNSLookupFailures {
if !utilsEngineIsGetaddrinfo(entry.DNSEngine) {
continue
}
output.DNSLookupFailures = append(output.DNSLookupFailures, entry)
}

// DNSLookupSuccesses & knownIPAddresses
for _, entry := range input.DNSLookupSuccesses {
if !utilsEngineIsGetaddrinfo(entry.DNSEngine) {
continue
}
ipAddr := entry.IPAddress.Unwrap() // it MUST be there
output.DNSLookupSuccesses = append(output.DNSLookupSuccesses, entry)
output.knownIPAddresses[ipAddr] = entry
}

// KnownTCPEndpoints
for _, entry := range input.KnownTCPEndpoints {
ipAddr := entry.IPAddress.Unwrap() // it MUST be there
txid := entry.EndpointTransactionID.Unwrap()
if output.knownIPAddresses[ipAddr] == nil {
continue
}
output.KnownTCPEndpoints[txid] = entry
}

return
}
30 changes: 30 additions & 0 deletions internal/minipipeline/qa_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,32 +42,62 @@ func testMustRunAllWebTestCases(t *testing.T, topdir string) {
var expectedContainerData minipipeline.WebObservationsContainer
must.UnmarshalJSON(expectedContainerRaw, &expectedContainerData)

// load the expected classic container from the test case
expectedClassicContainerFile := filepath.Join(fullpath, "observations_classic.json")
expectedClassicContainerRaw := must.ReadFile(expectedClassicContainerFile)
var expectedClassicContainerData minipipeline.WebObservationsContainer
must.UnmarshalJSON(expectedClassicContainerRaw, &expectedClassicContainerData)

// load the expected analysis from the test case
expectedAnalysisFile := filepath.Join(fullpath, "analysis.json")
expectedAnalysisRaw := must.ReadFile(expectedAnalysisFile)
var expectedAnalysisData minipipeline.WebAnalysis
must.UnmarshalJSON(expectedAnalysisRaw, &expectedAnalysisData)

// load the expected classic analysis from the test case
expectedClassicAnalysisFile := filepath.Join(fullpath, "analysis_classic.json")
expectedClassicAnalysisRaw := must.ReadFile(expectedClassicAnalysisFile)
var expectedClassicAnalysisData minipipeline.WebAnalysis
must.UnmarshalJSON(expectedClassicAnalysisRaw, &expectedClassicAnalysisData)

// load the measurement into the pipeline
gotContainerData, err := minipipeline.IngestWebMeasurement(&measurementData)
if err != nil {
t.Fatal(err)
}

// convert the container into a classic container
gotClassicContainerData := minipipeline.ClassicFilter(gotContainerData)

// analyze the measurement
gotAnalysisData := minipipeline.AnalyzeWebObservations(gotContainerData)

// perform the classic web-connectivity-v0.4-like analysis
gotClassicAnalysisData := minipipeline.AnalyzeWebObservations(gotClassicContainerData)

t.Run("observations", func(t *testing.T) {
if diff := testCmpDiffUsingGenericMaps(&expectedContainerData, gotContainerData); diff != "" {
t.Fatal(diff)
}
})

t.Run("observations_classic", func(t *testing.T) {
if diff := testCmpDiffUsingGenericMaps(&expectedClassicContainerData, gotClassicContainerData); diff != "" {
t.Fatal(diff)
}
})

t.Run("analysis", func(t *testing.T) {
if diff := testCmpDiffUsingGenericMaps(&expectedAnalysisData, gotAnalysisData); diff != "" {
t.Fatal(diff)
}
})

t.Run("analysis_classic", func(t *testing.T) {
if diff := testCmpDiffUsingGenericMaps(&expectedClassicAnalysisData, gotClassicAnalysisData); diff != "" {
t.Fatal(diff)
}
})
})
}
})
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"DNSExperimentFailure": null,
"DNSTransactionsWithBogons": {},
"DNSTransactionsWithUnexpectedFailures": null,
"DNSPossiblyInvalidAddrs": {},
"DNSPossiblyInvalidAddrsClassic": {},
"DNSPossiblyNonexistingDomains": null,
"HTTPDiffBodyProportionFactor": null,
"HTTPDiffStatusCodeMatch": null,
"HTTPDiffTitleDifferentLongWords": null,
"HTTPDiffUncommonHeadersIntersection": null,
"HTTPFinalResponsesWithControl": null,
"HTTPFinalResponsesWithTLS": null,
"TCPTransactionsWithUnexpectedTCPConnectFailures": {},
"TCPTransactionsWithUnexpectedTLSHandshakeFailures": {},
"TCPTransactionsWithUnexpectedHTTPFailures": null,
"TCPTransactionsWithUnexplainedUnexpectedFailures": null
}
Loading