-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathjsphere_proposal.qmd
155 lines (144 loc) · 7.54 KB
/
jsphere_proposal.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Proposal for JSphere: Classification of The Use of JavaScript on The Web"
subtitle: "Project A for CSci 651 by John Heidemann"
author:
- name: Steven Hé (Sīchàng)
- name: "Mentor: Harsha V. Madhyastha"
format:
html:
html-math-method: katex
pdf:
pdf-engine: latexmk
papersize: a4
margin-left: 1in
margin-right: 1in
margin-top: 1in
margin-bottom: 1in
indent: 2m
bibliography: main.bib
csl: acm-sig-proceedings-long-author-list.csl
---
JavaScript (JS) plays a major role in content delivery on the modern Internet.
For the median webpage, reports show that JS code amounts to
around a quarter of the bytes transferred [@httparchive2024web;
@httparchive2024javascript] and JS execution time accounts for about half of
the compute delay [@goel2024sprinter].
In this sense, in most end users' Internet browsing experiences,
JS is central for steering the traffic and displaying information.
<!--
1. How does your research "fit in"? (The competition, from NABC.) 1.
why is it interesting research (to the field)? 2.
why is it interesting for you (what will you learn)?3.
what is relationship to other work (briefly, a few sentences or so,
ideally with references to prior work or other projects)
-->
However, it remains understudied why this huge amount of JS code is used.
While observations from the industry have argued that most of
the JS is unnecessary (e.g., [@ceglowski2015website]),
research efforts have instead focused on its security vulnerabilities,
privacy invasions, or performance (e.g., [@snyder2016browser;
@iqbal2021fingerprinting; @mardani2021horcrux]).
Despite these efforts, the root cause of all JS issues remains unanswered:
Why is so much JS used? How effective is JS in achieving these purposes?
<!-- 1. What need will your research address? -->
Better understanding of the use of JS will broadly benefit developers and
users.
Although JS was originally designed for beginner programmers to
script interactive user experience (UX) [@paolini1994netscape],
it has been adopted for a plethora of systems tasks, which
may explain its high resource usage.
For example, as the popularity of Single-Page Applications (SPAs)
[@oh2013automated] rose, many websites that could otherwise be served as
static HTML instead ship entire SPA frameworks, fetch JSON data from
the server, and convert them into DOM elements.
If we can identify common use of JS, we may guide developers interested in
optimizing their websites, improve general online browsing experience, and
inform future utilization of the Internet for serving Web content.
<!-- 1. What research you plan to do (briefly, a paragraph or so).
(What is approach, from NABC.) -->
We propose to classify the functionalities of JS code on top websites on
the public Web. Specifically, we aim to classify each JS file in
the top 1000 websites, without any logins, into one or more of
the following categories, which we call *spheres*:
- **frontend processing**, e.g., user interaction, form validation, and
other logic;
- **DOM element generation**, including styling application;
- **UX enhancement**, e.g., animations and effects;
- **extensional features**, e.g., authentication and analytics; or
- **silent**: code that does not call any JS APIs, likely never executed.
To infer these spheres, we will analyze JS browser API calls when
interacting with these websites.
We plan to follow prior practices [@snyder2016browser] when interacting with
each website: we will randomly interact with ("monkey test") its root page,
3 second-level linked webpages, and 9 third-level webpages, for
five times each.
Meanwhile, we will record traces of JS API calls with VisibleV8,
a modified Chromium browser [@jueckstock2019visiblev8].
Using the API names and their call stacks in the recorded traces,
we will apply heuristics to
classify each corresponding JS file into the above spheres.
For example, the `addEventListener` API call indicates frontend processing,
`createElement` indicates DOM element generation, `requestAnimationFrame`
indicates animation, and `sendBeacon` indicates tracking.
Although there are thousands of JS APIs,
only a fraction is used widely [@snyder2016browser], therefore we expect to
classify most JS code through a few popular APIs.
We may select several key APIs, and
design the classification heuristics based on their call counts and
relative frequencies.
We will conduct random sampling to
manually validate the classification accuracy of our heuristics.
<!-- 1. What do you expect the results of your research to be?
(What are the benfits, from NABC.) -->
We expect to produce a summary of inferred sphere distributions of
the websites we analyze, and explore the relationships between JS spheres and
website performance metrics.
First, we will report the percentage of JS code in each sphere,
the sphere distributions across websites, and
the overall JS-sphere relationships as a Venn diagram.
Then, we plan to report the distributions of the number of bytes of
JS files (bytes of JS) across the spheres.
Additionally, we may associate the bytes of JS per sphere with
each webpage's page load time, including time to first byte and time to
interactive.
Finally, we will show the most popular JS APIs and conduct case studies on
them to see what specific functionalities they are used for.
Our results will provide a first glance at the common use of JS on the Web,
which may uncover patterns and trends in JS usage and light up the path for
future investigation on JS usage optimality and impact.
For example, if *extensional features* shows significant on the Venn diagram,
we may infer that websites frequently engage in special activities other than
content delivery; if *frontend processing* is common among JS files,
it may suggest that websites rely too much on JS for displaying content, and
should instead reconsider server-side rendering; if
*UX enhancement* is associated with longer page load time, it may reveal that
the use of JS usually works against the developers' intention in these cases.
<!-- 1. what specific *deliverables* you will have for
Project B.-->
We will include the following deliverables for Project B, as required:
- Weekly meetings between Steven and Harsha.
- Weekly progress notes in a Google Docs document. These notes will include:
- What Steven accomplished last week.
- Questions and problems Steven needs help with.
- What Steven plans to do next week.
- Comments from Harsha.
Each week, Steven will prepare these notes and send them to
Harsha the night before their meeting.
Steven will update the notes after the meeting with answers to
their questions and any revisions to their plan.
- The resulting code, intermediate measurement data, and
analysis documentation.
Specifically, we will include the code used to browse and interact with
the websites, collect the API traces, and classify the JS files.
As an intermediate stage, we will design heuristics only for
classifying the *DOM element generation* sphere.
We will include the process and results of the manual validation on
this single sphere. The measurement data will include the website domains,
<!-- page load time and bytes, -->
the API traces, the JS files, and the relations among them.
- A 2-6 page Project B report summarizing the progress of the project as of
the Project B deadline and the plan for Research Project C.
- If the Project C proposal is approved, a 3-8 page Project C report and
an end-of-semester poster summarizing the progress of
the project at the end of the semester.
# References