jsphere_final_report.qmd

---
title: "JSphere: Classification of The Use of JavaScript on The Web"
author:
    - name: Sichang "Steven" He
    - name: "Mentor: Harsha V. Madhyastha"
format: acm-pdf
keep-tex: true
fig-pos: 'tb'
filters:
    - table_workaround_filter.lua
acm-metadata:
    final: true
    acmart-options: sigconf,screen,nonacm
    copyright: none
abstract: |
    JavaScript (JS) scripts are popularly used among websites and
    have high impact on the Web browsing experience for millions of users.
    Why is JS used so extensively? What purposes does it serve?
    We begin the first known study to
    classify JS scripts into common functionalities, which we call *spheres*.
    Through web crawling and analyzing browser API calls,
    we look into the usage pattern of JS among the top 1000 subdomains on
    the public Web. We find that
    JS script sizes follow a tail-heavy distribution and
    many large scripts fall into multiple spheres, based on which
    we propose a novel `eval`-based approach to
    break down large JS scripts into smaller execution contexts.
    Our results reveal that frontend processing remains the core of JS usage,
    but half of the functional code is related to Document Object Model (DOM)
    element generation.
    Additionally, we observe half of the code to not invoke any browser APIs or
    mutate global states, which may suggest much JS code is unnecessary.
    Our findings may provide insights into JS usage and
    subsequently inform website optimization and future Web development.
bibliography: main.bib
---

# Weekly Meeting Notes {.unnumbered}

<!-- TODO: add this in private repo. -->

# Introduction {#sec-intro}
<!-- giving an overview of what need you're trying to address,
what you are doing and plan to do, what the novelty and
new results will be (i.e., why will your work be interesting to others in
the field), and why is this work interesting to you, personally.
Except for the last part (about you), that
structure is roughly like a typical paper. -->

The amount of JavaScript (JS) websites include has been growing over the years,
with visible impacts for users browsing the Web. On today's median webpage,
JS constitutes around 25% of the bytes transferred [@httparchive2024web;
@httparchive2024javascript], and has been reported to account for about 50% of
the compute delay [@goel2024sprinter], thus JS is responsible for
much waiting time that millions of website visitors experience every day.
Additionally, JS is responsible for providing client-side interactivity and
sometimes content rendering.
Therefore, JS plays a central role in shaping the Web browsing experience in
terms of both resource usage and functionalities.

However, despite the extensive use of JS, the exact underlying reasons for
its popularity remain understudied.
While some believe the increase of JS usage provides the end users with
more interactive browsing experiences, others suggest much of
this JS is unnecessary and
instead inflates webpages size [@ceglowski2015website].
Unfortunately, user-facing JS research efforts have primarily focused on
security vulnerabilities, privacy invasions, and performance issues.
The purpose behind the extensive use of JS and its effectiveness in
achieving them remain unanswered.

As a step towards understanding the common purpose and necessity of JS use,
JSphere aims to classify the functionalities of JS scripts on
the public Web. Although JS was originally designed for beginner programmers to
script interactive user experience (UX) [@paolini1994netscape],
it has been adopted for a plethora of systems tasks, which
may explain its high resource usage.
For example, as the popularity of Single-Page Applications (SPAs)
[@oh2013automated] rose, many websites that could otherwise be served as
static HTML instead ship entire SPA frameworks, fetch JSON data from
the server, and convert them into DOM elements.
By understanding the common use of JS, we can potentially guide developers in
optimizing their websites, improve general online browsing experiences, and
inform future utilization of the Internet for serving Web content.

As we alluded above, some functionalities are more suitable to
be implemented on the client, while others are less essential or
can be offloaded to the server.
Based on this observation, we propose to classify each JS script into one or
multiple categories, which we call *spheres*.
We describe the spheres in @sec-spheres, and
discuss how we classify JS into these spheres.

JSphere analyzes JS browser API calls to
approximately classify them into the above spheres.
While perfect classification is generally impossible because
JS is Turing-complete, browser API calls reveal the effects each script has on
the webpage.
Additionally, browser APIs usually have distinct and specific purposes.
By observing which browser APIs a script calls when the webpage loads or
is interacted with, we can approximately infer its purpose.
Thus, we visit and interact with webpages, then
analyze their browser API calls to classify their JS into spheres, as
we detail in @sec-methods.

Sphere classification is challenging not only because we need to
infer script intents, but as we discovered,
a single JS script may fall into multiple or even all spheres.
Overlapping spheres hinder our ability to
meaningfully identify what portions of JS scripts on
the Web fall into each sphere.
To address this issue, in @sec-methods2 we propose a novel approach to
break down large JS scripts into small blocks of `eval` calls, which
allows us to classify JS code on a finer granularity.

Our results in @sec-class-results and @sec-results2 reveal that
the essential frontend processing remain the core of JS usage on
the top 1000 subdomains, whereas a sizable portion of JS code is related to
rendering the content or extensional features like tracking.
JSphere also identifies a large portion of JS code that
does not invoke any browser APIs or mutate global states, suggesting that
much JS code may be unnecessary.
We hope our findings can provide insights into JS usage and
inform future Web development.

This work is personally interesting to me because I have strong opinions on
the trend of increasing JS usage on the Web. Although popular thoughts on
social media advocates for more JS usage, claiming it provides better user and
developer experiences, I personally experience otherwise.
As a user browsing the Web, I frequently experienced the frustration of
slow webpage loading and interaction even when
very little content is being loaded.
When I checked the network tab in my browser, I always saw many requests and
much JS transferred.
As a web developer, I also found the SPA frameworks unnecessary, that
it causes constant struggles to synchronize both frontend and
backend implementations and state.
I hope JSphere can bridge some gaps between the popular beliefs and
the reality of JS usage.

# Related Work
<!-- summarizes prior work related to yours and describes how your work compares.
When you compare to prior work, summarize the prior work and then
identify how it is alike or different from your work.
(Usually, related work is similar to your work in some ways and different in
others.) You need to include complete citations to the prior work
(those citations should point to a bibliography at the end of your report). -->

<!-- TODO: These are copied over. -->
There have been many user-facing studies on JS.
Researchers have studied how to detect privacy-invading or
malicious intention in JS, such as tracking when
user data flow towards the server [@li2018jsgraph],
identifying browser APIs associated with advertisements and
tracking [@snyder2016browser],
identifying fingerprinting behaviors [@iqbal2021fingerprinting], and,
most recently, detecting techniques for malicious code to
evade observations [@pantelaios2024fv8].
Besides privacy and security, researchers have also invented techniques to
improve the performance bottlenecks brought by JS, such as
speeding up webpage load time by executing JS in
parallel [@mardani2021horcrux], or reducing JS execution during web crawling by
imitating their effects on previously-visited webpages [@goel2024sprinter].
These studies have provided insights into how to solve some of the issues of
JS and improve user experiences.

There are, however, no research focusing on identifying the purposes of
general JS, to the best of our knowledge.
JSphere aims to bridge this gap of understanding about the use of JS in
general.
Instead of working around the JS issues, we aim to understand the root cause;
instead of focusing on script intentions centering around privacy and security,
we focus on the ordinary use of JS on webpages.
By looking into the common use of ordinary JS, we hope to
provide insights into avoiding JS issues in the first place and
improving web browsing experiences.

# Goals {#sec-goal}
<!-- listing what your goals were from Research Projects A and B with
a statement about which goals you completed, and for those you didn't complete,
why you didn't complete them.
Possible Reasons may vary from "ran out of time" or "unable to get access to
data" to "decided with mentor that result Y was more interesting than
result X"; you may have other reasons as well.
You should also list any goals you added after you submitted Project B, and
annotate these as "additional".

(This section is the only one that would not appear in a real paper.) -->

Our main goal is to classify each JS script of
the top 1000 popular websites into one or more spheres defined in @sec-spheres.
We would like to visualize the overall sphere distribution and
sphere distributions across websites, and gain insights into the common use of
JS on the Web.

Besides classifying scripts into spheres, we also aim to correlate them with
website performance.
We would like to associate the JS sizes per sphere with
each webpage's load time, and
visualize the relationship between sphere distributions and
performance metrics.

We completed the classification of JS scripts into spheres across websites, and
additionally improved the granularity of classification to sub-script level.
We did not complete the correlation with website performance: our approach with
improved granularity introduces overhead when crawling webpages,
therefore we need to rerun the experiment to collect performance metrics, but
we decided to prioritize classification granularity improvements and ran out of
time.

# JSphere {#sec-spheres}
<!-- Section 4 A discussion of your work.
You will likely want to use multiple sections or subsections here, and
you should identify your goal, methodology/approach, data collection, and
results. -->

Based on our observation in @sec-intro that
some functionalities suit JS better than others, we aim to
draw each JS script of the top 1000 popular websites into one or
more following spheres:

- **Frontend processing.** Client-side functionalities such as
    user interaction, form validation, and
    other domain-specific logic are essential JS purposes.
- **DOM element generation.** Instead of serving static HTML, many websites
    fetch JSON data from the server and convert them into DOM elements, and
    even apply styling after page load, shifting rendering workloads from
    the server to the client.
- **UX enhancement.** Animations, effects, and
    font loading improve the look and feel of the webpage, but
    may be replaceable by modern HTML and CSS features.
- **Extensional features.** Authentication, tracking, and
    hardware access are inessential to the core functionalities of webpages,
    but often provides important extra features comparable to
    native applications.
- **Silent**: Some scripts do not call any browser APIs or mutate global states,
    essentially ineffective. These are possibly never executed.

We automate browsers to crawl and collect browser JS API call logs from
the top 1000 subdomains in the Tranco sites ranking downloaded on September 25,
2024 [@pochat2018tranco].
In @sec-methods, we first experiment in a smaller scale of
the top 100 subdomains to test the feasibility of our approach, which
we call the vanilla setup.
To meaningfully classify scripts into the spheres we defined above,
we improve our methods by modifying the JS scripts before executing them, and
expand our experiments to the top 1000 subdomains in @sec-methods2.

## Vanilla Experiment Setup {#sec-methods}

We start by crawling the top 100 subdomains, then record and analyze logs of
API calls made by JS scripts on these webpages.
This setup is "vanilla" in that the browser executes the original JS scripts on
the webpages, which differs from our experiment in @sec-methods2.

<!-- TODO: These are copied over. -->
### Web Crawling

We visit and interact with webpages in a fashion used in [@snyder2016browser].
For each subdomain, we visit the root webpage, 3 second-level linked webpages,
and 9 third-level webpages, repeating this process five times.
After each webpage's `load` event fires, we conduct "chaos testing",
randomly interacting with the webpage's elements for 30 seconds in total.
During interactions, we block all page navigations by
fulfilling such requests with an HTML webpage containing a single JS block to
go back in history, and record this navigation attempt as a second- or
third-level linked webpage.
With the browser's back-forward cache enabled,
we find this method reliably resumes the interaction, with
infrequent glitches that we detect and solve by visiting the webpage again.
We use Playwright[^playwright] to automate this browsing process and
inject Gremlins.js[^gremlins] for the chaos testing.

While loading and interacting with webpages,
VisibleV8 records the API calls they make at run time and provides logs for
our analysis.
VisibleV8 is a modified Chromium browser [@jueckstock2019visiblev8]; for
each Chromium thread, it writes a log file that
contains each script execution context and browser API call made in that
context [@jueckstock2019visiblev8].
We open a new browser page every time we visit a webpage to
ensure all API calls in
each log file before our injected interaction script are made during page
loading.

We also collect the HTTP Archive (HAR) files and a list of reachable or
visited webpages for potential further analysis.
The HAR files contain detailed information about the network requests made and
the timing, which we can use for website performance comparisons.
The list of reachable or
visited webpages helps identify the links we clicked and webpages we visited.

### Log Processing

We parse the VisibleV8 logs to aggregate the API calls of
each script execution context.
An execution context is either a standalone JS file being executed,
a script tag in an HTML file, or a script created using `eval()`.
For each context, we group API calls made based on their API types, `this`
values, and attributes accessed (*e.g.,* a Get API on `Window`'s `scrollY`
attribute), and count the presence of each API call group.
Additionally, we filter out internal, user-defined, and
injected calls using log information and API name filtering, so we can focus on
browser API usage. Our custom Rust library powers this processing.

We divide API calls into two portions: calls made during page load and
calls made after interaction begins.
This division helps us filter out API calls not made for interaction handling.
We make this distinction based on when the browser enters a context of
our injected interaction script in the log file.

Additional to the API call logs, we also collect the script sizes in bytes from
their source code provided in the logs.
The script sizes may be a more indicative metric than script counts because
they better reflect the transferred bytes and the execution time of
the scripts, which directly affect user experiences.

### Heuristics for Classification {#sec-heuristics}

We count specific "anchor" APIs that strongly indicate specific spheres, and
develop heuristics based on them to determine if
a script context falls under a sphere.
Although there are tens of thousands of browser APIs,
prior studies observed an 20/80 rule of browser API usage, that
only a fraction is popularly used [@snyder2016browser].
Therefore, we expect to classify most scripts through a few popular APIs.
By manually inspecting the 678 most popular APIs that cover 90% of
all API calls we observed in the vanilla experiment,
we identify anchor APIs for each sphere and
handcrafted the following heuristics.

**Frontend processing.** Scripts that listen to webpage events via
`addEventListener`, read `Event` attributes or form elements `HTMLInputElement`
or `HTMLTextAreaElement`, handle interaction.
Additionally, interacting with client-side-only information,
including reading webpage `Location`, writing to
URL search parameters `URLSearchParams`, element text `textContent`, or
bounding rectangles `DOMRect` are done by frontend processing scripts.

**DOM element generation.** Scripts that add elements to the DOM or
apply styling before interaction begins are classified as
DOM element generation.
Such APIs include the `createElement`, `createElementNS`, `createTextNode`,
`appendChild`, `insertBefore` and `CSSStyleDeclaration.setProperty`
functions or modifying the values of `CSSStyleDeclaration` or `style`.

**UX enhancement.** If a script removes or
hides elements using `removeAttribute` or `removeChild`,
animate elements using `requestAnimationFrame` or `cancelAnimationFrame`,
query media type with `matchMedia` or `MediaQueryList.matches`, or
set font faces with `FontFaceSet.load`, we consider it UX enhancement.

**Extensional features.** Scripts that extract webpage performance data by
interacting with the `Performance`, `PerformanceTiming` or
`PerformanceResourceTiming` classes, or sends tracking data with
`Navigator.sendBeacon`, are classified as extensional features.

There are no clear ways to validate our heuristics due to the lack of
ground truth on developer intentions.
One proposed approach is to have a human or language model inspect the code to
determine which sphere it belongs to, but JS scripts on
the Web are usually minified and unreadable.
Thus, during testing, we find manual classification to be difficult, laborious,
and unlikely more accurate than our heuristics that directly rely on
the semantics of browser APIs.

## Vanilla Experiment Results

For the results from our vanilla setup, we are mostly concerned with
two aspects: validating our assumptions about a small set of
browser APIs dominating all API usage (the 20/80 rule), and testing whether
we can classify scripts into spheres based on these APIs.

### The 20/80 Rule of Browser APIs

**Dominating APIs.** We find that 1.75% (318) of all observed APIs account for
80% of all API calls and 3.74% (678) of APIs cover 90% of all API calls, as
demonstrated in @fig-api-calls-cdf.
This tail-heavy distribution suggests that, at run time,
a relatively small set of APIs dominates JS usage on the web.

![Cumulative distribution function of observed API calls by
increasing fraction of
APIs.](data/api_calls_cdf.png){width=45% #fig-api-calls-cdf}

**"Anchor" APIs.** We manually inspected the top 678 APIs that make up 90% of
all API calls and identified "anchor" APIs that
strongly indicate specific spheres.
For example, `addEventListener` for Frontend Processing and `createElement` for
DOM Element Generation.
Additionally, we manually inspected five scripts from YouTube to
gain intuition on the purpose of several anchor APIs.
Using these anchor APIs, we handcrafted the heuristics in @sec-heuristics to
classify scripts into spheres.

### Vanilla Classification Results {#sec-class-results}

We apply our heuristics to JS scripts from the top 100 subdomains,
aggregate classification results, and correlate them with script sizes.
In addition to the spheres,
we also classify scripts into auxiliary categories (Has Request,
Queries Element, Uses Storage) in the hope of discovering additional cues for
classification.
No significant correlation was found among script size and classification,
the spheres, or any other pairs of metrics.

**Single-Category Results.** We apply our heuristics to 40,116 scripts,
totaling 3,192.7 MB, and report both how many scripts in each category (count)
and how many megabytes they total (size).
@tbl-script-classification shows our heuristics successfully categorized 93.0%
of script bytes into at least one sphere (inverse of "No Sure Sphere"), though
it is only 67.2% of scripts by count.
This size coverage cannot be improved to over 94.4% even if
we use the auxiliary categories (inverse of "No Category"), so
we do not see any potential to significantly improvement coverage.
Additionally, a quarter of scripts executed do not call any browser APIs or
mutate global states, but they are only a tiny fraction of
the total script size, meaning there are many small scripts that
seemingly have no effects.

The popularities of spheres are mildly surprising.
Frontend processing is the most popular sphere, encompassing almost 90% of
scripts, which may indicate that
most scripts are doing essential client-side work.
However,
our heuristics are very relaxed toward classifying scripts into frontend
processing, thus this can be a false positive.
For example, a script primarily responsible for
DOM element generation may listen for DOM load and then generate elements,
which is unfortunately classified as frontend processing.
Other spheres also have surprisingly high popularities by size, which
alludes to the peculiar script size distribution we discuss below.

| Feature           | Count  | Count (%) | Size (MB) | Size (%) |
| :---------------- | -----: | --------: | --------: | -------: |
| Frontend Proc.    | 14,129 | 35.22     | 2864.3    | 89.72    |
| DOM Elem. Gen.    | 8,196  | 20.43     | 2248.9    | 70.44    |
| UX Enhancement    | 4,496  | 11.21     | 1840.7    | 57.65    |
| Extensional Feat. | 4,915  | 12.25     | 1888.1    | 59.14    |
| Silent Scripts    | 10,260 | 25.58     | 28.1      | 0.88     |
| Has Request       | 4,205  | 10.48     | 1432.1    | 44.86    |
| Queries Element   | 13,640 | 34.00     | 2731.6    | 85.56    |
| Uses Storage      | 4,571  | 11.39     | 1641.0    | 51.40    |
| No Sure Sphere    | 13,148 | 32.77     | 221.1     | 6.93     |
| No Category       | 10,178 | 25.37     | 179.7     | 5.63     |

: Classification results for the top 100 subdomains: the script counts,
sizes and their corresponding percentages out of the respective total.
{#tbl-script-classification}

**Script Size Distribution.** @fig-script-size-cdf shows the distribution of
script sizes in all scripts and in scripts belonging to each sphere.
Script sizes range from 9 bytes to 8.7 MB.
The average script size is 80 kB, with a median of 2.2 kB,
therefore there are many small scripts and a few large scripts, with
the large scripts contributing to most of the total script size.
Silent scripts are much smaller than others, but scripts in
each sphere have similar size distributions.

![Cumulative distribution function of script sizes among all scripts and
scripts belonging to
each sphere.](data/script_size_cdf.png){width=45% #fig-script-size-cdf}

**Multi-Sphere Scripts.** Many large scripts belong to multiple spheres.
@tbl-2-sphere-scripts shows high overlap between and among spheres: if
we pick any two spheres, we find around half of all scripts by size belong to
those spheres.
In fact, 1296.1 MB of scripts (40.60% by size) fall into all sure spheres,
although the count is only 1473 (3.67%).
The all-sure-sphere scripts have sizes ranging from 7.1 kB to 8.7 MB,
suggesting a small fraction of large scripts tightly couples multiple spheres.
Therefore, classifying by each script file is not fine-grained enough to
meaningfully separate spheres.

<!-- NOTE: Generated. Do not edit directly.
Instead, comment out Markdown table below, edit it, regenerate the TeX file,
copy it over, and change `table` to `table*`. -->
\begin{table*}

\caption{\label{tbl-2-sphere-scripts}Counts (and percentages) and sizes
(and percentages) of scripts that belong to two spheres.}

\centering{

\begin{tabular}{lccc}

\toprule

Feature Combination & Frontend Processing & DOM Element Generation & UX
Enhancement \\

\midrule

DOM Element Generation & 6602 (16.46\%), 2197.1MB (68.82\%)
& - & - \\ UX Enhancement & 3840 (9.57\%), 1821.5MB (57.05\%) & 3125 (7.79\%),
1613.2MB (50.53\%) & - \\ Extensional Features & 4229 (10.54\%),
1841.1MB (57.67\%) & 2703 (6.74\%), 1562.9MB (48.95\%) & 1844 (4.60\%),
1440.5MB (45.12\%) \\

\bottomrule

\end{tabular}

}

\end{table*}

<!--
| Feature Combination        | Frontend Processing              | DOM Element Generation          | UX Enhancement                  |
| :------------------------- | -------------------------------- | ------------------------------- | ------------------------------- |
| **DOM Element Generation** | 6602 (16.46%), 2197.1MB (68.82%) | -                               | -                               |
| **UX Enhancement**         | 3840 (9.57%), 1821.5MB (57.05%)  | 3125 (7.79%), 1613.2MB (50.53%) | -                               |
| **Extensional Features**   | 4229 (10.54%), 1841.1MB (57.67%) | 2703 (6.74%), 1562.9MB (48.95%) | 1844 (4.60%), 1440.5MB (45.12%) |

: Counts (and percentages) and sizes (and percentages) of scripts that
belong to two spheres. {#tbl-2-sphere-scripts}
-->

## Classification Granularity Improvement {#sec-methods2}

@sec-class-results shows that
our vanilla setup classifies many large scripts into multiple overlapping
spheres, thus gaining more fine-grained classifications than
per script file is crucial to understanding the popularity of each sphere.
Therefore, we modify our experiment to classify JS code on a sub-script level.

Unfortunately, the VisibleV8 browser inherently limits us to the granularity of
execution contexts, which generally correspond to individual script file, so
we cannot classify individual functions or statements with our current setup.
To work around this limitation, we notice that
execution contexts may also correspond to scripts created by calling the `eval`
function with the source code as a string.
Building on this observation, our approach is to
break down large scripts into smaller blocks of `eval`
calls before they arrive at the browser.

Specifically, we intercept all requests to JS scripts via Playwright,
make the requests ourselves, rewrite the responses into JS code with
small `eval` blocks, and then fulfill the browser's requests with
the rewritten responses.
This method allows us to maintain the rest of our setup, while
reducing the size of the classification unit from script files to `eval`
blocks.

However, the mechanisms of JS and `eval` introduce many challenges to
this approach, for which we introduce workarounds to
achieve high correctness and efficacy.

### JS Rewriting with `eval`

Our two main goals of JS rewriting are correctness and efficacy.
Correctness is to preserve the scripts' functionalities; efficacy means to
break down the scripts into reasonably small blocks.
We judge correctness by whether
we can browse the webpages normally after rewriting its JS, and we aim to
create `eval` blocks of around 1 kB in sizes for efficacy.

**Hoisting and scoping impact correctness.** Naively,
one may group all the top-level statements in
a JS script into multiple small blocks, then include each block in an `eval`
call.
However, variables declared with `var` and functions are hoisted, treated as if
they are declared at the top of the script.
Thus, in JS code, they may be accessed before they are declared.
In contrast, `eval` calls are executed sequentially, causing some variables and
functions to be undefined when accessed before they are declared in
our naive approach.
Additionally, variables declared with `let` and `const` do not leak out of
`eval` calls due to scoping, so they are not accessible in adjacent `eval`
calls.
These issues break most of the webpages we tested due to
accessing undefined variables or functions.

To mitigate issues brought by hoisting and scoping,
we manually hoist declarations.
We extract all the function and variable declarations, and declare them as
variables at the top of the script.
Then, we evaluate all function declarations first.
This workaround ensures that variables and
functions are declared before they are accessed, and that they leak out of
`eval` calls as needed.

**Efficacy requires recursive rewriting.** Splitting scripts only on
the top level results in large `eval` blocks.
This pitfall is due to the extensive use of closure functions in JS, where
functions are defined within other functions, resulting in
large top-level functions.
Additionally, many scripts avoid polluting the global scope with
immediately invoked function expressions (IIFE), where
a function expression is defined and immediately invoked.
To split scripts into blocks around 1 kB,
JSphere recursively break down code blocks such as function bodies and
the first block with an if-statement, and then introduce more `eval` blocks for
them.

We apply many more workarounds until we can successfully browser most webpages
we tested after rewriting their JS.
We wrap some code blocks in IIFE or async IIFE to enable `return` and `await`
syntax.
We convert function declarations to function expressions to leak them out of
IIFE.
We preserve blocks with `break`, `continue` and `yield` to avoid syntax errors.
To enable efficient `eval` nesting, we inject JS raw string literals and
use nested function calls to construct `eval`'s argument strings at runtime.
We increase Chromium's stack size to 4 GB and limit the depth of `eval`
nesting to 8, to avoid exceeding the stack size limit.
Finally, we skip rewriting scripts that export top-level functions and
scripts we cannot rewrite correctly.
Some of
these workarounds introduce obscure edge cases we do not yet
handle[^eval_trick].

We use Acorn[^acorn] to parse JS code into an abstract syntax tree (AST) and
generate the rewritten JS code from this AST.
An *effective length* is calculated and prepended to each `eval` block, so that
postprocessing knows the size corresponding to the original code in bytes.
Our implementation finishes rewriting an 8.6 MB script in under 1 second,
thus it introduces tolerable overhead to web crawling.

## Granularity-Improved Experiment Results {#sec-results2}

After applying our JS rewriting approach, we reran our experiment in
@sec-methods for the top 1000 subdomains for results with
improved classification granularity.
We measure the same metrics as in @sec-class-results, except
using execution contexts from `eval` blocks instead of script files as
the granularity unit.
Since the counts of execution contexts is not meaningful for
analyzing the source scripts, we focus on the sizes of the corresponding code.

Overall, we observe 21.0 million execution contexts with
source code effective size ranging from 1 byte to 6.1 MB, totaling 26.2 GB.
Only 10.86% of code is skipped and not rewritten (2.9 GB), but
some rewritten contexts remain large in size, possibly because
we do not yet break down some JS constructs like classes during our rewriting,
and more likely because of hitting the `eval` nesting limit.
Unfortunately, 89.9% of code is in contexts larger than 10 kB,
showing a large room for improvement in rewriting coverage.
However, classifications of
improved granularity still reveal more observations regarding JS usage.

**Silent Contexts.** The improved granularity reveals a much higher 49% of
code (12.9 GB) that does not invoke any API calls or mutate global states.
This is not because of the small contexts for function headers and
other peripheral code the rewriting produced, since only 10.0% of
the silent code (1.3 GB) is smaller than 1 kB.
Therefore, most silent code either performs only computations that
do not require browser APIs, or is likely never executed, *e.g.,* definition of
unused functions.
This reasoning echos the observation in @goel2022jawa that
much JS code is unreachable, and may not need to be sent to the client in
the first place.

**Overall sphere classification.** With the finer granularity,
we see less overlap among each sphere compared to our results in
@sec-class-results.
Whereas 50.9% of code is not silent, @fig-sphere-venn illustrates how 38% of
code is classified into at least one sphere and 12.4% of
code is classified into all spheres.
@tbl-context-classification shows frontend processing continue being much more
popular than the other spheres, accounting for 36% of code, which
reinforces the argument that most scripts contain essential functionalities.
DOM element generation, UX enhancement, and
extensional features all enjoy similar popularities of around 22%.
Since 50.9% of code is not silent, nearly half of
the functional code does DOM element generation, which
shows generating DOM elements is a major cause of high JS usage;
similar arguments can be made for UX enhancement and extensional features.

| Feature                | Size (MB) | Size (%) |
| :--------------------- | --------: | -------: |
| Frontend Processing    | 9,469.6   | 36.19    |
| DOM Element Generation | 6,250.1   | 23.89    |
| UX Enhancement         | 5,392.0   | 20.61    |
| Extensional Feature    | 5,617.2   | 21.47    |
| Silent Contexts        | 12,850.4  | 49.12    |
| No Sure Sphere         | 3,366.6   | 12.87    |
| All Sure Spheres       | 3,235.2   | 12.37    |

: Classification results for execution contexts for the top 1000 subdomains, in
the granularity-improved setup. {#tbl-context-classification}

![Code size distribution by sphere classification among the 38% of
classified code (not to
scale).](data/spheres_venn.pdf){width=45% #fig-sphere-venn}

<!--
This is no longer meaningful.
![Cumulative distribution function of sizes of
execution contexts among all contexts and contexts belonging to
each sphere.](data/script_size_cdf2.pdf){width=45% #fig-script-size-cdf2}
-->

**Spheres per website.** @fig-subdomain-stacked shows the distribution of
code in each sphere per subdomain, which is less skewed by outliers than
@tbl-context-classification.
Among the 555 subdomains with at least one execution context,
most subdomains have some of their code spanning multiple spheres,
thus their code sizes sum up to more than 100%.
The relatively many subdomains without any execution context may be attributed
to many subdomains on the Tranco list being CDN endpoints or
inaccessible websites.
These per-subdomain data reveal that UX enhancement and
extensional features are far less popular than shown in
the overall feature distribution, while frontend processing and
DOM element generation remain dominant for most websites.
This matches our intuition that
most websites include essential client-side processing scripts and scripts that
generate the DOM to display content.
However, the sizable portion of subdomains using performance- and
analytics-related extensional features strikingly suggests that
the top websites extensively collect telemetry and track visitors.

![Code Size Distribution per Sphere per
Subdomain.](data/subdomain_spheres_size_stacked.pdf){width=45%
#fig-subdomain-stacked}

# Future Work
<!-- You are not required to actually do this work, but it's always good to
identify where the work might go, especially since
the project work might not have perfectly aligned with the semester deadline. -->

JSphere is a start to understanding the common use of JS on the Web, and
there are many directions to continue or extend this work.
First, collecting the performance metrics of websites and correlating them with
sphere classifications may help understand the impact of different JS usage on
webpage performance.
We can further validate and refine our sphere classification heuristics.
The sphere distribution may be interpreted to understand how much of
the JS is essential and how much is optional.
Lastly, in response to the popularity of SPA frameworks, we can investigate if
shipping JS or JSON and converting them into DOM elements on
the client side save server compute or network resources compared to
traditional HTML transfer.

# Conclusion

We presented JSphere[^jsphere], an early attempt to
classify the functionalities of JS scripts on
the Web. Our methods pioneer classifying JS scripts based on browser API calls.
Early results show that frontend processing remains the core of JS usage, while
content rendering and analytics are also major contributors.
JSphere experiments provide insights into JS usage on the Web and
hopefully inspire future research on user-facing JS studies.

# References {.unnumbered}
<!-- Section 6 A bibliography, with at least what you cite in related work -->
::: {#refs}

:::

[^acorn]: <https://github.com/acornjs/acorn>.
[^eval_trick]: See <https://github.com/SichangHe/JSphere/blob/main/eval_trick.md> for details.
[^playwright]: <https://playwright.dev/>.
[^gremlins]: <https://github.com/marmelab/gremlins.js>.
[^jsphere]: See code and documentation at <https://github.com/SichangHe/JSphere>.