Optimize parser by removing repeated hash merges #515

andrewts129 · 2024-11-27T04:10:23Z

Changes

Thanks for the helpful gem!

While profiling the startup time for a large Rails app that's manually invoking Dotenv.load very early on in the boot process in order to get access to envvars ASAP, I noticed that the subsequent Dotenv.load being automatically called by this gem's provided railtie was taking an unusual amount of time to complete. The time was mostly being spent in the variable substitution module, on this line:

combined_env = overwrite ? ENV.to_h.merge(env) : env.merge(ENV)

Since ENV had already been loaded up with ~2,000 extra variables from the first run of dotenv, this hash merge is not a trivially cheap operation and it added up being run when parsing each line.

From what I understand, the purpose of that line is to build a lookup table that gives priority to either envvars already in ENV or envvars from an earlier line in the file, depending on the value of the "overwrite" flag. We can make this operation unnecessary by updating the parser to simply skip over lines re-defining a variable already in ENV when "overwrite = false", leaving the variable substitution module not even having to worry about the prioritization.

This leads to a modest performance improvement when parsing a large .env file, and a significant one when parsing a large .env file when ENV is already very populated by some other process (most likely a previous run of dotenv, but I can imagine there are other, less avoidable reasons this could happen as well):

The .env file used for this benchmark was created from this script:

require "securerandom"

lines = Array.new(2000) { "#{SecureRandom.uuid.tr("-", "")}=\"#{SecureRandom.uuid.tr("-", "")}\""}
IO.write("./tmp/.env", lines.join("\n"))

Validation

The RSpec test suite for this gem looks to be pretty thorough and it's all still passing after this change, so from that I don't believe that this will have any unintended changes in functionality.

This leads to a noticeable performance improvement when a large number of environment variables have already been set

bkeepers · 2024-12-12T21:32:31Z

@andrewts129 awesome, thanks for finding this and working on a fix!

I pushed some benchmarks in 1877fa0 just to compare. Here's the results of loading a 1000 line .env file:

main

parse, overwrite:false
                         43.265 (±11.6%) i/s   (23.11 ms/i) -    196.000 in   5.056455s
parse, overwrite:true
                         26.042 (± 0.0%) i/s   (38.40 ms/i) -    132.000 in   5.069958s

this branch

parse, overwrite:false
                        552.652 (± 0.9%) i/s    (1.81 ms/i) -      2.805k in   5.075897s
parse, overwrite:true
                        569.266 (± 0.7%) i/s    (1.76 ms/i) -      2.850k in   5.006666s

13-21x faster. Nice work!

Part of the optimization in #515 was to skip parsing variables that were already defined. But that had the side-effect of not returning them in the resulting hash. This adds a test for this behavior and restores it.

andrewts129 and others added 2 commits November 17, 2024 14:37

Optimize parser by removing repeated hash merges

05983b1

This leads to a noticeable performance improvement when a large number of environment variables have already been set

Add benchmarks for parsing

1877fa0

bkeepers merged commit b396779 into bkeepers:main Dec 12, 2024
12 checks passed

bkeepers mentioned this pull request Dec 13, 2024

Restore previous parser behavior of returning existing variables #519

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize parser by removing repeated hash merges #515

Optimize parser by removing repeated hash merges #515

andrewts129 commented Nov 27, 2024

bkeepers commented Dec 12, 2024

Optimize parser by removing repeated hash merges #515

Optimize parser by removing repeated hash merges #515

Conversation

andrewts129 commented Nov 27, 2024

Changes

Validation

bkeepers commented Dec 12, 2024