Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parser by removing repeated hash merges #515

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

andrewts129
Copy link
Contributor

Changes

Thanks for the helpful gem!

While profiling the startup time for a large Rails app that's manually invoking Dotenv.load very early on in the boot process in order to get access to envvars ASAP, I noticed that the subsequent Dotenv.load being automatically called by this gem's provided railtie was taking an unusual amount of time to complete. The time was mostly being spent in the variable substitution module, on this line:

combined_env = overwrite ? ENV.to_h.merge(env) : env.merge(ENV)

Since ENV had already been loaded up with ~2,000 extra variables from the first run of dotenv, this hash merge is not a trivially cheap operation and it added up being run when parsing each line.

From what I understand, the purpose of that line is to build a lookup table that gives priority to either envvars already in ENV or envvars from an earlier line in the file, depending on the value of the "overwrite" flag. We can make this operation unnecessary by updating the parser to simply skip over lines re-defining a variable already in ENV when "overwrite = false", leaving the variable substitution module not even having to worry about the prioritization.

This leads to a modest performance improvement when parsing a large .env file, and a significant one when parsing a large .env file when ENV is already very populated by some other process (most likely a previous run of dotenv, but I can imagine there are other, less avoidable reasons this could happen as well):

image

image

The .env file used for this benchmark was created from this script:

require "securerandom"

lines = Array.new(2000) { "#{SecureRandom.uuid.tr("-", "")}=\"#{SecureRandom.uuid.tr("-", "")}\""}
IO.write("./tmp/.env", lines.join("\n"))

Validation

The RSpec test suite for this gem looks to be pretty thorough and it's all still passing after this change, so from that I don't believe that this will have any unintended changes in functionality.

andrewts129 and others added 2 commits November 17, 2024 14:37
This leads to a noticeable performance improvement when a large number of environment variables have already been set
@bkeepers
Copy link
Owner

@andrewts129 awesome, thanks for finding this and working on a fix!

I pushed some benchmarks in 1877fa0 just to compare. Here's the results of loading a 1000 line .env file:

main

parse, overwrite:false
                         43.265 (±11.6%) i/s   (23.11 ms/i) -    196.000 in   5.056455s
parse, overwrite:true
                         26.042 (± 0.0%) i/s   (38.40 ms/i) -    132.000 in   5.069958s

this branch

parse, overwrite:false
                        552.652 (± 0.9%) i/s    (1.81 ms/i) -      2.805k in   5.075897s
parse, overwrite:true
                        569.266 (± 0.7%) i/s    (1.76 ms/i) -      2.850k in   5.006666s

13-21x faster. Nice work!

@bkeepers bkeepers merged commit b396779 into bkeepers:main Dec 12, 2024
12 checks passed
bkeepers added a commit that referenced this pull request Dec 13, 2024
Part of the optimization in #515 was to skip parsing variables that were already defined. But that had the side-effect of not returning them in the resulting hash. This adds a test for this behavior and restores it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants