Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loading html from file uri #209

Merged
merged 5 commits into from
Jul 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ Grover.configure do |config|
end
```

#### ignore_path
### ignore_path
The `ignore_path` configuration option can be used to tell Grover's middleware whether it should handle/modify
the response. There are three ways to set up the `ignore_path`:
* a `String` which matches the start of the request path.
Expand All @@ -378,7 +378,7 @@ Grover.configure do |config|
end
```

#### ignore_request
### ignore_request
The `ignore_request` configuration option can be used to tell Grover's middleware whether it should handle/modify
the response. It should be set with a `Proc` which accepts the request (Rack::Request) as a parameter.

Expand All @@ -398,6 +398,30 @@ Grover.configure do |config|
end
```

### allow_file_uris
The `allow_file_uris` option can be used to render an html document from the file system.
This should be used with *EXTREME CAUTION*. If used improperly it could potentially be manipulated to reveal
sensitive files on the system. Do not enable if rendering content from outside entities
(user uploads, external URLs, etc).

It defaults to `false` preventing local system files from being read.

```ruby
# config/initializers/grover.rb
Grover.configure do |config|
config.allow_file_uris = true
end
```

And used as such:
```ruby
# Grover.new accepts a file URI and optional parameters for Puppeteer
grover = Grover.new('file:///some/local/file.html', format: 'A4')

# Get an inline PDF of the local file
pdf = grover.to_pdf
```

## Cover pages

Since the header/footer for Puppeteer is configured globally, displaying of front/back cover
Expand Down
29 changes: 15 additions & 14 deletions lib/grover.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,40 +28,40 @@ class Grover
attr_reader :front_cover_path, :back_cover_path

#
# @param [String] url URL of the page to convert
# @param [String] uri URI of the page to convert
# @param [Hash] options Optional parameters to pass to PDF processor
# see https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.pdfoptions.md
# and https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.screenshotoptions.md
#
def initialize(url, **options)
@url = url.to_s
@options = OptionsBuilder.new(options, @url)
def initialize(uri, **options)
@uri = uri.to_s
@options = OptionsBuilder.new(options, @uri)
@root_path = @options.delete 'root_path'
@front_cover_path = @options.delete 'front_cover_path'
@back_cover_path = @options.delete 'back_cover_path'
end

#
# Request URL with provided options and create PDF
# Request URI with provided options and create PDF
#
# @param [String] path Optional path to write the PDF to
# @return [String] The resulting PDF data
#
def to_pdf(path = nil)
processor.convert :pdf, @url, normalized_options(path: path)
processor.convert :pdf, @uri, normalized_options(path: path)
end

#
# Request URL with provided options and render HTML
# Request URI with provided options and render HTML
#
# @return [String] The resulting HTML string
#
def to_html
processor.convert :content, @url, normalized_options(path: nil)
processor.convert :content, @uri, normalized_options(path: nil)
end

#
# Request URL with provided options and create screenshot
# Request URI with provided options and create screenshot
#
# @param [String] path Optional path to write the screenshot to
# @param [String] format Optional format of the screenshot
Expand All @@ -70,11 +70,11 @@ def to_html
def screenshot(path: nil, format: nil)
options = normalized_options(path: path)
options['type'] = format if %w[png jpeg].include? format
processor.convert :screenshot, @url, options
processor.convert :screenshot, @uri, options
end

#
# Request URL with provided options and create PNG
# Request URI with provided options and create PNG
#
# @param [String] path Optional path to write the screenshot to
# @return [String] The resulting PNG data
Expand All @@ -84,7 +84,7 @@ def to_png(path = nil)
end

#
# Request URL with provided options and create JPEG
# Request URI with provided options and create JPEG
#
# @param [String] path Optional path to write the screenshot to
# @return [String] The resulting JPEG data
Expand Down Expand Up @@ -116,10 +116,10 @@ def show_back_cover?
#
def inspect
format(
'#<%<class_name>s:0x%<object_id>p @url="%<url>s">',
'#<%<class_name>s:0x%<object_id>p @uri="%<uri>s">',
class_name: self.class.name,
object_id: object_id,
url: @url
uri: @uri
)
end

Expand Down Expand Up @@ -147,6 +147,7 @@ def processor
def normalized_options(path:)
normalized_options = Utils.normalize_object @options, excluding: ['extraHTTPHeaders']
normalized_options['path'] = path if path.is_a? ::String
normalized_options['allowFileUri'] = Grover.configuration.allow_file_uris == true
normalized_options
end
end
3 changes: 2 additions & 1 deletion lib/grover/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ class Grover
class Configuration
attr_accessor :options, :meta_tag_prefix, :ignore_path, :ignore_request,
:root_url, :use_pdf_middleware, :use_png_middleware,
:use_jpeg_middleware, :node_env_vars
:use_jpeg_middleware, :node_env_vars, :allow_file_uris

def initialize
@options = {}
Expand All @@ -19,6 +19,7 @@ def initialize
@use_png_middleware = false
@use_jpeg_middleware = false
@node_env_vars = {}
@allow_file_uris = false
end
end
end
1 change: 1 addition & 0 deletions lib/grover/errors.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ def self.const_missing(name)
const_set name, Class.new(Error)
end
end
UnsafeConfigurationError = Class.new(Error)
end
10 changes: 6 additions & 4 deletions lib/grover/js/processor.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ const fs = require('fs');
const os = require('os');
const path = require('path');

const _processPage = (async (convertAction, urlOrHtml, options) => {
const _processPage = (async (convertAction, uriOrHtml, options) => {
let browser, page, errors = [], tmpDir, wsConnection = false;

try {
Expand Down Expand Up @@ -173,10 +173,12 @@ const _processPage = (async (convertAction, urlOrHtml, options) => {
}

const waitUntil = options.waitUntil; delete options.waitUntil;
if (urlOrHtml.match(/^http/i)) {
const allowFileUri = options.allowFileUri; delete options.allowFileUri;
const uriRegex = allowFileUri ? /^(https?|file):\/\//i : /^https?:\/\//i;
if (uriOrHtml.match(uriRegex)) {
// Request is for a URL, so request it
requestOptions.waitUntil = waitUntil || 'networkidle2';
await page.goto(urlOrHtml, requestOptions);
await page.goto(uriOrHtml, requestOptions);
} else {
// Request is some HTML content. Use request interception to assign the body
requestOptions.waitUntil = waitUntil || 'networkidle0';
Expand All @@ -188,7 +190,7 @@ const _processPage = (async (convertAction, urlOrHtml, options) => {
request.continue();
else {
htmlIntercepted = true
request.respond({ body: urlOrHtml === '' ? ' ' : urlOrHtml });
request.respond({ body: uriOrHtml === '' ? ' ' : uriOrHtml });
}
});
const displayUrl = options.displayUrl; delete options.displayUrl;
Expand Down
13 changes: 12 additions & 1 deletion lib/grover/middleware.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,10 @@ def _call(env)
@request = Rack::Request.new(env)
identify_request_type

configure_env_for_grover_request(env) if grover_request?
if grover_request?
check_file_uri_configuration
configure_env_for_grover_request(env)
end
status, headers, response = @app.call(env)
response = update_response response, headers if grover_request? && html_content?(headers)

Expand All @@ -45,6 +48,14 @@ def _call(env)

attr_reader :pdf_request, :png_request, :jpeg_request

def check_file_uri_configuration
return unless Grover.configuration.allow_file_uris

# The combination of middleware and allowing file URLs is exceptionally
# unsafe as it can lead to data exfiltration from the host system.
raise UnsafeConfigurationError, 'using `allow_file_uris` configuration with middleware is exceptionally unsafe'
end

def identify_request_type
@pdf_request = Grover.configuration.use_pdf_middleware && path_matches?(PDF_REGEX)
@png_request = Grover.configuration.use_png_middleware && path_matches?(PNG_REGEX)
Expand Down
12 changes: 6 additions & 6 deletions lib/grover/options_builder.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ class Grover
# Build options from Grover.configuration, meta_options, and passed-in options
#
class OptionsBuilder < Hash
def initialize(options, url)
def initialize(options, uri)
super()
@url = url
@uri = uri
combined = grover_configuration
Utils.deep_merge! combined, Utils.deep_stringify_keys(options)
Utils.deep_merge! combined, meta_options unless url_source?
Utils.deep_merge! combined, meta_options unless uri_source?

update OptionsFixer.new(combined).run
end
Expand Down Expand Up @@ -41,11 +41,11 @@ def meta_options
end

def meta_tags
Nokogiri::HTML(@url).xpath('//meta')
Nokogiri::HTML(@uri).xpath('//meta')
end

def url_source?
@url.match(/\Ahttp/i)
def uri_source?
@uri.match?(%r{\A(https?|file)://}i)
end
end
end
6 changes: 6 additions & 0 deletions spec/fixtures/test.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<!DOCTYPE html>
<html>
<body>
<h1>Hello World!</h1>
</body>
</html>
71 changes: 68 additions & 3 deletions spec/grover/middleware_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,21 @@
expect(last_response.body.bytesize).to eq response_size
expect(last_response.headers['Content-Length']).to eq response_size.to_s
end

context 'when `allow_file_uris` configuration option is set' do
before { allow(Grover.configuration).to receive(:allow_file_uris).and_return true }

it 'raises an `UnsafeConfigurationError`' do
expect do
get 'http://www.example.org/test.PDF'
end.to(
raise_error(
Grover::UnsafeConfigurationError,
'using `allow_file_uris` configuration with middleware is exceptionally unsafe'
)
)
end
end
end

context 'when requesting a PNG' do
Expand All @@ -72,6 +87,21 @@
expect(last_response.body.bytesize).to eq response_size
expect(last_response.headers['Content-Length']).to eq response_size.to_s
end

context 'when `allow_file_uris` configuration option is set' do
before { allow(Grover.configuration).to receive(:allow_file_uris).and_return true }

it 'raises an `UnsafeConfigurationError`' do
expect do
get 'http://www.example.org/test.PNG'
end.to(
raise_error(
Grover::UnsafeConfigurationError,
'using `allow_file_uris` configuration with middleware is exceptionally unsafe'
)
)
end
end
end

context 'when requesting a JPEG' do
Expand Down Expand Up @@ -100,6 +130,21 @@
expect(last_response.body.bytesize).to eq response_size
expect(last_response.headers['Content-Length']).to eq response_size.to_s
end

context 'when `allow_file_uris` configuration option is set' do
before { allow(Grover.configuration).to receive(:allow_file_uris).and_return true }

it 'raises an `UnsafeConfigurationError`' do
expect do
get 'http://www.example.org/test.JPG'
end.to(
raise_error(
Grover::UnsafeConfigurationError,
'using `allow_file_uris` configuration with middleware is exceptionally unsafe'
)
)
end
end
end

context 'when request doesnt have an extension' do
Expand All @@ -109,6 +154,17 @@
expect(last_response.body).to eq 'Grover McGroveryface'
expect(last_response.headers['Content-Length']).to eq '20'
end

context 'when `allow_file_uris` configuration option is set' do
before { allow(Grover.configuration).to receive(:allow_file_uris).and_return true }

it 'returns the downstream content and content type' do
get 'http://www.example.org/test'
expect(last_response.headers['Content-Type']).to eq 'text/html'
expect(last_response.body).to eq 'Grover McGroveryface'
expect(last_response.headers['Content-Length']).to eq '20'
end
end
end

context 'when request has a non-PDF/PNG/JPEG extension' do
Expand Down Expand Up @@ -541,7 +597,10 @@
and_return(grover)
)
allow(grover).to receive(:to_pdf).with(no_args).and_return 'A converted PDF'
expect(Grover).to receive(:new).with('Grover McGroveryface', display_url: 'http://www.example.org/test')
expect(Grover).to(
receive(:new).
with('Grover McGroveryface', display_url: 'http://www.example.org/test')
)
expect(grover).to receive(:to_pdf).with(no_args)
get 'http://www.example.org/test.pdf'
expect(last_response.body).to eq 'A converted PDF'
Expand Down Expand Up @@ -596,7 +655,10 @@
and_return(grover)
)
allow(grover).to receive(:to_png).with(no_args).and_return 'A converted PNG'
expect(Grover).to receive(:new).with('Grover McGroveryface', display_url: 'http://www.example.org/test')
expect(Grover).to(
receive(:new).
with('Grover McGroveryface', display_url: 'http://www.example.org/test')
)
expect(grover).to receive(:to_png).with(no_args)
get 'http://www.example.org/test.png'
expect(last_response.body).to eq 'A converted PNG'
Expand All @@ -623,7 +685,10 @@
and_return(grover)
)
allow(grover).to receive(:to_jpeg).with(no_args).and_return 'A converted JPEG'
expect(Grover).to receive(:new).with('Grover McGroveryface', display_url: 'http://www.example.org/test')
expect(Grover).to(
receive(:new).
with('Grover McGroveryface', display_url: 'http://www.example.org/test')
)
expect(grover).to receive(:to_jpeg).with(no_args)
get 'http://www.example.org/test.jpeg'
expect(last_response.body).to eq 'A converted JPEG'
Expand Down
Loading
Loading