Skip to content

Commit

Permalink
Merge pull request #3390 from Earlopain/parser-translator-binary-enco…
Browse files Browse the repository at this point in the history
…ding

Fix binary encoding for parser translator
  • Loading branch information
kddnewton authored Jan 12, 2025
2 parents 4204d07 + fa0154d commit 01cbec9
Show file tree
Hide file tree
Showing 8 changed files with 119 additions and 3 deletions.
20 changes: 17 additions & 3 deletions lib/prism/translation/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def parse(source_buffer)
source = source_buffer.source

offset_cache = build_offset_cache(source)
result = unwrap(Prism.parse(source, filepath: source_buffer.name, version: convert_for_prism(version), partial_script: true, encoding: false), offset_cache)
result = unwrap(Prism.parse(source, **prism_options), offset_cache)

build_ast(result.value, offset_cache)
ensure
Expand All @@ -64,7 +64,7 @@ def parse_with_comments(source_buffer)
source = source_buffer.source

offset_cache = build_offset_cache(source)
result = unwrap(Prism.parse(source, filepath: source_buffer.name, version: convert_for_prism(version), partial_script: true, encoding: false), offset_cache)
result = unwrap(Prism.parse(source, **prism_options), offset_cache)

[
build_ast(result.value, offset_cache),
Expand All @@ -83,7 +83,7 @@ def tokenize(source_buffer, recover = false)
offset_cache = build_offset_cache(source)
result =
begin
unwrap(Prism.parse_lex(source, filepath: source_buffer.name, version: convert_for_prism(version), partial_script: true, encoding: false), offset_cache)
unwrap(Prism.parse_lex(source, **prism_options), offset_cache)
rescue ::Parser::SyntaxError
raise if !recover
end
Expand Down Expand Up @@ -285,6 +285,20 @@ def build_range(location, offset_cache)
)
end

# Options for how prism should parse/lex the source.
def prism_options
options = {
filepath: @source_buffer.name,
version: convert_for_prism(version),
partial_script: true,
}
# The parser gem always encodes to UTF-8, unless it is binary.
# https://github.com/whitequark/parser/blob/v3.3.6.0/lib/parser/source/buffer.rb#L80-L107
options[:encoding] = false if @source_buffer.source.encoding != Encoding::BINARY

options
end

# Converts the version format handled by Parser to the format handled by Prism.
def convert_for_prism(version)
case version
Expand Down
9 changes: 9 additions & 0 deletions test/prism/fixtures/encoding_binary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# encoding: binary

"\xcd"

:"\xcd"

/#{"\xcd"}/

%W[\xC0]
6 changes: 6 additions & 0 deletions test/prism/fixtures/encoding_euc_jp.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# encoding: euc-jp

# \x8E indicates a double-byte character, \x01 is not a valid second byte in euc-jp
"\x8E\x01"

%W["\x8E\x01"]
12 changes: 12 additions & 0 deletions test/prism/ruby/parser_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,18 @@
# First, opt in to every AST feature.
Parser::Builders::Default.modernize

# The parser gem rejects some strings that would most likely lead to errors
# in consumers due to encoding problems. RuboCop however monkey-patches this
# method out in order to accept such code.
# https://github.com/whitequark/parser/blob/v3.3.6.0/lib/parser/builders/default.rb#L2289-L2295
Parser::Builders::Default.prepend(
Module.new {
def string_value(token)
value(token)
end
}
)

# Modify the source map == check so that it doesn't check against the node
# itself so we don't get into a recursive loop.
Parser::Source::Map.prepend(
Expand Down
1 change: 1 addition & 0 deletions test/prism/ruby/ruby_parser_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ def ==(other)
module Prism
class RubyParserTest < TestCase
todos = [
"encoding_euc_jp.txt",
"newline_terminated.txt",
"regex_char_width.txt",
"seattlerb/bug169.txt",
Expand Down
49 changes: 49 additions & 0 deletions test/prism/snapshots/encoding_binary.txt

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 24 additions & 0 deletions test/prism/snapshots/encoding_euc_jp.txt

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions test/prism/snippets_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
module Prism
class SnippetsTest < TestCase
except = [
"encoding_binary.txt",
"newline_terminated.txt",
"seattlerb/begin_rescue_else_ensure_no_bodies.txt",
"seattlerb/case_in.txt",
Expand Down

0 comments on commit 01cbec9

Please sign in to comment.