Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape sequences in string arguments - unicode and others #4721

Closed
Eli-Siegel1 opened this issue Dec 4, 2023 · 2 comments · Fixed by #4824
Closed

Escape sequences in string arguments - unicode and others #4721

Eli-Siegel1 opened this issue Dec 4, 2023 · 2 comments · Fixed by #4824

Comments

@Eli-Siegel1
Copy link

Describe the bug
handling unicode has some bugs and weird behaviors:

  1. parsing unicode with double escape chars. this makes it impossible to pass strings with the pattern "\uXXXX"
  2. any escape sequence, not only unicodes: string within triple quotes are still parsed the same as single quotes (possible ruby related?)

Versions

graphql version: 2.0.19
rails (or other framework): 6.0.4.7
other applicable versions (graphql-batch, etc): none

GraphQL schema

in both the examples below i used the following simple resolver.
note that the result is the rendered result to the client, so there are escape characters added.
i also printed the string passed to the resolver char by char to make sure i'm not making mistakes:

class Resolvers::GetString < GraphQL::Schema::Resolver
  argument :string, String, "the name to return"
  type String, null: false

  def resolve(**arguments)
    puts "***"
    puts "string.length: #{arguments[:string].length}"
    puts "string value: \"#{arguments[:string]}\""
    # printing the chars one by one
    i = 0
    arguments[:string].each_char do |c|
      puts "string[#{i}] = \"#{c}\""
      i += 1
    end
    puts "***"
    arguments[:string]
  end
end

class Types::QueryType < Types::BaseObject
  field :get_string, resolver: Resolvers::GetString, description: "Just return the given string"
end


class ApplicationSchema < GraphQL::Schema
  query Types::QueryType
end

GraphQL query

Example GraphQL query and response (if query execution is involved)

example for bug 1:

query bug1 {
  example1: get_string(string: "\u0064") # correct: should return "d"
  example2: get_string(string: "\\u0064") # incorrect: should return "\\u0064"
  # example3: get_string(string: "\u006") # correct: validation error
  example4: get_string(string: "\\u006") # correct: should return "\\u006"
}

the log:

***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 5
string value: "\u006"
string[0] = "\"
string[1] = "u"
string[2] = "0"
string[3] = "0"
string[4] = "6"
***

the result:

{
  "data": {
    "example1": "d",
    "example2": "d",
    "example4": "\\u006"
  }
}

example for bug 2

query bug2 {
  # example1: get_string(string: """\a""") # incorrect (fails): should be "\\a"
  # example2: get_string(string: """\u006""") # incorrect (fails): should be "\\u006"
  example3: get_string(string: """\n""") # incorrect: should be "\\n"
  example4: get_string(string: """\u0064""") # incorrect: should be "\\u0064"
}

the log:

***
string.length: 1
string value: "
"
string[0] = "
"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***

the result:

{
  "data": {
    "example3": "\n",
    "example4": "d"
  }
}

Steps to reproduce

Steps to reproduce the behavior

Expected behavior

parsing string arguments:

  1. "\\u0064" => "\u0064"
  2. """\u0064""" => "\u0064"

Actual behavior

parsing string arguments:

  1. "\\u0064" => "n"
  2. """\u0064""" => "n"
Click to view exception backtrace n/a

Additional context
I saw an old version changing the behavior of \\u back and forth, stating it was reverted because of failing other parsers test suite:
#372
it mentions a test suite in graphql-js in this url (it moved, this is the updated link)
note: in the tests, the string that are written are parsed by the language parser (js). so, for example, the string '\u{1234}' is actually passed to the parser as , while the string \\u{1234} is passed as the characters \, u, {, etc.

the tests there behave as i the expected behavior i described. examples:

  1. in this test the parser is given the characters: "unicode \u1234\u5678\u90AB\uCDEF", and it returns the characters: unicode ሴ噸邫췯
    expect(lexOne('"unicode \\u1234\\u5678\\u90AB\\uCDEF"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 34,
      value: 'unicode \u1234\u5678\u90AB\uCDEF',
    });

this works the same as ruby.

  1. in this test the parser is given the characters: "escaped \n\r\b\t\f" and returns escaped followed by newline, carriage return, etc.
    expect(lexOne('"escaped \\n\\r\\b\\t\\f"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 20,
      value: 'escaped \n\r\b\t\f',
    });

this works the same as ruby.

  1. in this test the parser is given the characters: """unescaped \n\r\b\t\f\u1234""" and returns the characters: unescaped \n\r\b\t\f\u1234
    expect(lexOne('"""unescaped \\n\\r\\b\\t\\f\\u1234"""')).to.contain({
      kind: TokenKind.BLOCK_STRING,
      start: 0,
      end: 32,
      line: 1,
      column: 1,
      value: 'unescaped \\n\\r\\b\\t\\f\\u1234',
    });

in ruby the chars are converted to special characters, same as 2 - which is a bug.

  1. to check the escape sequence of unicodes, i cloned the repo and added another local test.
    in the test i passed the characters: "unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE" and the result is the characters: unicode \u{1234}\u{5678}\u{90AB}\u{CDE. the unicode is not parsed, and the escape of the \ before the u works.
    expect(
      lexOne('"escaped unicode \\\\u{1234}\\\\u{5678}\\\\u{90AB}\\\\u{CDE"'),
    ).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 52,
      value: 'escaped unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE',
    });

in ruby the first 3 unicodes are parsed, and the last one is not. the escape of the \ before the u does not work.

@rmosolgo
Copy link
Owner

rmosolgo commented Dec 8, 2023

Hey! Thanks for the detailed writeup. Yes, as you noticed, there have been many attempts to match the spec on this 😅

I've created a replication script for this bug:

Replicate unicode escape bugs

require "bundler/inline"

gemfile do
  gem "graphql", "2.1.7"
end

class MySchema < GraphQL::Schema
  class QueryType < GraphQL::Schema::Object
    field :get_string, String do
      argument :string, String
    end

    def get_string(string:)
      puts "***"
      puts "string.length: #{string.length}"
      puts "string value: \"#{string}\""
      string.each_char.each_with_index do |c, i|
        puts "string[#{i}] = \"#{c}\""
      end
      puts "***"
      string
    end
  end

  query(QueryType)
end

query_str = File.read("./example1.graphql")
puts query_str
pp MySchema.execute(query_str).to_h

query_str = File.read("./example2.graphql")
puts query_str
pp MySchema.execute(query_str).to_h
# example1.graphql 
{
  example1: getString(string: "\u0064") # correct: should return "d"
  example2: getString(string: "\\u0064") # incorrect: should return "\\u0064"
  # example3: getString(string: "\u006") # correct: validation error
  example4: getString(string: "\\u006") # correct: should return "\\u006"
}
# example2.graphql 
query bug2 {
  # example1: getString(string: """\a""") # incorrect (fails): should be "\\a"
  # example2: getString(string: """\u006""") # incorrect (fails): should be "\\u006"
  example3: getString(string: """\n""") # incorrect: should be "\\n"
  example4: getString(string: """\u0064""") # incorrect: should be "\\u0064"
}

It seems like it boils down to two things:

  • In block strings, GraphQL-Ruby is interpreting \... as escapes, but it shouldn't.
  • In single-quoted strings, GraphQL-Ruby is interpreting \\u just like \u, but it shouldn't -- it should be a literal \u instead.

I've got an open PR to rewrite the lexer (#4718) and I'll dive in on that when I'm done there!

@rmosolgo
Copy link
Owner

rmosolgo commented Feb 5, 2024

I've got a fix in the works over at #4824. The most interesting part of the bug was basically that these two .gsub! calls resulted in a double-replacing of escaped characters:

raw_string.gsub!(ESCAPES, ESCAPES_REPLACE)
raw_string.gsub!(UTF_8) do |_matched_str|

They had to be merged into a single gsub! call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants