Escape sequences in string arguments - unicode and others #4721

Eli-Siegel1 · 2023-12-04T21:46:44Z

Describe the bug
handling unicode has some bugs and weird behaviors:

parsing unicode with double escape chars. this makes it impossible to pass strings with the pattern "\uXXXX"
any escape sequence, not only unicodes: string within triple quotes are still parsed the same as single quotes (possible ruby related?)

Versions

graphql version: 2.0.19
rails (or other framework): 6.0.4.7
other applicable versions (graphql-batch, etc): none

GraphQL schema

in both the examples below i used the following simple resolver.
note that the result is the rendered result to the client, so there are escape characters added.
i also printed the string passed to the resolver char by char to make sure i'm not making mistakes:

class Resolvers::GetString < GraphQL::Schema::Resolver
  argument :string, String, "the name to return"
  type String, null: false

  def resolve(**arguments)
    puts "***"
    puts "string.length: #{arguments[:string].length}"
    puts "string value: \"#{arguments[:string]}\""
    # printing the chars one by one
    i = 0
    arguments[:string].each_char do |c|
      puts "string[#{i}] = \"#{c}\""
      i += 1
    end
    puts "***"
    arguments[:string]
  end
end

class Types::QueryType < Types::BaseObject
  field :get_string, resolver: Resolvers::GetString, description: "Just return the given string"
end


class ApplicationSchema < GraphQL::Schema
  query Types::QueryType
end

GraphQL query

Example GraphQL query and response (if query execution is involved)

example for bug 1:

query bug1 {
  example1: get_string(string: "\u0064") # correct: should return "d"
  example2: get_string(string: "\\u0064") # incorrect: should return "\\u0064"
  # example3: get_string(string: "\u006") # correct: validation error
  example4: get_string(string: "\\u006") # correct: should return "\\u006"
}

the log:

***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 5
string value: "\u006"
string[0] = "\"
string[1] = "u"
string[2] = "0"
string[3] = "0"
string[4] = "6"
***

the result:

{
  "data": {
    "example1": "d",
    "example2": "d",
    "example4": "\\u006"
  }
}

example for bug 2

query bug2 {
  # example1: get_string(string: """\a""") # incorrect (fails): should be "\\a"
  # example2: get_string(string: """\u006""") # incorrect (fails): should be "\\u006"
  example3: get_string(string: """\n""") # incorrect: should be "\\n"
  example4: get_string(string: """\u0064""") # incorrect: should be "\\u0064"
}

the log:

***
string.length: 1
string value: "
"
string[0] = "
"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***

the result:

{
  "data": {
    "example3": "\n",
    "example4": "d"
  }
}

Steps to reproduce

Steps to reproduce the behavior

Expected behavior

parsing string arguments:

"\\u0064" => "\u0064"
"""\u0064""" => "\u0064"

Actual behavior

parsing string arguments:

"\\u0064" => "n"
"""\u0064""" => "n"

Click to view exception backtrace

n/a

Additional context
I saw an old version changing the behavior of \\u back and forth, stating it was reverted because of failing other parsers test suite:
#372
it mentions a test suite in graphql-js in this url (it moved, this is the updated link)
note: in the tests, the string that are written are parsed by the language parser (js). so, for example, the string '\u{1234}' is actually passed to the parser as ሴ, while the string \\u{1234} is passed as the characters \, u, {, etc.

the tests there behave as i the expected behavior i described. examples:

in this test the parser is given the characters: "unicode \u1234\u5678\u90AB\uCDEF", and it returns the characters: unicode ሴ噸邫췯

    expect(lexOne('"unicode \\u1234\\u5678\\u90AB\\uCDEF"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 34,
      value: 'unicode \u1234\u5678\u90AB\uCDEF',
    });

this works the same as ruby.

in this test the parser is given the characters: "escaped \n\r\b\t\f" and returns escaped followed by newline, carriage return, etc.

    expect(lexOne('"escaped \\n\\r\\b\\t\\f"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 20,
      value: 'escaped \n\r\b\t\f',
    });

this works the same as ruby.

in this test the parser is given the characters: """unescaped \n\r\b\t\f\u1234""" and returns the characters: unescaped \n\r\b\t\f\u1234

    expect(lexOne('"""unescaped \\n\\r\\b\\t\\f\\u1234"""')).to.contain({
      kind: TokenKind.BLOCK_STRING,
      start: 0,
      end: 32,
      line: 1,
      column: 1,
      value: 'unescaped \\n\\r\\b\\t\\f\\u1234',
    });

in ruby the chars are converted to special characters, same as 2 - which is a bug.

to check the escape sequence of unicodes, i cloned the repo and added another local test.
in the test i passed the characters: "unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE" and the result is the characters: unicode \u{1234}\u{5678}\u{90AB}\u{CDE. the unicode is not parsed, and the escape of the \ before the u works.

    expect(
      lexOne('"escaped unicode \\\\u{1234}\\\\u{5678}\\\\u{90AB}\\\\u{CDE"'),
    ).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 52,
      value: 'escaped unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE',
    });

in ruby the first 3 unicodes are parsed, and the last one is not. the escape of the \ before the u does not work.

The text was updated successfully, but these errors were encountered:

rmosolgo · 2023-12-08T16:03:16Z

Hey! Thanks for the detailed writeup. Yes, as you noticed, there have been many attempts to match the spec on this 😅

I've created a replication script for this bug:

Replicate unicode escape bugs

require "bundler/inline"

gemfile do
  gem "graphql", "2.1.7"
end

class MySchema < GraphQL::Schema
  class QueryType < GraphQL::Schema::Object
    field :get_string, String do
      argument :string, String
    end

    def get_string(string:)
      puts "***"
      puts "string.length: #{string.length}"
      puts "string value: \"#{string}\""
      string.each_char.each_with_index do |c, i|
        puts "string[#{i}] = \"#{c}\""
      end
      puts "***"
      string
    end
  end

  query(QueryType)
end

query_str = File.read("./example1.graphql")
puts query_str
pp MySchema.execute(query_str).to_h

query_str = File.read("./example2.graphql")
puts query_str
pp MySchema.execute(query_str).to_h

# example1.graphql 
{
  example1: getString(string: "\u0064") # correct: should return "d"
  example2: getString(string: "\\u0064") # incorrect: should return "\\u0064"
  # example3: getString(string: "\u006") # correct: validation error
  example4: getString(string: "\\u006") # correct: should return "\\u006"
}

# example2.graphql 
query bug2 {
  # example1: getString(string: """\a""") # incorrect (fails): should be "\\a"
  # example2: getString(string: """\u006""") # incorrect (fails): should be "\\u006"
  example3: getString(string: """\n""") # incorrect: should be "\\n"
  example4: getString(string: """\u0064""") # incorrect: should be "\\u0064"
}

It seems like it boils down to two things:

In block strings, GraphQL-Ruby is interpreting \... as escapes, but it shouldn't.
In single-quoted strings, GraphQL-Ruby is interpreting \\u just like \u, but it shouldn't -- it should be a literal \u instead.

I've got an open PR to rewrite the lexer (#4718) and I'll dive in on that when I'm done there!

rmosolgo · 2024-02-05T23:12:36Z

I've got a fix in the works over at #4824. The most interesting part of the bug was basically that these two .gsub! calls resulted in a double-replacing of escaped characters:

graphql-ruby/lib/graphql/language/lexer.rb

Lines 302 to 303 in 18f3ded

    
           raw_string.gsub!(ESCAPES, ESCAPES_REPLACE) 
        
           raw_string.gsub!(UTF_8) do |_matched_str|

They had to be merged into a single gsub! call.

rmosolgo mentioned this issue Feb 5, 2024

Fix escapes in string literals #4824

Merged

3 tasks

rmosolgo closed this as completed in #4824 Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape sequences in string arguments - unicode and others #4721

Escape sequences in string arguments - unicode and others #4721

Eli-Siegel1 commented Dec 4, 2023

rmosolgo commented Dec 8, 2023 •

edited

Loading

rmosolgo commented Feb 5, 2024

Escape sequences in string arguments - unicode and others #4721

Escape sequences in string arguments - unicode and others #4721

Comments

Eli-Siegel1 commented Dec 4, 2023

rmosolgo commented Dec 8, 2023 • edited Loading

rmosolgo commented Feb 5, 2024

rmosolgo commented Dec 8, 2023 •

edited

Loading