bytes.to_s(encoding)
should validate the passed encoding
#1090
Labels
bytes.to_s(encoding)
should validate the passed encoding
#1090
Related:
.to_s(encoding)
: make encoding constant #1051Cc @GreyCat
In kaitai-io/kaitai_struct_compiler#254, a precompile step
CanonicalizeEncodingNames
was added. It validates everyencoding
YAML key, issues a warning if it specifies a non-canonical encoding and converts it to a canonical one if possible.However, this is currently only done for the
encoding
key, not the argument of thebytes.to_s(encoding)
method. For example, this compiles with KSC at commit6aef6fae
without any warnings despite using non-canonicalutf8
encoding (canonical isUTF-8
):Checking the argument of
bytes.to_s(encoding)
method at compile time is always possible, because it was decided in #1051 that it will accept only string literals.A real problem stemming from the lack of validation and canonicalization of the encoding name is that the unchecked encoding name will be reflected in the generated code exactly as written in the .ksy spec:
m_v = kaitai::kstream::bytes_to_str(buf(), "utf8");
However, such non-canonical encodings might not work in some target languages. In particular the new Win32 API-based
bytes_to_str
implementation in the C++/STL runtime (STRING_ENCODING_TYPE = "WIN32API"
) will not accept such non-canonical spellings, because it specifically expects only the standardized names (kaitai-io/kaitai_struct_cpp_stl_runtime#61):So
encoding: utf8
would work (though with a warning), butbytes.to_s('utf8')
would not.The text was updated successfully, but these errors were encountered: