Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API proposal: custom interpolated string handler #2136

Open
mgravell opened this issue Dec 14, 2024 · 20 comments
Open

API proposal: custom interpolated string handler #2136

mgravell opened this issue Dec 14, 2024 · 20 comments
Assignees

Comments

@mgravell
Copy link
Member

mgravell commented Dec 14, 2024

Right now, the preferred way of passing args to Dapper requires a second parameter, for example:

string name = ...
int id = ...
conn.Execute("""
    update customer
    set name = @name
    where id = @id
    """, new { name, id });

This works; Dapper (vanilla) has code to emit custom per-type code to extract the parameters, and DapperAOT has additional code to pre-gen that as AOT and better validation (mismatched args etc). Additional per-value parameter settings are awkward, but overall: it works.

However!

There is also a possibility to use a custom "interpolated string handler". I have a fully working prototype that allows the following:

string name = ...
int id = ...
conn.Execute($"""
    update customer
    set name = @{name}
    where id = @{id}
    """);

This is not a string, and is zero alloc, fully parameterized (SQLi safe), etc. Note that the leading @ (or : etc) is primarily because ADO.NET does not directly expose the parameter token of a given connection, but IMO it helps make it very clear what is going on.

Under the hood, this emits fundamentally the same SQL, even using the argument-expression feature to generate sensible parameter names where possible (name and id in this case). Additionally, from .NET 9 the "alt-lookup" feature of dictionaries is used to avoid allocating a new string per usage. There is zero runtime ref-emit etc needed for packing parameters - we basically trick the C# compiler into doing that work for us!

We could also potentially use the optional format parameters to convey other information, for example:l {name:1000} could set the .Size to 1000, and {qty:P=5,S=3} could set the .Precision and .Scale.

Genuine question: is this an improvement? Is this worth adding new overloads of some core methods? Is this technically nice but not worth the mental addition? Or is this hell-yeah-lets-do-this?


Separately, an analyzer in AOT is proposed to spot interpolated string uses that are susceptible to SQLi (i.e. the type is string); I would also propose that we start shipping Dapper.Advisor inside Dapper, to light up all those checks by default.

@NielsPilgaard
Copy link

Genuine question: is this an improvement? Is this worth adding new overloads of some core methods? Is this technically nice but not worth the mental addition? Or is this hell-yeah-lets-do-this?

In my opinion this is hell-yeah-lets-do-this 😄
I think this would improve Dapper adoption, as the getting started experience would be even easier, and more intuitive.

...I would also propose that we start shipping Dapper.Advisor inside Dapper, to light up all those checks by default.

I think that is a good idea,

@fdonnet
Copy link

fdonnet commented Dec 14, 2024

I use
var p = new DynamicParameters();
to have my args at the top of the Sql string, and pass "p" in the dapper command.

But yes, i will change for your new proposition.

@mgravell
Copy link
Member Author

mgravell commented Dec 14, 2024

I use var p = new DynamicParameters();
to have my args at the top of the Sql string, and pass "p" in the dapper command.

DynamicParameters is literally the worst option available; more expensive, and impossible to verify in an analyzer. I'd love to understand why you're using that option. If it is "because nobody said not to", then: that's really useful feedback that can help us prioritize docs and guidance. The main intended use-case for DynamicParameters is when the SQL is constructed on-the-fly and thus no fixed parameters make sense.

@mika76
Copy link

mika76 commented Dec 14, 2024

The main intended use-case for DynamicParameters is when the SQL is constructed on-the-fly and thus no fixed parameters make sense.

Don't forget output params too :) I have lots of those.

@mgravell
Copy link
Member Author

Don't forget output params too :) I have lots of those.

Yep, absolutely.

@fdonnet
Copy link

fdonnet commented Dec 14, 2024

DynamicParameters is literally the worst option available; more expensive, and impossible to verify in an analyzer. I'd love to understand why you're using that option. If it is "because nobody said not to", then: that's really useful feedback that can help us prioritize docs and guidance. The main intended use-case for DynamicParameters is when the SQL is constructed on-the-fly and thus no fixed parameters make sense.

" If it is "because nobody said not to" => Yeah, kind of, even worse:

That's good I said that here, because I use Dapper for a long time... And since I begun to use it (before async was part of the lib), I read some random tutorials. I use this Dynamic parametter thing to pass params even in a small generator I use to implement basic CRUD for small projects.

I never changed since then... never to late, to know that I was doing it wrong.

I will use the normal way and the new way if you implement it. :)

EDIT: maybe it was at the begining, I don't remember if we can get the inserted "id" back without an output param... and I used that way as a standard for all the stuff...

EDIT2: or maybe because I used a lot of sp in the past...

@albyrock87
Copy link

The main problem is that someone may already have used $"..." syntax to format multiple pieces of a query into one.

So when introducing this awesome API I would evaluate using a different signature or at least be very clear that this is a breaking change.

@JulianRooze
Copy link
Contributor

One thing to consider is that Npgsql considers named parameters to be legacy as they require runtime processing (conversion into positional). Using positional parameters ($1, $2, etc) is currently impossible (AFAIK) in Dapper, so if this proposal could also address this that would be great 👍

@mgravell
Copy link
Member Author

mgravell commented Dec 15, 2024 via email

@roji
Copy link

roji commented Dec 16, 2024

Hey @mgravell 👋

Yeah, interpolated strings can indeed provide a really nice experience for parameterizing SQL. In EF we do this too for APIs that accept SQL, though we (for now) only use FormattableString, rather than custom interpolated string handlers.

The one thing I'd recommend, is to think long and hard on the consequences of having two method overloads with the same name (Execute), one accepting a regular string and another an interpolated one... EF has had a rocky history here, where we initially did exactly that, and then ended up doing a breaking change to split them to two names, to make it 100% clear which kind of interpolation is taking place (see dotnet/efcore#10996). For example, an unsuspecting user can start off with the safe, interpolated version:

conn.Execute($"""
    update customer
    set name = @{name}
    where id = @{id}
    """);

... and then extract the string out for some reason:

var sql = $"""
    update customer
    set name = @{name}
    where id = @{id}
    """;
conn.Execute(sql);

This looks OK, but while the 1st fragment resolves to a FormattableString-based overload (that's safe against SQL injection), the 2nd one isn't. As this is all closely related to security, I'd definitely err on the conservative side here.

FWIW we ended up with the plainly-named, "default" FromSql being the FormattableString overload, and FromSqlRaw being the overload that accepts a string (if I had my way it would be called FromSqlUnsafe or FromSqlDangerous). This way, the unsuspecting user likely ends up just using FromSql, and gets SQL injection safety by default; the downside is that they have to use a $-string even when they don't need to interpolate anything, otherwise things don't compile, which is a bit odd.

See dotnet/efcore#10996 for lots of EF-side discussion on this.

@albyrock87
Copy link

That's exactly what I was trying to mention above.

Instead of changing the Dapper method names, I would suggest something like:

connection
    .FromSqlFormat($"...")
    .QueryFirstOrDefault<Animal>();

@mgravell
Copy link
Member Author

mgravell commented Dec 16, 2024

@albyrock87 that split between command and execution is something that DapperAOT already uses behind the scenes (not as the primary API), with a Command<T> that represents this intermediate layer, with that having all the Execute, Query<T> etc methods. Something similar here would make a decent amount of sense.

Perhaps, then, with the AOT library in scope (whether or not AOT is being used):

public static class DapperAotExtensions
{
    public static Command InterpolatedCommand(this DbConnection connection, ref SqlBuilder sql,
        int timeout = 0);
    public static Command InterpolatedCommand(this DbTransaction transaction, ref SqlBuilder sql,
        int timeout = 0);
}
public struct InterpolatedArgs { } // some wrapper around all the thing we captured
public struct Command // new API similar to Command<T> but for the "no args, or args already supplied via interpolation" scenario
{
    public int Execute(...); // etc - like Command<TArgs>, but minus the TArgs parameter
}

which would allow:

var animal = connection
    .InterpolatedCommand($"""
        select *
        from Animals
        where Genus = @{genus}
        and Region = @{region}
    """)
    .QueryFirstOrDefault<Animal>();

As for the name; naming is hard. Just thinking aloud.

And to be explicit, there would not be a string overload, so this would fail to compile:

string sql = $"""
        select *
        from Animals
        where Genus = @{genus}
        and Region = @{region}
    """;

var animal = connection
    .InterpolatedCommand(sql)
    .QueryFirstOrDefault<Animal>();

I think this is moving in the right direction, adds safety, avoids confusion and avoids adding 20+ methods to the public API.

@mgravell
Copy link
Member Author

@roji great input as always, thanks; as an aside: if EF wants, note that custom interpolated string handlers take overload priority over both string and FormattableString, so if EF wanted to, you could add your own lower-alloc more specialized implementation:

using System.Runtime.CompilerServices;

var obj = new Foo();

int id = 42;
obj.Bar($"""
    select 1
    from Orders
    where Id={id}
    """);
// output: "The new thing"

public class Foo
{
    public void Bar(FormattableString value) => Console.WriteLine("The old thing");
    public void Bar(ref CustomInterpolatedStringHandler value) => Console.WriteLine("The new thing");
}

[InterpolatedStringHandler]
public ref struct CustomInterpolatedStringHandler(int x, int y)
{
    public void AppendLiteral(string value) { }
    public void AppendFormatted<T>(T value) { }
}

@albyrock87
Copy link

albyrock87 commented Dec 16, 2024

@mgravell to be honest I was thinking about CreateCommandInterpolated too so it's intuitive.. but then I looked for something shorter.

Also, unless you plan to move timeout and other command-related switches, it's strange to name it CreateCommand when it only contains the SQL and parameters.

Naming things is so hard 🤣

@mgravell
Copy link
Member Author

mgravell commented Dec 16, 2024

Also, unless you plan to move timeout and other command-related switches, it's strange to name it CreateCommand when it only contains the SQL and parameters.

You're right, that would be an optional on the end of the CreateNamingIsHard(...) method; Command<> already captures timeout. Clarified, including renaming to InterpretedCommand for parity with the existing connection.Command(...) API.

@mgravell
Copy link
Member Author

I do wonder whether .FromSql or .FromFormatted is better than .InterpretedCommand though...

@mgravell
Copy link
Member Author

mgravell commented Dec 16, 2024

Side note: we would need to update to code-gen to consider use of the AOT API in addition to the vanilla Dapper API; that's fine, we should do that anyway! All that means is that under the bonnet when you do:

var animal = connection
    .InterpolatedCommand(sql)
    .QueryFirstOrDefault<Animal>();

behind the scenes if possible we emit a custom parser for reading Animal and we use "interceptors" to rewrite your .QueryFirstOrDefault<Animal>(); to a .QueryFirstOrDefault<Animal>(rowFactory: GeneratedAnimalRowFactory.Instance);

@mgravell
Copy link
Member Author

Another advantage of using a completely separate API is that it avoids a current compiler tendency to prefer string versions when all the formatted tokens are string and const; see this example for more context, and the corresponding Roslyn ticket

@roji
Copy link

roji commented Dec 16, 2024

as an aside: if EF wants, note that custom interpolated string handlers take overload priority over both string and FormattableString, so if EF wanted to, you could add your own lower-alloc more specialized implementation:

Yeah, there's an issue somewhere tracking looking at [InterpolatedStringHandler]... It has never bubbled up in terms of importance... In terms of overload resolution, yeah, custom interpolated string handler over Formattable string is fine (both would be conceptually the same API, with one just being a more efficient variant) - the string overload I'd keep very separate no matter what, for the reasons above.

@nRoger-Env
Copy link

I spotted 2 libraries that already does that with Dapper:
https://github.com/Drizin/InterpolatedSql/tree/main/src/InterpolatedSql.Dapper
and
https://github.com/mishael-o/Dapper.SimpleSqlBuilder
It might worth checking them out before you decide to fix your API.

IMO it would be a great addition to Dapper. Just make sure it stays very simple to use because this is the reason why we use Dapper in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants