Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the permitted ways to make prepared statement queries or inserts? Ie. To avoid SQL injection? Is it just JDBC? Postgres? #1829

Open
jonmdev opened this issue Nov 19, 2024 · 12 comments
Assignees
Labels
question Further information is requested
Milestone

Comments

@jonmdev
Copy link

jonmdev commented Nov 19, 2024

Why prepared statements are needed

The documentation links to this practice document for SQL safety:

https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html

It states:

Defense Option 4: STRONGLY DISCOURAGED: Escaping All User-Supplied Input

In this approach, the developer will escape all user input before putting it in a query. It is very database specific in its implementation. This methodology is frail compared to other defenses and we CANNOT guarantee that this option will prevent all SQL injections in all situations.

If an application is built from scratch or requires low risk tolerance, it should be built or re-written using parameterized queries, stored procedures, or some kind of Object Relational Mapper (ORM) that builds your queries for you.

In other words, we absolutely need prepared statements, if users are writing their own inputs to our queries or inserts.

Postgres?

I thought this must be possible by Postgres connection as this allows prepared statements. I managed to get a Postgres connection working by Rust tonight. But then I came to understand, as per the documentation:

Particularly, ArcadeDB does only support "simple" query mode and does not support SSL!

Simple query mode is no different than HTTP query. For example, in Rust with Postgres this is the same format of an HTTP Request (plain string, which is unsafe when you may have user written inputs):

    let query_result = client.simple_query("SELECT * FROM Customer"); //hypothetically, just to show syntax

Other practices like whitelisting input are not possible when users can type their own text inputs such as free entry text fields. Limiting characters to a-z,A-Z, 0-9 is also not possible when you have global users inputting every variety of unicode or emojis or punctuation.

JDBC?

Is JDBC then the only way to perform prepared statements with Arcade? If so I think this should be better documented in the documentation.

Say from a Rust or Elixir server, which is interfacing to Arcade installed and running on a separate Ubuntu server - I can get a JDBC system working under my Rust server. Is that then my only choice to connect and query/insert to Arcade from there safely?

Whatever the answer, I think the options for prepared statements should be more clearly enumerated in the documentation, given how critical this is.

Are the options (whatever they are) ever likely to change as well? Thanks for any thoughts or clarification.

@gramian
Copy link
Collaborator

gramian commented Nov 19, 2024

The HTTP API can pass parameters into queries and commands. Also, the Java API allows passing parameters. I guess, in both cases these parameterized queries can easily be encapsulated into functions (or objects) making them prepared statements.

@jonmdev
Copy link
Author

jonmdev commented Nov 19, 2024

The HTTP API can pass parameters into queries and commands.

I don't see any examples of anything that could protect against SQL injection there? My understanding is that to protect against SQL injection you need to prepare and finalize the query function the database will perform in terms of its operations and constraints (ie. INSERT INTO BUCKET:Customer_Europe CONTENT $1) separately from the variables ($1 = { firstName: 'Enzo', lastName: 'Ferrari' }) so that the variables that go into it can't alter the operations that can be performed (ie. even if the user sneaks in DROP with malicious character escaping, there will be no capacity for this operation to be performed, because it was not included in the preparation of the statement/function).

Also, the Java API allows passing parameters.

How does that work? I have read those Java snippets since a year ago and I still don't understand how they could be used.

If you have:

  1. ArcadeDB installed on Ubuntu server 1.
  2. Rust/Elixir installed on Ubuntu server 2.
  3. Users connect to Rust/Elixir server 2.

How do you use the Java API? The only way I see is by the Rust server connecting to the Arcade server by JDBC. Which then again, means the only way to protect against SQL injection (which is really a critical thing we all need to do unless there is zero user written input) is by JDBC connection. And this is not commonly or clearly available.

Why not at least Postgres too? Given there is more availability? It took me only a few hours (yeah, "only" 😬) to get Rust connected by Postgres. I was going to add it to the examples, but it now seems like a pointless method as it is not remotely safe.

Or if we must do it by JDBC, can we get some better clarity on how?

I see for example many JDBC drivers:

https://www.codejava.net/java-se/jdbc/jdbc-driver-library-download

I can wrap one into Rust, for example with this: https://lib.rs/crates/jni

But I'm not sure what JDBC version I am supposed to be using. For something so critical, I don't think this is well explained. I don't get it. I used to think I just needed to come back to it later, and it would make sense eventually. But now I actually need it running. And now I have done HTTP requests in multiple languages (I posted several of the examples on the documentation), tonight I succeeded with Postgres requests. But I am still no closer.

Any further thoughts? @lvca any thoughts as well?

Thanks as always for any help. 👍

@gramian
Copy link
Collaborator

gramian commented Nov 19, 2024

I don't think I can solve your problem(s), but here are some remarks that may hopefully help:

  • Above, I meant, for example, using a function to wrap the statement and after asserting the wrapping function's arguments, passing those as parameters into the wrapped statement.
  • With the query endpoint or method only read operations can be performed, altering operation are only admissible via the command endpoint or method.
  • The query language sql allows one and only one statement, so if you constrain your user to sql they cannot inject an additional statement.
  • Disallow and/or filter whitespace from inputs.
  • There maybe the corner case of an injectible sub-query or local LET

@lvca
Copy link
Contributor

lvca commented Nov 19, 2024

Our postgres driver implementation supports prepared statements. Look at:

final int paramFormatCount = channel.readShort();
. Postgres protocol has the "BIND" message where parameters are passed and then executed with ArcadeDB in a safe manner..

@jonmdev have you tried using parameters with the Postgres driver?

@lvca lvca added the question Further information is requested label Nov 19, 2024
@lvca lvca added this to the 24.11.2 milestone Nov 19, 2024
@jonmdev
Copy link
Author

jonmdev commented Nov 19, 2024

Above, I meant, for example, using a function to wrap the statement and after asserting the wrapping function's arguments, passing those as parameters into the wrapped statement.

That was my original instinct as well. ie. Just do escaping and try to be safe. Probably you are right in many cases. But it seems like an game of Russian roulette. As I read more over the past few weeks, everything I have encountered has told me this approach is extremely dangerous. If you slip up or someone out-thinks you you can have a major data breach or the service could be wiped out. I am not so confident in myself in that manner.

If your system requires allowing users to input text and then have that written into the database, the only certainly safe way everyone says is using prepared statements.

For example, I asked ChatGPT something like if this was safe (eg. HTTP Request or Postgres Simple Query):

user_input = sanitize(user_input) //where sanitize is an external function to check for escaping or other problems
let query_result = client.simple_query("SELECT ${user_input} FROM Customer"); //hypothetically, just to show syntax

And it says over and over in different words same as every resource I see online including the official one above:

What to Avoid: String Interpolation in SQL Queries

If you attempt to insert user input directly into SQL queries via string interpolation (like this):
let query_result = client.simple_query(SELECT * FROM Customer WHERE name = '${user_input}');
This approach is vulnerable to SQL injection. If an attacker enters malicious input such as '; DROP TABLE Customer; --, they could modify or delete your database. Never use string interpolation to construct SQL queries with user input directly.

They suggest we need prepared statements such as:

Safe String Substitution in SQL: When building SQL queries or similar strings, parameterized queries are the most reliable and secure method of including user input inside queries. This approach avoids SQL injection vulnerabilities and ensures input is properly escaped and handled.

Then they give examples of prepared (parameterized) queries like:

  const sanitized_input = sanitize(user_input);
  // Parameterized query
  const res = await client.query(
    'SELECT * FROM Customer WHERE name = $1', // $1 is a placeholder
    [sanitized_input]  // user input passed as parameter
  );

Even as I look at the API's for Postgres adapters they keep warning this constantly and every Stackoverflow thread I pull up says the same. So while it may not be so vulnerable as they say (based on your points), I think it is good enough for me now to read that enough that I am paranoid and do not wish to touch anything except by prepared/parameterized statements. Probably I suspect everyone should then just follow that approach as there is no harm and only safety and benefit.

It is just then a matter of how we do it then to figure out, but I think it definitely ought to be the default approach to be safe.

@gramian
Copy link
Collaborator

gramian commented Nov 19, 2024

To my understanding this is not how parameters work, at least in ArcadeDB. A parameter is placed into the parameterized query or command as its type. For example SELECT FROM doc WHERE name = :myparam becomes SELECT FROM doc WHERE name = 'test' or SELECT FROM doc WHERE name = 4 depending of the type of the parameter myparam.

@jonmdev
Copy link
Author

jonmdev commented Nov 19, 2024

Our postgres driver implementation supports prepared statements.

That would be great @lvca. I am not sure how this would work, though. Because the documentation we have on Arcade says that Arcade only supports "simple" mode for Postgres.

All the documentation I have on "simple postgres mode" suggests this does not permit parameterized/prepared statements but rather just plain string ones like HTTP.

For example, the Postgres package I got working for simple mode with Rust looks like this in practice:

    let mut config = Config::new();
    config.host("localhost")
          .user("root")
          .password("password")
          .dbname("mydb");
    let result: Result<Client, postgres::Error> = config.connect(NoTls);
    let mut client = result.expect("FAILED TO UNWRAP CLIENT");  // This will panic if the connection fails
    
    let query_result = client.simple_query("SELECT * FROM Customer");

This package is the main one for Rust with >5 million downloads and is here:

https://crates.io/crates/postgres

You can see the documentation on Simple Queries here:

https://docs.rs/postgres/latest/postgres/enum.SimpleQueryMessage.html?search=simplequery

pub fn simple_query(
&mut self,
query: &str,
) -> Result<Vec<SimpleQueryMessage>, Error>

Executes a sequence of SQL statements using the simple query protocol.

Statements should be separated by semicolons. If an error occurs, execution of the sequence will stop at that point. The simple query protocol returns the values in rows as strings rather than in their binary encodings, so the associated row type doesn’t work with the FromSql trait. Rather than simply returning the rows, this method returns a sequence of an enum which indicates either the completion of one of the commands, or a row of data. This preserves the framing between the separate statements in the request.

Warning

Prepared statements should be used for any query which contains user-specified data, as they provided the functionality to safely embed that data in the request. Do not form statements via string concatenation and pass them to this method!

So they are again warning us not to use this method as it only takes a string input. The prepared version (correct query method) is query, which does clearly allow parameters in the arguments, but this is then not "simple" mode.

ChatGPT confirms this as well:

Question: does simple mode for postgres allow parameters?

In PostgreSQL, the simple_query method provided by the PostgreSQL client libraries (like pg for Node.js) is typically a low-level query execution function that does not support parameterized queries directly. Instead, you would typically use query or preparedStatement methods for executing queries with parameters safely.

The simple_query method is generally used for executing straightforward SQL queries. However, it does not support parameterized queries or placeholders (like $1, $2, etc.). It's more useful for quick, single-line queries or administrative commands that don't require dynamic user input.

This method just sends the query directly to PostgreSQL without any parameterization, so you cannot use placeholders or safely insert user input into the query.

query (Supports Parameterized Queries)

On the other hand, the query method is the one you would typically use if you need to pass parameters to your query safely.
You can safely use parameterized queries with placeholders (like $1, $2, etc.) using query.

And their conclusion:

Why Use query over simple_query for Parameters?

  • simple_query doesn't support placeholders or parameterized queries, which means you must manually escape values or rely on external sanitization, which can be error-prone and insecure.

  • query with placeholders like $1, $2, etc., ensures safe and secure handling of dynamic values (like user input) and is the recommended method to avoid SQL injection vulnerabilities.

This then matches the Rust API. Thus if the Postgres system on your end allows parameterized queries, it does not require "simple" mode, or at least I believe these are two counteracting things.

Thanks for any further thoughts.

@gramian
Copy link
Collaborator

gramian commented Nov 19, 2024

If this is a response to me: sorry, I was referring to the HTTP API here.

@jonmdev
Copy link
Author

jonmdev commented Dec 7, 2024

Any thoughts @lvca on implementing a method to allow this? I keep reading more ways to try to manage the issue but everything I read doesn't sound good.

It would be good to have a solution. I see this was added to the next milestone. I am just curious your thoughts if you don't mind sharing. 🙂

For example, if I ask ChatGPT:

Can a hacker escape sql and damage my server if I am escaping all ' and " characters as \' and \" in user provided strings?

Escaping ' and " characters (as \' and \") is an important step, but it is not enough to fully protect against SQL injection attacks. Even if you escape these characters, there are still other methods that a hacker can use to potentially escape SQL and damage your server. Here's why:

1. Other SQL Metacharacters Not Escaped

There are many other SQL metacharacters and techniques that hackers can use, even if quotes are escaped:

  • Semicolon (;): This can terminate one SQL statement and start another. If not handled properly, attackers can use it to execute multiple queries in a single request, such as dropping tables or inserting malicious data.

    • Example: SELECT * FROM users WHERE username = 'admin'; DROP TABLE users;
  • Comment Syntax (--, #, /* */): Attackers can use comments to ignore parts of the query after a certain point, bypassing your intended logic.

    • Example: SELECT * FROM users WHERE username = 'admin' -- AND password = 'password'
  • Other Characters ((, ), &, |, =, etc.): These can be used in other injection techniques, like logical operations (e.g., OR 1=1), altering the query structure, or chaining statements together.

2. Blind SQL Injection

Even if you're escaping quotes, an attacker might still be able to perform a blind SQL injection. In this case, the attacker can infer information from the database by using conditional logic and observing the application's behavior (e.g., timing differences or response codes). A time-based blind SQL injection can be particularly dangerous because it doesn't require error messages from the server.

  • Example: SELECT * FROM users WHERE username = 'admin' AND IF(1=1, SLEEP(5), 0) could cause a delay in the response, indicating that the condition is true.

3. Error-Based SQL Injection

Even if input is properly escaped, detailed error messages from the database could still reveal useful information about the structure of your database (e.g., table names, column names, etc.), which an attacker can use for further attacks. Attackers might inject malicious queries to deliberately trigger errors that leak sensitive information.

  • Example: A database might return an error like "Unknown column 'password123' in 'field list'", which reveals table and column names to the attacker.

4. Second-Order SQL Injection

Escaping input properly only protects against immediate queries, but second-order SQL injection involves injecting malicious data that is later executed when retrieved from the database. For instance, an attacker might inject a payload into a field that will later be executed in a different query (e.g., when the data is retrieved and used in an SQL query).

  • Example: An attacker inputs ' OR 1=1 -- into a registration form, and this data is later used in a query when they log in, which could allow unauthorized access.

5. NoSQL Injection or Other Injection Vulnerabilities

If you're using a NoSQL database like MongoDB, escaping quotes won't protect you. NoSQL databases have their own query syntaxes and injection techniques. An attacker may be able to exploit other types of injections that are specific to the database you are using (e.g., MongoDB injections).

6. Stored Procedures and Dynamic SQL

If you're using dynamic SQL or stored procedures, escaping quotes may not be enough. If input is dynamically inserted into the SQL statements, it can still result in vulnerabilities if the input is not handled properly in the dynamic context.

  • Example (dynamic SQL in MySQL):
    SET @query = CONCAT('SELECT * FROM users WHERE username = "', user_input, '"');
    PREPARE stmt FROM @query;
    EXECUTE stmt;

Best Practices for Preventing SQL Injection

While escaping quotes is part of the solution, it is not sufficient on its own. Here are the best practices for properly preventing SQL injection:

  1. Use Parameterized Queries (Prepared Statements):

    • With parameterized queries, user input is never treated as executable code. The database engine knows that the user input is data, not SQL code, so it can't inject malicious code. This is the most reliable and secure method.
    • Example (using MySQLi in PHP):
      $stmt = $conn->prepare("SELECT * FROM users WHERE username = ? AND password = ?");
      $stmt->bind_param("ss", $username, $password);
      $stmt->execute();
  2. Use ORM (Object-Relational Mapping) Libraries:

    • Many modern programming languages and frameworks come with ORM libraries that automatically use parameterized queries and abstract away SQL execution. This reduces the risk of SQL injection.
  3. Limit Database Permissions:

    • Limit the database user’s permissions to only what's necessary. For example, if the user only needs to read from a table, don't grant them permissions to modify or delete tables.
  4. Use Web Application Firewalls (WAFs):

    • A WAF can help detect and block known attack patterns (including SQL injection attempts) before they reach your application.
  5. Input Validation and Sanitization:

    • In addition to escaping characters, always validate and sanitize user input. Ensure that input matches the expected format (e.g., a valid email address or number) and reject input that doesn't conform to these expectations.
  6. Regularly Update and Patch:

    • Keep your database, web server, and application frameworks up to date with security patches. Vulnerabilities in outdated software can provide entry points for attackers.

Conclusion

Escaping single and double quotes is a good practice but insufficient on its own to fully prevent SQL injection attacks. You should adopt parameterized queries, input validation, and other best practices (such as least privilege access and error handling) to ensure that your application is secure against SQL injection and other attack vectors.

@lvca
Copy link
Contributor

lvca commented Dec 7, 2024

Again, our postgres driver supports prepared statements. Have you tried and failed? What is the error?

@lvca
Copy link
Contributor

lvca commented Dec 7, 2024

Just asked to Llama3.2 how to use prepared statement with that rust driver:

To execute a prepared statement using the Rust postgres driver, you can follow these steps:

Step 1: Create a Connection

First, establish a connection to your PostgreSQL database using the postgres crate.

use postgres::{Connection, TlsMode};

let conn = Connection::connect("host=localhost user=myuser password=mypass dbname=mydb", TlsMode::None).unwrap();

Step 2: Prepare the Statement

Prepare the SQL statement using the prepare method.

let stmt = conn.prepare("SELECT * FROM mytable WHERE id = $1").unwrap();

Step 3: Execute the Prepared Statement

Execute the prepared statement using the query_one or query method, depending on your use case.

let row = stmt.query_one(&[&1]).unwrap();

Alternatively, you can use the query method to execute the prepared statement with multiple parameters.

let rows = stmt.query(&[&1, &"hello"]).unwrap();

Note that the query_one method returns a single row, while the query method returns an iterator over the result set.

Error Handling

Don't forget to handle errors properly using Result or unwrap. In a real-world application, you should handle errors more robustly.

use postgres::{Connection, TlsMode, Error};

let conn = Connection::connect("host=localhost user=myuser password=mypass dbname=mydb", TlsMode::None);

match conn {
    Ok(conn) => {
        let stmt = conn.prepare("SELECT * FROM mytable WHERE id = $1");

        match stmt {
            Ok(stmt) => {
                let row = stmt.query_one(&[&1]);

                match row {
                    Ok(row) => println!("Row: {:?}", row),
                    Err(err) => println!("Error: {}", err),
                }
            }
            Err(err) => println!("Error preparing statement: {}", err),
        }
    }
    Err(err) => println!("Error connecting to database: {}", err),
}

@jonmdev
Copy link
Author

jonmdev commented Dec 8, 2024

Yes, @lvca , I have tried with the parameterized method, but it doesn't appear to be working. Is there anything we need to do to activate it in Arcade?

The basic Rust method for parameterized/prepared Postgres query is:

    let first_name="Carl";
    let query= "SELECT * FROM Customer WHERE firstName = $1";
    let query_result = client.query(query, &[&first_name]);

This is illustrated directly from the documentation:

Executes a statement, returning the resulting rows.
A statement may contain parameters, specified by $n, where n is the index of the parameter of the list provided, 1-indexed.
The query argument can either be a Statement, or a raw query string. If the same statement will be repeatedly executed (perhaps with different query parameters), consider preparing the statement up front with the prepare method.
Examples
use postgres::{Client, NoTls};
let mut client = Client::connect("host=localhost user=postgres", NoTls)?;
let baz = true;
for row in client.query("SELECT foo FROM bar WHERE baz = $1", &[&baz])? {
let foo: i32 = row.get("foo");
println!("foo: {}", foo);
}

This query gets the following from Arcade when it is run:

bin/server.sh -Darcadedb.server.databaseDirectory=/usr/local/arcade_db/arcade_data -Darcadedb.server.plugins="Postgres:com.arcadedb.postgres.PostgresProtocolPlugin" -Darcadedb.postgres.debug=true 
2024-12-08 03:31:14.201 INFO  [ArcadeDBServer] <ArcadeDB_0> ArcadeDB Server started in 'development' mode (CPUs=8 MAXRAM=7.80GB)
2024-12-08 03:31:14.212 INFO  [ArcadeDBServer] <ArcadeDB_0> Studio web tool available at http://172.30.97.77:2480
2024-12-08 03:31:21.278 INFO  [PostgresNetworkExecutor] PSQL:-> request for password (R - 8b) (thread=36)
2024-12-08 03:31:21.355 INFO  [PostgresNetworkExecutor] PSQL:-> authentication ok (R - 8b) (thread=36)
2024-12-08 03:31:21.357 INFO  [PostgresNetworkExecutor] PSQL:-> backend key data (K - 12b) (thread=36)
2024-12-08 03:31:21.358 INFO  [PostgresNetworkExecutor] PSQL:-> parameter status (S - 24b) (thread=36)
2024-12-08 03:31:21.358 INFO  [PostgresNetworkExecutor] PSQL:-> parameter status (S - 25b) (thread=36)
2024-12-08 03:31:21.359 INFO  [PostgresNetworkExecutor] PSQL:-> parameter status (S - 25b) (thread=36)
2024-12-08 03:31:21.359 INFO  [PostgresNetworkExecutor] PSQL:-> ready for query (Z - 5b) (thread=36)
2024-12-08 03:31:21.360 INFO  [PostgresNetworkExecutor] PSQL: parse (portal=s0) -> SELECT * FROM Customer WHERE firstName = $1 (params=0) (errorInTransaction=false thread=36)
2024-12-08 03:31:21.426 INFO  [PostgresNetworkExecutor] PSQL:-> parse complete (1 - 4b) (thread=36)
2024-12-08 03:31:21.427 INFO  [PostgresNetworkExecutor] PSQL: describe 's0' type=S (errorInTransaction=false thread=36)
2024-12-08 03:31:21.427 INFO  [PostgresNetworkExecutor] PSQL:-> no data (n - 4b) (thread=36)
2024-12-08 03:31:21.427 INFO  [PostgresNetworkExecutor] PSQL: sync (thread=36)
2024-12-08 03:31:21.428 INFO  [PostgresNetworkExecutor] PSQL:-> ready for query (Z - 5b) (thread=36)

It claims there as you can see that it received params=0

2024-12-08 03:31:21.360 INFO  [PostgresNetworkExecutor] PSQL: parse (portal=s0) -> SELECT * FROM Customer WHERE firstName = $1 (params=0) (errorInTransaction=false thread=36)

If it is any help, I have shared my demo project which can be run just by cargo run here: https://github.com/jonmdev/postgres_test

The full code is just a few functions: https://github.com/jonmdev/postgres_test/blob/master/src/main.rs

You can toggle in or out of main run_postgres_prepared or run_postgres_simple then see the outcomes upon cargo run.

Thanks for any help.

@lvca lvca modified the milestones: 24.11.2, 24.12.1 Dec 9, 2024
@robfrank robfrank self-assigned this Dec 17, 2024
@lvca lvca modified the milestones: 24.12.1, 25.1.1 Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants