Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix search issue with special characters, and escape them in SQL. #844 #846

Merged
merged 8 commits into from
Aug 15, 2024
3 changes: 2 additions & 1 deletion include/functions_entries.inc.php
Original file line number Diff line number Diff line change
Expand Up @@ -883,7 +883,8 @@ function &serendipity_searchEntries($term, $limit = '', $searchresults = '') {
$term = str_replace('"', '"', $term);
$relevance_enabled = true;
if (preg_match('@["\+\-\*~<>\(\)]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
$term = serendipity_db_escape_string($term);
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('\"$term\"' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, we should use db_escape_string here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that mean you can remove if cond if you apply it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, we should use db_escape_string here too?

I will also fix db_escape_string for mysqli because it use deprecated mysqli alias.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @@[\+\-\*~<>\(\)"].[\S]*@ is currently wrong there, maybe a copy+paste mistake?

Anyhow... I'm afraid we regular-expression based evaluation of this might not bring us further. Because the regexp would match true on a string like: "+valid +valid +".

So I think we have two options here:

A: Introduce a tokenizer that analyzes the whole string and ALL parts in it, and performs checks on it
B: Execute the query and check if it works or not

Approach A will be quite complex and way too much effort to do this reliably. Not sure there is a parser/tokenizer for this out there, maybe part of Doctrine DBAL.

Approach B is a bit nasty but might be little effort. We could execute the query, use the "expect error" flag, and check if there is a result, something like:

$searchMode = 'non-boolean';
if (preg_match('@["\+\-\*~<>\(\)]+@', $term)) {
$termRows = serendipity_db_query("SELECT * FROM serendipity_entries WHERE $term LIMIT 1", true, 'both', false, false, false, true); // Last "true" is important here
  if ($termRows !== false) {
    $searchMode = 'boolean'; // Terms works as boolean expression
  }  
}
if ($searchMode === 'bolean') {
    $cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
    $cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}

(untested, just a thought)

}
Expand Down