-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support optimizer hints for select queries #260
Comments
We've tried optimizer hints before and found out that they're just as likely to hurt performance as to help. There are several reasons why optimizer hints are not a good idea.
Even some database vendors consider hints to be an anti-pattern. |
agree that it is better to trust the optimizer of db is doing a good job on execution plan for most queries, but no all queries, at least not for the query we are facing.
could hardly agree on the this point, whatever hints we human add to the query is not only basing on the data as of now, but prediction on data change in future as well, which the DB will never know.
the correct index was chosen after we analyzed the table manually, but sadly the execution plan was changed back to use wrong index in several hours.
sadly, the wrong index which is being used is primary key, which could not be removed or changed.
here is the query, very simple, 1 table, without any or-clause or in-clause. primary key is on item_seq and processing_date_to;
|
Given this query, I would also choose the The other index is rather bizarre. The advice I gave you was very specific: "arrange columns in an index in selectivility order". For this triplet, that's either |
considering the order by clause, in worst case: item_seq='yyy' located at the first line, and the last record satisfied the condition located at the end, any index start from item_seq will cause a full index table scan. |
For this query, an index like From your description, it sounds like P.S. Are you using the database as a queue? There is no fix for that particular anti-pattern, other than to simply stop and use the right tool for that instead. |
the purpose of this query is to fetch all item data of specified cp_seq, with pagination. |
if that's the case, the most appropriate index is: If you're not willing to try that, there is no point in continuing this conversation.
Yes, you do. It's called a count/group by query. |
please refer to the test result below, hint used to force index to be selected. case 1: as you suggested case 2: case 3: |
are you sure you're not running into caching effects here? Did you execute them in that order? What happens if you execute them in a different order or multiple times? |
case 1: as you suggested case 3: case 2: |
so you told me the difficult case was when cp_seq had millions of rows, for which I suggested an index and then choose a query with a result of 56 rows? I already told you what you should do with the rare cp case: issue the query without conditions on item_seq. |
this test is executed on local environment, data is different from real environment, which we could not add any index simply. |
on other hand, I checked the release notes of later version of Mysql. here is the bug report here is the release note DBA prepared 2 testing environment with the version we are using now, and the latest version, with new environment variable prefer_ordering_index to off, and imported same data from real environment. but obviously, the optimizer of DB itself have a limitation. |
Where you're running the query shouldn't matter. What does matter is running the right query. First, run this query: SELECT count(1), cp_seq FROM item t0 WHERE t0.processing_date_to = '9999-12-01 23:59:00.0' group by t0.cp_seq Then based on that, either issue a query with or without item_seq conditions. If you do that, you shouldn't have to force any indices. |
Data keeps changing every minute, and our system has been running for 3 years.
or you can simply assume the item count of each cp_seq will be between 0 and 10% of total records. |
The data from the group by is an estimate. You don't have to do it with every query. Just do it once an hour in the background. If the estimate is low (< 10K or not found), issue without item_seq. If it's high (> 10K), issue with item_seq. |
this is to implement an optimizer ourselves, which is far costy than using a simple hint clause. if you could kindly support. |
How is it costly? If it takes you more than 50 lines of code to do this, you're doing it wrong. It uses correct data science to issue proper queries, now and forever, instead of an irrational belief that a human can predict data distributions years into the future. |
sorry, is there any evidence shows the solution you shared is using correct data science and will retrieve a higher performance? the fact is, in real environment, the item count is more than 1 million, and local environment, it is 56.
|
did you follow what I said? In this particular case, you have to: in your production environment, add a in your test environment, run the query without item_seq clauses. |
what do you mean
|
with: SELECT * FROM item t0 WHERE t0.cp_seq = 'xxx' AND t0.processing_date_to = '9999-12-01 23:59:00.0' AND t0.item_seq >= 'yyy' ORDER BY t0.item_seq LIMIT 1002; without
|
for those cp_seqs which owns less than 10k, you mean we have to change specification of our system to improve performance, right? what about the performance more than 10k? |
fortunately there are some cp_seqs which own items more than 1 million in local environment. case 1: as you suggested case 3: case 2: |
and the primary key is selected if hint to force index is removed. |
hence, we still need your help to support hint, to force to use the index you suggested. |
Something isn't making sense. What's the output of |
there is no other index available for this query. case 1: as you suggested case 2: case 3: |
Try the following: Also, issue this query: SELECT count(1), cp_seq, min(item_seq) FROM item t0 WHERE t0.processing_date_to = '9999-12-01 23:59:00.0' group by t0.cp_seq then use the min value from the above query in your original query: SELECT * FROM item t0 WHERE t0.cp_seq = 'xxx' AND t0.processing_date_to = '9999-12-01 23:59:00.0' AND t0.item_seq >= min_val_from_above ORDER BY t0.item_seq LIMIT 1002; |
since the parameter used in above execution exactly satisfy your condition, same parameter executed with the new index you suggested. case 4: as you suggested by the way, primary key is still being selected by Mysql, after the new index created. |
On other hand, I reported the issue to Mysql, and it has been verified. but, I don"t think it will be resolved, since there are more than 600 bugs verified on optimizer, and some of them are reported more than 10 years back. hence, hint seems is the optimal solution, we are still looking forward to your support. |
I'm waiting on the results of this:
SELECT count(1), cp_seq, min(item_seq) FROM item t0 WHERE t0.processing_date_to = '9999-12-01 23:59:00.0' group by t0.cp_seq
SELECT * FROM item t0 WHERE t0.cp_seq = 'xxx' AND t0.processing_date_to = '9999-12-01 23:59:00.0' AND t0.item_seq >= min_val_from_above ORDER BY t0.item_seq LIMIT 1002; |
I have already tried as you suggested and here is the result. |
There are too many things labeled "case x" in this conversation. Can you clarify what exact queries you ran? |
As mentioned earlier, hint is used to force index to be used in the query. here is the query, the index name is the only difference in
|
what was the value for |
Can you please explain does the value matter? |
The point of #260 (comment) was to use values that are specific to each |
any item_seq maybe used in the query, data keeps changing. |
so just so we're clear, you did not follow the instructions provided in #260 (comment) ? |
I don't understand, do you mean that we are facing performance issue with the parameter for example 'xxx' and 'yyy', and you are telling us to use parameter 'aaa' and 'bbb' to resolve it? |
No, I'm trying to see if there is a better way to construct the query. Let's take a step back: where does the query value ( |
as you mentioned, the first item_seq of target cp_seq. |
If you have mysql environment and would like to tune the query, please refer to the bug details, there is simple way to generate testing data, and repeat the bug. and also you may have an image on the data distribution. |
So in your production code, you run the aggregate query before running the normal query? and for the test results here you've done the same? |
the parameters are from client, not decided by us. |
Hi, is it good to commit a change to support optimizer hints for select queries?
The text was updated successfully, but these errors were encountered: