Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] OWSql does not save selected table/query #2659

Merged
merged 7 commits into from
Oct 17, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 16 additions & 7 deletions Orange/data/sql/backend/mssql.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,17 +117,26 @@ def _guess_variable(self, field_name, field_metadata, inspect_table):
EST_ROWS_RE = re.compile(r'StatementEstRows="(\d+)"')

def count_approx(self, query):
try:
with self.connection.cursor() as cur:
with self.connection.cursor() as cur:
try:
cur.execute("SET SHOWPLAN_XML ON")
try:
cur.execute(query)
result = cur.fetchone()
return int(self.EST_ROWS_RE.search(result[0]).group(1))
except AttributeError:
# This is to catch a bug from SQL Server 2012
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the query plan in SQL Server 2012 produces an XML, which does not match the EST_ROWS_RE regexp as it represents the number of rows as float? If this is the case, why not update the regular expression to match both?

Instead of

StatementEstRows="(\d+)"

which expects an integer between the parentheses, we could use something like this

r'StatementEstRows="(\d+)(\.\d*)?"'

the second regular expressions allow the number to end with a decimal point and some numbers, but only parses the whole part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exacty, after searching it seem that our SQL server statistics are not maintained often enough and this results in bad values being returned in form of a float thus the AttributeError. I'm not sure if a well maintained server would return a float: I have a second database where statistics are maintained and I never received a float from the Plan query.
It would require more samples to figure out what to do. This code works in all situations, the drawback is the execution time if a float is returned. Maybe there should be a warning issued like those deprecated warnings found in some libraries saying that "statistics are out of date, SELECT count(*) used." but I do not know how to throw such a message in Orange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code comments to explain the reason of the AttributeError. If someone ever comes back with a float with the right estimated result, the code will have to be updated as you suggested and if possible with a sanity check on the statistics (so un-maintained statistics would be caught).

# resulting StatementEstRows is float instead of int
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read your comment and I agree, if the float is a result of a reallly bad estimate, it should not be used. I still do not like the flow of this function though. I would put it like this:

match = self.EST_ROWS_RE.search(result[0])
if not match:
    # Comment explaining why this happens
    return None
return int(match.group(1))

finally:
cur.execute("SET SHOWPLAN_XML OFF")
except pymssql.Error as ex:
if "SHOWPLAN permission denied" in str(ex):
warnings.warn("SHOWPLAN permission denied, count approximates will not be used")
return None
raise BackendError(str(ex)) from ex
except pymssql.Error as ex:
if "SHOWPLAN permission denied" in str(ex):
warnings.warn("SHOWPLAN permission denied, count approximates will not be used")
return None
raise BackendError(str(ex)) from ex
# In case of an AttributeError, give a second chance:
# Use the long method for counting (or upgrade SQL Server version :) )
cur.execute("SELECT count(*) FROM ( {} ) x".format(query))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary. When count_approx returns None, an exact count is already performed (see SqlTable.count_approx)

result = cur.fetchone()
return result[0]
62 changes: 38 additions & 24 deletions Orange/widgets/data/owsql.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
import sys
from collections import OrderedDict

Expand All @@ -20,7 +21,7 @@
MAX_DL_LIMIT = 1000000


class TableModel(PyListModel):
class TableModels(PyListModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be singular Model as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have anticipated this one.

def data(self, index, role=Qt.DisplayRole):
row = index.row()
if role == Qt.DisplayRole:
Expand All @@ -43,7 +44,7 @@ class OWSql(OWWidget):
icon = "icons/SQLTable.svg"
priority = 30
category = "Data"
keywords = ["data", "file", "load", "read"]
keywords = ["data", "file", "load", "read", "SQL"]

class Outputs:
data = Output("Data", Table, doc="Attribute-valued data set read from the input file.")
Expand Down Expand Up @@ -85,10 +86,10 @@ def __init__(self):
vbox = gui.vBox(self.controlArea, "Server", addSpace=True)
box = gui.vBox(vbox)

self.backendmodel = BackendModel(Backend.available_backends())
self.backends = BackendModel(Backend.available_backends())
self.backendcombo = QComboBox(box)
if len(self.backendmodel):
self.backendcombo.setModel(self.backendmodel)
if len(self.backends):
self.backendcombo.setModel(self.backends)
else:
self.Error.no_backends()
box.setEnabled(False)
Expand Down Expand Up @@ -124,17 +125,24 @@ def __init__(self):
box.layout().addWidget(self.passwordtext)

self._load_credentials()
self.tablemodels = TableModels()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tablemodel or tables, but not tablemodels


tables = gui.hBox(box)
self.tablemodel = TableModel()
self.tablecombo = QComboBox(
minimumContentsLength=35,
sizeAdjustPolicy=QComboBox.AdjustToMinimumContentsLength
)
self.tablecombo.setModel(self.tablemodel)
self.tablecombo.setModel(self.tablemodels)
self.tablecombo.setToolTip('table')
tables.layout().addWidget(self.tablecombo)
self.connect()

index = self.tablecombo.findText(str(self.table))
if index != -1:
self.tablecombo.setCurrentIndex(index)
# set up the callback to select_table in case of selection change
self.tablecombo.activated[int].connect(self.select_table)

self.connectbutton = gui.button(
tables, self, '↻', callback=self.connect)
self.connectbutton.setSizePolicy(
Expand Down Expand Up @@ -167,7 +175,8 @@ def __init__(self):
callback=self.open_table)

gui.rubber(self.buttonsArea)
QTimer.singleShot(0, self.connect)

QTimer.singleShot(0, self.select_table)

def _load_credentials(self):
self._parse_host_port()
Expand All @@ -182,8 +191,8 @@ def _load_credentials(self):

def _save_credentials(self):
cm = self._credential_manager(self.host, self.port)
cm.username = self.username
cm.password = self.password
cm.username = self.username or ''
cm.password = self.password or ''

def _credential_manager(self, host, port):
return CredentialManager("SQL Table: {}:{}".format(host, port))
Expand Down Expand Up @@ -217,7 +226,7 @@ def connect(self):
try:
if self.backendcombo.currentIndex() < 0:
return
backend = self.backendmodel[self.backendcombo.currentIndex()]
backend = self.backends[self.backendcombo.currentIndex()]
self.backend = backend(dict(
host=self.host,
port=self.port,
Expand All @@ -232,24 +241,24 @@ def connect(self):
("Database", self.database), ("User name", self.username)
))
self.refresh_tables()
self.select_table()
except BackendError as err:
error = str(err).split('\n')[0]
self.Error.connection(error)
self.database_desc = self.data_desc_table = None
self.tablecombo.clear()

def refresh_tables(self):
self.tablemodel.clear()
self.tablemodels.clear()
self.Error.missing_extension.clear()
if self.backend is None:
self.data_desc_table = None
return

self.tablemodel.append("Select a table")
self.tablemodel.extend(self.backend.list_tables(self.schema))
self.tablemodel.append("Custom SQL")
self.tablemodels.append("Select a table")
self.tablemodels.append("Custom SQL")
self.tablemodels.extend(self.backend.list_tables(self.schema))

# Called on tablecombo selection change:
def select_table(self):
curIdx = self.tablecombo.currentIndex()
if self.tablecombo.itemText(curIdx) != "Custom SQL":
Expand All @@ -260,6 +269,8 @@ def select_table(self):
self.data_desc_table = None
self.database_desc["Table"] = "(None)"
self.table = None
if len(str(self.sql)) > 14:
return self.open_table()

#self.Error.missing_extension(
# 's' if len(missing) > 1 else '',
Expand All @@ -272,19 +283,22 @@ def open_table(self):
self.Outputs.data.send(table)

def get_table(self):
if self.tablecombo.currentIndex() <= 0:
curIdx = self.tablecombo.currentIndex()
if curIdx <= 0:
if self.database_desc:
self.database_desc["Table"] = "(None)"
self.data_desc_table = None
return

if self.tablecombo.currentIndex() < self.tablecombo.count() - 1:
self.table = self.tablemodel[self.tablecombo.currentIndex()]
if self.tablecombo.itemText(curIdx) != "Custom SQL":
self.table = self.tablemodels[self.tablecombo.currentIndex()]
self.database_desc["Table"] = self.table
if "Query" in self.database_desc:
del self.database_desc["Query"]
what = self.table
else:
self.sql = self.table = self.sqltext.toPlainText()
what = self.sql = self.sqltext.toPlainText()
self.table = "Custom SQL"
if self.materialize:
import psycopg2
if not self.materialize_table_name:
Expand All @@ -297,11 +311,10 @@ def get_table(self):
pass
with self.backend.execute_sql_query("CREATE TABLE " +
self.materialize_table_name +
" AS " + self.table):
" AS " + self.sql):
pass
with self.backend.execute_sql_query("ANALYZE " + self.materialize_table_name):
pass
self.table = self.materialize_table_name
except (psycopg2.ProgrammingError, BackendError) as ex:
self.Error.connection(str(ex))
return
Expand All @@ -312,7 +325,7 @@ def get_table(self):
database=self.database,
user=self.username,
password=self.password),
self.table,
what,
backend=type(self.backend),
inspect_values=False)
except BackendError as ex:
Expand All @@ -321,8 +334,8 @@ def get_table(self):

self.Error.connection.clear()


sample = False

if table.approx_len() > LARGE_TABLE and self.guess_values:
confirm = QMessageBox(self)
confirm.setIcon(QMessageBox.Warning)
Expand All @@ -347,6 +360,7 @@ def get_table(self):
domain = s.get_domain(inspect_values=True)
self.Information.data_sampled()
else:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this empty line is necessary

domain = table.get_domain(inspect_values=True)
QApplication.restoreOverrideCursor()
table.domain = domain
Expand Down