Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PSQL KeyValueStore implementation for python 2/3 compatibility #365

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/release_notes/pending_release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,10 @@ Updates / New Features
Fixes
-----

Representation

- ``KeyValueStore``

- PostgreSQL

- Fix python<->SQL bytea conversion functions for python 2/3 compatibility.
23 changes: 18 additions & 5 deletions python/smqtk/representation/key_value/postgres.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,15 @@
class PostgresKeyValueStore (KeyValueStore):
"""
PostgreSQL-backed key-value storage.

NOTE: Due to the differences in pickle serialization between python
versions 2 and 3 (even with the same protocol version) the same underlying
postgresql table should not be shared between instances of
``PostgresKeyValueStore`` in different python major versions.
"""

PICKLE_PROTOCOL_VER = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Python 3 attempt to read objects that have been pickled in Python 2? The documentation makes it seem like this is the case, which would defeat the purpose of changing the protocol number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A) I just noticed that I didn't try using the fix_imports flag for python 3's pickle. I'll have to try that in my compatibility failures I was seeing.

B) A python 3 instance of PostgresKeyValueStore will attempt to read the data in the table and columns its pointed to, even if that data was written by a python 2 instance of PostgresKeyValueStore. I had tried using a pinned protocol version of 2 (since both python versions understand that version) but was still seeing compatibility issues. This may have been the correct thing to do when I try (A).


class SqlTemplates (object):
"""
Container for static PostgreSQL queries used by the containing class.
Expand Down Expand Up @@ -198,16 +205,22 @@ def _py_to_bin(k):
:rtype: psycopg2.Binary

"""
return psycopg2.Binary(pickle.dumps(k))
return psycopg2.Binary(pickle.dumps(
k, protocol=PostgresKeyValueStore.PICKLE_PROTOCOL_VER
))

@staticmethod
def _bin_to_py(b):
"""
Un-"translate" psycopg2.Binary value (buffer) to a python type.
Un-"translate" binary return from a psycopg2 query (buffer) to a python
object instance.

:param b: ``psycopg2.Binary`` buffer instance as retrieved from a
PostgreSQL query.
:type b: psycopg2.Binary
:param b: Buffer instance as retrieved from a PostgreSQL query. This
may be either a ``buffer`` instead (python 2.x) or a ``memoryview``
instance (python 3.x). Generally, the type passed here should be
passable to the ``bytes`` constructor to get the underlying byte
string.
:type b: buffer | memoryview

:return: Python object instance as loaded via pickle from the given
``psycopg2.Binary`` buffer.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ def test_remove(self, m_gcp):
the mock cursor.
"""
expected_key = 'test_remove_key'
expected_key_bytea = bytes(
PostgresKeyValueStore._py_to_bin(expected_key))
expected_key_bytea = \
PostgresKeyValueStore._py_to_bin(expected_key).getquoted()

# Cut out create table calls.
s = PostgresKeyValueStore(create_table=False)
Expand All @@ -66,7 +66,12 @@ def test_remove(self, m_gcp):
"DELETE FROM .+ WHERE .+ LIKE .+")
self.assertEqual(set(mock_execute.call_args[0][1].keys()),
{'key_like'})
self.assertEqual(bytes(mock_execute.call_args[0][1]['key_like']),
# Value passed to ``cursor.execute`` should be the result of
# ``_py_to_bin``, AKA the ``psycopg2.Binary`` instance. Compare
# using the quoted representation that will be used in the SQL
# query.
self.assertIsNotNone(mock_execute.call_args[0][1]['key_like'].getquoted())
self.assertEqual(mock_execute.call_args[0][1]['key_like'].getquoted(),
expected_key_bytea)

@mock.patch('smqtk.utils.postgres.get_connection_pool')
Expand Down Expand Up @@ -156,22 +161,28 @@ def test_remove_many(self, m_psqlExecBatch, m_gcp):
"%(key_tuple)s"
mock_execute_call_args = mock_execute.call_args[0]
self.assertEqual(mock_execute_call_args[0], expected_has_q)
# Value passed to ``cursor.execute`` should be the result of
# ``_py_to_bin``, AKA the ``psycopg2.Binary`` instance. Compare
# using the quoted representation that will be used in the SQL
# query.
self.assertEqual(set(mock_execute_call_args[1].keys()), {'key_tuple'})
for k in mock_execute_call_args[1]['key_tuple']:
self.assertIsNotNone(k.getquoted())
self.assertEqual(
[bytes(k) for k in mock_execute_call_args[1]['key_tuple']],
[bytes(k) for k in [exp_key_1_bytea, exp_key_2_bytea]]
[k.getquoted() for k in mock_execute_call_args[1]['key_tuple']],
[k.getquoted() for k in [exp_key_1_bytea, exp_key_2_bytea]]
)

# Confirm call arguments to ``psycopg2.extras.execute_batch``
expected_del_q = "DELETE FROM data_set WHERE key LIKE %(key_like)s"
expected_del_vals = [{'key_like': exp_key_1_bytea},
{'key_like': exp_key_2_bytea}]
psqlExecBatch_call_args = m_psqlExecBatch.call_args[0]
self.assertEqual(psqlExecBatch_call_args[0], mock_cursor)
self.assertEqual(psqlExecBatch_call_args[1], expected_del_q)
# 3rd argument is a list of dictionaries for 'key_like' replacements
self.assertEqual(len(psqlExecBatch_call_args[2]), 2)
for d in psqlExecBatch_call_args[2]:
self.assertIsNotNone(d['key_like'].getquoted())
self.assertListEqual(
[bytes(d['key_like']) for d in psqlExecBatch_call_args[2]],
[bytes(exp_key_1_bytea), bytes(exp_key_2_bytea)]
[d['key_like'].getquoted() for d in psqlExecBatch_call_args[2]],
[exp_key_1_bytea.getquoted(), exp_key_2_bytea.getquoted()]
)