Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies of MapReduce when using http or pbc [JIRA: CLIENTS-131] #394

Open
gglanzani opened this issue Jan 5, 2015 · 6 comments
Open

Comments

@gglanzani
Copy link

I have the following Erlang code (very simple, just for illustration)

-module(grid_mr).

-export([yearfun/3]).

yearfun(O, _KeyData, _Arg) ->
  {struct, Map} = mochijson2:decode(riak_object:get_value(O)),
  Year = proplists:get_value(<<"year">>, Map, -1.0),
  Grid = proplists:get_value(<<"grid">>, Map, -1.0),
  case Year > 2006 of
     true -> [{Grid, Year}];
     false -> []
end.

When I run a MR code via CURL, I get

$ curl -XPOST http://172.17.12.21:8089/mapred \
   -H 'Content-Type: application/json'   \
   -d '{"inputs":[["STATS", grid_stats_C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0-2008"],
["STATS",  grid_stats_C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0-2010"]],"query":[{"map":{"language":"erlang","module":"grid_mr","function":"yearfun"}}]}'

{"C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0":2008,"C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0":2010}

However when submitting with Python using http

from riak import RiakClient
from riak import RiakMapReduce
riak = RiakClient(protocol='http', host='172.17.12.22', http_port=8089)
bucket = riak.bucket("STATS")
mr = RiakMapReduce(riak)
keys = ["grid_stats_C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0-2008", 
        "grid_stats_C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0-2010"]
mr.add("STATS", keys)
#mr.search(index="grid_stats", query="industry_id:22 AND customer_segmentation_id:6")
mr.map(['grid_mr', 'yearfun'])
for result in mr.run():
    print "%s" % result 

I get

C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0

If, instead, I use pbc

from riak import RiakClient
from riak import RiakMapReduce
riak = RiakClient(protocol='pbc', host='172.17.12.22', http_port=8087)
bucket = riak.bucket("STATS")
mr = RiakMapReduce(riak)
keys = ["grid_stats_C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0-2008", 
        "grid_stats_C7FF2796D6BD1153E847E84277F6B4A1022E29DACA35989004C10FFA92E2A5F0-2010"]
mr.add("STATS", keys)
#mr.search(index="grid_stats", query="industry_id:22 AND customer_segmentation_id:6")
mr.map(['grid_mr', 'yearfun'])
for result in mr.run():
    print "%s" % result

the client raises an exception

[...]
.virtualenvs/numpy/lib/python2.7/site-packages/riak/transports/pbc/transport.pyc in mapred(self, inputs, query, timeout)
    410         for phase, content in self.stream_mapred(inputs, query, timeout):
    411             if phase in result:
--> 412                 result[phase] += content
    413             else:
    414                 result[phase] = content
TypeError: unsupported operand type(s) for +=: 'dict' and 'dict'

Am I doing something wrong, or…? Right now changing the Erlang code is the only fix:

-module(binary_grid).

-export([yearfun/3]).

yearfun(O, _KeyData, _Arg) ->
  {struct, Map} = mochijson2:decode(riak_object:get_value(O)),
  Year = proplists:get_value(<<"year">>, Map, -1.0),
  Grid = proplists:get_value(<<"grid">>, Map, -1.0),
  case Year > 2006 of
     true -> [list_to_binary(mochijson2:encode([{Grid, Year}]))];
     false -> []
end.
@Basho-JIRA Basho-JIRA changed the title Inconsistencies of MapReduce when using http or pbc Inconsistencies of MapReduce when using http or pbc [JIRA: CLIENTS-131] Jan 5, 2015
@hazen
Copy link

hazen commented Jan 7, 2015

@gglanzani We'll take a look at this. I assume it's not related to the mochijson2 issue you reported in #395?

@gglanzani
Copy link
Author

No, that issue seems to be related with an older version of mochijson2, while this one present itself only when I'm NOT using mochi

@hazen
Copy link

hazen commented Jan 20, 2015

@gglanzani Sorry for the delay. Trying to understand your situation. So you are running your own, custom Erlang yearfun on your nodes. You are getting the desired results in curl, but are only getting a partial result with the Python HTTP interface, right? And PBC seems to raise an exception in this case, right?

I'm looking at our current test cases and all of them seem to return a list values, not more complicated types like proplists. I could see that curl is simply blurping back whatever Erlang returns, but the Python client has to serialize and deserialize. As you can see from the PBC results, it's trying to convert it into a Python dict(), which is the source of the exception. I'm interested that the HTTP sort of works. I'm guessing it is unpacking part of the JSON we get back from Riak and throwing the rest on the floor.

So my guess at the moment is the Python client is only supporting returning a list of values. We could look at lists of dicts() as an enhancement.

@gglanzani
Copy link
Author

Yes,you gave an accurate description of what (I think) is going on.

I think lists of dicts were fairly common but I see that some extra code is needed to make it work. Btw, curl converts [{grid1, year1}, {grid2 , year2}] to {grid1: year1, grid2: year2}, so some conversion is happening.

Is this difficult to fix?

@hazen
Copy link

hazen commented Jan 21, 2015

@gglanzani I'll take a good to see how difficult it would be to add. Probably not too hard. Just have to prioritize it.

@gglanzani
Copy link
Author

@javajolt At least it's in jira 😜

@lukebakken lukebakken modified the milestone: riak-python-client-2.7.1 Dec 16, 2016
@lukebakken lukebakken modified the milestones: riak-python-client-2.7.1, riak-python-client-3.0.0 Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants