Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amplitude Personalize Workshop #41

Merged
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
2ea785e
Added Amplitude notebook
manbearshark May 29, 2020
b4f3ce5
Post personalization features
manbearshark May 29, 2020
37475d5
Switched config file name
manbearshark May 29, 2020
b51609b
More detail on post-personalize reports
manbearshark May 29, 2020
b335a46
Post personalize part I
manbearshark May 30, 2020
70bc088
Datagenerator library + Amplitude reports
manbearshark Jun 1, 2020
d577caa
Fix for file paths
manbearshark Jun 1, 2020
bc7f06b
Users file read + SSM config read
manbearshark Jun 1, 2020
2426f64
Additional config steps
manbearshark Jun 1, 2020
15429ae
Fixed settings image
manbearshark Jun 2, 2020
07abe32
Fixed SSM query issue
manbearshark Jun 2, 2020
5c118b0
Adding final steps
manbearshark Jun 3, 2020
af38a50
Search report images and content
manbearshark Jun 3, 2020
0a078e7
Staged Amplitude deploy
manbearshark Jun 5, 2020
27c883a
User ID to string
manbearshark Jun 6, 2020
332040c
Data reference and stage copy update
manbearshark Jun 9, 2020
4d96050
Braze workshop (#40)
manbearshark Jun 9, 2020
21046c7
Braze added to workshop intro (#42)
manbearshark Jun 11, 2020
0c10437
Added Amplitude notebook
manbearshark May 29, 2020
7310e08
Post personalization features
manbearshark May 29, 2020
bd98487
Switched config file name
manbearshark May 29, 2020
497926d
More detail on post-personalize reports
manbearshark May 29, 2020
2f74422
Post personalize part I
manbearshark May 30, 2020
69ec124
Datagenerator library + Amplitude reports
manbearshark Jun 1, 2020
9898d19
Fix for file paths
manbearshark Jun 1, 2020
31b3b6e
Users file read + SSM config read
manbearshark Jun 1, 2020
1aa1c0b
Additional config steps
manbearshark Jun 1, 2020
40e4bb0
Fixed settings image
manbearshark Jun 2, 2020
b4978d7
Fixed SSM query issue
manbearshark Jun 2, 2020
0569c9b
Adding final steps
manbearshark Jun 3, 2020
e8afbc1
Search report images and content
manbearshark Jun 3, 2020
6ac9987
Staged Amplitude deploy
manbearshark Jun 5, 2020
b0f3709
User ID to string
manbearshark Jun 6, 2020
f085c55
Data reference and stage copy update
manbearshark Jun 9, 2020
e49bd09
Merge branch 'amplitude-personalize-workshop' of github.com:manbearsh…
manbearshark Jun 11, 2020
912b207
Welcome notebook update
manbearshark Jun 11, 2020
2089b8b
Architecture updates
manbearshark Jun 12, 2020
6c9c2bd
Added license info to data generator files
manbearshark Jun 12, 2020
3134a86
Moved user gen to datagenerator library
manbearshark Jun 12, 2020
45aedc5
datagenerator README
manbearshark Jun 12, 2020
b811a23
Check for workshop data dir in stage
manbearshark Jun 12, 2020
52841dc
requirements.txt to workshop
manbearshark Jun 12, 2020
e77c607
requirements for amplitude notebook
manbearshark Jun 13, 2020
ae48e7d
datagenerator requirements.txt update
manbearshark Jun 13, 2020
dc76153
upgrade to conda3
manbearshark Jun 13, 2020
bba9ff1
cleanup documentation
manbearshark Jun 18, 2020
0a2ecfb
datagenerator readme update
manbearshark Jun 18, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
__pycache__/
.vscode
build/*
*.zip
workshop/*/.ipynb_checkpoints
workshop/1-Personalization/interactions.csv
workshop/1-Personalization/items.csv
Expand All @@ -16,4 +15,8 @@ demo.md
generators/*.gz
/csvs/
.unotes/*
!workshop/5-Conversational/RetailDemoStore_Lex.zip
!workshop/5-Conversational/RetailDemoStore_Lex.zip
*.zip
workshop/data/*
workshop/datagenerator/*
workshop/requirements.txt
17 changes: 17 additions & 0 deletions generators/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# User Data Generator

generate_users_json.py generates a set of users for the Retail Demo Store.

These user profiles are used in the following ways:

* The Users service provides login services to the user profiles that this crates for the Retail Demo Store
manbearshark marked this conversation as resolved.
Show resolved Hide resolved
* Workshops which need to generate simulated user behavior data can use the datagenerator library to create simulated events for these user profiles after they are created. This provides realistic and consistent data across all integrated tools in the Retail Demo Store.

## datagenerator Library

The datagenerator library is a Python library that provides the follwing functions:

* A pool of randomly generated users (see ./datagenerator/users.py)
manbearshark marked this conversation as resolved.
Show resolved Hide resolved
* The ability to specify a set of user behavior funnels and to then generate events that can be sent to Amazon Personalize, Segment, or Amplitude. (see ./datagenerator/file.py, amplitude.py, and segment.py)

For examples of usage of the event generator features, see ../workshop/3-Experimentation/3.5-Amplitude-Performance-Metrics.ipynb)
manbearshark marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions generators/datagenerator/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

aws_datagenerator_version = '1.8.0'
80 changes: 80 additions & 0 deletions generators/datagenerator/amplitude.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

import datagenerator
import json
import requests
import yaml

# Amplitude event support
# This follows the Amplitude V2 HTTP Bulk API spec, here:
# https://help.amplitude.com/hc/en-us/articles/360032842391-HTTP-API-V2
#
# These classes accept a user, platform, and general event properties and map them
# into an Amplitude API compatible represenation.

class AmplitudeEvent:
def __init__(self, timestamp, user, platform):
self.time = int(timestamp.timestamp() * 1000) # Amplitude time is milliseconds since epoch
self.user_id = f'{user.id:0>5}' # Amplitude user ID is a string type, min length is 5 which is weird

platform_data = user.get_platform_data(platform)
self.device_id = platform_data['anonymous_id']
if platform == 'ios':
self.idfa = platform_data['advertising_id']
self.platform = 'iOS'
self.device_model = platform_data['model']
self.os_version = platform_data['version']
elif platform == 'android':
self.adid = platform_data['advertising_id']
self.device_model = platform_data['model']
self.os_version = platform_data['version']

def toJson(self):
return self.__repr__()

def __repr__(self):
return json.dumps(self.__dict__)

class AmplitudeIdentifyEvent(AmplitudeEvent):
def __init__(self, timestamp, user, platform):
super().__init__(timestamp, user, platform)
self.event_type = '$identify'
self.user_properties = user.traits
self.user_properties['name'] = user.name
self.user_properties['email'] = user.email
self.user_properties['age'] = user.age
self.user_properties['gender'] = user.gender
self.user_properties['persona'] = user.persona
self.user_properties['username'] = user.username

class AmplitudeTrackEvent(AmplitudeEvent):
def __init__(self, name, timestamp, user, platform, properties):
super().__init__(timestamp, user, platform)
self.event_type = name
self.event_properties = properties

class AmplitudeSender:
def __init__(self, config):
self.config = config # MUST BE: { 'api_key': <Amplitude API Key> }
self.endpoint = 'https://api.amplitude.com/2/httpapi'

def send_batch(self, platform, events, debug=False):
batch_events = {
"api_key": self.config['api_key'],
"events": events
}

events_str = json.dumps(batch_events, default=lambda x: x.__dict__)
#print(f'Batch length bytes: {len(events_str)}')
if debug:
parsed = json.loads(events_str)
print(f'{json.dumps(parsed, indent=4)}')
response = None
else:
response = requests.post(self.endpoint,
data=events_str)
#print(self.config_keys[platform])
#print(json.dumps(batch_events, default=lambda x: x.__dict__))
#print(f'Sent {len(batch_events["batch"])} events and got {response}')
return response
25 changes: 25 additions & 0 deletions generators/datagenerator/file.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

class FileEvent:
def __init__(self, name, timestamp, user, platform, properties):
self.event = name
self.timestamp = timestamp.isoformat()
self.user_id = user.id
self.anonymous_id = user.get_platform_data(platform)['anonymous_id']
self.platform = platform
self.traits = ''

if len(user.traits.items()) > 0:
for (k,v) in user.traits.items():
self.traits += f',{v}'

def str(self):
return self.__repr__()

def __repr__(self):
output = f'{self.event},{self.timestamp},{self.user_id},{self.anonymous_id},{self.platform}'
if len(self.traits) > 0:
output += self.traits
output += f'\n'
return output
68 changes: 68 additions & 0 deletions generators/datagenerator/funnel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

import random
import numpy as np
import datetime
import inspect
from datagenerator.output import OutputFormatter
from collections.abc import Mapping, Iterable

class Funnel:
def __init__(self, timestamp, funnel, user):
self.funnel = funnel
self.event_index = 0
self.timestamp = timestamp
self.platform = self.funnel['platform']
self.user = user

if 'user_props' in self.funnel:
self.user.set_traits(self.funnel['user_props'])
self.identify = True
else:
self.identify = False

if 'state' in self.funnel:
self.state = self.funnel['state'](self.user) # Passes the user to the state lambda
else:
self.state = None

def __iter__(self):
return self

def __next__(self):
success_percent = min(100, 50 + (self.event_index * 10)) / 100
proceed = self.proceed(success_percent)
at_start = self.event_index == 0
not_at_end = self.event_index < len(self.funnel['templates'])
# This is to make sure that you always get at least the first event in a funnel,
# rest will be stochastic
if (proceed and not_at_end) or at_start:
formatter = OutputFormatter(
self.timestamp,
self.user,
self.platform,
self.generate_props(self.event_index),
self.funnel['templates'][self.event_index][0])
self.timestamp += datetime.timedelta(seconds=random.randint(30, 600))
self.event_index += 1
return formatter
else:
raise StopIteration

def generate_props(self, index):
template = self.funnel['templates'][index]
props = {}
for (k,v) in template[1].items():
if k == 'expand' and callable(v):
props = {**props, **v(self.state)}
elif callable(v):
props[k] = v(self.state)
elif isinstance(v, Iterable):
props[k] = random.choice(v)
else:
props[k] = v
return props

def proceed(self, p):
return np.random.binomial(1, p)
85 changes: 85 additions & 0 deletions generators/datagenerator/output.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

from datagenerator.segment import SegmentIdentifyEvent, SegmentTrackEvent, SegmentSender
from datagenerator.amplitude import AmplitudeIdentifyEvent, AmplitudeTrackEvent, AmplitudeSender
from datagenerator.file import FileEvent

# TODO: Add Personalize output file formatter
# TODO: Add Amplitude output formatter

class OutputFormatter:
def __init__(self, timestamp, user, platform, properties, name = None):
self.event = name
self.timestamp = timestamp
self.user = user
self.properties = properties
self.platform = platform

def amplitude_identify(self):
return AmplitudeIdentifyEvent(self.timestamp, self.user, self.platform)

def amplitude_event(self):
return AmplitudeTrackEvent(self.event, self.timestamp, self.user, self.platform, self.properties)

def segment_track(self):
return SegmentTrackEvent(self.event, self.timestamp, self.user, self.platform, self.properties)

def segment_identify(self):
return SegmentIdentifyEvent(self.timestamp, self.user, self.platform)

def file_event(self):
return FileEvent(self.event, self.timestamp, self.user, self.platform, self.properties)

class OutputWriter:
def __init__(self, sessions):
self.sessions = sessions

def to_file(self, file_name):
# Write to the specified file using the FileEvent output formatter
f = open(file_name, 'w')
for funnel in self.sessions:
for formatter in funnel:
event = formatter.file_event()
f.write(event.str())

def to_amplitude(self, config, debug=False):
sender = AmplitudeSender(config)
print(f'Send config is: {config}.')
count = 0
for funnel in self.sessions:
batch =[]
count += 1
for formatter in funnel:
if funnel.identify:
# Send an identify call if specified in the funnel
event = formatter.amplitude_identify()
batch.append(event)
event = formatter.amplitude_event()
batch.append(event)
if len(batch) > 0:
response = sender.send_batch(funnel.platform, batch, debug)
if response != None and response.status_code > 200:
print(f'Error sending to Amplitude: {response.text}')
print(f'Processed {count} funnels...')

def to_segment(self, config_file, debug=False):
# Write to Segment, using the specified config file
sender = SegmentSender('segment_config.yaml')
print(f'Send config is: {sender.config_keys}')
count = 0
for funnel in self.sessions:
batch = []
count += 1
for formatter in funnel:
if funnel.identify:
# Send an identify call if specified in the funnel
event = formatter.segment_identify()
batch.append(event)
event = formatter.segment_track()
batch.append(event)
if len(batch) > 0:
response = sender.send_batch(funnel.platform, batch, debug)
if response != None and response.status_code > 200:
print(f'Error sending to Segment: {response.text}')
print(f'Processed {count} funnels...')
14 changes: 14 additions & 0 deletions generators/datagenerator/rdscatalog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

import yaml
from collections import UserList

class RDSCatalog(UserList):
def __init__(self, file):
self.data = []
f = open(file)
self.data = yaml.load(f, Loader=yaml.FullLoader)

def subcategory_sample(self, categories):
return list(filter(lambda item: item['category'] in categories, self.data))
75 changes: 75 additions & 0 deletions generators/datagenerator/rdsuserstate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

import random
import uuid

class RDSUserSelectionState:
def __init__(self, catalog, user):
if user.persona != '': # Added to support RDS personas from the catalog
self.search_results = catalog.subcategory_sample(user.persona.split('_'))
else:
self.search_results = random.sample(catalog, 10)
self.subsample = random.sample(self.search_results, 5)
self.cart = random.sample(self.subsample, 3)
self.cart_id = str(uuid.uuid4())
self.search_terms = []
for item in self.search_results:
self.search_terms.extend(item['name'].split(' '))

def search(self):
return self.search_results

def user_search(self):
separator = ' '
query = separator.join(random.sample(self.search_terms, 2))
return query

def recommendations(self):
return random.sample(self.subsample, 3)

def cart_items(self):
return self.cart

def num_results(self):
return len(self.search_results)

def cart_value(self):
total = 0.0
for item in self.cart:
total += item['price']
return total

def item(self):
return random.choice(self.cart)

# These are specific to RDS event properties
def item_added_event_props(self):
item = self.item()
return {
'productId': item['id'],
'cartId': self.cart_id,
'name': item['name'],
'category': item['category'],
'image': item['image'],
'price': item['price'],
'quantity': 1
}

def item_viewed_event_props(self):
item = self.item()
return {
'productId': item['id'],
'name': item['name'],
'category': item['category'],
'image': item['image'],
'price': item['price']
}

def cart_viewed_event_props(self):
return {
'cartId': self.cart_id,
'cartSubTotal': self.cart_value(),
'cartTotal': self.cart_value(),
'cartQuantity': len(self.cart)
}
Loading