Skip to content

Commit

Permalink
Merge pull request #2 from panditanvita/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
panditanvita committed Jul 26, 2015
2 parents 6136014 + 7965209 commit 073c8cd
Show file tree
Hide file tree
Showing 6 changed files with 186 additions and 39 deletions.
89 changes: 58 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# MovieBot
natural language movie requests for Magic Tiger

<</in progress/>>
--in progress--

***
###to play:

run Bot file on python interpreter

'''
`
bot = Bot()

bot.run()
'''
`

interact with bot on the console
Interact with bot on the console
ask for a movie that's in theatres. ask for 4 tickets. suggest a time of day.
or suggest an exact time. or ask for a theatre and suggest a time of day.

Expand All @@ -21,16 +23,21 @@ Finish by either fulfilling a request, or if you input "bye"

***
###Purpose:

MovieBot gives a valid response to every line of movie-related input

Each bot corresponds to a single conversation, as the final product of a successful conversation should be a single
completed movie request

Bot has access to unchangeable dictionaries of movie names and theatres, which come from the knowledge base
each bot instance keeps track of all its conversations.

Two options for running:
1. with a debug flag (in which case, call bot.run() to play with the features,

1. With a debug flag (in which case, call bot.run() to play with the features,
and the bot will interact using System.in and System.out
2. without the debug flag, in which case the bot will keep track of its state in the MovieRequest and conversation

2. Without the debug flag, in which case the bot will keep track of its state in the MovieRequest and conversation
objects created at instantiation, and you must call the sleek_get_response(message) function to get the bot's
response to a particular input

Expand All @@ -39,7 +46,18 @@ response to a particular input
***
###Design:

idea of an 'expert system' with a knowledge base and logical rule-set. keeps track of state and can respond to certain movie-related inputs
idea of an 'expert system' with a knowledge base and logical rule-set.

keeps track of state in a State object and can respond to certain movie-related inputs

State object has a question and option.
Question corresponds to the attribute the bot is expecting to here about, and is used in the
tagging functions, to favor entities which are indicated by the question. For example -
if question is 1, then the tagging functions try harder to find a valid movie title.
If the bot response involves multiple options, we want to make it easier for the customer
to choose a specific one. Option field keeps track of given options, if the bot gave
a list of valid theatres or movies, for example.


###Knowledge : scraping and parsing information from the internet/stored files

Expand All @@ -61,21 +79,22 @@ Speed: takes about five seconds to finish
###Tokeniser: tokenises and tags information from the customer

####tokeniser.py
tokenizing, categorizing and tagging words done in here
Tokenizing, categorizing and tagging words done in here

tokeniser splits up incoming string into valid words, attempts to correct for slang,
Tokeniser splits up incoming string into valid words, attempts to correct for slang,
and tries to keep times and phone numbers as one token

tagging done in tag_tokens_num (which looks for ticket numbers and times)
Tagging is done in tag_tokens_num (which looks for ticket numbers and times)
and tag_tokens_movies (which looks for movie titles and theatres).

idea is to allow for some typos using the typo() function for all string comparisons
The idea is to allow for some typos using the typo() function for all string comparisons

currently theatre name is the hardest to select for, because the full theatre name is
Currently the theatre name is the hardest to select for, because the full theatre name is
never used - people will mention several keywords out of order like 'pvr koramangala' or
'sri srinivasa', and those keywords may even match to multiple theatres. current
implementation attempts to look for a subset of matching keywords , and narrows
down the total space as far as possible
'sri srinivasa', and those keywords may even match to multiple theatres.

The current implementation attempts to look for a subset of matching keywords , and narrows
down the total space as far as possible. It returns all best matching options.
***

***
Expand All @@ -84,23 +103,28 @@ down the total space as far as possible
####logic.py
Logic is thought of in terms of cases: given a limited set of total
attributes to fulfill, we want to fit in as many as possible while also
making sure it is all mutually compatible. case 1: we have one movie, case 1.1: we have a movie
and a theatre, is this movie playing at this theatre? case 2: we have two movies and so on..
order of attempting-to-fit is movies - theatres - time. So it might input a movie
making sure it is all mutually compatible.

There are many cases and sub-cases. For example:
Case 1: we have one movie in the list of tags
Case 2: we have a movie and a theatre, is this movie playing at this theatre?
case 3: we have multiple movies and one theatre. Which movies are playing at the theatre?
Case 4: We have one movie and one time of day. Which theatres can we return?
And so on.
The order of attempting-to-fit multiple options is movies - theatres - time. So it might input a movie
and a theatre and then fail at selecting the chosen time.

narrow() takes in tokeniser output - the tagged movie/theatre/time/day entities
it subcontracts the work. there is a submodule for each attribute, each function
makes sure that what it inputs into the movie request is
correct given all the other information it knows.
each submodule updates the request object based on what it knows. each one creates a
narrow() takes in tokeniser output - the tagged movie/theatre/time/day entities.
It subcontracts the work to specific subfunctions for each kind of entity.
Each function makes sure that what it inputs into the movie request is
correct given all the other information it knows. Each function returned creates a
potential return message for the customer.

output of narrow() is given to eval()
finally, decisions: do we have enough information?
which questions must we ask to get more information?
maybe the selected movie is not playing in the selected theatre?
give alternate showtimes
The output of narrow() is given to eval()
Finally, bot must make decisions: Do we have enough information?
Which questions must we ask to get more information?
Maybe the selected movie is not playing in the selected theatre? What
alternative options are there?
eval() chooses which output to return.

Note that eval() re-evaluates based on every time narrow() is called on a set
Expand All @@ -111,10 +135,13 @@ saved to the request object is lost.
***
####Further improvements:

- google scraping doesn't return a lot of valid showtimes - need the book my show api
(most important)
- google scraping doesn't return a lot of the valid showtimes - need the book my show api
- save all past information returned by the narrow() sub-functions in some sort of State
object, which should keep track of both the question and the narrowed down options


- save all past information returned by the narrow() sub-functions
- long if/else cases in logic are awkward (but it seems to work)
- long if/else cases in logic are awkward (but it seems to work). what alternatives?
- options for choosing numbered answers still needs to be done
- timeout for repeating the same question
- options for choosing a different day (will need to scrape theatres for the
Expand Down
38 changes: 37 additions & 1 deletion classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def __init__(self, bms_name, address, company):
Theatre.theatres.append(self)

# String movie
# String[] timings ex: "10:30am"
# Time[] timings ex: "[Time('10:30am')]
def put(self, movie, timings):
self.check()
self.movies[movie.lower()] = timings
Expand Down Expand Up @@ -106,13 +106,20 @@ def getAgentChat(self):

'''
classes for bot
'''

'''
MovieRequest: object keep track of information that we are completely
sure of, which fits in with all the other information that we have
stored in the object
Title is String movie title, cased
num_tickets is Integer number of tickets
Theatre is String Theatre.bms_name, cased
date
time is instance of Time, time of showing
payment_method is 0 for COD, 1 for online
(currently nothing to support payment_method or comments)
'''
class MovieRequest:
Expand Down Expand Up @@ -167,3 +174,32 @@ def readout(self):
self.title, self.theatre,t, self.date)
return readout


'''
Keeping track of what we are learning.
Int question: corresponds to index of attribute in request.done. Initialised as 0,
which means the initial question is about the movie.
Options keeps track of a list of options, whether of movies or theatres,
where the option number is i+1, for index i of the item in the list
list of keys
Option is used in logic.py - if we are given a tagged numbers, and (there are
multiple items in state.options, indicating that the last thing the bot said
was a list of options AND the question isn't looking for time or - there can be
multiple showtimes that the bot returns as possible examples, but people will
use the time value itself to refer to them, not the number), then
we should use that number to correspond to the item numbered in the options,
pick out that item, treat it like an equivalent to the case if tag_theats or
tag_movs had a single item, and rewrite the option list, either to [] or to a new
list
hence it must be re-created every time logic module runs
'''
class State:
def __init__(self):
self.question = 0
self.options = []
self.option_type = 0 #for theatres, 1 for movies
6 changes: 6 additions & 0 deletions knowledge.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,5 +312,11 @@ def f(i):

url = startUrl + "&start=" + str(len(theatreList))

# add all theatres into dictionary, even if it doesn't have any movies for today
# that way, we can always recognise when a theatre is mentioned
for t in Theatre.theatres:
if t.bms_name.lower() not in namesToTheatres.keys():
namesToTheatres[t.bms_name.lower()] = t

print("Knowledge base loaded")
return namesToMovies, namesToTheatres, theatreList
15 changes: 8 additions & 7 deletions logic.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
'''
def narrow_movies(req,tag_movs,ntm):
r1 = 0, "Which movie?"
if req.done[0] != 1:
if req.done[0] != 1: # doesn't re-write if a movie is already selected
if len(tag_movs) == 1:
m_nice = ntm[tag_movs[0]].title
req.add_title(m_nice)
Expand Down Expand Up @@ -61,13 +61,14 @@ def narrow_theatres(req,tag_theats,ntt):
if req.done[0]:
ft = [t for t in tag_theats if len(ntt[t].movies.get(mk, [])) > 0]
if len(ft) == 0:
statement = "{} isn't playing there today".format(req.title)
statement = "{} isn't playing at any of those locations today".format(req.title)
else:
ft_nice = [ntt[t].bms_name for t in ft]
statement = "{} is playing in: ".format(req.title) + '\n'.join(ft_nice)
# ['{}. {}'.format(i, t) for i, t in enumerate(ft)]
# not using because cannot support user choosing numbers
# but it would be nice
statement = "{} is playing in: ".format(req.title) \
+ '\n'.join(['{}. {}'.format(i, t) for i, t in enumerate(ft_nice)])
#'\n'.join(ft_nice)
#
# support user choosing numbers!

# ['{}. {}'.format(i, t) for i, t in enumerate(tag_theats)]
r2 = 2, statement
Expand Down Expand Up @@ -160,7 +161,7 @@ def get_options(time):
req.add_time(time1)
r4 = 1,""
else:
#list of movies and theatres
#list of movies and theatres, cut off because it can get long
r4 = 2, statement[:400] + '...'
else:
# no movie, no theatre either
Expand Down
35 changes: 35 additions & 0 deletions tests/knowledge.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
__author__ = 'V'

from MovieBot.classes import *
from MovieBot.showtime import *

# for testing, we want known ntm, ntt dictionaries
def get_info():
req1 = MovieRequest("test")
req2 = MovieRequest("test")

ntt, ntm = {}, {}

times1 = [Time('6pm'), Time('630pm'), Time('730pm')]
times2 = [Time('9am'), Time('1030am'), Time('1730pm'), Time('10pm')]

#title, description, theatres
m1 = Movie("Zabod")
m2 = Movie("Interesting Short Stories")

#bms_name, address, company
t1 = Theatre("t1",['outer','ring''road'], 'pvr')
t1.put('zabod',times1)
t2 = Theatre("t2",['outer','koramangala'], 'cinemax')
t2.put('zabod',times2)
t3 = Theatre("t3",['marathalli'], 'innovative multiplex')
t3.put('interesting short stories', times2)
ntt['t1'] = t1
ntt['t2'] = t2
ntt['t3'] = t3

ntt['zabod'] = m1
ntt['interesting short stories'] = m2

req1.add_title('zabod')
return req1, req2, ntm, ntt
42 changes: 42 additions & 0 deletions tests/test_narrow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
__author__ = 'V'

import unittest

from MovieBot.logic import *

from knowledge import get_info

class Test_narrow(unittest.TestCase):
req1, req2, ntm, ntt = get_info()

def test_narrow_movies(self):
tag_movs = []
r1 = narrow_movies(self.req1,tag_movs,self.ntm)
self.assertEqual(r1, (0,"What movie?"))

tag_movs = ['zabod']
r1_ = narrow_movies(self.req1, tag_movs, self.ntm)
self.assertEqual(r1_, (0,"What movie?"))

r1_1 = narrow_movies(self.req2, tag_movs, self.ntm)
self.assertEqual(r1_1, (1,""))


def test_narrow_theatres(self):
tag_theats = ['t1']
r2 = narrow_theatres(self.req1,tag_theats,self.ntt)

self.assertTrue(r2, (0, "At which theatre?"))


def test_narrow_num(self):
self.assertTrue(True)



def main():
unittest.main()


if __name__ == '__main__':
main()

0 comments on commit 073c8cd

Please sign in to comment.