Please fill out:
- Student name: AURALIA ADILLA MBOYA
- Student pace: Full Time
- Scheduled project review date/time: Nov 20th/ 11:59pm
- Instructor name: Mark Tiba
- Blog post URL: N/A
- Introduction
- Business Understanding
- Business problem
- Objectives
- Data Understanding
- Data preparation
- Data Loading
- Data cleaning
- Data Analysis
- Exploratory Descriptive Analysis (EDA)
- Translating data into visual context
- Plotting of graphs.
- Conclusion
- Recommendations
I will use exploratory data analysis to produce insights for a business stakeholder in this segment.
I'll walk you through my research findings and how I turn them into useful information that stakeholders can use to guide their decision-making.
Microsoft sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies.They have hired you to help them better understand the movie industry. Your team is charged with exploring what type of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create.
- What is the correlation between the genre and movie runtime?
- Which genre has the highest rating?
- Which of the genres has the highest production budget?
- Which Genre has the highest world wide gross?
- Which is the most voted for genre?
The majority of the information I used for this project came from a zipped folder that contained materials provided by the school. Since they have different file formats, they were all compressed into one folder.
The URLs to the data that I will be modifying for this project are listed below:
b. IMDB
d. TheMovieDB
e. The Numbers
For a film studio to exist or be successful, we must conduct research, comprehend the information from the content provided, choose the right performers, and identify the top authors and writers for the various genres. To make Microsoft Film Studio successful, we will need to comprehend all the facts at our disposal. Four of These links' data were utilized for this project.
I'll be transforming data into usable format from this point on.
The relationships shown in the ERD below are what our datasets should have once they have been cleaned up in order for the stakeholder to understand what we are attempting to do.
# loading allthe necessary libraries
#For a more user-friendly data representation, import Pandas as pd.
#For the SQL database,import sqlite3
#import Numpy for arrays as np
#import Seaborn and Matplotlib for visualizations
#import json for the available structured data
import pandas as pd
import sqlite3
import numpy as np
import seaborn as sns
import json
import matplotlib.pyplot as plt
%matplotlib inline
import csv
#Verifying that all necessary datasets have successfully loaded
#Checking for the necessary datasets
!ls -a
.
..
.canvas
.git
.gitignore
.ipynb_checkpoints
CONTRIBUTING.md
LICENSE.md
Production Budget Vs Genres.png
Production Budget Vs Genres.png
README.md
awesome.gif
bom.movie_gross.csv
im.db
movie_data_erd.jpeg
rt.movie_info.tsv
rt.reviews.tsv
student.ipynb
tmdb.movies.csv
tn.movie_budgets.csv
untitled
zippedData
#loading the box office mojo file
movie_gross=pd.read_csv ('bom.movie_gross.csv')
movie_gross
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | |
---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010 |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010 |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010 |
3 | Inception | WB | 292600000.0 | 535700000 | 2010 |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010 |
... | ... | ... | ... | ... | ... |
3382 | The Quake | Magn. | 6200.0 | NaN | 2018 |
3383 | Edward II (2018 re-release) | FM | 4800.0 | NaN | 2018 |
3384 | El Pacto | Sony | 2500.0 | NaN | 2018 |
3385 | The Swan | Synergetic | 2400.0 | NaN | 2018 |
3386 | An Actor Prepares | Grav. | 1700.0 | NaN | 2018 |
3387 rows × 5 columns
#loading movie_budgets file
movie_budgets = pd.read_csv('tn.movie_budgets.csv')
movie_budgets
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | release_date | movie | production_budget | domestic_gross | worldwide_gross | |
---|---|---|---|---|---|---|
0 | 1 | Dec 18, 2009 | Avatar | $425,000,000 | $760,507,625 | $2,776,345,279 |
1 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | $410,600,000 | $241,063,875 | $1,045,663,875 |
2 | 3 | Jun 7, 2019 | Dark Phoenix | $350,000,000 | $42,762,350 | $149,762,350 |
3 | 4 | May 1, 2015 | Avengers: Age of Ultron | $330,600,000 | $459,005,868 | $1,403,013,963 |
4 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | $317,000,000 | $620,181,382 | $1,316,721,747 |
... | ... | ... | ... | ... | ... | ... |
5777 | 78 | Dec 31, 2018 | Red 11 | $7,000 | $0 | $0 |
5778 | 79 | Apr 2, 1999 | Following | $6,000 | $48,482 | $240,495 |
5779 | 80 | Jul 13, 2005 | Return to the Land of Wonders | $5,000 | $1,338 | $1,338 |
5780 | 81 | Sep 29, 2015 | A Plague So Pleasant | $1,400 | $0 | $0 |
5781 | 82 | Aug 5, 2005 | My Date With Drew | $1,100 | $181,041 | $181,041 |
5782 rows × 6 columns
#loading the imdb file
movie_info= pd.read_csv('tmdb.movies.csv')
movie_info
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Unnamed: 0 | genre_ids | id | original_language | original_title | popularity | release_date | title | vote_average | vote_count | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | [12, 14, 10751] | 12444 | en | Harry Potter and the Deathly Hallows: Part 1 | 33.533 | 2010-11-19 | Harry Potter and the Deathly Hallows: Part 1 | 7.7 | 10788 |
1 | 1 | [14, 12, 16, 10751] | 10191 | en | How to Train Your Dragon | 28.734 | 2010-03-26 | How to Train Your Dragon | 7.7 | 7610 |
2 | 2 | [12, 28, 878] | 10138 | en | Iron Man 2 | 28.515 | 2010-05-07 | Iron Man 2 | 6.8 | 12368 |
3 | 3 | [16, 35, 10751] | 862 | en | Toy Story | 28.005 | 1995-11-22 | Toy Story | 7.9 | 10174 |
4 | 4 | [28, 878, 12] | 27205 | en | Inception | 27.920 | 2010-07-16 | Inception | 8.3 | 22186 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
26512 | 26512 | [27, 18] | 488143 | en | Laboratory Conditions | 0.600 | 2018-10-13 | Laboratory Conditions | 0.0 | 1 |
26513 | 26513 | [18, 53] | 485975 | en | _EXHIBIT_84xxx_ | 0.600 | 2018-05-01 | _EXHIBIT_84xxx_ | 0.0 | 1 |
26514 | 26514 | [14, 28, 12] | 381231 | en | The Last One | 0.600 | 2018-10-01 | The Last One | 0.0 | 1 |
26515 | 26515 | [10751, 12, 28] | 366854 | en | Trailer Made | 0.600 | 2018-06-22 | Trailer Made | 0.0 | 1 |
26516 | 26516 | [53, 27] | 309885 | en | The Church | 0.600 | 2018-10-05 | The Church | 0.0 | 1 |
26517 rows × 10 columns
#loading movie info file
#to check if there is any relevant data
movie_info= pd.read_table('rt.movie_info.tsv')
movie_info
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | synopsis | rating | genre | director | writer | theater_date | dvd_date | currency | box_office | runtime | studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | This gritty, fast-paced, and innovative police... | R | Action and Adventure|Classics|Drama | William Friedkin | Ernest Tidyman | Oct 9, 1971 | Sep 25, 2001 | NaN | NaN | 104 minutes | NaN |
1 | 3 | New York City, not-too-distant-future: Eric Pa... | R | Drama|Science Fiction and Fantasy | David Cronenberg | David Cronenberg|Don DeLillo | Aug 17, 2012 | Jan 1, 2013 | $ | 600,000 | 108 minutes | Entertainment One |
2 | 5 | Illeana Douglas delivers a superb performance ... | R | Drama|Musical and Performing Arts | Allison Anders | Allison Anders | Sep 13, 1996 | Apr 18, 2000 | NaN | NaN | 116 minutes | NaN |
3 | 6 | Michael Douglas runs afoul of a treacherous su... | R | Drama|Mystery and Suspense | Barry Levinson | Paul Attanasio|Michael Crichton | Dec 9, 1994 | Aug 27, 1997 | NaN | NaN | 128 minutes | NaN |
4 | 7 | NaN | NR | Drama|Romance | Rodney Bennett | Giles Cooper | NaN | NaN | NaN | NaN | 200 minutes | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1555 | 1996 | Forget terrorists or hijackers -- there's a ha... | R | Action and Adventure|Horror|Mystery and Suspense | NaN | NaN | Aug 18, 2006 | Jan 2, 2007 | $ | 33,886,034 | 106 minutes | New Line Cinema |
1556 | 1997 | The popular Saturday Night Live sketch was exp... | PG | Comedy|Science Fiction and Fantasy | Steve Barron | Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner | Jul 23, 1993 | Apr 17, 2001 | NaN | NaN | 88 minutes | Paramount Vantage |
1557 | 1998 | Based on a novel by Richard Powell, when the l... | G | Classics|Comedy|Drama|Musical and Performing Arts | Gordon Douglas | NaN | Jan 1, 1962 | May 11, 2004 | NaN | NaN | 111 minutes | NaN |
1558 | 1999 | The Sandlot is a coming-of-age story about a g... | PG | Comedy|Drama|Kids and Family|Sports and Fitness | David Mickey Evans | David Mickey Evans|Robert Gunter | Apr 1, 1993 | Jan 29, 2002 | NaN | NaN | 101 minutes | NaN |
1559 | 2000 | Suspended from the force, Paris cop Hubert is ... | R | Action and Adventure|Art House and Internation... | NaN | Luc Besson | Sep 27, 2001 | Feb 11, 2003 | NaN | NaN | 94 minutes | Columbia Pictures |
1560 rows × 12 columns
#Using the encode attribute to load a tsv file and display tab separated values
rt_reviews = pd.read_table('rt.reviews.tsv', encoding='unicode_escape')
rt_reviews
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | |
---|---|---|---|---|---|---|---|---|
0 | 3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 |
1 | 3 | It's an allegory in search of a meaning that n... | NaN | rotten | Annalee Newitz | 0 | io9.com | May 23, 2018 |
2 | 3 | ... life lived in a bubble in financial dealin... | NaN | fresh | Sean Axmaker | 0 | Stream on Demand | January 4, 2018 |
3 | 3 | Continuing along a line introduced in last yea... | NaN | fresh | Daniel Kasman | 0 | MUBI | November 16, 2017 |
4 | 3 | ... a perverse twist on neorealism... | NaN | fresh | NaN | 0 | Cinema Scope | October 12, 2017 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
54427 | 2000 | The real charm of this trifle is the deadpan c... | NaN | fresh | Laura Sinagra | 1 | Village Voice | September 24, 2002 |
54428 | 2000 | NaN | 1/5 | rotten | Michael Szymanski | 0 | Zap2it.com | September 21, 2005 |
54429 | 2000 | NaN | 2/5 | rotten | Emanuel Levy | 0 | EmanuelLevy.Com | July 17, 2005 |
54430 | 2000 | NaN | 2.5/5 | rotten | Christopher Null | 0 | Filmcritic.com | September 7, 2003 |
54431 | 2000 | NaN | 3/5 | fresh | Nicolas Lacroix | 0 | Showbizz.net | November 12, 2002 |
54432 rows × 8 columns
#Sqlite3 connection to the database for reading the files
conn = sqlite3.connect("im.db")
conn
<sqlite3.Connection at 0x2b2c795b3f0>
#load the necessary data from the movie_ratings sql file
movie_ratings = pd.read_sql_query("""
SELECT *
FROM movie_ratings
LIMIT 10
;
""", conn)
movie_ratings
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
movie_id | averagerating | numvotes | |
---|---|---|---|
0 | tt10356526 | 8.3 | 31 |
1 | tt10384606 | 8.9 | 559 |
2 | tt1042974 | 6.4 | 20 |
3 | tt1043726 | 4.2 | 50352 |
4 | tt1060240 | 6.5 | 21 |
5 | tt1069246 | 6.2 | 326 |
6 | tt1094666 | 7.0 | 1613 |
7 | tt1130982 | 6.4 | 571 |
8 | tt1156528 | 7.2 | 265 |
9 | tt1161457 | 4.2 | 148 |
#load the data from the movie_basics
movie_basics = pd.read_sql_query("""
SELECT *
FROM movie_basics
;
""", conn)
movie_basics
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
movie_id | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|
0 | tt0063540 | Sunghursh | Sunghursh | 2013 | 175.0 | Action,Crime,Drama |
1 | tt0066787 | One Day Before the Rainy Season | Ashad Ka Ek Din | 2019 | 114.0 | Biography,Drama |
2 | tt0069049 | The Other Side of the Wind | The Other Side of the Wind | 2018 | 122.0 | Drama |
3 | tt0069204 | Sabse Bada Sukh | Sabse Bada Sukh | 2018 | NaN | Comedy,Drama |
4 | tt0100275 | The Wandering Soap Opera | La Telenovela Errante | 2017 | 80.0 | Comedy,Drama,Fantasy |
... | ... | ... | ... | ... | ... | ... |
146139 | tt9916538 | Kuambil Lagi Hatiku | Kuambil Lagi Hatiku | 2019 | 123.0 | Drama |
146140 | tt9916622 | Rodolpho Teóphilo - O Legado de um Pioneiro | Rodolpho Teóphilo - O Legado de um Pioneiro | 2015 | NaN | Documentary |
146141 | tt9916706 | Dankyavar Danka | Dankyavar Danka | 2013 | NaN | Comedy |
146142 | tt9916730 | 6 Gunn | 6 Gunn | 2017 | 116.0 | None |
146143 | tt9916754 | Chico Albuquerque - Revelações | Chico Albuquerque - Revelações | 2013 | NaN | Documentary |
146144 rows × 6 columns
#load movie_akas to display relevant data needed.
movie_akas = pd.read_sql_query("""
SELECT *
FROM movie_akas
;
""", conn)
movie_akas
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
movie_id | ordering | title | region | language | types | attributes | is_original_title | |
---|---|---|---|---|---|---|---|---|
0 | tt0369610 | 10 | Джурасик свят | BG | bg | None | None | 0.0 |
1 | tt0369610 | 11 | Jurashikku warudo | JP | None | imdbDisplay | None | 0.0 |
2 | tt0369610 | 12 | Jurassic World: O Mundo dos Dinossauros | BR | None | imdbDisplay | None | 0.0 |
3 | tt0369610 | 13 | O Mundo dos Dinossauros | BR | None | None | short title | 0.0 |
4 | tt0369610 | 14 | Jurassic World | FR | None | imdbDisplay | None | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
331698 | tt9827784 | 2 | Sayonara kuchibiru | None | None | original | None | 1.0 |
331699 | tt9827784 | 3 | Farewell Song | XWW | en | imdbDisplay | None | 0.0 |
331700 | tt9880178 | 1 | La atención | None | None | original | None | 1.0 |
331701 | tt9880178 | 2 | La atención | ES | None | None | None | 0.0 |
331702 | tt9880178 | 3 | The Attention | XWW | en | imdbDisplay | None | 0.0 |
331703 rows × 8 columns
I'll be Correcting or deleting inaccurate, corrupted, improperly formatted, duplicate, or incomplete data from the relevant datasets.
#starting data cleaning from the first dataset
#cheecking for any erraneous data, null values or incomplete
movie_gross
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | |
---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010 |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010 |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010 |
3 | Inception | WB | 292600000.0 | 535700000 | 2010 |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010 |
... | ... | ... | ... | ... | ... |
3382 | The Quake | Magn. | 6200.0 | NaN | 2018 |
3383 | Edward II (2018 re-release) | FM | 4800.0 | NaN | 2018 |
3384 | El Pacto | Sony | 2500.0 | NaN | 2018 |
3385 | The Swan | Synergetic | 2400.0 | NaN | 2018 |
3386 | An Actor Prepares | Grav. | 1700.0 | NaN | 2018 |
3387 rows × 5 columns
#convert domestic_gross float to integer type
movie_gross['domestic_gross'] = movie_gross['domestic_gross'].fillna(0).astype(int)
#confirm the conversion
movie_gross
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | |
---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000 | 652000000 | 2010 |
1 | Alice in Wonderland (2010) | BV | 334200000 | 691300000 | 2010 |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000 | 664300000 | 2010 |
3 | Inception | WB | 292600000 | 535700000 | 2010 |
4 | Shrek Forever After | P/DW | 238700000 | 513900000 | 2010 |
... | ... | ... | ... | ... | ... |
3382 | The Quake | Magn. | 6200 | NaN | 2018 |
3383 | Edward II (2018 re-release) | FM | 4800 | NaN | 2018 |
3384 | El Pacto | Sony | 2500 | NaN | 2018 |
3385 | The Swan | Synergetic | 2400 | NaN | 2018 |
3386 | An Actor Prepares | Grav. | 1700 | NaN | 2018 |
3387 rows × 5 columns
#convert domestic_gross float type to integer type
movie_gross['domestic_gross'].astype(int)
0 415000000
1 334200000
2 296000000
3 292600000
4 238700000
...
3382 6200
3383 4800
3384 2500
3385 2400
3386 1700
Name: domestic_gross, Length: 3387, dtype: int32
#check for any null values
movie_gross.isna().sum()
title 0
studio 5
domestic_gross 0
foreign_gross 1350
year 0
dtype: int64
#checked for null values
#null values were found to be 1350 on foreign_gross column, 5 on studio
#Considering that they will be needed later, I have chosen to drop.
#drop all null values in the datasets
movie_gross.dropna()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | |
---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000 | 652000000 | 2010 |
1 | Alice in Wonderland (2010) | BV | 334200000 | 691300000 | 2010 |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000 | 664300000 | 2010 |
3 | Inception | WB | 292600000 | 535700000 | 2010 |
4 | Shrek Forever After | P/DW | 238700000 | 513900000 | 2010 |
... | ... | ... | ... | ... | ... |
3275 | I Still See You | LGF | 1400 | 1500000 | 2018 |
3286 | The Catcher Was a Spy | IFC | 725000 | 229000 | 2018 |
3309 | Time Freak | Grindstone | 10000 | 256000 | 2018 |
3342 | Reign of Judges: Title of Liberty - Concept Short | Darin Southa | 93200 | 5200 | 2018 |
3353 | Antonio Lopez 1970: Sex Fashion & Disco | FM | 43200 | 30000 | 2018 |
2033 rows × 5 columns
#checking for any null values to clean
movie_budgets.isna().sum()
id 0
release_date 0
movie 0
production_budget 0
domestic_gross 0
worldwide_gross 0
dtype: int64
#calling movie_budgets for cleaning
#checking for any null values
movie_budgets
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | release_date | movie | production_budget | domestic_gross | worldwide_gross | |
---|---|---|---|---|---|---|
0 | 1 | Dec 18, 2009 | Avatar | $425,000,000 | $760,507,625 | $2,776,345,279 |
1 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | $410,600,000 | $241,063,875 | $1,045,663,875 |
2 | 3 | Jun 7, 2019 | Dark Phoenix | $350,000,000 | $42,762,350 | $149,762,350 |
3 | 4 | May 1, 2015 | Avengers: Age of Ultron | $330,600,000 | $459,005,868 | $1,403,013,963 |
4 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | $317,000,000 | $620,181,382 | $1,316,721,747 |
... | ... | ... | ... | ... | ... | ... |
5777 | 78 | Dec 31, 2018 | Red 11 | $7,000 | $0 | $0 |
5778 | 79 | Apr 2, 1999 | Following | $6,000 | $48,482 | $240,495 |
5779 | 80 | Jul 13, 2005 | Return to the Land of Wonders | $5,000 | $1,338 | $1,338 |
5780 | 81 | Sep 29, 2015 | A Plague So Pleasant | $1,400 | $0 | $0 |
5781 | 82 | Aug 5, 2005 | My Date With Drew | $1,100 | $181,041 | $181,041 |
5782 rows × 6 columns
#In the columns production budget, domestic gross, and worldwide gross,for movie_budgets dataframe replacing $ and , data to integers.
movie_budgets['production_budget'] = movie_budgets['production_budget'].str.replace('$','')
movie_budgets['production_budget'] = movie_budgets['production_budget'].str.replace(',','')
movie_budgets['domestic_gross'] = movie_budgets['domestic_gross'].str.replace('$','')
movie_budgets['domestic_gross'] = movie_budgets['domestic_gross'].str.replace(',','')
movie_budgets['worldwide_gross'] = movie_budgets['worldwide_gross'].str.replace('$','')
movie_budgets['worldwide_gross'] = movie_budgets['worldwide_gross'].str.replace(',','')
movie_budgets
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | release_date | movie | production_budget | domestic_gross | worldwide_gross | |
---|---|---|---|---|---|---|
0 | 1 | Dec 18, 2009 | Avatar | 425000000 | 760507625 | 2776345279 |
1 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 241063875 | 1045663875 |
2 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 42762350 | 149762350 |
3 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 459005868 | 1403013963 |
4 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 620181382 | 1316721747 |
... | ... | ... | ... | ... | ... | ... |
5777 | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 | 0 |
5778 | 79 | Apr 2, 1999 | Following | 6000 | 48482 | 240495 |
5779 | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 | 1338 |
5780 | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 | 0 |
5781 | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 | 181041 |
5782 rows × 6 columns
#calling the next dataset, movie_info
movie_info
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | synopsis | rating | genre | director | writer | theater_date | dvd_date | currency | box_office | runtime | studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | This gritty, fast-paced, and innovative police... | R | Action and Adventure|Classics|Drama | William Friedkin | Ernest Tidyman | Oct 9, 1971 | Sep 25, 2001 | NaN | NaN | 104 minutes | NaN |
1 | 3 | New York City, not-too-distant-future: Eric Pa... | R | Drama|Science Fiction and Fantasy | David Cronenberg | David Cronenberg|Don DeLillo | Aug 17, 2012 | Jan 1, 2013 | $ | 600,000 | 108 minutes | Entertainment One |
2 | 5 | Illeana Douglas delivers a superb performance ... | R | Drama|Musical and Performing Arts | Allison Anders | Allison Anders | Sep 13, 1996 | Apr 18, 2000 | NaN | NaN | 116 minutes | NaN |
3 | 6 | Michael Douglas runs afoul of a treacherous su... | R | Drama|Mystery and Suspense | Barry Levinson | Paul Attanasio|Michael Crichton | Dec 9, 1994 | Aug 27, 1997 | NaN | NaN | 128 minutes | NaN |
4 | 7 | NaN | NR | Drama|Romance | Rodney Bennett | Giles Cooper | NaN | NaN | NaN | NaN | 200 minutes | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1555 | 1996 | Forget terrorists or hijackers -- there's a ha... | R | Action and Adventure|Horror|Mystery and Suspense | NaN | NaN | Aug 18, 2006 | Jan 2, 2007 | $ | 33,886,034 | 106 minutes | New Line Cinema |
1556 | 1997 | The popular Saturday Night Live sketch was exp... | PG | Comedy|Science Fiction and Fantasy | Steve Barron | Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner | Jul 23, 1993 | Apr 17, 2001 | NaN | NaN | 88 minutes | Paramount Vantage |
1557 | 1998 | Based on a novel by Richard Powell, when the l... | G | Classics|Comedy|Drama|Musical and Performing Arts | Gordon Douglas | NaN | Jan 1, 1962 | May 11, 2004 | NaN | NaN | 111 minutes | NaN |
1558 | 1999 | The Sandlot is a coming-of-age story about a g... | PG | Comedy|Drama|Kids and Family|Sports and Fitness | David Mickey Evans | David Mickey Evans|Robert Gunter | Apr 1, 1993 | Jan 29, 2002 | NaN | NaN | 101 minutes | NaN |
1559 | 2000 | Suspended from the force, Paris cop Hubert is ... | R | Action and Adventure|Art House and Internation... | NaN | Luc Besson | Sep 27, 2001 | Feb 11, 2003 | NaN | NaN | 94 minutes | Columbia Pictures |
1560 rows × 12 columns
#show all null values in the datasets
movie_info.isna().sum()
id 0
synopsis 62
rating 3
genre 8
director 199
writer 449
theater_date 359
dvd_date 359
currency 1220
box_office 1220
runtime 30
studio 1066
dtype: int64
#The movie info dataset contains an excessive number of null values.
#synopsis 62,rating 3,genre8,director 199,writer 449,theater_date 359,dvd_date 359,currency 1220,box_office 1220,runtime 30,studio 1066
movie_info.dropna()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | synopsis | rating | genre | director | writer | theater_date | dvd_date | currency | box_office | runtime | studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | New York City, not-too-distant-future: Eric Pa... | R | Drama|Science Fiction and Fantasy | David Cronenberg | David Cronenberg|Don DeLillo | Aug 17, 2012 | Jan 1, 2013 | $ | 600,000 | 108 minutes | Entertainment One |
6 | 10 | Some cast and crew from NBC's highly acclaimed... | PG-13 | Comedy | Jake Kasdan | Mike White | Jan 11, 2002 | Jun 18, 2002 | $ | 41,032,915 | 82 minutes | Paramount Pictures |
7 | 13 | Stewart Kane, an Irishman living in the Austra... | R | Drama | Ray Lawrence | Raymond Carver|Beatrix Christian | Apr 27, 2006 | Oct 2, 2007 | $ | 224,114 | 123 minutes | Sony Pictures Classics |
15 | 22 | Two-time Academy Award Winner Kevin Spacey giv... | R | Comedy|Drama|Mystery and Suspense | George Hickenlooper | Norman Snider | Dec 17, 2010 | Apr 5, 2011 | $ | 1,039,869 | 108 minutes | ATO Pictures |
18 | 25 | From ancient Japan's most enduring tale, the e... | PG-13 | Action and Adventure|Drama|Science Fiction and... | Carl Erik Rinsch | Chris Morgan|Hossein Amini | Dec 25, 2013 | Apr 1, 2014 | $ | 20,518,224 | 127 minutes | Universal Pictures |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1530 | 1968 | This holiday season, acclaimed filmmaker Camer... | PG | Comedy|Drama | Cameron Crowe | Aline Brosh McKenna|Cameron Crowe | Dec 23, 2011 | Apr 3, 2012 | $ | 72,700,000 | 126 minutes | 20th Century Fox |
1537 | 1976 | Embrace of the Serpent features the encounter,... | NR | Action and Adventure|Art House and International | Ciro Guerra | Ciro Guerra|Jacques Toulemonde Vidal | Feb 17, 2016 | Jun 21, 2016 | $ | 1,320,005 | 123 minutes | Buffalo Films |
1541 | 1980 | A band of renegades on the run in outer space ... | PG-13 | Action and Adventure|Science Fiction and Fantasy | Joss Whedon | Joss Whedon | Sep 30, 2005 | Dec 20, 2005 | $ | 25,335,935 | 119 minutes | Universal Pictures |
1542 | 1981 | Money, Fame and the Knowledge of English. In I... | NR | Comedy|Drama | Gauri Shinde | Gauri Shinde | Oct 5, 2012 | Nov 20, 2012 | $ | 1,416,189 | 129 minutes | Eros Entertainment |
1545 | 1985 | A woman who joins the undead against her will ... | R | Horror|Mystery and Suspense | Sebastian Gutierrez | Sebastian Gutierrez | Jun 1, 2007 | Oct 9, 2007 | $ | 59,371 | 98 minutes | IDP Distribution |
235 rows × 12 columns
#cleaning data in rt_reviews
#call rt_reviews
rt_reviews
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | |
---|---|---|---|---|---|---|---|---|
0 | 3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 |
1 | 3 | It's an allegory in search of a meaning that n... | NaN | rotten | Annalee Newitz | 0 | io9.com | May 23, 2018 |
2 | 3 | ... life lived in a bubble in financial dealin... | NaN | fresh | Sean Axmaker | 0 | Stream on Demand | January 4, 2018 |
3 | 3 | Continuing along a line introduced in last yea... | NaN | fresh | Daniel Kasman | 0 | MUBI | November 16, 2017 |
4 | 3 | ... a perverse twist on neorealism... | NaN | fresh | NaN | 0 | Cinema Scope | October 12, 2017 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
54427 | 2000 | The real charm of this trifle is the deadpan c... | NaN | fresh | Laura Sinagra | 1 | Village Voice | September 24, 2002 |
54428 | 2000 | NaN | 1/5 | rotten | Michael Szymanski | 0 | Zap2it.com | September 21, 2005 |
54429 | 2000 | NaN | 2/5 | rotten | Emanuel Levy | 0 | EmanuelLevy.Com | July 17, 2005 |
54430 | 2000 | NaN | 2.5/5 | rotten | Christopher Null | 0 | Filmcritic.com | September 7, 2003 |
54431 | 2000 | NaN | 3/5 | fresh | Nicolas Lacroix | 0 | Showbizz.net | November 12, 2002 |
54432 rows × 8 columns
#checking for the sum of null values in this dataset
rt_reviews.isna().sum()
id 0
review 5563
rating 13517
fresh 0
critic 2722
top_critic 0
publisher 309
date 0
dtype: int64
#rt_reviews has too many null values
#review has 5563, rating 13517, critic 2722 and publisher has 309 null values
# drop all null values
rt_reviews.dropna()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | |
---|---|---|---|---|---|---|---|---|
0 | 3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 |
6 | 3 | Quickly grows repetitive and tiresome, meander... | C | rotten | Eric D. Snider | 0 | EricDSnider.com | July 17, 2013 |
7 | 3 | Cronenberg is not a director to be daunted by ... | 2/5 | rotten | Matt Kelemen | 0 | Las Vegas CityLife | April 21, 2013 |
11 | 3 | While not one of Cronenberg's stronger films, ... | B- | fresh | Emanuel Levy | 0 | EmanuelLevy.Com | February 3, 2013 |
12 | 3 | Robert Pattinson works mighty hard to make Cos... | 2/4 | rotten | Christian Toto | 0 | Big Hollywood | January 15, 2013 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
54419 | 2000 | Sleek, shallow, but frequently amusing. | 2.5/4 | fresh | Gene Seymour | 1 | Newsday | September 27, 2002 |
54420 | 2000 | The spaniel-eyed Jean Reno infuses Hubert with... | 3/4 | fresh | Megan Turner | 1 | New York Post | September 27, 2002 |
54421 | 2000 | Manages to be somewhat well-acted, not badly a... | 1.5/4 | rotten | Bob Strauss | 0 | Los Angeles Daily News | September 27, 2002 |
54422 | 2000 | Arguably the best script that Besson has writt... | 3.5/5 | fresh | Wade Major | 0 | Boxoffice Magazine | September 27, 2002 |
54424 | 2000 | Dawdles and drags when it should pop; it doesn... | 1.5/5 | rotten | Manohla Dargis | 1 | Los Angeles Times | September 26, 2002 |
33988 rows × 8 columns
#checking for any NaN values
movie_ratings.isna().sum()
movie_id 0
averagerating 0
numvotes 0
dtype: int64
#checking for any null values
movie_basics.isna().sum()
movie_id 0
primary_title 0
original_title 21
start_year 0
runtime_minutes 31739
genres 5408
dtype: int64
#checking for null values
movie_akas.isna().sum()
movie_id 0
ordering 0
title 0
region 53293
language 289988
types 163256
attributes 316778
is_original_title 25
dtype: int64
#eraaneous null values have been found in movie_akas
#replacing null values with 0
movie_akas.fillna(0, inplace= True)
movie_akas
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
movie_id | ordering | title | region | language | types | attributes | is_original_title | |
---|---|---|---|---|---|---|---|---|
0 | tt0369610 | 10 | Джурасик свят | BG | bg | 0 | 0 | 0.0 |
1 | tt0369610 | 11 | Jurashikku warudo | JP | 0 | imdbDisplay | 0 | 0.0 |
2 | tt0369610 | 12 | Jurassic World: O Mundo dos Dinossauros | BR | 0 | imdbDisplay | 0 | 0.0 |
3 | tt0369610 | 13 | O Mundo dos Dinossauros | BR | 0 | 0 | short title | 0.0 |
4 | tt0369610 | 14 | Jurassic World | FR | 0 | imdbDisplay | 0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
331698 | tt9827784 | 2 | Sayonara kuchibiru | 0 | 0 | original | 0 | 1.0 |
331699 | tt9827784 | 3 | Farewell Song | XWW | en | imdbDisplay | 0 | 0.0 |
331700 | tt9880178 | 1 | La atención | 0 | 0 | original | 0 | 1.0 |
331701 | tt9880178 | 2 | La atención | ES | 0 | 0 | 0 | 0.0 |
331702 | tt9880178 | 3 | The Attention | XWW | en | imdbDisplay | 0 | 0.0 |
331703 rows × 8 columns
#setting index of the dataframe
movie_ratings.set_index("movie_id")
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
averagerating | numvotes | |
---|---|---|
movie_id | ||
tt10356526 | 8.3 | 31 |
tt10384606 | 8.9 | 559 |
tt1042974 | 6.4 | 20 |
tt1043726 | 4.2 | 50352 |
tt1060240 | 6.5 | 21 |
tt1069246 | 6.2 | 326 |
tt1094666 | 7.0 | 1613 |
tt1130982 | 6.4 | 571 |
tt1156528 | 7.2 | 265 |
tt1161457 | 4.2 | 148 |
#setting index for movies_basics
movie_basics.set_index("movie_id")
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|
movie_id | |||||
tt0063540 | Sunghursh | Sunghursh | 2013 | 175.0 | Action,Crime,Drama |
tt0066787 | One Day Before the Rainy Season | Ashad Ka Ek Din | 2019 | 114.0 | Biography,Drama |
tt0069049 | The Other Side of the Wind | The Other Side of the Wind | 2018 | 122.0 | Drama |
tt0069204 | Sabse Bada Sukh | Sabse Bada Sukh | 2018 | NaN | Comedy,Drama |
tt0100275 | The Wandering Soap Opera | La Telenovela Errante | 2017 | 80.0 | Comedy,Drama,Fantasy |
... | ... | ... | ... | ... | ... |
tt9916538 | Kuambil Lagi Hatiku | Kuambil Lagi Hatiku | 2019 | 123.0 | Drama |
tt9916622 | Rodolpho Teóphilo - O Legado de um Pioneiro | Rodolpho Teóphilo - O Legado de um Pioneiro | 2015 | NaN | Documentary |
tt9916706 | Dankyavar Danka | Dankyavar Danka | 2013 | NaN | Comedy |
tt9916730 | 6 Gunn | 6 Gunn | 2017 | 116.0 | None |
tt9916754 | Chico Albuquerque - Revelações | Chico Albuquerque - Revelações | 2013 | NaN | Documentary |
146144 rows × 5 columns
#mergin movie_basics and movie_ratings
#call new table basics_and_ratings
basics_and_ratings = movie_ratings.merge(movie_basics, on = 'movie_id', how = 'inner')
basics_and_ratings
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|
0 | tt10356526 | 8.3 | 31 | Laiye Je Yaarian | Laiye Je Yaarian | 2019 | 117.0 | Romance |
1 | tt10384606 | 8.9 | 559 | Borderless | Borderless | 2019 | 87.0 | Documentary |
2 | tt1042974 | 6.4 | 20 | Just Inès | Just Inès | 2010 | 90.0 | Drama |
3 | tt1043726 | 4.2 | 50352 | The Legend of Hercules | The Legend of Hercules | 2014 | 99.0 | Action,Adventure,Fantasy |
4 | tt1060240 | 6.5 | 21 | Até Onde? | Até Onde? | 2011 | 73.0 | Mystery,Thriller |
5 | tt1069246 | 6.2 | 326 | Habana Eva | Habana Eva | 2010 | 106.0 | Comedy,Romance |
6 | tt1094666 | 7.0 | 1613 | The Hammer | Hamill | 2010 | 108.0 | Biography,Drama,Sport |
7 | tt1130982 | 6.4 | 571 | The Night Clerk | Avant l'aube | 2011 | 104.0 | Drama,Thriller |
8 | tt1156528 | 7.2 | 265 | Silent Sonata | Circus Fantasticus | 2011 | 77.0 | Drama,War |
9 | tt1161457 | 4.2 | 148 | Vanquisher | The Vanquisher | 2016 | 90.0 | Action,Adventure,Sci-Fi |
movie_akas.set_index('movie_id')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ordering | title | region | language | types | attributes | is_original_title | |
---|---|---|---|---|---|---|---|
movie_id | |||||||
tt0369610 | 10 | Джурасик свят | BG | bg | 0 | 0 | 0.0 |
tt0369610 | 11 | Jurashikku warudo | JP | 0 | imdbDisplay | 0 | 0.0 |
tt0369610 | 12 | Jurassic World: O Mundo dos Dinossauros | BR | 0 | imdbDisplay | 0 | 0.0 |
tt0369610 | 13 | O Mundo dos Dinossauros | BR | 0 | 0 | short title | 0.0 |
tt0369610 | 14 | Jurassic World | FR | 0 | imdbDisplay | 0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... |
tt9827784 | 2 | Sayonara kuchibiru | 0 | 0 | original | 0 | 1.0 |
tt9827784 | 3 | Farewell Song | XWW | en | imdbDisplay | 0 | 0.0 |
tt9880178 | 1 | La atención | 0 | 0 | original | 0 | 1.0 |
tt9880178 | 2 | La atención | ES | 0 | 0 | 0 | 0.0 |
tt9880178 | 3 | The Attention | XWW | en | imdbDisplay | 0 | 0.0 |
331703 rows × 7 columns
#merging basics_and_ratings & movie_akas
b_r_akas = basics_and_ratings.merge(movie_akas, on = 'movie_id', how= 'inner')
b_r_akas
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | ordering | title | region | language | types | attributes | is_original_title | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | tt1042974 | 6.4 | 20 | Just Inès | Just Inès | 2010 | 90.0 | Drama | 1 | Just Inès | 0 | 0 | original | 0 | 1.0 |
1 | tt1042974 | 6.4 | 20 | Just Inès | Just Inès | 2010 | 90.0 | Drama | 2 | Samo Ines | RS | 0 | imdbDisplay | 0 | 0.0 |
2 | tt1042974 | 6.4 | 20 | Just Inès | Just Inès | 2010 | 90.0 | Drama | 3 | Just Inès | GB | 0 | 0 | 0 | 0.0 |
3 | tt1043726 | 4.2 | 50352 | The Legend of Hercules | The Legend of Hercules | 2014 | 99.0 | Action,Adventure,Fantasy | 10 | The Legend of Hercules | 0 | 0 | original | 0 | 1.0 |
4 | tt1043726 | 4.2 | 50352 | The Legend of Hercules | The Legend of Hercules | 2014 | 99.0 | Action,Adventure,Fantasy | 11 | Hércules - A Lenda Começa | PT | 0 | imdbDisplay | 0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
61 | tt1156528 | 7.2 | 265 | Silent Sonata | Circus Fantasticus | 2011 | 77.0 | Drama,War | 7 | Circus Fantasticus | 0 | 0 | original | 0 | 1.0 |
62 | tt1156528 | 7.2 | 265 | Silent Sonata | Circus Fantasticus | 2011 | 77.0 | Drama,War | 8 | Circus Fantasticus | FI | sv | imdbDisplay | 0 | 0.0 |
63 | tt1156528 | 7.2 | 265 | Silent Sonata | Circus Fantasticus | 2011 | 77.0 | Drama,War | 9 | Соната без думи | BG | bg | 0 | 0 | 0.0 |
64 | tt1161457 | 4.2 | 148 | Vanquisher | The Vanquisher | 2016 | 90.0 | Action,Adventure,Sci-Fi | 1 | Vanquisher | US | 0 | 0 | new title | 0.0 |
65 | tt1161457 | 4.2 | 148 | Vanquisher | The Vanquisher | 2016 | 90.0 | Action,Adventure,Sci-Fi | 2 | The Vanquisher | 0 | 0 | original | 0 | 1.0 |
66 rows × 15 columns
#setting index for movie_budgets
movie_budgets.set_index('domestic_gross','production_budget')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | release_date | movie | production_budget | worldwide_gross | |
---|---|---|---|---|---|
domestic_gross | |||||
760507625 | 1 | Dec 18, 2009 | Avatar | 425000000 | 2776345279 |
241063875 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 1045663875 |
42762350 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 149762350 |
459005868 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 1403013963 |
620181382 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 1316721747 |
... | ... | ... | ... | ... | ... |
0 | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 |
48482 | 79 | Apr 2, 1999 | Following | 6000 | 240495 |
1338 | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 |
0 | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 |
181041 | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 |
5782 rows × 5 columns
#setting index for movie_gross
movie_gross.set_index('domestic_gross', 'production_budget')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | foreign_gross | year | |
---|---|---|---|---|
domestic_gross | ||||
415000000 | Toy Story 3 | BV | 652000000 | 2010 |
334200000 | Alice in Wonderland (2010) | BV | 691300000 | 2010 |
296000000 | Harry Potter and the Deathly Hallows Part 1 | WB | 664300000 | 2010 |
292600000 | Inception | WB | 535700000 | 2010 |
238700000 | Shrek Forever After | P/DW | 513900000 | 2010 |
... | ... | ... | ... | ... |
6200 | The Quake | Magn. | NaN | 2018 |
4800 | Edward II (2018 re-release) | FM | NaN | 2018 |
2500 | El Pacto | Sony | NaN | 2018 |
2400 | The Swan | Synergetic | NaN | 2018 |
1700 | An Actor Prepares | Grav. | NaN | 2018 |
3387 rows × 4 columns
#merging tables to access data based on the logical relationships between them
#merging the movie_basics and movie_ratings
#call new table ratings_basics
joined_gross_budget = pd.concat([movie_gross,movie_budgets], axis=1)
joined_gross_budget
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | id | release_date | movie | production_budget | domestic_gross | worldwide_gross | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010.0 | 1 | Dec 18, 2009 | Avatar | 425000000 | 760507625 | 2776345279 |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010.0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 241063875 | 1045663875 |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010.0 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 42762350 | 149762350 |
3 | Inception | WB | 292600000.0 | 535700000 | 2010.0 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 459005868 | 1403013963 |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010.0 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 620181382 | 1316721747 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5777 | NaN | NaN | NaN | NaN | NaN | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 | 0 |
5778 | NaN | NaN | NaN | NaN | NaN | 79 | Apr 2, 1999 | Following | 6000 | 48482 | 240495 |
5779 | NaN | NaN | NaN | NaN | NaN | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 | 1338 |
5780 | NaN | NaN | NaN | NaN | NaN | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 | 0 |
5781 | NaN | NaN | NaN | NaN | NaN | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 | 181041 |
5782 rows × 11 columns
#merging joined_gross_budget,b_r_akas
akas_gross = pd.concat([joined_gross_budget,b_r_akas], axis=1)
akas_gross
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | id | release_date | movie | production_budget | domestic_gross | ... | start_year | runtime_minutes | genres | ordering | title | region | language | types | attributes | is_original_title | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010.0 | 1 | Dec 18, 2009 | Avatar | 425000000 | 760507625 | ... | 2010.0 | 90.0 | Drama | 1.0 | Just Inès | 0 | 0 | original | 0 | 1.0 |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010.0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 241063875 | ... | 2010.0 | 90.0 | Drama | 2.0 | Samo Ines | RS | 0 | imdbDisplay | 0 | 0.0 |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010.0 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 42762350 | ... | 2010.0 | 90.0 | Drama | 3.0 | Just Inès | GB | 0 | 0 | 0 | 0.0 |
3 | Inception | WB | 292600000.0 | 535700000 | 2010.0 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 459005868 | ... | 2014.0 | 99.0 | Action,Adventure,Fantasy | 10.0 | The Legend of Hercules | 0 | 0 | original | 0 | 1.0 |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010.0 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 620181382 | ... | 2014.0 | 99.0 | Action,Adventure,Fantasy | 11.0 | Hércules - A Lenda Começa | PT | 0 | imdbDisplay | 0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5777 | NaN | NaN | NaN | NaN | NaN | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5778 | NaN | NaN | NaN | NaN | NaN | 79 | Apr 2, 1999 | Following | 6000 | 48482 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5779 | NaN | NaN | NaN | NaN | NaN | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5780 | NaN | NaN | NaN | NaN | NaN | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5781 | NaN | NaN | NaN | NaN | NaN | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5782 rows × 26 columns
#setting index for rt_reviews dataframe
rt_reviews.set_index('id')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
review | rating | fresh | critic | top_critic | publisher | date | |
---|---|---|---|---|---|---|---|
id | |||||||
3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 |
3 | It's an allegory in search of a meaning that n... | NaN | rotten | Annalee Newitz | 0 | io9.com | May 23, 2018 |
3 | ... life lived in a bubble in financial dealin... | NaN | fresh | Sean Axmaker | 0 | Stream on Demand | January 4, 2018 |
3 | Continuing along a line introduced in last yea... | NaN | fresh | Daniel Kasman | 0 | MUBI | November 16, 2017 |
3 | ... a perverse twist on neorealism... | NaN | fresh | NaN | 0 | Cinema Scope | October 12, 2017 |
... | ... | ... | ... | ... | ... | ... | ... |
2000 | The real charm of this trifle is the deadpan c... | NaN | fresh | Laura Sinagra | 1 | Village Voice | September 24, 2002 |
2000 | NaN | 1/5 | rotten | Michael Szymanski | 0 | Zap2it.com | September 21, 2005 |
2000 | NaN | 2/5 | rotten | Emanuel Levy | 0 | EmanuelLevy.Com | July 17, 2005 |
2000 | NaN | 2.5/5 | rotten | Christopher Null | 0 | Filmcritic.com | September 7, 2003 |
2000 | NaN | 3/5 | fresh | Nicolas Lacroix | 0 | Showbizz.net | November 12, 2002 |
54432 rows × 7 columns
#setting index for movie_info
movie_info.set_index('id')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
synopsis | rating | genre | director | writer | theater_date | dvd_date | currency | box_office | runtime | studio | |
---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||
1 | This gritty, fast-paced, and innovative police... | R | Action and Adventure|Classics|Drama | William Friedkin | Ernest Tidyman | Oct 9, 1971 | Sep 25, 2001 | NaN | NaN | 104 minutes | NaN |
3 | New York City, not-too-distant-future: Eric Pa... | R | Drama|Science Fiction and Fantasy | David Cronenberg | David Cronenberg|Don DeLillo | Aug 17, 2012 | Jan 1, 2013 | $ | 600,000 | 108 minutes | Entertainment One |
5 | Illeana Douglas delivers a superb performance ... | R | Drama|Musical and Performing Arts | Allison Anders | Allison Anders | Sep 13, 1996 | Apr 18, 2000 | NaN | NaN | 116 minutes | NaN |
6 | Michael Douglas runs afoul of a treacherous su... | R | Drama|Mystery and Suspense | Barry Levinson | Paul Attanasio|Michael Crichton | Dec 9, 1994 | Aug 27, 1997 | NaN | NaN | 128 minutes | NaN |
7 | NaN | NR | Drama|Romance | Rodney Bennett | Giles Cooper | NaN | NaN | NaN | NaN | 200 minutes | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1996 | Forget terrorists or hijackers -- there's a ha... | R | Action and Adventure|Horror|Mystery and Suspense | NaN | NaN | Aug 18, 2006 | Jan 2, 2007 | $ | 33,886,034 | 106 minutes | New Line Cinema |
1997 | The popular Saturday Night Live sketch was exp... | PG | Comedy|Science Fiction and Fantasy | Steve Barron | Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner | Jul 23, 1993 | Apr 17, 2001 | NaN | NaN | 88 minutes | Paramount Vantage |
1998 | Based on a novel by Richard Powell, when the l... | G | Classics|Comedy|Drama|Musical and Performing Arts | Gordon Douglas | NaN | Jan 1, 1962 | May 11, 2004 | NaN | NaN | 111 minutes | NaN |
1999 | The Sandlot is a coming-of-age story about a g... | PG | Comedy|Drama|Kids and Family|Sports and Fitness | David Mickey Evans | David Mickey Evans|Robert Gunter | Apr 1, 1993 | Jan 29, 2002 | NaN | NaN | 101 minutes | NaN |
2000 | Suspended from the force, Paris cop Hubert is ... | R | Action and Adventure|Art House and Internation... | NaN | Luc Besson | Sep 27, 2001 | Feb 11, 2003 | NaN | NaN | 94 minutes | Columbia Pictures |
1560 rows × 11 columns
Using .dropna() in the merged datasets. The dataframes have crossed the threshold of null values, thus dropping.
#merging rt_reviews and movie_info datasets
reviews_info = pd.concat([rt_reviews,movie_info], axis=1)
reviews_info
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | id | synopsis | rating | genre | director | writer | theater_date | dvd_date | currency | box_office | runtime | studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 | 1.0 | This gritty, fast-paced, and innovative police... | R | Action and Adventure|Classics|Drama | William Friedkin | Ernest Tidyman | Oct 9, 1971 | Sep 25, 2001 | NaN | NaN | 104 minutes | NaN |
1 | 3 | It's an allegory in search of a meaning that n... | NaN | rotten | Annalee Newitz | 0 | io9.com | May 23, 2018 | 3.0 | New York City, not-too-distant-future: Eric Pa... | R | Drama|Science Fiction and Fantasy | David Cronenberg | David Cronenberg|Don DeLillo | Aug 17, 2012 | Jan 1, 2013 | $ | 600,000 | 108 minutes | Entertainment One |
2 | 3 | ... life lived in a bubble in financial dealin... | NaN | fresh | Sean Axmaker | 0 | Stream on Demand | January 4, 2018 | 5.0 | Illeana Douglas delivers a superb performance ... | R | Drama|Musical and Performing Arts | Allison Anders | Allison Anders | Sep 13, 1996 | Apr 18, 2000 | NaN | NaN | 116 minutes | NaN |
3 | 3 | Continuing along a line introduced in last yea... | NaN | fresh | Daniel Kasman | 0 | MUBI | November 16, 2017 | 6.0 | Michael Douglas runs afoul of a treacherous su... | R | Drama|Mystery and Suspense | Barry Levinson | Paul Attanasio|Michael Crichton | Dec 9, 1994 | Aug 27, 1997 | NaN | NaN | 128 minutes | NaN |
4 | 3 | ... a perverse twist on neorealism... | NaN | fresh | NaN | 0 | Cinema Scope | October 12, 2017 | 7.0 | NaN | NR | Drama|Romance | Rodney Bennett | Giles Cooper | NaN | NaN | NaN | NaN | 200 minutes | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
54427 | 2000 | The real charm of this trifle is the deadpan c... | NaN | fresh | Laura Sinagra | 1 | Village Voice | September 24, 2002 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54428 | 2000 | NaN | 1/5 | rotten | Michael Szymanski | 0 | Zap2it.com | September 21, 2005 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54429 | 2000 | NaN | 2/5 | rotten | Emanuel Levy | 0 | EmanuelLevy.Com | July 17, 2005 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54430 | 2000 | NaN | 2.5/5 | rotten | Christopher Null | 0 | Filmcritic.com | September 7, 2003 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54431 | 2000 | NaN | 3/5 | fresh | Nicolas Lacroix | 0 | Showbizz.net | November 12, 2002 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54432 rows × 20 columns
#merged tthe two dataframes
#drop all the NaN values a
reviews_info.dropna()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | id | synopsis | rating | genre | director | writer | theater_date | dvd_date | currency | box_office | runtime | studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 3 | Quickly grows repetitive and tiresome, meander... | C | rotten | Eric D. Snider | 0 | EricDSnider.com | July 17, 2013 | 10.0 | Some cast and crew from NBC's highly acclaimed... | PG-13 | Comedy | Jake Kasdan | Mike White | Jan 11, 2002 | Jun 18, 2002 | $ | 41,032,915 | 82 minutes | Paramount Pictures |
7 | 3 | Cronenberg is not a director to be daunted by ... | 2/5 | rotten | Matt Kelemen | 0 | Las Vegas CityLife | April 21, 2013 | 13.0 | Stewart Kane, an Irishman living in the Austra... | R | Drama | Ray Lawrence | Raymond Carver|Beatrix Christian | Apr 27, 2006 | Oct 2, 2007 | $ | 224,114 | 123 minutes | Sony Pictures Classics |
15 | 3 | For better or worse - often both - Cosmopolis ... | 3/5 | fresh | Adam Ross | 0 | The Aristocrat | September 27, 2012 | 22.0 | Two-time Academy Award Winner Kevin Spacey giv... | R | Comedy|Drama|Mystery and Suspense | George Hickenlooper | Norman Snider | Dec 17, 2010 | Apr 5, 2011 | $ | 1,039,869 | 108 minutes | ATO Pictures |
18 | 3 | It's fascinating to watch Pattinson actually a... | 2/4 | rotten | Sean P. Means | 0 | Salt Lake Tribune | September 14, 2012 | 25.0 | From ancient Japan's most enduring tale, the e... | PG-13 | Action and Adventure|Drama|Science Fiction and... | Carl Erik Rinsch | Chris Morgan|Hossein Amini | Dec 25, 2013 | Apr 1, 2014 | $ | 20,518,224 | 127 minutes | Universal Pictures |
19 | 3 | A black comedy as dry and deadpan as a bleache... | 4/4 | fresh | John Beifuss | 0 | Commercial Appeal (Memphis, TN) | September 10, 2012 | 26.0 | A comic series of short vignettes build on one... | R | Art House and International|Comedy|Drama|Music... | Jim Jarmusch | Jim Jarmusch | May 14, 2004 | Sep 21, 2004 | $ | 1,971,135 | 96 minutes | MGM |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1511 | 45 | Hello, Deedles. Terrible to meet you. | 1/5 | rotten | Scott Weinberg | 0 | eFilmCritic.com | July 29, 2002 | 1945.0 | Left on a nun's doorstep, Larry, Curly and Moe... | PG | Comedy | Bobby Farrelly|Peter Farrelly | Bobby Farrelly|Peter Farrelly|Mike Cerrone | Apr 13, 2012 | Jul 17, 2012 | $ | 41,800,000 | 92 minutes | 20th Century Fox |
1518 | 45 | Steve Van Wormer and Paul Walker, as Stew and ... | 0/4 | rotten | Steve Rhodes | 0 | Internet Reviews | January 1, 2000 | 1953.0 | A glimpse into the comedic process and private... | R | Comedy|Documentary|Television | Ricki Stern|Anne Sundberg | Ricki Stern | Jun 11, 2010 | Dec 14, 2010 | $ | 2,927,972 | 84 minutes | IFC Films |
1537 | 46 | Leaves the audience smiling and giggling, all ... | 3/4 | fresh | Michael Dequina | 0 | TheMovieReport.com | March 8, 2009 | 1976.0 | Embrace of the Serpent features the encounter,... | NR | Action and Adventure|Art House and International | Ciro Guerra | Ciro Guerra|Jacques Toulemonde Vidal | Feb 17, 2016 | Jun 21, 2016 | $ | 1,320,005 | 123 minutes | Buffalo Films |
1541 | 46 | The briskly paced, high-spirited movie is comp... | 3.5/4 | fresh | Judith Egerton | 0 | Courier-Journal (Louisville, KY) | June 25, 2004 | 1980.0 | A band of renegades on the run in outer space ... | PG-13 | Action and Adventure|Science Fiction and Fantasy | Joss Whedon | Joss Whedon | Sep 30, 2005 | Dec 20, 2005 | $ | 25,335,935 | 119 minutes | Universal Pictures |
1545 | 46 | It's a familiar show-biz routine but one that'... | 3.5/4 | fresh | Susan Wloszczyna | 1 | USA Today | January 1, 2000 | 1985.0 | A woman who joins the undead against her will ... | R | Horror|Mystery and Suspense | Sebastian Gutierrez | Sebastian Gutierrez | Jun 1, 2007 | Oct 9, 2007 | $ | 59,371 | 98 minutes | IDP Distribution |
148 rows × 20 columns
#merge the joined_gross_budget with basics_and_ratings
budget_ratings = pd.concat([joined_gross_budget,basics_and_ratings], axis=1)
budget_ratings
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | id | release_date | movie | production_budget | domestic_gross | worldwide_gross | movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010.0 | 1 | Dec 18, 2009 | Avatar | 425000000 | 760507625 | 2776345279 | tt10356526 | 8.3 | 31.0 | Laiye Je Yaarian | Laiye Je Yaarian | 2019.0 | 117.0 | Romance |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010.0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 241063875 | 1045663875 | tt10384606 | 8.9 | 559.0 | Borderless | Borderless | 2019.0 | 87.0 | Documentary |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010.0 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 42762350 | 149762350 | tt1042974 | 6.4 | 20.0 | Just Inès | Just Inès | 2010.0 | 90.0 | Drama |
3 | Inception | WB | 292600000.0 | 535700000 | 2010.0 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 459005868 | 1403013963 | tt1043726 | 4.2 | 50352.0 | The Legend of Hercules | The Legend of Hercules | 2014.0 | 99.0 | Action,Adventure,Fantasy |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010.0 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 620181382 | 1316721747 | tt1060240 | 6.5 | 21.0 | Até Onde? | Até Onde? | 2011.0 | 73.0 | Mystery,Thriller |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5777 | NaN | NaN | NaN | NaN | NaN | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5778 | NaN | NaN | NaN | NaN | NaN | 79 | Apr 2, 1999 | Following | 6000 | 48482 | 240495 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5779 | NaN | NaN | NaN | NaN | NaN | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 | 1338 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5780 | NaN | NaN | NaN | NaN | NaN | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5781 | NaN | NaN | NaN | NaN | NaN | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 | 181041 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5782 rows × 19 columns
#drop the null values.
budget_ratings.fillna(0, inplace = True)
budget_ratings
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | id | release_date | movie | production_budget | domestic_gross | worldwide_gross | movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010.0 | 1 | Dec 18, 2009 | Avatar | 425000000 | 760507625 | 2776345279 | tt10356526 | 8.3 | 31.0 | Laiye Je Yaarian | Laiye Je Yaarian | 2019.0 | 117.0 | Romance |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010.0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 241063875 | 1045663875 | tt10384606 | 8.9 | 559.0 | Borderless | Borderless | 2019.0 | 87.0 | Documentary |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010.0 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 42762350 | 149762350 | tt1042974 | 6.4 | 20.0 | Just Inès | Just Inès | 2010.0 | 90.0 | Drama |
3 | Inception | WB | 292600000.0 | 535700000 | 2010.0 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 459005868 | 1403013963 | tt1043726 | 4.2 | 50352.0 | The Legend of Hercules | The Legend of Hercules | 2014.0 | 99.0 | Action,Adventure,Fantasy |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010.0 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 620181382 | 1316721747 | tt1060240 | 6.5 | 21.0 | Até Onde? | Até Onde? | 2011.0 | 73.0 | Mystery,Thriller |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5777 | 0 | 0 | 0.0 | 0 | 0.0 | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5778 | 0 | 0 | 0.0 | 0 | 0.0 | 79 | Apr 2, 1999 | Following | 6000 | 48482 | 240495 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5779 | 0 | 0 | 0.0 | 0 | 0.0 | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 | 1338 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5780 | 0 | 0 | 0.0 | 0 | 0.0 | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5781 | 0 | 0 | 0.0 | 0 | 0.0 | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 | 181041 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5782 rows × 19 columns
#merging all the dataframes
#merging akas_gross, budgets_ratings
budget_ratings_akas = pd.concat([reviews_info,budget_ratings], axis=1)
budget_ratings_akas
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | id | synopsis | ... | domestic_gross | worldwide_gross | movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 | 1.0 | This gritty, fast-paced, and innovative police... | ... | 760507625 | 2776345279 | tt10356526 | 8.3 | 31.0 | Laiye Je Yaarian | Laiye Je Yaarian | 2019.0 | 117.0 | Romance |
1 | 3 | It's an allegory in search of a meaning that n... | NaN | rotten | Annalee Newitz | 0 | io9.com | May 23, 2018 | 3.0 | New York City, not-too-distant-future: Eric Pa... | ... | 241063875 | 1045663875 | tt10384606 | 8.9 | 559.0 | Borderless | Borderless | 2019.0 | 87.0 | Documentary |
2 | 3 | ... life lived in a bubble in financial dealin... | NaN | fresh | Sean Axmaker | 0 | Stream on Demand | January 4, 2018 | 5.0 | Illeana Douglas delivers a superb performance ... | ... | 42762350 | 149762350 | tt1042974 | 6.4 | 20.0 | Just Inès | Just Inès | 2010.0 | 90.0 | Drama |
3 | 3 | Continuing along a line introduced in last yea... | NaN | fresh | Daniel Kasman | 0 | MUBI | November 16, 2017 | 6.0 | Michael Douglas runs afoul of a treacherous su... | ... | 459005868 | 1403013963 | tt1043726 | 4.2 | 50352.0 | The Legend of Hercules | The Legend of Hercules | 2014.0 | 99.0 | Action,Adventure,Fantasy |
4 | 3 | ... a perverse twist on neorealism... | NaN | fresh | NaN | 0 | Cinema Scope | October 12, 2017 | 7.0 | NaN | ... | 620181382 | 1316721747 | tt1060240 | 6.5 | 21.0 | Até Onde? | Até Onde? | 2011.0 | 73.0 | Mystery,Thriller |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
54427 | 2000 | The real charm of this trifle is the deadpan c... | NaN | fresh | Laura Sinagra | 1 | Village Voice | September 24, 2002 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54428 | 2000 | NaN | 1/5 | rotten | Michael Szymanski | 0 | Zap2it.com | September 21, 2005 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54429 | 2000 | NaN | 2/5 | rotten | Emanuel Levy | 0 | EmanuelLevy.Com | July 17, 2005 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54430 | 2000 | NaN | 2.5/5 | rotten | Christopher Null | 0 | Filmcritic.com | September 7, 2003 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54431 | 2000 | NaN | 3/5 | fresh | Nicolas Lacroix | 0 | Showbizz.net | November 12, 2002 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
54432 rows × 39 columns
#dropping null values
budget_ratings_akas.fillna(0, inplace = True)
budget_ratings_akas
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | id | synopsis | ... | domestic_gross | worldwide_gross | movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 | 1.0 | This gritty, fast-paced, and innovative police... | ... | 760507625 | 2776345279 | tt10356526 | 8.3 | 31.0 | Laiye Je Yaarian | Laiye Je Yaarian | 2019.0 | 117.0 | Romance |
1 | 3 | It's an allegory in search of a meaning that n... | 0 | rotten | Annalee Newitz | 0 | io9.com | May 23, 2018 | 3.0 | New York City, not-too-distant-future: Eric Pa... | ... | 241063875 | 1045663875 | tt10384606 | 8.9 | 559.0 | Borderless | Borderless | 2019.0 | 87.0 | Documentary |
2 | 3 | ... life lived in a bubble in financial dealin... | 0 | fresh | Sean Axmaker | 0 | Stream on Demand | January 4, 2018 | 5.0 | Illeana Douglas delivers a superb performance ... | ... | 42762350 | 149762350 | tt1042974 | 6.4 | 20.0 | Just Inès | Just Inès | 2010.0 | 90.0 | Drama |
3 | 3 | Continuing along a line introduced in last yea... | 0 | fresh | Daniel Kasman | 0 | MUBI | November 16, 2017 | 6.0 | Michael Douglas runs afoul of a treacherous su... | ... | 459005868 | 1403013963 | tt1043726 | 4.2 | 50352.0 | The Legend of Hercules | The Legend of Hercules | 2014.0 | 99.0 | Action,Adventure,Fantasy |
4 | 3 | ... a perverse twist on neorealism... | 0 | fresh | 0 | 0 | Cinema Scope | October 12, 2017 | 7.0 | 0 | ... | 620181382 | 1316721747 | tt1060240 | 6.5 | 21.0 | Até Onde? | Até Onde? | 2011.0 | 73.0 | Mystery,Thriller |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
54427 | 2000 | The real charm of this trifle is the deadpan c... | 0 | fresh | Laura Sinagra | 1 | Village Voice | September 24, 2002 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54428 | 2000 | 0 | 1/5 | rotten | Michael Szymanski | 0 | Zap2it.com | September 21, 2005 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54429 | 2000 | 0 | 2/5 | rotten | Emanuel Levy | 0 | EmanuelLevy.Com | July 17, 2005 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54430 | 2000 | 0 | 2.5/5 | rotten | Christopher Null | 0 | Filmcritic.com | September 7, 2003 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54431 | 2000 | 0 | 3/5 | fresh | Nicolas Lacroix | 0 | Showbizz.net | November 12, 2002 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54432 rows × 39 columns
We'll now employ techniques that are sometimes referred to as descriptive statistics because they only describe the available data or offer estimations based on it.
#open the needed dataframe
budget_ratings_akas
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
id | review | rating | fresh | critic | top_critic | publisher | date | id | synopsis | ... | domestic_gross | worldwide_gross | movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | A distinctly gallows take on contemporary fina... | 3/5 | fresh | PJ Nabarro | 0 | Patrick Nabarro | November 10, 2018 | 1.0 | This gritty, fast-paced, and innovative police... | ... | 760507625 | 2776345279 | tt10356526 | 8.3 | 31.0 | Laiye Je Yaarian | Laiye Je Yaarian | 2019.0 | 117.0 | Romance |
1 | 3 | It's an allegory in search of a meaning that n... | 0 | rotten | Annalee Newitz | 0 | io9.com | May 23, 2018 | 3.0 | New York City, not-too-distant-future: Eric Pa... | ... | 241063875 | 1045663875 | tt10384606 | 8.9 | 559.0 | Borderless | Borderless | 2019.0 | 87.0 | Documentary |
2 | 3 | ... life lived in a bubble in financial dealin... | 0 | fresh | Sean Axmaker | 0 | Stream on Demand | January 4, 2018 | 5.0 | Illeana Douglas delivers a superb performance ... | ... | 42762350 | 149762350 | tt1042974 | 6.4 | 20.0 | Just Inès | Just Inès | 2010.0 | 90.0 | Drama |
3 | 3 | Continuing along a line introduced in last yea... | 0 | fresh | Daniel Kasman | 0 | MUBI | November 16, 2017 | 6.0 | Michael Douglas runs afoul of a treacherous su... | ... | 459005868 | 1403013963 | tt1043726 | 4.2 | 50352.0 | The Legend of Hercules | The Legend of Hercules | 2014.0 | 99.0 | Action,Adventure,Fantasy |
4 | 3 | ... a perverse twist on neorealism... | 0 | fresh | 0 | 0 | Cinema Scope | October 12, 2017 | 7.0 | 0 | ... | 620181382 | 1316721747 | tt1060240 | 6.5 | 21.0 | Até Onde? | Até Onde? | 2011.0 | 73.0 | Mystery,Thriller |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
54427 | 2000 | The real charm of this trifle is the deadpan c... | 0 | fresh | Laura Sinagra | 1 | Village Voice | September 24, 2002 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54428 | 2000 | 0 | 1/5 | rotten | Michael Szymanski | 0 | Zap2it.com | September 21, 2005 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54429 | 2000 | 0 | 2/5 | rotten | Emanuel Levy | 0 | EmanuelLevy.Com | July 17, 2005 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54430 | 2000 | 0 | 2.5/5 | rotten | Christopher Null | 0 | Filmcritic.com | September 7, 2003 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54431 | 2000 | 0 | 3/5 | fresh | Nicolas Lacroix | 0 | Showbizz.net | November 12, 2002 | 0.0 | 0 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
54432 rows × 39 columns
# plotting a sns.barplot:
fig, ax1= plt.subplots(figsize=(10,8))
x = list(budget_ratings_akas['runtime_minutes'].values)
y = budget_ratings['genres']
ax= sns.barplot(data = budget_ratings, x = 'runtime_minutes', y = 'genres')
#labelling plot
ax1.set_title('Correlation btwn Genres & Runtime', fontsize=16)
ax1.set_xlabel("Runtime_minutes",fontsize=16)
ax1.set_ylabel("Genres", fontsize=16)
#will display the plot
plt.show()
The plot above has the longest duration of the genres, measured in minutes, according to the visual representation. The genre with the longest runtime is romance, whereas the genre with the shortest length is a thriller.
#loading data
#confirming the columns needed are available
budget_ratings
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | id | release_date | movie | production_budget | domestic_gross | worldwide_gross | movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010.0 | 1 | Dec 18, 2009 | Avatar | 425000000 | 760507625 | 2776345279 | tt10356526 | 8.3 | 31.0 | Laiye Je Yaarian | Laiye Je Yaarian | 2019.0 | 117.0 | Romance |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010.0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 241063875 | 1045663875 | tt10384606 | 8.9 | 559.0 | Borderless | Borderless | 2019.0 | 87.0 | Documentary |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010.0 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 42762350 | 149762350 | tt1042974 | 6.4 | 20.0 | Just Inès | Just Inès | 2010.0 | 90.0 | Drama |
3 | Inception | WB | 292600000.0 | 535700000 | 2010.0 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 459005868 | 1403013963 | tt1043726 | 4.2 | 50352.0 | The Legend of Hercules | The Legend of Hercules | 2014.0 | 99.0 | Action,Adventure,Fantasy |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010.0 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 620181382 | 1316721747 | tt1060240 | 6.5 | 21.0 | Até Onde? | Até Onde? | 2011.0 | 73.0 | Mystery,Thriller |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5777 | 0 | 0 | 0.0 | 0 | 0.0 | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5778 | 0 | 0 | 0.0 | 0 | 0.0 | 79 | Apr 2, 1999 | Following | 6000 | 48482 | 240495 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5779 | 0 | 0 | 0.0 | 0 | 0.0 | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 | 1338 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5780 | 0 | 0 | 0.0 | 0 | 0.0 | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5781 | 0 | 0 | 0.0 | 0 | 0.0 | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 | 181041 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5782 rows × 19 columns
#plotting
fig, ax1= plt.subplots(figsize=(10,8))
x = list(budget_ratings_akas['genres'].values)
y = budget_ratings['averagerating']
ax= sns.barplot(data = budget_ratings, x = 'genres', y = 'averagerating')
#labelling plot
ax1.set_title('Most rated genre of film', fontsize=16)
ax1.set_xlabel("Genres",fontsize=16)
ax1.set_ylabel("Average Rating", fontsize=16)
#changing axis of x labels
plt.xticks(rotation = 45)
#will display the plot
plt.show()
The genre with the highest rating, documentaries, is 8.9
#To enable it to load in the plot, convert worldwide gross to float.
budget_ratings['worldwide_gross']=budget_ratings['worldwide_gross'].astype(float)
#confirming changes
budget_ratings
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
title | studio | domestic_gross | foreign_gross | year | id | release_date | movie | production_budget | domestic_gross | worldwide_gross | movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010.0 | 1 | Dec 18, 2009 | Avatar | 425000000 | 760507625 | 2.776345e+09 | tt10356526 | 8.3 | 31.0 | Laiye Je Yaarian | Laiye Je Yaarian | 2019.0 | 117.0 | Romance |
1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010.0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | 410600000 | 241063875 | 1.045664e+09 | tt10384606 | 8.9 | 559.0 | Borderless | Borderless | 2019.0 | 87.0 | Documentary |
2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010.0 | 3 | Jun 7, 2019 | Dark Phoenix | 350000000 | 42762350 | 1.497624e+08 | tt1042974 | 6.4 | 20.0 | Just Inès | Just Inès | 2010.0 | 90.0 | Drama |
3 | Inception | WB | 292600000.0 | 535700000 | 2010.0 | 4 | May 1, 2015 | Avengers: Age of Ultron | 330600000 | 459005868 | 1.403014e+09 | tt1043726 | 4.2 | 50352.0 | The Legend of Hercules | The Legend of Hercules | 2014.0 | 99.0 | Action,Adventure,Fantasy |
4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010.0 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | 317000000 | 620181382 | 1.316722e+09 | tt1060240 | 6.5 | 21.0 | Até Onde? | Até Onde? | 2011.0 | 73.0 | Mystery,Thriller |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5777 | 0 | 0 | 0.0 | 0 | 0.0 | 78 | Dec 31, 2018 | Red 11 | 7000 | 0 | 0.000000e+00 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5778 | 0 | 0 | 0.0 | 0 | 0.0 | 79 | Apr 2, 1999 | Following | 6000 | 48482 | 2.404950e+05 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5779 | 0 | 0 | 0.0 | 0 | 0.0 | 80 | Jul 13, 2005 | Return to the Land of Wonders | 5000 | 1338 | 1.338000e+03 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5780 | 0 | 0 | 0.0 | 0 | 0.0 | 81 | Sep 29, 2015 | A Plague So Pleasant | 1400 | 0 | 0.000000e+00 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5781 | 0 | 0 | 0.0 | 0 | 0.0 | 82 | Aug 5, 2005 | My Date With Drew | 1100 | 181041 | 1.810410e+05 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 |
5782 rows × 19 columns
fig, ax1= plt.subplots(figsize=(10,5))
#arranging the x & y axis to avoid an overlap
x = np.arange(8)
y = 2*x + 1
#plot:
ax= sns.scatterplot( x='movie', y='worldwide_gross', data = budget_ratings)
#labelling plot
ax1.set_title('Movie with the highest worldwide gross')
ax1.set_xlabel("Movie")
ax1.set_ylabel("worldwide_gross")
plt.xticks(rotation= 45)
#will display the plot
plt.show()
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 128 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 153 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 148 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 129 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 149 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 159 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 131 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 147 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 128 missing from current font.
font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 153 missing from current font.
font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 148 missing from current font.
font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 129 missing from current font.
font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 149 missing from current font.
font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 159 missing from current font.
font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 131 missing from current font.
font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 147 missing from current font.
font.set_text(s, 0, flags=flags)
This particular scatter plot was created to display the amount of money that the films on the x-axis brought in globally. I made a few attempts to stop it from overlapping, but they were unsuccessful. This leads me to the conclusion that I need to do additional research on how to plan a plot that doesn't overlap. The many film genres brought in good money as gained from x-axis, which is measured in millions.
#checking for the needed columns first
basics_and_ratings
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
movie_id | averagerating | numvotes | primary_title | original_title | start_year | runtime_minutes | genres | |
---|---|---|---|---|---|---|---|---|
0 | tt10356526 | 8.3 | 31 | Laiye Je Yaarian | Laiye Je Yaarian | 2019 | 117.0 | Romance |
1 | tt10384606 | 8.9 | 559 | Borderless | Borderless | 2019 | 87.0 | Documentary |
2 | tt1042974 | 6.4 | 20 | Just Inès | Just Inès | 2010 | 90.0 | Drama |
3 | tt1043726 | 4.2 | 50352 | The Legend of Hercules | The Legend of Hercules | 2014 | 99.0 | Action,Adventure,Fantasy |
4 | tt1060240 | 6.5 | 21 | Até Onde? | Até Onde? | 2011 | 73.0 | Mystery,Thriller |
5 | tt1069246 | 6.2 | 326 | Habana Eva | Habana Eva | 2010 | 106.0 | Comedy,Romance |
6 | tt1094666 | 7.0 | 1613 | The Hammer | Hamill | 2010 | 108.0 | Biography,Drama,Sport |
7 | tt1130982 | 6.4 | 571 | The Night Clerk | Avant l'aube | 2011 | 104.0 | Drama,Thriller |
8 | tt1156528 | 7.2 | 265 | Silent Sonata | Circus Fantasticus | 2011 | 77.0 | Drama,War |
9 | tt1161457 | 4.2 | 148 | Vanquisher | The Vanquisher | 2016 | 90.0 | Action,Adventure,Sci-Fi |
#Plotting a seaborn lineplot
plt.figure(figsize=(12,6))
sns.lineplot( x="genres", y="numvotes", data=basics_and_ratings,)
plt.title("The most voted for Genres") #labelling
plt.xticks(rotation = 60);
plt.show()
The most voted for genre is Action,Adventure,Fantasy followed by Biography,Drama,Sport.
- A movie's average rating does not guarantee that it is a good movie, and the opposite is also true.
- Film studios should provide many online and offline access methods for their content.
- Fans of movies convey a different message about what they find appealing in movies.
- According to the data provided, romantic films had longer runs than scary films. Films that are near to the hearts of the audience should receive more attention than those that frighten them, as the production budget also increases somewhat as a result.
- It is necessary to conduct more research. To determine the amount of individuals who really see movies in theaters versus those who prefer to stream, surveys can be sent to owners of movie theaters, moviegoers, and internet respondents.
- Depending on their genre and production costs, movies can make money both domestic and foreign.. Less people will watch it the worse the quality, and vice versa.
-
The dataframes displayed the various movie genres, the titles of the films, their budgets for production, and the respective domestic, international, and global box office receipts for the film studios. Despite having a global and worldwide audience, the languages employed in the films did not take into account other continents; for instance, there was no swahili-language film or actor. Therefore,accessibility of content in different markets When movies come out should be considered. Allow growth by giving everyone the chance to watch a new movie in every region of the world.
-
Major Markets to invest in: .Tv Licensing .Foreign distribution .Domestic Box Office .Physical Copy sales .Digital streaming & video on demand
-
Consider first going through the company planning process.
-
Work in all languages and with the more popular genres.