Best selling video games

9 minute read

Published:

In this project, we'll explore the top 400 best-selling video games created between 1977 and 2020. We'll compare a dataset on game sales with critic and user reviews to determine whether or not video games have improved as the gaming market has grown.

Let's begin by looking at some of the top selling video games of all time!

Skills : SQL, Data Exploration, Data cleaning, SQL Join, SQL Sub query

%%sql
postgresql:///games

-- Select all information for the top ten best-selling games
-- Order the results from best-selling game down to tenth best-selling


SELECT * 
FROM game_sales
ORDER BY games_sold DESC
LIMIT 10;
10 rows affected.
gameplatformpublisherdevelopergames_soldyear
Wii Sports for WiiWiiNintendoNintendo EAD82.902006
Super Mario Bros. for NESNESNintendoNintendo EAD40.241985
Counter-Strike: Global Offensive for PCPCValveValve Corporation40.002012
Mario Kart Wii for WiiWiiNintendoNintendo EAD37.322008
PLAYERUNKNOWN'S BATTLEGROUNDS for PCPCPUBG CorporationPUBG Corporation36.602017
Minecraft for PCPCMojangMojang AB33.152010
Wii Sports Resort for WiiWiiNintendoNintendo EAD33.132009
Pokemon Red / Green / Blue Version for GBGBNintendoGame Freak31.381998
New Super Mario Bros. for DSDSNintendoNintendo EAD30.802006
New Super Mario Bros. Wii for WiiWiiNintendoNintendo EAD30.302009

2. Missing review scores

Wow, the best-selling video games were released between 1985 to 2017! That's quite a range; we'll have to use data from the reviews table to gain more insight on the best years for video games.

First, it's important to explore the limitations of our database. One big shortcoming is that there is not any reviews data for some of the games on the game_sales table.

%%sql 

-- Join games_sales and reviews
-- Select a count of the number of games where both  critic_scoreand user_score are null

SELECT COUNT(g.game) 
FROM game_sales as g
LEFT JOIN reviews as r
ON g.game = r.game
WHERE user_score IS NULL AND critic_score IS NULL;


 * postgresql:///games
1 rows affected.
count
31

3. Years that video game critics loved

It looks like a little less than ten percent of the games on the game_sales table don't have any reviews data. That's a small enough percentage that we can continue our exploration, but the missing reviews data is a good thing to keep in mind as we move on to evaluating results from more sophisticated queries.

There are lots of ways to measure the best years for video games! Let's start with what the critics think.

%%sql

-- Select release year and average critic score for each year, rounded and aliased
-- Join the game_sales and reviews tables
-- Group by release year
-- Order the data from highest to lowest avg_critic_score and limit to 10 results


SELECT year,
ROUND(AVG(critic_score),2) as avg_critic_score
FROM game_sales as g
LEFT JOIN reviews as r
ON g.game = r.game
GROUP BY year
ORDER BY avg_critic_score DESC
LIMIT 10;

 * postgresql:///games
10 rows affected.
yearavg_critic_score
19909.80
19929.67
19989.32
20209.20
19939.10
19959.07
20049.03
19829.00
20028.99
19998.93

4. Was 1982 really that great?

The range of great years according to critic reviews goes from 1982 until 2020: we are no closer to finding the golden age of video games!

Hang on, though. Some of those avg_critic_score values look like suspiciously round numbers for averages. The value for 1982 looks especially fishy. Maybe there weren't a lot of video games in our dataset that were released in certain years.

Let's update our query and find out whether 1982 really was such a great year for video games.

%%sql 

-- Paste your query from the previous task; update it to add a count of games released in each year called num_games
-- Update the query so that it only returns years that have more than four reviewed games

SELECT g.year,
ROUND(AVG(critic_score),2) as avg_critic_score,
COUNT(g.game) as num_games
FROM game_sales as g
INNER JOIN reviews as r
ON g.game = r.game
GROUP BY g.year
HAVING COUNT(g.game) > 4
ORDER BY avg_critic_score DESC
LIMIT 10;
 * postgresql:///games
10 rows affected.
yearavg_critic_scorenum_games
19989.3210
20049.0311
20028.999
19998.9311
20018.8213
20118.7626
20168.6713
20138.6618
20088.6320
20178.6213

5. Years that dropped off the critics’ favorites list

That looks better! The num_games column convinces us that our new list of the critics' top games reflects years that had quite a few well-reviewed games rather than just one or two hits. But which years dropped off the list due to having four or fewer reviewed games? Let's identify them so that someday we can track down more game reviews for those years and determine whether they might rightfully be considered as excellent years for video game releases!

It's time to brush off your set theory skills. To get started, we've created tables with the results of our previous two queries:

top_critic_years

columntypemeaning
yearintYear of video game release
avg_critic_scorefloatAverage of all critic scores for games released in that year

top_critic_years_more_than_four_games

columntypemeaning
yearintYear of video game release
num_gamesintCount of the number of video games released in that year
avg_critic_scorefloatAverage of all critic scores for games released in that year
%%sql 

-- Select the year and avg_critic_score for those years that dropped off the list of critic favorites 
-- Order the results from highest to lowest avg_critic_score


SELECT year, avg_critic_score
FROM top_critic_years
EXCEPT
SELECT year, avg_critic_score
FROM top_critic_years_more_than_four_games
ORDER BY avg_critic_score DESC;
 * postgresql:///games
6 rows affected.
yearavg_critic_score
19909.80
19929.67
20209.20
19939.10
19959.07
19829.00

6. Years video game players loved

Based on our work in the task above, it looks like the early 1990s might merit consideration as the golden age of video games based on critic_score alone, but we'd need to gather more games and reviews data to do further analysis.

Let's move on to looking at the opinions of another important group of people: players! To begin, let's create a query very similar to the one we used in Task Four, except this one will look at user_score averages by year rather than critic_score averages.

%%sql 

-- Select year, an average of user_score, and a count of games released in a given year, aliased and rounded
-- Include only years with more than four reviewed games; group data by year
-- Order data by avg_user_score, and limit to ten results


SELECT year,
ROUND(AVG(user_score),2) as avg_user_score,
COUNT(g.game) as num_games
FROM game_sales as g
INNER JOIN reviews as r
ON g.game = r.game
GROUP BY g.year
HAVING COUNT(g.game) > 4
ORDER BY avg_user_score DESC
LIMIT 10;


 * postgresql:///games
10 rows affected.
yearavg_user_scorenum_games
19979.508
19989.4010
20109.2423
20099.1820
20089.0320
19969.005
20058.9513
20068.9516
20008.808
20028.809

7. Years that both players and critics loved

Alright, we've got a list of the top ten years according to both critic reviews and user reviews. Are there any years that showed up on both tables? If so, those years would certainly be excellent ones!

Recall that we have access to the top_critic_years_more_than_four_games table, which stores the results of our top critic years query from Task 4:

top_critic_years_more_than_four_games

columntypemeaning
yearintYear of video game release
num_gamesintCount of the number of video games released in that year
avg_critic_scorefloatAverage of all critic scores for games released in that year

We've also saved the results of our top user years query from the previous task into a table:

top_user_years_more_than_four_games

columntypemeaning
yearintYear of video game release
num_gamesintCount of the number of video games released in that year
avg_user_scorefloatAverage of all user scores for games released in that year
%%sql 

-- Select the year results that appear on both tables
SELECT year
FROM top_critic_years_more_than_four_games
INTERSECT
SELECT year 
FROM top_user_years_more_than_four_games AS new;

 * postgresql:///games
3 rows affected.
year
1998
2008
2002

8. Sales in the best video game years

Looks like we've got three years that both users and critics agreed were in the top ten! There are many other ways of measuring what the best years for video games are, but let's stick with these years for now. We know that critics and players liked these years, but what about video game makers? Were sales good? Let's find out.

This time, we haven't saved the results from the previous task in a table for you. Instead, we'll use the query from the previous task as a subquery in this one! This is a great skill to have, as we don't always have write permissions on the database we are querying.

%%sql 

-- Select year and sum of games_sold, aliased as total_games_sold; order results by total_games_sold descending
-- Filter game_sales based on whether each year is in the list returned in the previous task


SELECT g.year, 
SUM(games_sold) as total_games_sold
FROM game_sales as g
INNER JOIN reviews as r
ON g.game = r.game
WHERE year IN 
(SELECT year
FROM top_critic_years_more_than_four_games
INTERSECT
SELECT year 
FROM top_user_years_more_than_four_games)
GROUP BY g.year
ORDER BY total_games_sold DESC;

 * postgresql:///games
3 rows affected.
yeartotal_games_sold
2008175.07
1998101.52
200258.67