Visualising PUBG Deaths with Datashader

Let’s use Datashader to understand some of the gameplay mechanics of a hit video game while also making some abstract art.
Datashader
Visualisation
PUBG
Published

May 31, 2020

While browsing Kaggle, I came across this interesting dataset, and I thought it would form the basis for some exciting blog posts.

The dataset contains 65M player deaths, from 720,000 different matches, from PlayerUnknown’s Battlegrounds (PUBG), a wildly popular online game.

An Introduction to PUBG

Wikipedia sums up the aim of the game pretty well: > “In the game, up to one hundred players parachute onto an island and scavenge for weapons and equipment to kill others while avoiding getting killed themselves. The available safe area of the game’s map decreases in size over time, directing surviving players into tighter areas to force encounters. The last player or team standing wins the round.”

But for something a bit less dry but just as accurate, there is this video on Youtube.

Data preprocessing

First, let’s load some of the libraries we will need later.

import glob
import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [15, 15]

Bad key "text.kerning_factor" on line 4 in
/opt/anaconda3/envs/PyMC3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
https://github.com/matplotlib/matplotlib/blob/v3.1.3/matplotlibrc.template
or from the matplotlib source distribution

The dataset comes in several different .csv files, which we will load and concatenate.

def load_deaths():
    li = []
    for filename in glob.glob("/Users/cooke_c/Documents/Blog_Staging/PUBG/9372_13466_bundle_archive/deaths/*.csv"):
        df = pd.read_csv(filename)
        df = df.drop(['match_id','victim_placement','killed_by','killer_name','killer_placement','killer_position_x','killer_position_y','victim_name'],axis='columns')
        li.append(df)
    df = pd.concat(li, axis=0, ignore_index=True)
    return(df)
deaths_df = load_deaths()

Matches in PUBG are limited in time to approximately 32.5 minutes. Let’s create a new categorical variable called “phase”. It will represent which of the following match phases a player died in:

  1. Early Phase (0-10m) (Lime Green points)
  2. Mid Phase (10-25m) (Cyan points)
  3. Late Phase (>25m) (Purple points)
def create_phase_category(deaths_df):
    conditions = [
        (1*60<deaths_df.time) & (deaths_df.time<10*60),
        (10*60<deaths_df.time) & (deaths_df.time<25*60),
        (25*60<deaths_df.time)]

    choices = ['early', 'mid', 'late']
    deaths_df['phase'] = np.select(conditions, choices, default='very_early')
    deaths_df['phase'] = deaths_df['phase'].astype('category')
    
    return(deaths_df)
deaths_df = create_phase_category(deaths_df)

Datashader

Now, this is where the fun begins.

Datashader is a highly efficient Python library for visualising massive data.

Taking Pandas data frames as inputs, Datashader aggregates the data to form visualisations.

There are three key components that we use to generate our visualisation:

  1. Defining a canvas. It’s going to be 4,000 by 4,000 pixels. The data range we want to visualise is 800,000 by 800,000.
cvs = ds.Canvas(plot_width=4_000, plot_height=4_000, x_range=[0,800_000],y_range=[0,800_000])
  1. We want to aggregate data from deaths_df, using the ‘victim_position_x’ variable as the x coordinate and ‘victim_position_y’ as the y coordinate. Effectively, we are computing a separate 2D histogram for each category (game phase).
agg = cvs.points(deaths_df, 'victim_position_x', 'victim_position_y',ds.count_cat('phase'))
  1. We visualise our 2D histogram, colouring each bin/pixel according to our colour map. We also use histogram equalisation (how=‘eq_hist’).
img = tf.shade(agg, color_key=color_key, how='eq_hist')

This post is heavily inspired by this example, which is more detailed about the pipeline involved.

def visualise_with_datashader(deaths_df):
    color_key = {'very_early':'black', 'early':'lime',  'mid':'aqua', 'late':'fuchsia'}
    
    cvs = ds.Canvas(plot_width=4_000, plot_height=4_000, x_range=[0,800_000],y_range=[0,800_000])
    
    agg = cvs.points(deaths_df,'victim_position_x','victim_position_y',ds.count_cat('phase'))
    
    img = tf.shade(agg, color_key=color_key, how='eq_hist')
    img = tf.set_background(img,"black", name="Black bg")
    return(img)

One minor detail is that we need to invert the y coordinates we want to render to match the coordinate system used for the game maps.

deaths_df.victim_position_y = 800_000 - deaths_df.victim_position_y

Erangel

  1. Early Phase (0-10m) (Lime Green points)
  2. Mid Phase (10-25m) (Cyan points)
  3. Late Phase (>25m) (Purple points)
erangel_df = deaths_df[deaths_df.map=='ERANGEL']

num_points = erangel_df.shape[0]
print(f'Total points : {num_points}')

img = visualise_with_datashader(erangel_df)

ds.utils.export_image(img=img,filename='Erangel', fmt=".png");
Total points : 52964245

Erangel Erangel

Miramar

  1. Early Phase (0-10m) (Lime Green points)
  2. Mid Phase (10-25m) (Cyan points)
  3. Late Phase (>25m) (Purple points)
miramar_df = deaths_df[deaths_df.map=='MIRAMAR']

num_points = miramar_df.shape[0]
print(f'Total points : {num_points}')

img = visualise_with_datashader(miramar_df)
ds.utils.export_image(img=img,filename='Miramar', fmt=".png");
Total points : 11622838

Erangel Miramar

Analysis

Let’s take a closer look at the lower part of the Erangel map.

We can see three different phases of the game, the early phase in green, the mid-phase in cyan, and the later phase in purple.

I will confess to having played a total of 2 games of PUBG before deciding that playing virtual hide and seek wasn’t that fun. Hence, we can see some clear patterns.

In the early phases of the game, deaths are in and around buildings as players search for supplies and weapons.

In the middle phase, the deaths appear to be more spread over the map, with concentrations on roads and natural chokepoints like bridges.

In the last phase of the game, the decreasing size of the “safe zone” forces the players into a concentrated area for a final stand. This results in the constellation of purple dots spread across the map.

Erangel subsection 1

Subsection of Erangel 1

Erangel subsection 2

Subsection of Erangel 2

Miramar subsection

Subsection of Miramar