by Sherry He

Check out the comment section of those Youtube gaming videos. What do they say about the giant entertainment firms’ upcoming financial performances?

Our motivation

Early research on stock market prediction was based on random walk theory and the Efficient Market Hypothesis (EMH), which assumes that EMH stock market prices are largely driven by new information, i.e. news, rather than present and past prices. Since news is unpredictable, stock market prices will follow a random walk pattern.

Yet the assumptions are often challenged, and recent research suggests that news / social media may be unpredictable but that very early indicators can be extracted online to predict changes in various economic and commercial indicators. This may conceivably also be the case for the stock market.

Battlefield V « click to watch this trailer video

A case in point is Battlefield V. See the like/dislike ratio of the trailer video above. Apparently the video game producer (Electronic Art) mistakenly thought their market, i.e. the gaming community, were still keen on futurist / irrealist games. Although it’s too early to say these were the only reason, EA stock price has indeed tumbled during the period:

EA

Our project initiative is to use the comment section as an alternative data source and perdict gaming firms’ financial performance and stock price movements.

Closer look: text mining

As a starter, I did some exploratory text mining. Here I’d like to share how to make a wordcloud. Below are example word clouds of some games.

Battlefield V (which unfortunately received excessive negative reviews):
BattleV

Call of Duty Infinite Warefare:
cod

Pokemon: Let’s go, Pikachu!
Pikachu

Wordclouds are useful in several ways:

  • It’s a direct representation of what’s included in the document
  • Wordclouds of different corpus can be used to intuitively compare the similarity (since our brain is certainly the best NLP machine :)
  • It can be used to check whether data pre-processing stage is done properly, e.g. if all stopwords are removed
  • It is eye-appalling and useful if you’re going to talk about some boring data analysis later

Here’s the minimal version of python code to realise a wordcloud:

# Start with loading all necessary libraries
import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

import matplotlib.pyplot as plt

# Load the dataframe that contains comments
# You can download all our raw on our github
comments_df = pd.read_csv("Battlefield V_comments.csv", index_col=0)

# Join comments 
text = " ".join(t for t in comments_df.Comment)

# Create stopword list:  
stopwords = set(STOPWORDS)
# we recommend using customised stop words apart from the set by word cloud
stopwords.update(["like", "wait", "game"])

# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords, background_color="black").generate(text)

# Create and generate a word cloud image:
wordcloud = WordCloud().generate(text)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

wordcloud.to_file("img/first_review.png")

Hope you find this interesting. Stay tuned to see updates of our project!