Classification analysis based on the following data for (N+1) Months’ Combat Power Categorization: ======================================== power_difference: Monthly Change in Combat Power by Character low_dungeon: Monthly Plays in Low-level Dungeons high_dungeon: Monthly Plays in High-level Dungeons quest_dungeon: Monthly Plays in Quest Dungeons current_power_level: (N) Months’ Combat Power Categorization (Higher Numbers Indicate Greater Combat Strength)
# Import libraries
import os
import itertools
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from time import time
from pycaret.classification import *
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
# setting font
plt.rc('font', family='AppleGothic') # For Windows
print(plt.rcParams['font.family'])
# Function for confusion matrix visualization
def plot_confusion_matrix(cm, model=None, target_names=None, cmap=None, normalize=True, labels=True, title='Confusion matrix'):
accuracy = np.trace(cm) / float(np.sum(cm))
misclass = 1 - accuracy
if cmap is None:
cmap = plt.get_cmap('Blues')
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
thresh = cm.max() / 1.5 if normalize else cm.max() / 2
if target_names is not None:
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names)
plt.yticks(tick_marks, target_names)
if labels:
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
if normalize:
plt.text(j, i, "{:0.4f}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
else:
plt.text(j, i, "{:,}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
NAME_FIG = "./" + model + "_CM.png"
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
plt.grid(False)
plt.savefig(NAME_FIG, dpi = 300, bbox_inches = 'tight')
plt.show()
# Data Load
df2021_character = pd.read_csv("./preprocessed_data/df2021_character.csv")
# The distribution of target column data
df2021_character['next_power_level'].value_counts().sort_index()
1 1799
2 3693
3 3267
4 3528
5 2449
6 531
7 105
Name: next_power_level, dtype: int64
# Verify dimensions of split data
print("Training Features Dimension:", features_train.shape)
print("Training Labels Dimension:", label_train.shape)
print("Test Features Dimension:", features_test.shape)
print("Test Labels Dimension:", label_test.shape)
Dimension of training features: (12297, 5)
Dimension of training labels: (12297,)
Dimension of test features: (3075, 5)
Dimension of test labels: (3075,)
# Finalize Model Selection
final_gbc_model = finalize_model(gbc_model)
# Input test data into the final model to verify model predictions
prediction = predict_model(final_gbc_model, data = features_test)
– The most influential variable in predicting next month’s Power Level is the current month’s Power Level. This suggests that users with low combat power levels tend to drop out quickly, while users with high combat power levels find it challenging to increase their combat power. In this current situation, it appears that Power Level changes are minimal, contributing to a ‘good’ prediction. – However, it may be challenging to consider this analysis within its limitations as a ‘good’ prediction. As a result, additional research will be conducted by segmenting combat power levels and further refining the analysis by breaking down combat power levels into finer categories.