Practice some SQL, data analysis and visualizations! This is going to be split into a series of practices.
Practice #1 and #2 are quite easy. This practice, #3, is more advanced. I suppose this might be considered like an assignment a company may give with 1-2 days to return, perhaps with the last question as a bonus question.
UPDATE; SET; CASE WHEN; CTE; UNION ALL; DATE FUNCTIONS; COUNT(); GROUP BY; HAVING; JOIN; FULL OUTER JOIN; COALESCE; SUM; WINDOW FUNCTIONS; WHERE; ORDER BY; CAST; FLOOR; NULLIF; TEMPORARY TABLE; ROUND; And more.
You can download the datasets below.
The Data
Download the three datasets (CSV):
You may copy the table creation and data loading from Practice #1.
Practice #3:
You’ll be using all datasets. Decide which to use on your own.
The Questions
Question #1: The deprecated servers (old servers dataset) are missing server location. Analyze the “worlds” dataset to figure out which (old) server was located where. The server world name has a structure matching with server location.
Then, update the “oldservers” dataset accordingly.
If you need, see hints at “The Answers” section below. Scroll slowly if you don’t want to be exposed to both hints/the answer at once.
Question #2: Create a single visualization which presents the journey of expansion of CipSoft's Tibia to new geographical areas via the server locations throughout the game's existence.
Although there is no one "correct" answer, ensure that with your query/queries, the visualization is clear about how many servers Tibia had at least per year throughout its existence and the geographical locations.
Note: I propose two versions for this question. One, visualizing with Tableau, and the other, visualizing using Excel. The required output for fast visualization with Excel essentially warrants a somewhat more thoughtful query in my opinion. So consider tackling both versions.
Question #3: Find out how many (regular) creatures and how many (unique monsters) bosses there are, given that all "races" are either Creatures, Bosses or Players.
Note that you can distinguish between Creatures and Bosses based on the fact Bosses' names start with Capitalized letters while the rest do not.
Desired output:
Question 4#: Find the 15 most lethal races (from most to least lethal), by their rank, displaying their Lethality Rank, Race name, and Lethality Rate (players killed per race killed). Adjust the Lethality Rate as shown in the desired output image.
Desired output:
Question #5: Produce the output as shown below. It is a table which displays the Amount of, and overall Percentage in the population, the lethality rate of, the different races once classified into groups.
Arrange the data into six different Lethality Group categories as shown.
Pay attention that your output produces LethalityGroup E.
Perform the following data cleaning steps as well, and account for it in your calculations:
Some bosses evolve in stages during the boss fight and thus, require to merge the different names they appear under into one. Use the following list to handle it:
- 'Urmahlullu the Immaculate', 'Wisdom of Urmahlullu', 'Urmahlullu the Tamed', 'Wildness of Urmahlullu' = 'Urmahlullu the Weakened'
- 'Dragonking Zyrtarch' = 'Soul of Dragonking Zyrtarch'
- 'Channeling Earl Osam' = 'Earl Osam'
- 'Charged Anomaly' = 'Anomaly'
- 'glooth powered minotaurs' = 'Depowered Minotaur'
- 'The Fire Empowered Duke' = 'The Duke Of The Depths'Remove races which are game mechanisms, and therefore, cannot be killed (e.g. elements such as fire or magic pillars). You can spot them by the fact they cannot, and therefore never are, killed.
Consider a minimum threshold of 30 "encounters" in your evaluation (encounter being one of the parties is killed- players or creatures)
Desired output:
Question #6: Produce the same final output of the previous question (question #5), but this time, in the formation as shown at the desired output.
General note:
SQL does not have Transpose. If you want to convert data for more than 2 columns, it's usually easiest to export the output to e.g. Excel or use Python.
Yet consider the following challenge here and try it out. I found it to be an interesting thought-exercise and a way to instill stronger understanding of some concepts.
That noted, I’m sure in the future I’ll find out there was some easier/smarter way to do this :) A way I simply don’t know/haven’t thought of at this very moment.
But if you know of a better way, please leave a comment!
Desired output:
or if you prefer to see how it looks in Excel:
(Bonus?) Question 7#: Examine players' deaths at different servers (PvP Type & Location). Provide one or two interesting insight(s).
No specific desired output.
No specific solution code.
The Answers
Hint #1 for Question #1:
Hint 1: The pattern is most clear for the South America ones, then North America ones, and least for the European servers.
Hint #2 for Question #1:
Focus on the name ending.
Answer #1: The deprecated servers (old servers dataset) are missing server location. Analyze the “worlds” dataset to figure out which (old) server was located where. The server world name has a structure matching with server location.
~bra = South America (servers located in Brazil)
~era = North America (servers located in North America)
~a = Europe (servers located in Europe; names ending with A which don't end with BRA nor ERA)
UPDATE oldservers
SET located =
CASE
WHEN servName LIKE '%bra' THEN 'South America'
WHEN servName LIKE '%era' THEN 'North America'
ELSE 'Europe'
END
Answer #2: Create a single visualization which presents the journey of expansion of CipSoft's Tibia to new geographical areas via the server locations throughout the game's existence.
Tableau version - desired output example:
Solution code below:
WITH old_and_current AS (
select servName
, located
, YEAR(onlineSince) AS startYear
, YEAR(offlineSince) AS endYear
from oldservers
union all
select worldName
, servLocation
, YEAR(serverCETCreatedDate) AS startYear
, CASE WHEN serverCETCreatedDate is not null THEN '0000' END AS endYear
from worlds
)
, start_year AS (
SELECT
startYear
, located
, COUNT(servName) AS sYcounter
FROM old_and_current
GROUP BY
startYear
, located
)
, end_year AS (
SELECT
endYear
, located
, COUNT(servName) AS eYcounter
FROM old_and_current
GROUP BY
endYear
, located
)
, yearly_status AS (
SELECT
startYear AS "Year"
, start_year.located AS located
, COALESCE((sYcounter - eYcounter), sYcounter) AS NumStatus
FROM start_year
FULL OUTER JOIN end_year
ON start_year.startYear = end_year.endYear
AND start_year.located = end_year.located
)
SELECT
"Year"
, located
, SUM(NumStatus) OVER (PARTITION BY located ORDER BY "Year") AS Total
FROM yearly_status
WHERE "Year" IS NOT NULL
ORDER BY 2, 1
Excel version - desired output example:
Solution code below:
WITH old_and_current AS (
SELECT servName
, located
, YEAR(onlineSince) AS startYear
, COALESCE(YEAR(offlineSince), 0) AS endYear
FROM oldservers
UNION ALL
SELECT worldName
, servLocation
, YEAR(serverCETCreatedDate) AS startYear
, 0 AS endYear
FROM worlds
),
addition AS (
SELECT startYear
, SUM(CASE WHEN located = 'Europe' THEN 1 ELSE 0 END) AS EuropePlus
, SUM(CASE WHEN located = 'North America' THEN 1 ELSE 0 END) AS NorthAmericaPlus
, SUM(CASE WHEN located = 'South America' THEN 1 ELSE 0 END) AS SouthAmericaPlus
FROM old_and_current
GROUP BY startYear
),
subtraction AS (
SELECT endYear
, SUM(CASE WHEN located = 'Europe' THEN 1 ELSE 0 END) AS EuropeMinus
, SUM(CASE WHEN located = 'North America' THEN 1 ELSE 0 END) AS NorthAmericaMinus
, SUM(CASE WHEN located = 'South America' THEN 1 ELSE 0 END) AS SouthAmericaMinus
FROM old_and_current
WHERE endYear != 0
GROUP BY endYear
),
trendPerYear AS (
SELECT
startYear AS "Year"
, COALESCE((EuropePlus - EuropeMinus), EuropePlus) AS EuropeTrend
, COALESCE((NorthAmericaPlus - NorthAmericaMinus), NorthAmericaPlus) AS NorthAmericaTrend
, COALESCE((SouthAmericaPlus - SouthAmericaMinus), SouthAmericaPlus) AS SouthAmericaTrend
FROM addition
FULL OUTER JOIN subtraction
ON addition.startYear = subtraction.endYear
),
statusPerYear AS (
SELECT
"Year"
, SUM(EuropeTrend) OVER (ORDER BY "Year") AS Europe
, SUM(NorthAmericaTrend) OVER (ORDER BY "Year") AS NorthAmerica
, SUM(SouthAmericaTrend) OVER (ORDER BY "Year") AS SouthAmerica
FROM trendPerYear
)
SELECT "Year"
, Europe
, NorthAmerica
, SouthAmerica
, (Europe + NorthAmerica + SouthAmerica) AS Total
FROM statusPerYear
Answer 3#: Find out how many (regular) creatures and how many (unique monsters) bosses there are, given that all "races" are either Creatures, Bosses or Players.
Solution code below:
SELECT
COUNT(CASE WHEN ASCII(LEFT(race, 1)) BETWEEN 65 AND 90 THEN 1 END ) AS 'Boss'
, COUNT(CASE WHEN ASCII(LEFT(race, 1)) NOT BETWEEN 65 AND 90 THEN 1 END ) AS 'Creature'
FROM (SELECT distinct race
FROM killstats
WHERE race != 'players'
) source
Answer: 4#: Find the 15 most lethal races (from most to least lethal), by their rank, displaying their Lethality Rank, Race name, and Lethality Rate (players killed per race killed). Adjust the Lethality Rate as shown in the desired output image.
Solution code below:
WITH LethalityTable AS (
SELECT
race
, COALESCE(
CAST(
FLOOR(
CAST(SUM(lastWeekKilledPlayers) AS DECIMAL(18, 6))
/
NULLIF
(CAST(SUM(lastWeekKilledbyPlayers) AS DECIMAL(18, 6))
, 0) --nullIF
* 1000 --for FLOOR
) / 1000 --for FLOOR
AS DECIMAL(18, 3))
, 0) -- coalesce
AS LethalityRate
FROM killstats
GROUP BY race
)
SELECT
TOP 16
DENSE_RANK() OVER (ORDER BY LethalityRate DESC) AS LethalityRank
, Race
, LethalityRate
FROM LethalityTable
ORDER BY LethalityRate DESC
Answer #5: Produce the output as shown below. It is a table which displays the Amount of, and overall Percentage in the population, the lethality rate of, the different races once classified into groups. i.e. according to the described details.
Solution code below:
DROP TABLE IF EXISTS #all_categories;
CREATE TABLE #all_categories
(
LethalityGroup VARCHAR(50)
);
INSERT INTO #all_categories(LethalityGroup)
VALUES ('(A) 0-0.99'), ('(B) 1-9.99'), ('(C) 10-49.99'), ('(D) 50-89.99'), ('(E) 90-99.99'), ('(F) 100+');
WITH AggregatedTable AS (--Grouped for aggregation, changed integer into decimal for rates
SELECT
serverName
, race
, CAST(SUM(lastWeekKilledPlayers) AS DECIMAL(13,3)) AS sumKilledPlayers
, CAST(SUM(lastWeekKilledbyPlayers) AS DECIMAL(13,3)) AS sumPlayersKilled
FROM killstats
GROUP BY
serverName
, race
)
, StagedBossesMerger AS (--sumPlayersKilled equal zero reveals list of creatures which
--cannot be killed at all or that got different forms before they can be killed in their final form.
--the CASE WHEN merged the different forms into the Killable-form.
--the HAVING part removed the UnKillable Creatures.
--And yes, to figure it out, there was a need to manual check on wiki-fandom website.
SELECT
CASE
WHEN race IN ('Urmahlullu the Immaculate', 'Wisdom of Urmahlullu', 'Urmahlullu the Tamed', 'Wildness of Urmahlullu') THEN 'Urmahlullu the Weakened'
WHEN race IN ('Dragonking Zyrtarch') THEN 'Soul of Dragonking Zyrtarch'
WHEN race IN ('Channeling Earl Osam') THEN 'Earl Osam'
WHEN race IN ('Charged Anomaly') THEN 'Anomaly'
WHEN race IN ('glooth powered minotaurs') THEN 'Depowered Minotaur'
WHEN race IN ('The Fire Empowered Duke') THEN 'The Duke Of The Depths'
ELSE race
END AS race
, SUM(sumKilledPlayers) AS sumKilledPlayers
, SUM(sumPlayersKilled) AS sumPlayersKilled
FROM AggregatedTable
GROUP BY
CASE
WHEN race IN ('Urmahlullu the Immaculate', 'Wisdom of Urmahlullu', 'Urmahlullu the Tamed', 'Wildness of Urmahlullu') THEN 'Urmahlullu the Weakened'
WHEN race IN ('Dragonking Zyrtarch') THEN 'Soul of Dragonking Zyrtarch'
WHEN race IN ('Channeling Earl Osam') THEN 'Earl Osam'
WHEN race IN ('Charged Anomaly') THEN 'Anomaly'
WHEN race IN ('glooth powered minotaurs') THEN 'Depowered Minotaur'
WHEN race IN ('The Fire Empowered Duke') THEN 'The Duke Of The Depths'
ELSE race
END
HAVING SUM(sumPlayersKilled) > 0
)
, LethalityTable AS (
SELECT race
, ROUND((sumKilledPlayers/sumPlayersKilled)*100,2) AS Lethality
FROM StagedBossesMerger
WHERE sumKilledPlayers+sumPlayersKilled >= 30 ---threshold
)
, LethalityTable2 AS (
SELECT
cate.LethalityGroup
, ISNULL(COUNT(l.race), 0) AS Amount
FROM LethalityTable l
RIGHT JOIN #all_categories cate
ON cate.LethalityGroup =
CASE WHEN l.Lethality BETWEEN 0 AND 0.99 THEN '(A) 0-0.99'
WHEN l.Lethality BETWEEN 1 AND 9.99 THEN '(B) 1-9.99'
WHEN l.Lethality BETWEEN 10 AND 49.99 THEN '(C) 10-49.99'
WHEN l.Lethality BETWEEN 50 AND 89.99 THEN '(D) 50-89.99'
WHEN l.Lethality BETWEEN 90 AND 99.99 THEN '(E) 90-99.99'
WHEN l.Lethality >= 100 THEN '(F) 100+'
END
GROUP BY
cate.LethalityGroup
),
LethalityTable3 AS (
SELECT LethalityGroup
, Amount
, CAST(Amount AS DECIMAL(10,4))
/
(SELECT CAST(SUM(Amount) AS DECIMAL(10,4))
FROM LethalityTable2)*100
AS "Percent"
FROM LethalityTable2
)
SELECT LethalityGroup
, Amount
, ROUND("Percent",2) AS "Percent"
FROM LethalityTable3
Answer #6: Produce the same final output of the previous question (question #5), but this time, in the formation as shown at the desired output.
Solution code below:
WITH AggregatedTable AS ( --Grouped for aggregation and changed integer into decimal for rates
SELECT
serverName
, race
, CAST(SUM(lastWeekKilledPlayers) AS DECIMAL(13,3)) AS sumKilledPlayers
, CAST(SUM(lastWeekKilledbyPlayers) AS DECIMAL(13,3)) AS sumPlayersKilled
FROM killstats
GROUP BY
serverName
, race
)
, StagedBossesMerger AS (--sumPlayersKilled equal zero reveals list of creatures which
--cannot be killed at all or that got different forms before they can be killed in their final form.
--the CASE WHEN merged the different forms into the Killable-form.
--the HAVING part removed the UnKillable Creatures.
--And yes, to figure it out, there was a need to manual check on wiki-fandom website.
SELECT
CASE
WHEN race IN ('Urmahlullu the Immaculate', 'Wisdom of Urmahlullu', 'Urmahlullu the Tamed', 'Wildness of Urmahlullu') THEN 'Urmahlullu the Weakened'
WHEN race IN ('Dragonking Zyrtarch') THEN 'Soul of Dragonking Zyrtarch'
WHEN race IN ('Channeling Earl Osam') THEN 'Earl Osam'
WHEN race IN ('Charged Anomaly') THEN 'Anomaly'
WHEN race IN ('glooth powered minotaurs') THEN 'Depowered Minotaur'
WHEN race IN ('The Fire Empowered Duke') THEN 'The Duke Of The Depths'
ELSE race
END AS race
, SUM(sumKilledPlayers) AS sumKilledPlayers
, SUM(sumPlayersKilled) AS sumPlayersKilled
FROM AggregatedTable
GROUP BY
CASE
WHEN race IN ('Urmahlullu the Immaculate', 'Wisdom of Urmahlullu', 'Urmahlullu the Tamed', 'Wildness of Urmahlullu') THEN 'Urmahlullu the Weakened'
WHEN race IN ('Dragonking Zyrtarch') THEN 'Soul of Dragonking Zyrtarch'
WHEN race IN ('Channeling Earl Osam') THEN 'Earl Osam'
WHEN race IN ('Charged Anomaly') THEN 'Anomaly'
WHEN race IN ('glooth powered minotaurs') THEN 'Depowered Minotaur'
WHEN race IN ('The Fire Empowered Duke') THEN 'The Duke Of The Depths'
ELSE race
END
HAVING SUM(sumPlayersKilled) > 0
)
, LethalityTable AS (
SELECT race
, ROUND((sumKilledPlayers/sumPlayersKilled)*100,2) AS Lethality
FROM StagedBossesMerger
WHERE sumKilledPlayers+sumPlayersKilled >= 30 ---threshold
),
LethalityAmount AS (
SELECT
COUNT(CASE WHEN Lethality BETWEEN 0 AND 0.99 THEN 1 END)
AS '(A) 0-0.99'
, COUNT(CASE WHEN Lethality BETWEEN 1 AND 9.99 THEN 1 END)
AS '(B) 1-9.99'
, COUNT(CASE WHEN Lethality BETWEEN 10 AND 49.99 THEN 1 END)
AS '(C) 10-49.99'
, COUNT(CASE WHEN Lethality BETWEEN 50 AND 89.99 THEN 1 END)
AS '(D) 50-89.99'
, COUNT(CASE WHEN Lethality BETWEEN 90 AND 99.99 THEN 1 END)
AS '(E) 90-99.99'
, COUNT(CASE WHEN Lethality >= 100 THEN 1 END)
AS '(F) 100+'
FROM LethalityTable
),
LethalityPercent AS (
SELECT
(SELECT CAST(COUNT(CASE WHEN Lethality BETWEEN 0 AND 0.99 THEN 1 END) AS DECIMAL(8,2)) FROM LethalityTable)
/
(SELECT CAST(COUNT(race) AS DECIMAL(8,2)) FROM LethalityTable)
AS '(A) 0-0.99'
,
(SELECT CAST(COUNT(CASE WHEN Lethality BETWEEN 1 AND 9.99 THEN 1 END) AS DECIMAL(8,2)) FROM LethalityTable)
/
(SELECT CAST(COUNT(race) AS DECIMAL(8,2)) FROM LethalityTable)
AS '(B) 1-9.99'
,
(SELECT CAST(COUNT(CASE WHEN Lethality BETWEEN 10 AND 49.99 THEN 1 END) AS DECIMAL(8,2)) FROM LethalityTable)
/
(SELECT CAST(COUNT(race) AS DECIMAL(8,2)) FROM LethalityTable)
AS '(C) 10-49.99'
,
(SELECT CAST(COUNT(CASE WHEN Lethality BETWEEN 50 AND 89.99 THEN 1 END) AS DECIMAL(8,2)) FROM LethalityTable)
/
(SELECT CAST(COUNT(race) AS DECIMAL(8,2)) FROM LethalityTable)
AS '(D) 50-89.99'
,
(SELECT CAST(COUNT(CASE WHEN Lethality BETWEEN 90 AND 99.99 THEN 1 END) AS DECIMAL(8,2)) FROM LethalityTable)
/
(SELECT CAST(COUNT(race) AS DECIMAL(8,2)) FROM LethalityTable)
AS '(E) 90-99.99'
,
(SELECT CAST(COUNT(CASE WHEN Lethality >= 100 THEN 1 END) AS DECIMAL(8,2)) FROM LethalityTable)
/
(SELECT CAST(COUNT(race) AS DECIMAL(8,2)) FROM LethalityTable)
AS '(F) 100+'
)
SELECT *
FROM LethalityAmount
UNION ALL
SELECT *
FROM LethalityPercent
(Bonus?) Answer 7#: Examine players' deaths at different servers (PvP Type & Location). Provide one or two interesting insight(s).
The insight I chose to show was that players on South American servers die to PvE (Player versus the Environment) significantly more than players on North American & European servers. This can be seen at Open PvP, Retro Open PvP and Retro Hardcore PvP. But I chose to illustrate this with the Open PvP servers:
I wonder if the reason for it is a bad internet infrastructure.
Anyhow, below is most of the SQL code used for the evaluation:
Question #E
And \/
WITH step1 AS (
SELECT servLocation
, pvpType
, CASE WHEN (race = 'players') THEN 'PvP' --P for Players
ELSE 'PvE' --E for Environment
END AS DeathType
, SUM(lastWeekKilledPlayers) AS Total_Players_Killed_TYPE
, CONCAT(
CAST(
FORMAT(
ROUND(
(
CAST(SUM(lastWeekKilledPlayers) AS DECIMAL(18,2))
/
CAST(SUM(SUM(lastWeekKilledPlayers)) OVER (PARTITION BY pvpType, servLocation) AS DECIMAL(18,2))
)*100 --multiply by 100 for Percent
, 2) --round to 2 digits past period
, 'F' --to Format the digit in a way which eliminates the 0 trailing after the 2 rounded digits
) AS VARCHAR --turn to String for CONCAT to work
), '%' --added % sign with CONCAT to signal percent
) AS Percent_Players_Killed_TYPE
FROM killstats
JOIN worlds
ON killstats.serverName = worlds.worldName
GROUP BY servLocation
, pvpType
, CASE WHEN (race = 'players') THEN 'PvP'
ELSE 'PvE' END
)
SELECT pvpType
, servLocation
, DeathType
, Percent_Players_Killed_TYPE
FROM step1
ORDER BY pvpType DESC
That’s it for the Tibia SQL Data Analysis Practice. I hope you enjoyed the questions and this practice.
The data gives ways to do more interesting analysis, I’m sure of it. Feel free to drop suggestions at the comments or contact me directly via e-mail or LinkedIn.