Code
#setwd("Your/File/Directory")Migration is something that many people have to resort to for a better life. It is a big part of out generation as it shapes our world and the world around us - shaping the way people live, learn, and build their future. Many move in search of better education or new job opportunities, hoping to improve their quality of life. This assignment showcases how migration connects to education and wealth. I will be visualising the data and creating interactive chart to allow people to learn and understand why migration happens, and which regions of the world see the most of it. To achieve the following, these are the datasets I have used:
Global IQ Data
https://www.kaggle.com/datasets/mlippo/average-global-iq-per-country-with-other-stats
This is a dataset containing information about average IQ per country. Using this I am able to see any correlations between IQ and Nobel Prizes, and if IQ is effected by education investments,
World Happiness Data
https://www.kaggle.com/datasets/simonaasm/world-happiness-index-by-reports-2013-2023
This is a dataset which has data about countries happiness. Unfortunately this dataset does not have enough countries listed to be fully representative of each country, but it is still a useful dataset to use for continental happiness.
Country Population Data
https://www.kaggle.com/datasets/tanuprabhu/population-by-country-2020
This is a dataset which has data about countries population. I was not able to find a dataset for 2022, so I used 2020. With manual checking the numbers have not change too significantly between 2020 and 2022, and trends stayed the same which what matters the most.
Country Financial Data
https://www.kaggle.com/datasets/yusufglcan/country-data
Main dataset for containing all information about countries financial positions, and where they spend the money. The issue with the dataset is not all countries had information about where they spend money.
Extra
I also use a world_map dataset for the sake of printing the global map in the next panel, but that is not the focus of this big idea.
Using these datasets, I will be able to show how migration is connected to wealth and education. I will be able to showcase how important is to invest towards education, and what results it yields. I will also be investigating what effects unemployment and GDP Per Capita have on migration.
- https://www.kaggle.com/datasets/mlippo/average-global-iq-per-country-with-other-stats
- https://www.kaggle.com/datasets/simonaasm/world-happiness-index-by-reports-2013-2023
- https://www.kaggle.com/datasets/tanuprabhu/population-by-country-2020
- https://www.kaggle.com/datasets/yusufglcan/country-data
(1) Primary groups or individuals:
(2) Single person:
(3) Audience’s Interests:
(4) Audience’s Actions:
Benefits:
Risks:
Understanding the economic world is essential for overall intelligence, especially considering ongoing migration trends and cultural shifts.
Original datasets have been renamed to:
Final dataset is created, but also submitted as:
Files need to be in the same directory as qmd, and then this command must be ran:
#setwd("Your/File/Directory")if (!require("ggplot2")) install.packages("ggplot2", dependencies = TRUE)
if (!require("dplyr")) install.packages("dplyr", dependencies = TRUE)
if (!require("tidyr")) install.packages("tidyr", dependencies = TRUE)
if (!require("plotly")) install.packages("plotly", dependencies = TRUE)
if (!require("forcats")) install.packages("forcats", dependencies = TRUE)
if (!require("ggiraph")) install.packages("ggiraph", dependencies = TRUE)
if (!require("maps")) install.packages("maps", dependencies = TRUE)
if (!require("simputation")) install.packages("simputation", dependencies = TRUE)
if (!require("tmap")) install.packages("tmap", dependencies = TRUE)
if (!require("sf")) install.packages("sf", dependencies = TRUE)
if (!require("rnaturalearth")) install.packages("rnaturalearth", dependencies = TRUE)
if (!require("rnaturalearthdata")) install.packages("rnaturalearthdata", dependencies = TRUE)
if (!require("scales")) install.packages("scales", dependencies = TRUE)
if (!require("gganimate")) install.packages("gganimate", dependencies = TRUE)
if (!require("cowplot")) install.packages("cowplot", dependencies = TRUE)
if (!require("reshape2")) install.packages("reshape2", dependencies = TRUE)
library(ggplot2)
library(dplyr)
library(tidyr)
library(plotly)
library(forcats)
library(ggiraph)
library(maps)
library(simputation)
library(tmap)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(scales)
library(gganimate)
library(cowplot)
library(reshape2)# The original csv files from kaggle
population_by_country <- read.table("VI_PopulationByCountry2020.csv", header = TRUE, sep = ",", quote = "", encoding = "UTF-8", fill = TRUE)
world_happiness_index <- read.table("VI_WorldHappinessIndex.csv", header = TRUE, sep = ",", quote = "", encoding = "UTF-8", fill = TRUE)
countries_financial_data <- read.table("VI_CountriesFinancialData.csv", header = TRUE, sep = ",", quote = "", encoding = "UTF-8", fill = TRUE)
average_iq_per_country<- read.table("VI_AvgIqPerCountry.csv", header = TRUE, sep = ",", quote = "", encoding = "UTF-8", fill = TRUE)colnames(population_by_country) <- c("Country", "Population", "Population Yearly Change", "Population Net Change", "Population Density", "Land Area", "Net Migrants", "Fertility Rate", "Median Age", "Urban Population", "World Share")world_happiness_index <- world_happiness_index[world_happiness_index$Year == 2022, ]
row.names(world_happiness_index) <- NULL
world_happiness_index <- world_happiness_index[, !names(world_happiness_index) %in% "Year"]
colnames(world_happiness_index) <- c("Country", "Happiness Index", "Happiness Rank")average_iq_per_country <- average_iq_per_country[, !names(average_iq_per_country) %in% "Population...2023"]
colnames(average_iq_per_country) <- c("IQ Rank", "Country", "Average IQ", "Continent", "Literacy Rate", "Nobel Prizes", "HDI", "Mean Schooling Years", "GNI")
average_iq_per_country <- average_iq_per_country[, !names(average_iq_per_country) %in% "GNI"]countries_financial_data <- countries_financial_data[countries_financial_data$Year == 2022, ]
row.names(countries_financial_data) <- NULL
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Country.Code"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Year"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Population"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Population.Density"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "R.D"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Service....GDP."]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Continent.Name"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Country.Code"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Land"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Import....GDP."]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Industry....GDP."]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Export....GDP."]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Agriculture....GDP."]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Education.Expenditure"]
countries_financial_data <- countries_financial_data[, !names(countries_financial_data) %in% "Health.Expenditure"]
colnames(countries_financial_data) <- c("Country", "Ease Of Doing Business", "Education Expenditure", "Country GDP", "Health Expenditure", "Inflation Rate", "Unemployment", "Country Export", "Country Import", "Country Net Trade", "GDP Per Capita")global_information_dataset <- population_by_country %>%
left_join(world_happiness_index, by = "Country") %>%
left_join(average_iq_per_country, by = "Country") %>%
left_join(countries_financial_data, by = "Country")
global_information_dataset <- global_information_dataset %>%
mutate(across(where(is.character), ~ na_if(., "NULL"))) %>%
mutate(across(where(is.character), ~ na_if(., "N.A.")))
global_information_dataset <- global_information_dataset %>%
filter(!is.na(`Median Age`))global_information_dataset <- global_information_dataset %>%
mutate(Continent = case_when(
Country == "DR Congo" ~ "Africa",
Country == "Turkey" ~ "Europe/Asia",
Country == "Côte d'Ivoire" ~ "Africa",
Country == "Czech Republic (Czechia)" ~ "Europe",
Country == "State of Palestine" ~ "Asia",
Country == "Moldova" ~ "Europe",
Country == "Guinea-Bissau" ~ "Africa",
Country == "Equatorial Guinea" ~ "Africa",
Country == "Timor-Leste" ~ "Asia",
Country == "Réunion" ~ "Africa",
Country == "Western Sahara" ~ "Africa",
Country == "Cabo Verde" ~ "Africa",
Country == "Guadeloupe" ~ "North America",
Country == "Martinique" ~ "North America",
Country == "French Guiana" ~ "South America",
Country == "French Polynesia" ~ "Oceania",
Country == "Mayotte" ~ "Africa",
Country == "Sao Tome & Principe" ~ "Africa",
Country == "Samoa" ~ "Oceania",
Country == "Channel Islands" ~ "Europe",
Country == "Guam" ~ "Oceania",
Country == "Curaçao" ~ "North America",
Country == "Kiribati" ~ "Oceania",
Country == "Micronesia" ~ "Oceania",
Country == "Grenada" ~ "North America",
Country == "St. Vincent & Grenadines" ~ "North America",
Country == "Aruba" ~ "North America",
Country == "Tonga" ~ "Oceania",
Country == "U.S. Virgin Islands" ~ "North America",
TRUE ~ Continent
))global_information_dataset <- impute_median(global_information_dataset, `Inflation Rate` ~ Continent)
global_information_dataset <- impute_median(global_information_dataset, `HDI` ~ Continent)
global_information_dataset <- impute_median(global_information_dataset, `Unemployment` ~ Continent)
global_information_dataset <- impute_median(global_information_dataset, `GDP Per Capita` ~ Continent)global_information_dataset$`Nobel Prizes`[is.na(global_information_dataset$`Nobel Prizes`)] <- 0global_information_dataset <- global_information_dataset %>%
mutate(`Happiness Index` = as.numeric(`Happiness Index`))
global_information_dataset <- global_information_dataset %>%
mutate(`Happiness Rank` = as.numeric(`Happiness Rank`))The goal of this section is to look at the data from each country, and allow people to view key metrics such as GDP and population. This can be a way to learn which countries are rich or struggling, and which countries have a lot of people. I will also be analysing the top countries which are losing people and gaining people, and which continents are the happiest.
#have to create a smaller dataset and change GDP to match world_map_data
countries_population_gdp <- global_information_dataset[c("Country", "Population", "Country GDP")]
colnames(countries_population_gdp) <- c("Country", "Population", "GDP")
#import world_map_data from libraries
if (!exists("world_map_data")) {
world_map_data <- ne_countries(scale = "medium", returnclass = "sf")
}
#edit them to match my dataset
world_map_data <- world_map_data %>%
mutate(admin = case_when(
admin == "eSwatini" ~ "Eswatini",
admin == "United States of America" ~ "United States",
admin == "Greenland" ~ "Greenland",
admin == "Ivory Coast" ~ "Côte d'Ivoire",
admin == "Republic of the Congo" ~ "Congo",
admin == "Democratic Republic of the Congo" ~ "DR Congo",
admin == "United Republic of Tanzania" ~ "Tanzania",
admin == "Somaliland" ~ "Somalia",
admin == "Republic of Serbia" ~ "Serbia",
admin == "Czechia" ~ "Czech Republic (Czechia)",
TRUE ~ admin
))
#sf library
world_sf <- st_as_sf(world_map_data)
#admin best fits my dataset Countries
world_sf <- world_sf %>%
left_join(countries_population_gdp, by = c("admin" = "Country"))
tmap_mode("view")
tm_shape(world_sf) +
tm_polygons(col = "admin",
id = "admin",
popup.vars = c("Population", "GDP"),
palette = "Set3",
legend.show = FALSE) +
tm_layout(legend.show = FALSE)Reasoning
This is the easiest way to understand information about countries. User can see where the country is located, it’s population and GDP.
interactive_scatter_plot <- ggplot(global_information_dataset, aes(
x = Population, y = `Country GDP`,
text = paste("Country:", Country,
"\nPopulation:", comma(Population),
"\nCountry GDP:", comma(`Country GDP`),
"\nHDI:", round(HDI, 3)))) +
geom_point(aes(color = HDI), alpha = 0.8) +
geom_smooth(method = "lm", se = FALSE, color = "white", linetype = "dashed") +
scale_x_log10(labels = label_number(scale_cut = cut_short_scale())) +
scale_y_log10(labels = label_number(scale_cut = cut_short_scale())) +
scale_color_viridis_c(option = "cividis", name = "HDI") +
theme_minimal(base_size = 14) +
theme(panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "white"),
text = element_text(color = "black"),
axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(vjust = 1),
legend.position = "bottom") +
labs(title = "GDP vs Population",
x = "Population",
y = "GDP")
ggplotly(interactive_scatter_plot, tooltip = "text")Obeservations
With the help of this interactive scatterplot, it is clear which countries have most population, and GDP.
country_population_change <- global_information_dataset %>%
select(Country, Population, `Population Net Change`, `Net Migrants`, Continent)
create_population_plot <- function(data, title, y_title, colour1, colour2) {
data <- data %>%
mutate(Country = fct_reorder(Country, `Population Net Change`)) %>%
arrange(`Population Net Change`)
data_long <- data %>%
pivot_longer(cols = c(`Population Net Change`, `Net Migrants`), names_to = "Metric", values_to = "Value") %>%
mutate(tooltip_text = paste0(Metric, ": ", comma(Value)))
temp_plot <- ggplot(data_long, aes(x = Country, y = Value, fill = Metric, text = tooltip_text)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.6)) +
scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()),
breaks = pretty_breaks(n = 5)) +
scale_fill_manual(values = c("Population Net Change" = colour1, "Net Migrants" = colour2)) +
labs(title = title, x = "Country", y = y_title, fill = "Metric") +
theme_minimal() +
coord_flip()
ggplotly(temp_plot, tooltip = "text")
}
top_population_loss <- country_population_change %>% arrange(`Population Net Change`) %>% slice_head(n = 20)
top_population_gain <- country_population_change %>% arrange(desc(`Population Net Change`)) %>% slice_head(n = 20)
loss_plotly <- create_population_plot(top_population_loss, "Countries with Biggest Population Loss", "Population Loss / Net Migrants","red", "lightblue" )
gain_plotly <- create_population_plot(top_population_gain, "Countries with Biggest Population Gain", "Population Gain / Net Migrants","lightgreen", "orange")loss_plotlyObeservations Here we can visualise which countries are having the biggest population losses, and how many people are migrating
gain_plotlyObeservations
Here we can visualise which countries are having the biggest population gains, and how many people are migrating
continent_happiness <- global_information_dataset %>%
select(Continent, `Happiness Index`, `Happiness Rank`)
avg_continent_happiness <- continent_happiness %>%
group_by(Continent) %>%
summarise(Avg_Happiness = mean(`Happiness Index`, na.rm = TRUE))
interactive_avg_continent_happiness <- ggplot(avg_continent_happiness, aes(x = Continent, y = Avg_Happiness, color = Continent, text = paste("Happiness Index:", round(Avg_Happiness, 2)))) +
geom_segment(aes(xend = Continent, y = 0, yend = Avg_Happiness), linewidth = 1.5) +
geom_point(size = 4) +
labs(title = "Happiness by Continent",
x = "Continent",
y = "Happiness Index") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(interactive_avg_continent_happiness, tooltip = "text") %>%
layout(legend = list(title = list(text = "Continent"))) Observations
Correlations
inflation_unemployment_gdppc <- global_information_dataset %>%
select(`Inflation Rate`, Unemployment, `GDP Per Capita`, `Net Migrants`)
cor_test_result <- cor.test(
inflation_unemployment_gdppc$`Inflation Rate`,
inflation_unemployment_gdppc$Unemployment,
use = "complete.obs"
)
cat("Correlation between Inflation and Unemployment:",
"\n Coefficient:", round(cor_test_result$estimate, 3),
"\n p-value:", format.pval(cor_test_result$p.value, eps = 0.001), "\n")Correlation between Inflation and Unemployment:
Coefficient: 0.151
p-value: 0.030257
The correlation is positive between inflation and unemployment, but is weak. Even though it is weak, the p-value shows that it is still significant, and not random.
cor_test_gdp <- cor.test(
inflation_unemployment_gdppc$Unemployment,
inflation_unemployment_gdppc$`GDP Per Capita`,
use = "complete.obs"
)
cat("Correlation between Unemployment and GDP per capita:",
"\n Coefficient:", round(cor_test_gdp$estimate, 3),
"\n p-value:", format.pval(cor_test_gdp$p.value, eps = 0.001), "\n") Correlation between Unemployment and GDP per capita:
Coefficient: -0.216
p-value: 0.001795
The correlation is negative between unemployment and inflation, but fairly weak. This means that when unemploment decreaseses, GDP per capita tends to increase. The p-value of is highly significant as its far bellow the threshold, making it almost certain it is not random.
inflation_unemployment <- ggplot(global_information_dataset, aes(x = Unemployment, y = `Inflation Rate`)) +
geom_point(alpha = 0.6, color = "black", size = 1) +
geom_smooth(method = "lm", color = "red", fill = "pink") +
labs(title = "Inflation / Unemployment",
subtitle = "Weak positive correlation",
x = "Unemployment Rate",
y = "Inflation Rate") +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(color = "grey", size = 13),
axis.title = element_text(size = 12),
panel.grid.minor = element_blank()
) +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_x_continuous(labels = scales::percent_format(scale = 1))
gdppc_unemployment <- ggplot(global_information_dataset, aes(x = Unemployment, y = `GDP Per Capita`)) +
geom_point(alpha = 0.6, color = "black", size = 1) +
geom_smooth(method = "lm", color = "steelblue", fill = "lightblue") +
labs(title = "GDP per Capita / Unemployment",
subtitle = "Moderate negative correlation",
x = "Unemployment Rate",
y = "GDP per Capita") +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(color = "grey", size = 13),
axis.title = element_text(size = 12),
panel.grid.minor = element_blank()
) +
scale_x_continuous(labels = scales::percent_format(scale = 1)) +
scale_y_continuous(labels = scales::dollar_format())
gdpcc_infl_unempl <- plot_grid(inflation_unemployment, gdppc_unemployment, ncol = 2)
gdpcc_infl_unemplObeservations
gdppc_matrix <- cor(inflation_unemployment_gdppc, use = "complete.obs")
melted_gdppc_matrix <- melt(gdppc_matrix)
ggplot(data = melted_gdppc_matrix, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "red", high = "orange", mid = "lightyellow",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
coord_fixed() +
geom_text(aes(label = round(value, 3)), color = "black", size = 4) +
labs(title = "Correlation Heatmap",
x = "", y = "")Observations
inflation_unemployment_gdppc_migrant <- inflation_unemployment_gdppc %>%
mutate(GDP_Bin = cut(`GDP Per Capita`, breaks = 10, labels = FALSE)) %>%
mutate(bin_min = min(`GDP Per Capita`) + (max(`GDP Per Capita`) - min(`GDP Per Capita`)) * (GDP_Bin - 1) / 10,
bin_max = min(`GDP Per Capita`) + (max(`GDP Per Capita`) - min(`GDP Per Capita`)) * GDP_Bin / 10,
GDP_Bin = paste0(format(round(bin_min), big.mark = ","), " - ", format(round(bin_max), big.mark = ","))) %>%
group_by(GDP_Bin) %>%
summarise(`Total Net Migrants` = sum(`Net Migrants`, na.rm = TRUE)) %>%
ungroup() %>%
filter(!(GDP_Bin %in% tail(unique(GDP_Bin), 2))) %>%
mutate(GDP_Bin = factor(GDP_Bin, levels = unique(GDP_Bin)))
gdppc_migrant_bar <- ggplot(inflation_unemployment_gdppc_migrant,
aes(x = 1, y = GDP_Bin, size = abs(`Total Net Migrants`),
text = paste("GDP Range:", GDP_Bin, "Net Migrants:", scales::comma(`Total Net Migrants`)))) +
geom_point(aes(color = ifelse(`Total Net Migrants` > 0, "Coming", "Leaving"))) +
scale_color_manual(values = c("Coming" = "darkgreen", "Leaving" = "pink"),
name = "Migration Direction") +
scale_size_continuous(range = c(2, 20), "") +
labs(title = "Net Migrants by GDP Per Capita",
y = "GDP Per Capita") +
theme_minimal(base_size = 12) +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_text(size = 11))
ggplotly(gdppc_migrant_bar, tooltip = "text")Observations
Correlations
smart_data <- global_information_dataset %>%
select(`Mean Schooling Years`, `Nobel Prizes`, `Literacy Rate`, `Average IQ`, `Education Expenditure`, `Net Migrants`)
cor_test_msy_lit <- cor.test(
smart_data$`Mean Schooling Years`,
smart_data$`Literacy Rate`,
use = "complete.obs"
)
cat("Correlation between Mean Schooling Years and Literacy Rate:",
"\n Coefficient:", round(cor_test_msy_lit$estimate, 3),
"\n p-value:", format.pval(cor_test_msy_lit$p.value, eps = 0.001), "\n\n")Correlation between Mean Schooling Years and Literacy Rate:
Coefficient: 0.84
p-value: < 0.001
cor_test_iq_nobel <- cor.test(
smart_data$`Average IQ`,
smart_data$`Nobel Prizes`,
use = "complete.obs"
)
cat("Correlation between Average IQ and Nobel Prizes:",
"\n Coefficient:", round(cor_test_iq_nobel$estimate, 3),
"\n p-value:", format.pval(cor_test_iq_nobel$p.value, eps = 0.001), "\n\n")Correlation between Average IQ and Nobel Prizes:
Coefficient: 0.208
p-value: 0.0055678
cor_test_ee_nobel <- cor.test(
smart_data$`Education Expenditure`,
smart_data$`Nobel Prizes`,
use = "complete.obs"
)
cat("Correlation between Education Expenditure and Nobel Prizes:",
"\n Coefficient:", round(cor_test_ee_nobel$estimate, 3),
"\n p-value:", format.pval(cor_test_ee_nobel$p.value, eps = 0.001), "\n")Correlation between Education Expenditure and Nobel Prizes:
Coefficient: -0.061
p-value: 0.43852
lit_rate_school_year <- ggplot(smart_data, aes(x = `Mean Schooling Years`, y = `Literacy Rate`)) +
geom_point(color = "darkgreen", alpha = 0.6, size = 2) +
geom_smooth(method = "lm", formula = y ~ x,
color = "black", se = FALSE, linewidth = 1.2) +
labs(title = paste("Mean Schooling Years / Literacy Rate"),
x = "Mean Schooling Years",
y = "Literacy Rate") +
theme_minimal(base_size = 14) +
scale_y_continuous(labels = scales::percent_format(scale = 100))
ggplotly(lit_rate_school_year, tooltip = c("x", "y")) %>%
layout(hoverlabel = list(bgcolor = "lightgreen",
font = list(size = 14)),
margin = list(t = 60)) %>%
config(displayModeBar = TRUE)Observations
The trend is quite obvious. The more schooling people get, the more literate people there are. However, not everywhere many years are needed to achieve top of the charts.
smart_data_range <- smart_data %>%
mutate(iq_range = cut(`Average IQ`, breaks = 10))
nobel_summary <- smart_data_range %>%
group_by(iq_range) %>%
summarize(Total_Nobel_Prizes = sum(`Nobel Prizes`))
nobel_plot <- ggplot(smart_data_range, aes(x = iq_range, y = `Nobel Prizes`, fill = iq_range)) +
geom_bar(stat = "sum", alpha = 0.9) +
scale_fill_viridis_d(option = "cividis") +
labs(title = "Nobel Prizes by IQ Range",
x = "Average IQ Range",
y = "Total Nobel Prizes") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none",
panel.grid.major.x = element_blank()
)
interactive_nobel_plot <- ggplotly(nobel_plot, tooltip = c("x", "y")) %>%
layout(hoverlabel = list(bgcolor = "white", font = list(size = 12))) %>%
config(displayModeBar = TRUE)
interactive_nobel_plotObservations
As expected, high IQ is important for Nobel prizes. However, it is not always the top that gets them. Some lower brackets also are able win some Nobel Prizes.
smart_data_binned <- smart_data %>%
mutate(Expenditure_Category = cut(`Education Expenditure`,
breaks = quantile(`Education Expenditure`, probs = seq(0, 1, by = 0.25), na.rm = TRUE),
labels = c("Low", "Medium", "High", "Very High"),
include.lowest = TRUE))
mean_data <- smart_data_binned %>%
group_by(Expenditure_Category) %>%
summarize(Mean_Literacy = mean(`Literacy Rate`, na.rm = TRUE),.groups = "drop") %>%
mutate(Tooltip_Label = paste("Mean Literacy Rate:", round(Mean_Literacy, 10), "%"))
lit_ee_boxplot <- ggplot() +
geom_boxplot(data = smart_data_binned, aes(x = Expenditure_Category,y = `Literacy Rate`, fill = Expenditure_Category), alpha = 0.8, outlier.shape = 21, outlier.fill = "white", outlier.alpha = 0.7,width = 0.6) +
geom_point(data = mean_data, aes(x = Expenditure_Category, y = Mean_Literacy, text = Tooltip_Label), shape = 23, size = 3, fill = "white",color = "black") +
scale_fill_viridis_d(option = "cividis") +
labs(
title = "Literacy Rate / Education Expenditure",
x = NULL,
y = "Literacy Rate"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 13, hjust = 0.5),
axis.text = element_text(size = 10),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "none",
plot.margin = margin(t = 20, r = 20, b = 20, l = 20)
) +
scale_y_continuous(labels = scales::percent_format(scale = 100))
interactive_tidy_boxplot <- ggplotly(lit_ee_boxplot, tooltip = "text") %>%
layout(hoverlabel = list(bgcolor = "white", font = list(size = 11)),
margin = list(t = 50, r = 50, b = 50, l = 50)
) %>%config(displayModeBar = FALSE)
interactive_tidy_boxplotObservations
While massive expenditure does not mean countries will be the top in literacy rate, it does help, as countries with the lowest spenditure show the biggest spread of literacy rate.
smart_data_migrants <- smart_data %>%
mutate(Education_Bin = cut(`Education Expenditure`, breaks = 25)) %>%
group_by(Education_Bin) %>%
summarise(`Mean Net Migrants` = mean(`Net Migrants`, na.rm = TRUE),
`Mean Education Expenditure` = mean(`Education Expenditure`, na.rm = TRUE)) %>%
ungroup()
migrant_line <- ggplot(smart_data_migrants, aes(x = `Mean Education Expenditure`, y = `Mean Net Migrants`)) +
geom_hline(yintercept = 0, color = "red", size = 0.5, linetype = "solid") +
geom_line(color = "black", size = 1) +
geom_point(aes(text = paste("Education Expenditure:", round(`Mean Education Expenditure`, 4),
"<br>Net Migrants:", round(`Mean Net Migrants`, 2))),
color = "purple", size = 2) +
labs(title = "Net Migrants / Education Expenditure",
x = "Education Expenditure of GDP",
y = "Average Net Migrants") +
scale_x_continuous(labels = scales::percent_format(scale = 1)) +
theme_minimal(base_size = 14)
ggplotly(migrant_line, tooltip = "text") Observations
na_gdp_countries <- global_information_dataset %>%
filter(is.na(`Country GDP`)) %>%
select(Country, Population)
total_population <- sum(na_gdp_countries$Population, na.rm = TRUE)
threshold <- 0.01 * total_population
na_countries_grouped <- na_gdp_countries %>% mutate(Country = ifelse(Population < threshold, "Other", Country)) %>% group_by(Country) %>% summarise(Population = sum(Population)) %>% ungroup()
plot_ly(na_countries_grouped, labels = ~Country, values = ~Population, type = 'pie', textinfo = 'label+value', hoverinfo = 'label+value') %>% layout(title = "Countries with Missing GDP By Population")Observations
These are the countries which lack a lot of financial data. One of them being their GDP. The reason for that cannot be decisive, but upon my research, those countries do no openly state their investments, and only rough assumptions can be made my experts. Usually it’s countries with higher corruption rate, or other similar reasons, but this is not always the case, and I do not have the graph to prove so.
Why do people migrate?
As shown, there are two main reasons for migration. One is for financial security, and the other is for the sake of education. It is understandable that people thrive for financial security more as that means survival. It is also understandable that people want to migrate to countries so their kids can get better education.
Migration Issues
As of recently, there has been a lot of news regarding migration, and my charts solidified my previous observations. It is understandable that people want a better life. It can also be seen that Asia alone have very high levels of migration, which can cause issues so smaller countries, as they are not able to keep up with the demand.
Making final dataset into .csv
write.csv(global_information_dataset, "VI_FullDataset.csv", row.names = FALSE)