Modifying order of a legend in ggplot by total sum observations - r

I'm graphing a line plot in ggplot of numbers of migrants to a city over x years, based on country of origin. Each country is graphed as its own line, plotted on a graph against other countries, over a period of five years.
I want to order the legend by country from largest to smallest total sum of migrants over the x years, regardless of the total number of countries, instead of alphabetically as it is now.
I've tried using forcats commands such as fct_relevel, but haven't been able to find anything other than doing it manually, which can be time consuming for multiple graphs.
My data frame has variables year, country, and number_migrants, and each observation is a year-country pair.
library(tidyverse)
g <- ggplot(migrants, aes(x=year, y=number_migrants, col=country)) +
geom_line()
Current example:

You need fct_reorder
library(dplyr)
library(forcats)
migrants %>%
mutate(
country = fct_reorder(country, number_migrants, .desc = TRUE)
) %>%
ggplot(migrants, aes(x=year, y=number_migrants, col=country)) +
geom_line()

Related

GGplot2: plotting values per year of specific columns in a dataframe

I am trying to plot values on the y axis against years on the x axis with ggplot2.
This is the dataset: https://drive.google.com/file/d/1nJYtXPrxD0xvq6rBz2NXlm4Epi52rceM/view?usp=sharing
I want to plot the values of specific countries.
It won't work by just specifying year as the x axis and a country's values on the y axis. I'm reading I need to melt the data frame, so I did that, but it's now in a format that doesn't seem convenient to get the job done.
I'm assuming I haven't correctly melted, but I have a hard time finding what I need to specifically do.
What I did beforehand is manually transpose the data and make the years a column, as well as all the countries.
This is the dataset transposed:
https://drive.google.com/file/d/131wNlubMqVEG9tID7qp-Wr8TLli9KO2Q/view?usp=sharing
Here's how I melted:
inv_melt.data <- melt(investments_t.data, id.vars="Year")
ggplot() +
geom_line(aes(x=Year, y=value), data = inv_melt.data)
The plot shows the aggregated values of all countries per year, but I want them per country in such a manner that I can also select to plot certain countries only.
How do I utilize melt in such a manner? Could someone walk me through this?
There are no columns named "Year" in the linked to data set, there are columns per year. So it need to be melted by "country" and then the "variable" edited with sub.
inv_melt.data <- reshape2::melt(investments_t.data, id.vars="country")
inv_melt.data$variable <- as.integer(sub("^X", "", inv_melt.data$variable))
ggplot(inv_melt.data, aes(variable, value, color = country)) +
geom_line(show.legend = FALSE)
Edit.
The following code keeps only some countries, filtering out the ones with more missing values.
i <- sapply(investments_t.data[-1], function(x) sum(is.na(x)) == 0)
i <- c(1, which(i))
inv_melt.data <- reshape2::melt(investments_t.data[i], id.vars = "Year")
ggplot(inv_melt.data, aes(Year, value, color = variable)) +
geom_line(show.legend = FALSE)

Multiple barplots of different mean of years in one plot

I currently have two dataframes. I wish to get multiple bar plots from both of them in one plot using ggplot. I want to get an average of 'NEE' variable from different years(1850-1950,1951-2012,2013-2100) from both dataframes and plot side by side just like in this green barplot visualization(https://ars.els-cdn.com/content/image/1-s2.0-S0048969716303424-fx1_lrg.jpg).
The header of two dataframes is as follows (this is only a portion).The header is the same for both dataframes from year 1850-1859:
How can I achieve plotting bar plots lets say for the year 1850-1852 , 1854-1856, 1857-1859 from both dataframes in one plot. I know the barplots will be the same in this case as both data frames are similar, but i would like to get an idea and I can edit the code to my desired years.
(Note that I have 39125 obs with 9 variables)
This is what I have done so far (by following a solution posted by member in this website).I achieved data1 and data2 geom_col successfully.But how can i merge them together and plot geom_col of 1850-1852 , 1854-1856, 1857-1859 side by side from both dataframes?graph of data1 graph of data2 :
data1 %>%
# case_when lets us define yr_group based on Year:
mutate(yr_group = case_when(Year <= 1950 ~ "1850-1950",
Year <= 2012 ~ "1951-2012",
Year <= 2100 ~ "2013-2100",
TRUE ~ "Other range")) %>%
# For each location and year group, get the mean of all the columns:
group_by(Lon, Lat, yr_group) %>%
summarise_all(mean) %>%
# Plot the mean Total for each yr_group
ggplot(aes(yr_group, NEE)) + geom_col(position =
"dodge")+theme_classic()+xlab("Year")+ylab(ln)+labs(subtitle="CCSM4
RCP2.6")+
geom_hline(yintercept=0, color = "black", size=1)
My preferred approach is usually to do the data summarization first and then send the output to ggplot. In this case, you might use dplyr from the tidyverse meta-package to add a variable relating to which time epoch a given year belongs to, and then collect the stats for that whole epoch.
For instance, just using your example data, we might group those years arbitrarily and find the averages for 1850-51, 1852-53, and 1854-55, and then display those next to each other:
library(tidyverse)
df %>%
# case_when lets us define yr_group based on Year:
mutate(yr_group = case_when(Year <= 1851 ~ "1850-51",
Year <= 1853 ~ "1852-53",
Year <= 1855 ~ "1854-55",
TRUE ~ "Other range")) %>%
# For each location and year group, get the mean of all the columns:
group_by(Lon, Lat, yr_group) %>%
summarise_all(mean) %>%
# Plot the mean Total for each yr_group
ggplot(aes(yr_group, Total)) + geom_col()
If you have multiple locations, you might use ggplot facets to display those separately, or use dodge within geom_col (equivalent to geom_bar(stat = "identity"), btw) to show the different locations next to each other.

How to make density histogram divided up on second value in ggplot2?

I have a problem with my density histogram in ggplot2. I am working in RStudio, and I am trying to create density histogram of income, dependent on persons occupation. My problem is, that when I use my code:
data = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
sep=",",header=F,col.names=c("age", "type_employer", "fnlwgt", "education",
"education_num","marital", "occupation", "relationship", "race","sex",
"capital_gain", "capital_loss", "hr_per_week","country", "income"),
fill=FALSE,strip.white=T)
ggplot(data=dat, aes(x=income)) +
geom_histogram(stat='count',
aes(x= income, y=stat(count)/sum(stat(count)),
col=occupation, fill=occupation),
position='dodge')
I get in response histogram of each value divided by overall count of all values of all categories, and I would like for example for people earning >50K whom occupation is 'craft repair' divided by overall number of people whos occupation is craft-repair, and the same for <=50K and of the same occupation category, and like that for every other type of occupation
And the second question is, after doing propper density histogram, how can I sort the bars in decreasing order?
This is a situation where it makes sence to re-aggregate your data first, before plotting. Aggregating within the ggplot call works fine for simple aggregations, but when you need to aggregate, then peel off a group for your second calculation, it doesn't work so well. Also, note that because your x axis is discrete, we don't use a histogram here, instead we'll use geom_bar()
First we aggregate by count, then calculate percent of total using occupation as the group.
d2 <- data %>% group_by(income, occupation) %>%
summarize(count= n()) %>%
group_by(occupation) %>%
mutate(percent = count/sum(count))
Then simply plot a bar chart using geom_bar and position = 'dodge' so the bars are side by side, rather than stacked.
d2 %>% ggplot(aes(income, percent, fill = occupation)) +
geom_bar(stat = 'identity', position='dodge')

Cumulative sum across time

I have a dataset that I want to summarize through time. I have a period of ten dates and flower counts on three plants (Tomato, Pepper, Squash). I would like to create a ggplot (barplot) plot that sums the number of flowers and displays them as a stacked bar plot colored by plant. The Y axis should be the cumulative sum of flowers and the x axis should be time. When I use cum_sum the output does not make sense to me. Any help would be great! Thanks.
dataset here
df.sum<- df.sub%>% group_by(Date) %>% mutate(cumsum_covered = cumsum(Tomato))
ggplot (df.sum, aes (x=Date, y=cumsum_covered)) + geom_bar(stat="identity")
You are grouping by date so the cumsum will always be the single value. We want to get the cumsum of each fruit ordered by date
df.sum <- df.sub %>%
# This gives us Date, fruit, amount
gather(fruit, amount, Tomato, Pepper, Squash) %>%
# We group by the fruit to get only the cumsums for the correct fruit and order by date
group_by(fruit) %>%
arrange(Date) %>%
mutate(cumsum_covered = cumsum(amount))
ggplot(df.sum, aes(Date, cumsum_covered, fill=fruit)) +
geom_col(position="stack")

How to plot using ggplot2

I have a task and i need to plot graph using ggplot2.
I have a vector of rating (Samsung S4 ratings from its users)
I generate this data using this:
TestRate<- data.frame (rating=sample (x =1:5, size=100, replace=T ), month= sample(x=1:12,size=100,rep=T) )
And now I need to plot a graph, where on X axis will be dates (monthes in our example data) and 5 different lines grouped by 5 different ratings (1,2,3,4,5). Each line shows count of its ratings for corresponding month
How can I plot this in ggplot2?
You need first to count the number of elements per couple of (rating, month):
library(data.table)
setDT(TestRate)[,count:=.N,by=list(month, rating)]
And then you can plot the result:
ggplot(TestRate, aes(month, count, color=as.factor(rating))) + geom_line()
If your data.table is not set (so to speak), you can use dplyr (and rename the legend while you are at it).
df <- TestRate %>% group_by(rating, month) %>% summarise(count = n())
ggplot(df, aes(x=month, y=count, color=as.factor(rating))) + geom_line() + labs(color = "Rating")

Resources