I'm new to using R so please bear with me as my code might not look the best. So I want to combine these two line graphs together since right now I have written code for each item that I am analyzing. This is the dataset I am using: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-01/readme.md I used the "Arable_Land" dataset!
##USA Arable Land
plot_arable_land_USA <- arable_land %>%
filter(Code == "USA") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_USA)
##Canada Arable Land
plot_arable_land_CAN <- arable_land %>%
filter(Code == "CAN") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_CAN)
Ideally, I would like one graph to show both like one line (in Purple) to show the USA and another line(in Brown) to show Canada.
Thank you!
Try this. It is a better practice to reshape data to long as you did. In your case you can add filter() to choose the desired countries. Then, reshape to long and design the plot. The key is setting color and group with Code in order to obtain the desired lines. You can set the colors using scale_color_manual() and I have left the facet option to get the title. Here the code:
library(plotly)
library(tidyverse)
#Code
plot_arable_land_CAN <- arable_land %>% select(-Entity) %>%
filter(Code %in% c('USA','CAN')) %>%
pivot_longer(-c(Code,Year)) %>%
ggplot(aes(x = Year, y = value,color=Code,group=Code)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')+
scale_color_manual(values = c('brown','purple'))
#Transform
ggplotly(plot_arable_land_CAN)
Output:
Related
So, I'm trying to plot missing values here over time (longitudinal data).
I would prefer placing them in a geom_col() to fill up with colours of certain treatments afterwards. But for some weird reason, geom_col() gives me weird values, while geom_point() gives me the correct values using the same function. I'm trying to wrap my head around why this is happening. Take a look at the y-axis.
Disclaimer:
I know the missing values dissappear on day 19-20. This is why I'm making the plot.
Sorry about the lay-out of the plot. Not polished yet.
For the geom_point:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_point()
Picture: geom_point
For the geom_col:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_col()
Picture: geom_col
The problem is that you're using mutate and creating several rows for your groups. You cannot see that, but you will have plenty of points overlapping in your geom_point plot.
One way is to either use summarise, or you use distinct
Compare
library(tidyverse)
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_point()
The points look ugly because there is a lot of over plotting.
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
distinct(order, .keep_all = TRUE) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
Created on 2021-06-02 by the reprex package (v2.0.0)
So after some digging:
What happens was that the geom_col() function sums up all the missing values while geom_point() does not. Hence the large values for y. Why this is happening, I do not know. However doing the following worked fine for me:
gaussian_transformed$time <- as.factor(gaussian_transformed$time)
gaussian_transformed %>% group_by(time) %>% summarise(missing = sum(is.na(Rose_width))) -> gaussian_transformed
gaussian_transformed %>% ggplot(aes(x = time, y = missing)) + geom_col(fill = "blue", alpha = 0.5) + theme_minimal() + labs(title = "Missing values in Gaussian Outcome over the days", x = "Time (in days)", y = "Amount of missing values") + scale_y_continuous(breaks = seq(0, 10, 1))
With the plot: GaussianMissing
I am trying to create a line graph that shows how many pounds each milk type sold in 2017. It comes from this dataset https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/fluid_milk_sales.csv
This is what I have but I get a message asking if I need to adjust the group aesthetic. Not sure what I am doing wrong so I would love some assistance.
options(scipen = 999)
fluid_milk_sales %>%
filter(year == 2017) %>%
select(milk_type, pounds) %>%
ggplot(aes(x = milk_type, y = pounds)) +
geom_line()
You get that error your x-variable is a category and you can't join them into a line. I guess you would need a bar plot (I flip the plot so that the types can be read, you can remove coord_flip() if you don't need that) :
fluid_milk_sales %>%
filter(year == 2017) %>%
ggplot(aes(x = reorder(milk_type,pounds), y = pounds)) +
geom_col() + xlab("milk_type") + coord_flip()
Or if you want like a lollipop plot, it goes like:
fluid_milk_sales %>%
filter(year == 2017) %>%
ggplot(aes(x = reorder(milk_type,pounds), y = pounds)) +
geom_point() +
geom_segment(aes(xend = milk_type, yend = 0)) +
coord_flip() + xlab("milk_type")
If you really want to force a line, which I think doesn't make sense (note I reorder with the negative to start with the highest):
fluid_milk_sales %>%
filter(year == 2017) %>%
ggplot(aes(x = reorder(milk_type,-pounds), y = pounds,group=1)) +
geom_line() + xlab("milk_type")
I would like to create a line plot using ggplot's geom_line() where all distances between years are equal independent of the actual value the year-variable takes and where the dots of geom_point() are connected if there are only two years in between but not if the temporal distance is more than that.
Example:
my.data<-data.frame(
year=c(2001,2003,2005,NA,NA,NA,NA,NA,NA,2019),
value=c(runif(10)))
As for the plot I have tried two different things, both of which are not ideal:
Plotting year as continuous variable with breaks=year and minor_breaks=F, where, obviously the distances between the first three observations are much smaller than the distance between 2005 and 2019, and where, unfortunately, all dots are connected:
library(ggplot2)
library(dplyr)
my.data %>%
ggplot(aes(x=year,y=value)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
Removing NAs and plotting year as factor which yields equal spacing between the years, but obviously removes the lines between data points:
my.data %>%
filter(!is.na(year)) %>%
ggplot(aes(x=factor(year),y=value)) +
geom_line() +
geom_point() +
theme_minimal()
Are there any solutions to these issues? What am I overlooking?
First attempt:
Second attempt:
What I need (but ideally without the help of Paint):
my.data %>%
ggplot(aes(x=year)) +
geom_line(aes(y = ifelse(year <= 2005,value,NA))) +
geom_point(aes(y = value)) +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
maybe something like this would work
I came to a bit convoluted and not super clean solution, but it might get the job done. I am checking if one year should be connected to the next one with lead(). And "remove" the appropriate connections by turning them white. The dummy column is there to put all years in one line and not two.
my.data = data.frame(year=c(2001,2003,2005,2008,2009,2012,2015,2016,NA,2019),
value=c(runif(10))) %>%
filter(!is.na(year)) %>%
mutate(grouped = if_else(lead(year) - year <= 2, "yes", "no")) %>%
fill(grouped, .direction = "down") %>%
mutate(dummy = "all")
my.data %>%
ggplot(aes(x = factor(year),y = value)) +
geom_line(aes(y = value, group = dummy, color = grouped), show.legend = FALSE) +
geom_point() +
scale_color_manual(values = c("yes" = "black", "no" = "white")) +
theme_classic()
The name of the countries are long and are on top of each other in the x labels, how can I make it readable?
ggplot(results, aes(x = Nationality, horiz=TRUE)) +
theme_solarized() +
geom_bar() +
labs(y = "Number of Medals",
title = "Number of Medals by Country")
Welcome to stackoverflow. Here are some suggestions on how you can deal with the many values. In both methods, I am using the forcats library within the tidyverse. You can read more about it here: https://r4ds.had.co.nz/factors.html
First, some fake data & replicating your problem
library(tidyverse)
df <-
mpg %>%
arrange(manufacturer) %>%
mutate(
n = row_number(),
vehicle = paste(year, manufacturer, model)
) %>%
uncount(n)
# this replicates your problem
ggplot(df, aes(vehicle)) +
geom_bar() +
coord_flip()
Option 1: consolidate
df %>%
mutate(
vehicle = # making heavy use of forcats here
fct_lump(vehicle, 35) %>% # keep only the 35 most frequent values, others in "Other" category
fct_infreq() %>% # order them by frequency
fct_rev() #reverse the order
) %>%
ggplot(aes(vehicle)) +
geom_bar() +
coord_flip()
Option 2: facet
Someone may have a more elegant way of getting these groups but I use this method quite a bit
df %>%
mutate(
vehicle = # similar methods to earlier
fct_infreq(vehicle) %>%
fct_rev(),
num_fct = as.integer(vehicle), # generates a number for each factor
facet = (max(num_fct)-num_fct) %/% 20 # will make groups of 20, but they need to be in descending order within each facet
) %>%
ggplot(aes(vehicle)) +
geom_bar() +
coord_flip() +
facet_wrap(~facet, scales = "free_y", nrow = 1) +
theme(
strip.background = element_blank(),
strip.text = element_blank()
)
Hope this helps.
Simplifying my question in terms of generic titanic dataset:
how can i get a following plot for all the attributes in my dataset
If possible, i would also want to get the count or percentage for each category.
Thank you for your help in advance.
Regards, Trupti
With the Titanic data set this can be accomplished using
library(tidyverse)
data("Titanic")
Titanic %>%
as.data.frame() %>% # transform from a table to dataframe
gather(variable, value, -Freq) %>% # change to long format
group_by(variable, value) %>%
summarise(Freq = sum(Freq)) %>% # get the freq for each level of each variable
ggplot(aes(variable, Freq, fill = value)) +
geom_col(position = position_stack()) +
geom_text(aes(label = paste0(value, " (", Freq, ")")), vjust = 1,
position = position_stack()) +
theme(legend.position = "none")