Outline grouped plot - r

I would like to outline the shape of the graph produced by a group plot.
Using this code:
# Libraries
library(ggplot2)
library(babynames)
library(dplyr)
# Keep only 3 names
don <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Helen")) %>%
filter(sex=="F")
# Plot
don %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line()
I get this:
Is it possible to keep only the outline of the produced graph?
Example output:

There might be a ggplot2 way to do this but here one attempt using dplyr :
library(dplyr)
library(ggplot2)
don %>%
group_by(year) %>%
slice(which.max(n)) %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line()
The logic here is we keep only the row with max n value for each year so it removes all those lines which are being plotted below the outline line that we want.

Related

ggplot2 - Continuous color scale for plot with many lines

I have a data set where I want to plot many lines in a single plot. the lines represent events that are ordered and I would like to use the color scale to represent that order. If I do this, I get
library(purrr)
library(ggplot2)
set.seed(100)
c(1:10) %>% set_names(seq_along(.)) %>%
map(~rnorm(50, 0, 1)) %>% map(cumsum) %>%
imap(~tibble(y=.x, color=as.integer(.y))) %>%
map(mutate, x=row_number()) %>%
reduce(union_all) %>%
ggplot(aes(x=x, y=y, color=color))+ geom_line()
I can solve the issue of the incorrect line by making color a factor
set.seed(100)
c(1:10) %>% set_names(seq_along(.)) %>%
map(~rnorm(50, 0, 1)) %>% map(cumsum) %>%
imap(~tibble(y=.x, color=as.factor(.y))) %>% #this is the only changed line
map(mutate, x=row_number()) %>%
reduce(union_all) %>%
ggplot(aes(x=x, y=y, color=color))+ geom_line()
to get the correct line plot, but now my color scales are discrete and the legend is too. I would like the legend to look like the first example and the plot like the second example. furthermore, in the actual data the events are not uniformly spaced around, so the behavior of the continuous color scale is important because the color conveys distance. I tried group=color but that doesn't work. What aesthetic am I missing here that would help me achieve the desired outcome?
set.seed(100)
c(1:10) %>% set_names(seq_along(.)) %>%
map(~rnorm(50, 0, 1)) %>% map(cumsum) %>%
imap(~tibble(y=.x, color=as.integer(.y))) %>%
map(mutate, x=row_number()) %>%
reduce(union_all) %>%
ggplot(aes(x=x, y=y, group=color, color=color))+ geom_line()

How to input data in excel/csv to make multiple chart in R studio

I have a data here, my data.
I would like to make graph like this example multichart.
I have tried to run this script below.
However, I dont understand how to input my data in excel to run this script.
Does anyone to help me? Please, I have thought about this 3 days and The deadline is very soon. Thank you for your help
# Libraries
library(ggplot2)
library(babynames) # provide the dataset: a dataframe called babynames
library(dplyr)
library(hrbrthemes)
library(viridis)
# Keep only 3 names
don <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Helen")) %>%
filter(sex=="F")
# Plot
don %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
ylab("Number of babies born")
You may read the data using readxl::read_excel, get it in long format and plot using ggplot.
library(tidyverse)
data <- readxl::read_excel('example data.xlsx')
data %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row, values_drop_na = TRUE) %>%
ggplot() + aes(row, value, color = name) +
geom_line()

r filter() issue: plotly vs ggplot

I am starting to learn interactive data viz and basic data analysis with R (mainly plotly).
I am having an issue while using the dplyr function filter() while plotting with plotly in R.
here is an example using the gapminder dataset:
library(gapminder)
# filter by year and continent
gapminder_2002_asia <- gapminder %>%
filter(year== 2002 & continent == "Asia")
# plot gpd/capita bar chart using plotly
gapminder_2002_asia %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
this is the results: all the world countries present in the initial data set are on the x axis:
plotly graph as image
On the other hand, if just make a static graph with ggplot, I only have the asian countries appearing on the x axis:
gapminder_2002_asia %>%
ggplot(aes(country, gdpPercap, fill = country)) +
geom_col()
ggplot graph
I really do not understand how this is happening as they both come from the same df..
Very odd.
As an alternative while you debug that code, why not try using ggplotly()?
E. G.
p <- gapminder_2002_asia %>%
ggplot(aes(country, gdpPercap, fill = country)) +
geom_col()
plotly::ggplotly(p)
I'd be curious which version of the plot came out the far end!
The reason is that plotly is taking all the levels inside the country variable while ggplot2 only takes the available values in your dataset. So, to get same results, yu can use this:
library(plotly)
library(ggplot2)
#Plotly
gapminder_2002_asia %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
Output:
And with ggplot2:
#ggplot2
gapminder_2002_asia %>%
ggplot(aes(country, gdpPercap, fill = country)) +
geom_col()+
scale_x_discrete(limits=levels(gapminder_2002_asia$country))+
theme(axis.text.x = element_text(angle=90))
Output:
Update: In order to get the same output in plotly you could use something like this, which will be similar to your ggplot2 initial code for plotting:
#Plotly 2
gapminder_2002_asia %>%
mutate(country=as.character(country)) %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
Output:
The key to tackle is the factors in your dataset.
Another option can be fct_drop() from forcats (many thanks and credit to #user2554330):
library(forcats)
#Plotly 2
gapminder_2002_asia %>%
mutate(country=fct_drop(country)) %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
Output:

Keeping dodge position in boxplot passed to plotly

I have a regular boxplot in ggplot2:
# working example
library(ggplot2)
mtcars %>%
mutate(cyl=as.factor(cyl)) %>%
mutate(vs=as.factor(vs)) %>%
ggplot(aes(y=mpg, x=cyl)) +
geom_boxplot(aes(colour=vs))
It looks like this:
However, when I create an object and pass it to plotly, I lose the dodge position:
library(plotly)
mtcars_boxplot <-
mtcars %>%
mutate(cyl=as.factor(cyl)) %>%
mutate(vs=as.factor(vs)) %>%
ggplot(aes(y=mpg, x=cyl)) +
geom_boxplot(aes(colour=vs))
mtcars_boxplot %>%
ggplotly()
It looks like this:
I tried to add position=position_dodge() & position=position_dodge2() but none of them worked:
library(plotly)
mtcars_boxplot <-
mtcars %>%
mutate(cyl=as.factor(cyl)) %>%
mutate(vs=as.factor(vs)) %>%
ggplot(aes(y=mpg, x=cyl)) +
geom_boxplot(aes(colour=vs), position=position_dodge2())
mtcars_boxplot %>%
ggplotly()
What should I do to keep the dodge position like the first plot?
As suggested here, add layout(boxmode = "group")
library(plotly)
mtcars_boxplot %>%
ggplotly() %>%
layout(boxmode = "group")

Unable to loop through ggplot histogram

I'm trying to loop through every column of the iris data set and plot a histogram in ggplot. So I'm expecting 5 different histograms to appear. However, my for loop below returns nothing. How can I fix this?
library(ggplot2)
for (i in colnames(iris)){
ggplot(iris, aes(x = i))+
geom_histogram()
}
Instead of using a for loop, the tidyverse/ggplot way would be to reshape the data from wide to long and then plot using facet_wrap
library(tidyverse)
iris %>%
gather(key, val, -Species) %>%
ggplot(aes(val)) +
geom_histogram(bins = 30) +
facet_wrap(~key, scales = "free_x")
Using dplyr, tidyr and ggplot:
library(ggplot2)
library(dplyr)
library(tidyr)
iris %>%
gather(Mesure, Value, -Species) %>%
ggplot(aes(x=Value)) + geom_histogram() + facet_grid(rows=vars(Species), cols=vars(Mesure))
Result:

Resources