I want to plot variables from a dataframe into different groups, but cannot figure out how to do this.
Mock data frame:
df<-data.frame(animal=c("cat1","cat2","mouse1","mouse2","dog1","dog2"),size=c(3,4,1,2,6,7))
plot<-ggplot(df)+geom_col(mapping=aes(x=animal,y=size))
plot
My wanted output should look like this:
I would suggest next approach. You have to create a group and then use facets. Most of these tricks I learnt from #AllanCameron that has great answer for problems of this style.Next the code that can do that:
library(tidyverse)
#Data
df<-data.frame(animal=c("cat1","cat2","mouse1","mouse2","dog1","dog2"),
size=c(3,4,1,2,6,7),stringsAsFactors = F)
#Create group
df$Group <- gsub('\\d+','',df$animal)
#Now plot
ggplot(df,aes(x=animal,y=size))+
geom_col()+
facet_wrap(.~Group,scales = 'free', strip.position = "bottom")+
theme(strip.placement = "outside",
panel.spacing = unit(0, "points"),
strip.background = element_blank(),
strip.text = element_text(face = "bold", size = 12),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
The output:
You need to tell ggplot to which group an animal belongs to; then you can use the group argument on the x-axis and use the fill argument to further distinguish between the different animals. position = "dodge" leads to the bars showing up next to each other.
library(ggplot2)
df <- data.frame(animal = c("cat1","cat2","mouse1","mouse2","dog1","dog2"),
size=c(3,4,1,2,6,7),
group = c("cat", "cat", "mouse", "mouse", "dog", "dog"))
ggplot(df, aes(x = group, y = size, fill = animal)) +
geom_col(position = "dodge")
Created on 2020-08-23 by the reprex package (v0.3.0)
Related
I am trying to do a bubble plot. My data are:
Year<-rep(2001:2005, each = 5)
name<-c("John","Ellen","Mark","Randy","Luisa")
Name<-c(rep(name,5))
Value<-sample(seq(0,25,by=1),25)
mydata<-data.frame(Year,Name,Value)
And by far I've got to this point:
ggplot(mydata, aes(x=Year, y=Name, size = Value)) +
geom_point() +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
I need many modifications but I couldn't understand how to do that (I am not very familiar with ggplot2).
First, I would like to use the viridis scale, but neither scale_color_viridis nor scale_fill_viridis are working (I have also tried setting the discrete=T argument).
Second, I would like to avoid the 0 values to be plotted (i.e., having a blank space where the 0 value is being plotted), but neither using na.omit (e.g. as ggplot(na.omit(mydata), aes(x=Year, y=Name, size = Value)) or as ggplot(mydata, aes(x=Year, y=Name, size = na.omit(Value)))) or removing the 0 from Value object work.
Third, I'd like the legend to be a continuous scale: the plotted values of Value are in a range from 1 to 25 (as I would like to remove the zeros) but the default legend is discrete with 5 points break.
I would like the plot to look more or less like this (with the bubble sizes depending on the value of Value):
Any suggestions? Sorry for the many questions but I have some real difficulties in understanding how ggplot works. Thanks!
In order to map a variable in your data to some scale, you use the aes() function to couple what ggplot2 calls an 'aesthetic' to an expression (typically a symbol for a column in your data). Thus, to make a colour scale, you have to specify a colour aesthetic inside the aes() function. In the code below, I also specify an alpha aesthetic, which is 1 if Value > 0 and 0 otherwise, making the 0-value points completely transparent. I specify I() to let ggplot2 know that it should take this value literally instead of mapping it to a scale.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.3
Year<-rep(2001:2005, each = 5)
name<-c("John","Ellen","Mark","Randy","Luisa")
Name<-c(rep(name,5))
Value<-sample(seq(0,25,by=1),25)
mydata<-data.frame(Year,Name,Value)
g <- ggplot(mydata, aes(x=Year, y=Name, size = Value)) +
geom_point(aes(colour = Value,
alpha = I(as.numeric(Value > 0))))
Once we have specified the aesthetics, we can begin customising the scales. The typical pattern is scale_{the aesthetic}_{type of scale}, so we need to add scale_colour_viridis_c() if we want to map the colour values to the viridis scale (the *_c is for continuous scales). In the scales, we can specify for example the limits, which you've indicated should be between 1 and 25. Also, I added a scale_size_area() where we say that we do not want a legend for the size of the points by setting `guide = "none".
g + scale_colour_viridis_c(option = "C", direction = -1,
limits = c(1, 25)) +
scale_size_area(guide = "none") +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
Created on 2021-02-24 by the reprex package (v1.0.0)
Is that what you are looking for?
library(ggplot2)
Year<-rep(2001:2005, each = 5)
name<-c("John","Ellen","Mark","Randy","Luisa")
Name<-c(rep(name,5))
Value<-sample(seq(0,25,by=1),25)
Value <- ifelse(Value == 0, NA, Value)
mydata<-data.frame(Year,Name,Value)
ggplot(mydata, aes(x=Year, y=Name, size = Value, colour = Value)) +
geom_point() +
scale_colour_viridis_c() +
scale_size(guide = F) +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
#> Warning: Removed 1 rows containing missing values (geom_point).
Concerning your points:
I did only see the scale_colour_viridis_c and the scale_colour_viridis_b functions which differ in the colors as far as I could see. Maybe I am missing some package?
Secondly regarding the NAs: you just needed to replace the 0s by NAs.
And lastly regarding the scale: The color-scale is automatically continuous. Depicting sizes continuously is a bit tricky, therefore it will always be discrete. But I removed it from the legend for you so that you only have the color there as in your example.
Just as an alternative way to think about this... maybe it's helpful. :-)
library(tidyverse)
set.seed(123)
df <- tibble(
year = rep(2001:2005, each = 5),
name = rep(c("John","Ellen","Mark","Randy","Luisa"),5),
value = sample(seq(0,25,by=1),25)
)
df %>%
mutate(name_2 = ifelse(year>2001 & year<2005, NA, name)) %>%
ggplot(aes(year, value, group = name, label = name_2, color = name)) +
geom_line() +
geom_point() +
geom_text(vjust = -1) +
scale_color_brewer(palette = "Set1") +
theme_minimal(base_family = "serif") +
theme(legend.position = "none") +
xlab("")
I am trying to create a horizontal bar plot where for each variable I have two bars. the first should be a stack bar with the count of positive and negative values, the second only the neutral values. So far I didn't manage to do that.
Here is an example of the data:
df <- read.table(text = "Positive Negative Neutral
A 4 5 1
B 6 8 3
C 12 3 6
D 10 5 2
E 2 11 7", header = TRUE)
And here the plot I aiming to do (made with Excel):
Edit:
Thanks everyone for the help! I got great feedbacks already.
I am planning to continue with the option provided by #Duck as I feel that with ggplot2 it's more flexible. However, I figured out I would like to add (and learn) one more things:
Reorder automatically the y axis values (id (A, B, C, D, E)) based on the number of positive values (I edited the plot above). For instance in this case would be C, D, B,A, E):
I tried with reorder() and using factors() but unsuccessfully. Should it be applied on the data or directly on ggplot? The solution I found so far do not have this case of double level of variable (here is the name of variable (id) and the flag (TRUE = positive/negative, FALSE = neutral).
I might be the only person on the planet using base R plotting these days, but it is actually pretty flexible for non-standard plots like this.
xl <- c(0,20)
cols <- c("blue","orange","grey")
bd <- t(as.matrix(df))
bp <- barplot(unname(bd[1:2, rbind(1:5,NA,NA)]), xlim=xl, horiz=TRUE, col=cols[1:2])
barplot(unname(bd[3, rbind(NA,1:5,NA)]), xlim=xl, horiz=TRUE, col=cols[3],
axes=FALSE, add=TRUE)
axis(2, at=colMeans(matrix(bp, nrow=3)[1:2,]), labels=colnames(bd), las=1, lty=0)
par(xpd=NA)
legend("bottom", rownames(bd), fill=cols, horiz=TRUE, inset=-1 / par("pin")[2], bty="n")
Result:
Try with facets like this and smartly create a reference variable to create two variables and plot them using ggplot2 and some tidyverse functions:
library(ggplot2)
library(tidyverse)
#Data
df %>% rownames_to_column("id") %>%
pivot_longer(-id) %>%
mutate(Flag=name %in% c('Positive','Negative')) %>%
ggplot(aes(x=Flag,y=value,fill=name))+
geom_bar(stat = 'identity')+
facet_grid(id~., switch = "y") +
scale_fill_manual(values = c("red", "tomato", "cyan3"), name = "") +
coord_flip() +
theme_classic() +
theme(panel.spacing = unit(0, "points"),
strip.background = element_blank(),
axis.text.y = element_blank(),
axis.ticks.length.y = unit(0, "points"),
axis.title = element_blank(),
strip.placement = "outside",
strip.text = element_text(),
legend.position = "bottom",
panel.grid.major.x = element_line())
Output:
Update: In order to have the desired order, you could create a dummy data to order the labels and then format the id variable as factor. Here the code:
#Auxiliar data
levs <- df %>% rownames_to_column("id") %>%
pivot_longer(-id) %>%
filter(name=='Positive') %>%
arrange(desc(value))
#Data
df %>% rownames_to_column("id") %>%
pivot_longer(-id) %>%
mutate(Flag=name %in% c('Positive','Negative'),
id=factor(id,levels = levs$id,ordered = T)) %>%
ggplot(aes(x=Flag,y=value,fill=name))+
geom_bar(stat = 'identity')+
facet_grid(id~., switch = "y") +
scale_fill_manual(values = c("red", "tomato", "cyan3"), name = "") +
coord_flip() +
theme_classic() +
theme(panel.spacing = unit(0, "points"),
strip.background = element_blank(),
axis.text.y = element_blank(),
axis.ticks.length.y = unit(0, "points"),
axis.title = element_blank(),
strip.placement = "outside",
strip.text = element_text(),
legend.position = "bottom",
panel.grid.major.x = element_line())
Output:
I'd love to make a timeline density graph like the New York Times does for COVID-19 cases (screenshot below). I am trying to do it with crime data instead of COVID data. Any ideas on how to use R (ggplot2, plotly, etc) to make a graph like this that shows density by day?
So far, I haven't found a similar style in the R-graph Gallery. Thanks.
update: Here is the closest approximation I have come up with so far:
dat_c_grp <- dat_c %>%
group_by(report_date, month) %>%
summarize(count = n())
p <- ggplot(dat_c_grp, aes(report_date, month, fill = count))+
geom_tile(color= "white",size=0.1) +
scale_fill_viridis(name="Daily",option ="C")
output:
p
I'd like the months collapsed the months into one row. I can't figure out how to make it all one row.
Here's a full reprex of one way to do this which emulates the look of the original fairly well (obviously I've had to make the data up):
library(ggplot2)
set.seed(1)
Dates <- rep(seq(as.Date("2020-03-01"), by = "1 week", length.out = 36), 3)
Places <- rep(c("Conneticut", "Fairfield", "New Haven"), each = 36)
Cases <- as.numeric(replicate(3, rpois(36, dgamma((1:36)/3, 2.5) * 100))) +
as.numeric(replicate(3, rpois(36, 0.0002 * exp(1:36)^(1/3))))
df <- data.frame(Dates, Places, Cases)
ggplot(df, aes(Dates, Places, fill = Cases)) +
geom_tile(color = "gray92") +
facet_grid(Places~., scales = "free_y") +
scale_fill_gradientn(colors = c("#f3df8e", "#fdad45", "#ff700a", "#cc0a06")) +
theme_minimal() +
scale_x_date(date_breaks = "month", labels = scales::date_format("%b")) +
theme(panel.spacing = unit(50, "points"),
legend.position = "top",
axis.title.y = element_blank(),
strip.text = element_blank(),
panel.grid = element_blank())
Thanks to both Allan Cameron and Konrad Rudolph.
Here is the code to answer my question:
dat_c_grp <- dat_c %>%
count(report_date, month, name = 'count')
p <- ggplot(dat_c_grp, aes(report_date, 0, fill = count))+
geom_tile(color= "white",size=0.1) +
scale_fill_gradientn(colors = c("#f3df8e", "#fdad45", "#ff700a", "#cc0a06"))+
removeGrid()+
theme(panel.spacing = unit(50, "points"),
legend.position = "top",
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y=element_blank(),
strip.text = element_blank(),
panel.grid = element_blank())
p
And output:
I am using geom_tile to visualize random draws
Generate data:
set.seed(1)
df= crossing(sim=1:10,part= 1:10)
df$result = sample(c(1,0),size = nrow(df), replace=T)
df = df %>%
group_by(sim)%>%
# find out how many successful (1) pilots there were in the first 4 participants
summarize(good_pilots = sum(result[1:4])) %>%
arrange(good_pilots) %>%
ungroup() %>%
# add this back into full dataframe
full_join(df)
# plot data
plot = ggplot(df, aes( y=factor(sim), x=part)) +
geom_tile(aes(fill = factor(result)), colour = "black",
show.legend = T)+
scale_fill_manual(values=c("lightgrey", "darkblue"))+# c(0,1)
theme(panel.border = element_rect(size = 2),
plot.title = element_text(size = rel(1.2)),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.title = element_blank(),
legend.position = "right")+ theme_classic()+ coord_fixed(ratio=1)
This results in:
What I actually want is the y axis to be ordered by the # of blue (ie 1's) in the first four columns of the block (which is calculated in good_pilots).
I tried scale_y_discrete but that cannot be what is intended:
plot + scale_y_discrete(limits=df$sim[order(df$good_pilots)])
resulting in:
From what I can tell it seems like the ordering worked correctly, but using scale_y_discrete caused the plot to be messed up.
You can use reorder here
ggplot(df, aes(y = reorder(sim, good_pilots), x = part)) +
...
I have a plot created using ggplot2 where I'm trying to modify some of the minor grid lines. Here is the current version:
library(tidyverse)
data(starwars)
starwars = starwars %>%
filter(!is.na(homeworld), !is.na(skin_color)) %>%
mutate(tatooine = factor(if_else(homeworld == "Tatooine", "Tatooine Native", "Other Native")),
skin_color = factor(skin_color))
ggplot(starwars, aes(birth_year, skin_color)) +
geom_point(aes(color = gender), size = 4, alpha = 0.7, show.legend = FALSE) +
facet_grid(tatooine ~ ., scales = "free_y", space = "free_y", switch = "y") +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.placement = "outside",
strip.background = element_rect(fill="gray90", color = "white"),
) +
geom_hline(yintercept = seq(0, length(unique(starwars$skin_color))) + .5, color="gray30")
Y axis is a factor and a facet grid is used, with an uneven number of categories in each grid. I added some minor grid lines using geom_hline (my understanding is that panel.grid.minor does not work with categorical data i.e., factors).
I would like to remove the lines highlighted in yellow below, and then ADD a single black line in between the two facet grids (i.e., where the current double lines are that are highlighted in yellow).
Any way to do this? I'd prefer avoiding hard coding the position of any lines, in case the data change. Thanks.
Removing the top and bottom grid lines dynamically is relatively easy. You code the line positions in the data set based on the faceting groups and exclude the highest and lowest value, and plot the geom_hline with an xintercept inside the aes() statement. That approach is robust to changing the data (to see that this approach works if you change the data, comment out the # filter(!is.na(birth_year)) line below).
library(tidyverse)
library(grid)
data(starwars)
starwars = starwars %>%
filter(!is.na(homeworld), !is.na(skin_color)) %>%
mutate(tatooine = factor(if_else(homeworld == "Tatooine", "Tatooine Native", "Other Native")),
skin_color = factor(skin_color)) %>%
# filter(!is.na(birth_year)) %>%
group_by(tatooine) %>%
# here we assign the line_positions
mutate(line_positions = as.numeric(factor(skin_color, levels = unique(skin_color))),
line_positions = line_positions + .5,
line_positions = ifelse(line_positions == max(line_positions), NA, line_positions))
plot_out <- ggplot(starwars, aes(birth_year, skin_color)) +
geom_point(aes(color = gender), size = 4, alpha = 0.7, show.legend = FALSE) +
geom_hline(aes(yintercept = line_positions)) +
facet_grid(tatooine ~ ., scales = "free_y", space = "free_y", switch = "y") +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_line(colour = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.placement = "outside",
strip.background = element_rect(fill="gray90", color = "white"),
)
print(plot_out)
gives
However, adding a solid between the facets without any hardcoding is difficult. There are some possible ways to add borders between facets (see here), but if we don't know whether the facets change it is not obvious to which value the border should be assigned. I guess there is a possible solution with drawing a hard coded line in the plot that divides the facets, but the tricky part is to determine dynamically where that border is going to be located, based on the data and how the facets are ultimately draw (e.g. in which order etc). I'd be interested in hearing other opinions on this.