I'm trying to use ggplot to create sequence plots, for the sake of keeping the same visual style within my paper using sequence analysis. I do:
library(ggplot2)
library(TraMineR)
library(dplyr)
library(tidyr)
data(mvad)
mvad_seq<-seqdef(mvad,15:length(mvad))
mvad_trate<-seqsubm(mvad_seq,method="TRATE")
mvad_dist<-seqdist(mvad_seq,method="OM",sm=mvad_trate)
cluster<-cutree(hclust(d=as.dist(mvad_dist),method="ward.D2"),k=6)
mvad$cluster<-cluster
mvad_long<-gather(select(mvad,id,contains("."),-matches("N.Eastern"),-matches("S.Eastern")),
key="Month",value="state",
Jul.93, Aug.93, Sep.93, Oct.93, Nov.93, Dec.93, Jan.94, Feb.94, Mar.94,
Apr.94, May.94, Jun.94, Jul.94, Aug.94, Sep.94, Oct.94, Nov.94, Dec.94, Jan.95,
Feb.95, Mar.95, Apr.95, May.95, Jun.95, Jul.95, Aug.95, Sep.95, Oct.95, Nov.95,
Dec.95, Jan.96, Feb.96, Mar.96, Apr.96, May.96, Jun.96, Jul.96, Aug.96, Sep.96,
Oct.96, Nov.96, Dec.96, Jan.97, Feb.97, Mar.97, Apr.97, May.97, Jun.97, Jul.97,
Aug.97, Sep.97, Oct.97, Nov.97, Dec.97, Jan.98, Feb.98, Mar.98, Apr.98, May.98,
Jun.98, Jul.98, Aug.98, Sep.98, Oct.98, Nov.98, Dec.98, Jan.99, Feb.99, Mar.99,
Apr.99, May.99, Jun.99)
mvad_long<-left_join(mvad_long,select(mvad,id,cluster))
ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+geom_tile()+facet_wrap(~cluster)
I try to plot the sequences by cluster, and this gives me the following plot:
As you can see, there are gaps for the ids that don't belong to the cluster represented by each facet. I would like to get rid of these gaps, so that the sequences show up stacked just as with the seqIplot() function of TraMineR as in the next figure:
Any suggestions of how to proceed?
Two small changes:
mvad_long$id <- as.factor(mvad_long$id)
ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+
geom_tile()+facet_wrap(~cluster,scales = "free_y")
ggplot was treating id as a numerical variable, rather than a factor, and then the scales were fixed.
An update: I needed to convert the month in to a date for it to work. Full solution follows:
library(ggplot2)
library(TraMineR)
library(dplyr)
library(tidyr)
library(lubridate)
data(mvad)
mvad_seq <- seqdef(mvad, 15:length(mvad))
mvad_trate <- seqsubm(mvad_seq, method = "TRATE")
mvad_dist <- seqdist(mvad_seq, method = "OM", sm = mvad_trate)
cluster <- cutree(hclust(d = as.dist(mvad_dist), method = "ward.D2"), k = 6)
mvad$cluster <- cluster
mvad_long <- mvad %>%
select(id, matches("\\.\\d\\d")) %>%
gather(key = "month", value = "state", -id) %>%
inner_join(
mvad %>%
select(id, cluster),
by = "id"
) %>%
mutate(
id = factor(id),
date = myd(paste0(month, "01"))
)
mvad_long %>%
ggplot(aes(x = date, y = id, fill = state, color = state)) +
geom_tile() +
facet_wrap(~cluster, scales = "free_y", ncol = 2) +
theme_bw() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid = element_blank()
) +
scale_fill_brewer(palette = "Accent") +
scale_colour_brewer(palette = "Accent") +
labs(x = "", y = "")
Related
I am trying to make a several bar plots with their standard errors added to the plot. I tried to add a second y-axis, which was not that hard, however, now I also want my standard errors to fit this new y-axis. I know that I can manipulate the y-axis, but this is not really what I want. I want it such that the standard errors fit to this new y-axis. To illustrate, this is the plot I have now, where I just divided the first y-axis by a 100.
but what I want it something more like this
How it should look like using Excel
to show for all barplots (this was done for the first barplot using Excel). Here is my code
df_bar <- as.data.frame(
rbind(
c('g1', 0.945131015, 1.083188828, 1.040164338,
1.115716593, 0.947886795),
c('g2', 1.393211286, 1.264193745, 1.463434395,
1.298126006, 1.112718796),
c('g3', 1.509976099, 1.450923745, 1.455102201,
1.280102338, 1.462689245),
c('g4', 1.591697668, 1.326292649, 1.767207296,
1.623619341, 2.528108183),
c('g5', 2.625114848, 2.164050167, 2.092843287,
2.301950359, 2.352736806)
)
)
colnames(df_bar)<-c('interval', 'lvl3.Mellem.Høj', 'lvl1.Lav', 'TOM',
',lvl4.Høj', 'lvl2.Lav.Mellem')
df_bar <- melt(df_bar, id.vars = "interval",
variable.name = "name",
value.name = "value")
df_line <- as.data.frame(
rbind(
c('g1', 0.0212972, 0.0164494, 0.0188898, 0.01888982,
0.03035883),
c('g2', 0.0195600, 0.0163811, 0.0188747, 0.01887467,
0.03548092),
c('g3', 0.0192249, 0.0161914, 0.02215852, 0.02267605,
0.03426538),
c('g4', 0.0187961, 0.0180842, 0.01962371, 0.02103450,
0.03902890),
c('g5', 0.0209987, 0.0164596, 0.01838280, 0.02282300,
0.03516818)
)
)
colnames(df_line)<-c('interval', 'lvl3.Mellem.Høj', 'lvl1.Lav', 'TOM',
',lvl4.Høj', 'lvl2.Lav.Mellem')
df_line <- melt(df_line, id.vars = "interval",
variable.name = "name",
value.name = "sd")
df <- inner_join(df_bar,df_line, by=c("interval", "name"))
df %>%
mutate(value = as.numeric(value)) %>%
mutate(sd = as.numeric(sd)) %>%
mutate(interval = as.factor(interval)) %>%
mutate(name = as.factor(name)) %>%
ggplot() +
geom_bar(aes(x = interval, y = value, fill = interval), stat = "identity") +
geom_line(aes(x = interval, y = sd, group = 1),
color = "black", size = .75) +
scale_y_continuous("Value", sec.axis = sec_axis(~ . /100, name = "sd")) +
facet_grid(~name, scales = "free") +
theme_bw() + theme(legend.position = "none") +
xlab("Interval") + ylab("Value") +
labs(caption = "Black line indicates standard deviation.")
Thanks in advance..
As described in this example, you have to also perform a transformation to your values from sd to match the scale of your second axis. In your example you divided by 100, therefore you have to multiply your sd by 100 as shown in the below:
library(tidyverse)
library(data.table)
df %>%
mutate(value = as.numeric(value)) %>%
mutate(sd = as.numeric(sd)) %>%
mutate(interval = as.factor(interval)) %>%
mutate(name = as.factor(name)) %>%
ggplot() +
geom_bar(aes(x = interval, y = value, fill = interval), stat = "identity") +
scale_y_continuous("Value", sec.axis = sec_axis(~ ./100, name = "sd"))+
geom_line(aes(x = interval, y = sd*100, group = 1),
color = "black", size = .75)+
facet_grid(~name, scales = "free")+
theme_bw() + theme(legend.position = "none") +
xlab("Interval") + ylab("Value") +
labs(caption = "Black line indicates standard deviation.")
You can also use a different value to scale your second axis. In this example I used 50 as a scaling factor, which in my opinion looks a bit better:
Created on 2022-08-25 with reprex v2.0.2
Here is what it should look like for the first barplot using Excel.
I'm trying to create a barplot by month that includes two columns, with each column stacked. For each month, the first column would be the total number of video visits, split by vid_new and vid_return. The second column would be the total number of phone visits, split by phone_charge and phone_nocharge.
I still haven't been able to get the bars side-by-side correct. This code uses the data frame in the second picture and it's counting the instances of the word "video" and "phone", not the Count column resulting in the third picture.
plot <- ggplot(data=new_df, aes(x=Month, y = count, fill = gen_type)) +
geom_bar(stat = "identity", position = "dodge")
Below is a pic of the data I'm working with. I've converted it into a few different forms to try new methods by have not been able to form this graph.
How can I make a barplot by group and by stack in ggplot? What data structure do I need to get make it?
Thanks in advance for your advice!
You can try any of these options reshaping your data to long and creating and additional variable so that you can identify the types. Here the code using tidyverse functions:
library(ggplot2)
library(dplyr)
library(tidyr)
#Date
df <- data.frame(Month=c(rep('Mar',4),rep('Apr',4),rep('May',2)),
spec_type=c('vid_new','vid_return','phone_charge','phone_nocharge',
'vid_new','vid_return','phone_charge','phone_nocharge',
'vid_new','vid_return'),
Count=c(7,85,595,56,237,848,2958,274,205,1079))
#Plot 1
df %>% mutate(Month=factor(Month,levels = unique(Month),ordered = T)) %>%
mutate(Dup=spec_type) %>%
separate(Dup,c('Type','Class'),sep='_') %>% select(-Class) %>%
ggplot(aes(x=Type,y=Count,fill=spec_type))+
geom_bar(stat = 'identity',position = 'stack')+
facet_wrap(.~Month,strip.position = 'bottom')+
theme(strip.placement = 'outside',
strip.background = element_blank())
Output:
Or this:
#Plot 2
df %>% mutate(Month=factor(Month,levels = unique(Month),ordered = T)) %>%
mutate(Dup=spec_type) %>%
separate(Dup,c('Type','Class'),sep='_') %>% select(-Class) %>%
ggplot(aes(x=Type,y=Count,fill=spec_type))+
geom_bar(stat = 'identity',position = 'fill')+
facet_wrap(.~Month,strip.position = 'bottom',scales = 'free')+
theme(strip.placement = 'outside',
strip.background = element_blank())
Output:
Or this:
#Plot 3
df %>% mutate(Month=factor(Month,levels = unique(Month),ordered = T)) %>%
mutate(Dup=spec_type) %>%
separate(Dup,c('Type','Class'),sep='_') %>% select(-Class) %>%
ggplot(aes(x=Type,y=Count,fill=spec_type))+
geom_bar(stat = 'identity',position = position_dodge2(preserve = 'single'))+
facet_wrap(.~Month,strip.position = 'bottom',scales = 'free')+
theme(strip.placement = 'outside',
strip.background = element_blank())
Output:
In order to see by month you can use facet_wrap() and placing labels in a smart way.
The name of the countries are long and are on top of each other in the x labels, how can I make it readable?
ggplot(results, aes(x = Nationality, horiz=TRUE)) +
theme_solarized() +
geom_bar() +
labs(y = "Number of Medals",
title = "Number of Medals by Country")
Welcome to stackoverflow. Here are some suggestions on how you can deal with the many values. In both methods, I am using the forcats library within the tidyverse. You can read more about it here: https://r4ds.had.co.nz/factors.html
First, some fake data & replicating your problem
library(tidyverse)
df <-
mpg %>%
arrange(manufacturer) %>%
mutate(
n = row_number(),
vehicle = paste(year, manufacturer, model)
) %>%
uncount(n)
# this replicates your problem
ggplot(df, aes(vehicle)) +
geom_bar() +
coord_flip()
Option 1: consolidate
df %>%
mutate(
vehicle = # making heavy use of forcats here
fct_lump(vehicle, 35) %>% # keep only the 35 most frequent values, others in "Other" category
fct_infreq() %>% # order them by frequency
fct_rev() #reverse the order
) %>%
ggplot(aes(vehicle)) +
geom_bar() +
coord_flip()
Option 2: facet
Someone may have a more elegant way of getting these groups but I use this method quite a bit
df %>%
mutate(
vehicle = # similar methods to earlier
fct_infreq(vehicle) %>%
fct_rev(),
num_fct = as.integer(vehicle), # generates a number for each factor
facet = (max(num_fct)-num_fct) %/% 20 # will make groups of 20, but they need to be in descending order within each facet
) %>%
ggplot(aes(vehicle)) +
geom_bar() +
coord_flip() +
facet_wrap(~facet, scales = "free_y", nrow = 1) +
theme(
strip.background = element_blank(),
strip.text = element_blank()
)
Hope this helps.
I have the following dataset:
HIU,0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375
TTHY,0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0
Full,0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951
I made a grouped bar plot according to the rows of HIU and TTHY (figure 1). But I want to add a line according to the "Full" row, such as the second image.
Figure 1:
Figure 2:
How can I do it with R? This is my current code:
df = read.csv('TTR-HIU/resultados.csv',header=FALSE,colClasses=c("NULL",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
df.bar <- barplot(as.matrix(df[-nrow(df),]),beside=TRUE,col=c("darkblue","red"))
Using ggplot2, you could try something like this:
# put data in data frame:
df <- data.frame(HIU = c(0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375),
TTHY = c(0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0),
Full= c(0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951))
library(ggplot2)
library(tidyr) # to make data long (gather)
# create x-values:
df$x <- as.factor(seq_len(nrow(df)))
# make data long for ggplot2:
df_long <- df %>% gather(key, value, -x)
ggplot() +
# plot bars:
geom_col(data = subset(df_long, key %in% c("HIU", "TTHY")),
mapping = aes(x = x, y = value, fill = key),
position = position_dodge()) +
# plot lines:
geom_line(data = subset(df_long, key == "Full"),
mapping = aes(x = x, y = value, group = key, color = key),
size = 2) +
# make plot look a little like your desired output:
scale_color_manual(values = c("Full" = "yellow")) +
scale_fill_manual(values = c("HIU" = "blue", "TTHY" = "red")) +
theme_minimal() +
theme(axis.title = element_blank(),
legend.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
However, you might have to put your data in data-frame-shape as in this example. Use dput to show how your data exactly looks like, if you need further help...
I am looking to create a cycle plot of hours within months. I am hoping it will look something like the plot bellow. I am aiming for the plot to indicate mean temperature for each month with a horizontal line, and then within each month have the graph show the temperature fluctuations across the typical day of that month. I was trying to use monthplot() but it doesn't seem to be working:
library(nycflights13)
tempdata <- weather %>% group_by(hour)
monthplot(tempdata, labels = NULL, ylab = "temp")
It keeps saying argument is not numeric or logical: returning NA but I am not sure where the code is going wrong.
Hope that this ggplot2 solution will work:
library(nycflights13)
library(ggplot2)
library(dplyr)
# Prepare data
tempdata <- weather %>%
group_by(month, day) %>%
summarise(temp = mean(temp, na.rm = TRUE))
meanMonth <- tempdata %>%
group_by(month) %>%
summarise(temp = mean(temp, na.rm = TRUE))
# Plot using ggplot2
ggplot(tempdata, aes(day, temp)) +
geom_hline(data = meanMonth, aes(yintercept = temp)) +
geom_line() +
facet_grid(~ month, switch = "x") +
labs(x = "Month",
y = "Temperature") +
theme_classic() +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.line.x = element_blank())
temp has a missing value which causes an error. You also need to set the times and phase arguments.
library(nycflights13)
# Find mean value : this automatically removes the observation with missing data that was causing an error
tempdata <- aggregate(temp ~ month + day, data=weather, mean)
with(tempdata, monthplot(temp, times=day , phase=month, ylab = "temp"))