Plotting dummy variables with ggplot2 - r

I actually need help building on this question:
ggplot2 graphic order by grouped variable instead of in alphabetical order.
I need to produce a similar graph and I actually have a problem with the black points. I have data where column names are dates and rows are filled with 0 or 1 and I need to plot the point if the value is 1. To reproduce, here is a small sample (in my dataset, there is over 300 columns):
df <- data.frame(id=c(1,2,3),
"26April1970"=c(0,0,1),
"14August1970"=c(0,1,0))
I need to plot the dates on the x axis, match the id to the canton and show the points where the value is 1.
Could anyone help?

Try this:
plot_data = df %>%
## put data in long format
pivot_longer(-id, names_to = "colname") %>%
## keep only 1s
filter(value == 1) %>%
## convert dates to Date class
mutate(date = as.Date(colname, format = "%d%B%Y"))
plot_data
# # A tibble: 2 x 4
# id colname value date
# <dbl> <chr> <dbl> <date>
# 1 2 14August1970 1 1970-08-14
# 2 3 26April1970 1 1970-04-26
## plot
ggplot(plot_data, aes(x = date, y = factor(id))) +
geom_point()
Using this data:
df <- data.frame(id=c(1,2,3),
"26April1970"=c(0,0,1),
"14August1970"=c(0,1,0), check.names = FALSE)

Maybe you are looking for this:
library(ggplot2)
library(dplyr)
library(tidyr)
#Data
df <- data.frame(id=c(1,2,3),
"26April1970"=c(0,0,1),
"14August1970"=c(0,1,0))
#Code
df %>% pivot_longer(-id) %>%
ggplot(aes(x=name,y=factor(value)))+
geom_point(aes(color=factor(value)))+
scale_color_manual(values=c('transparent','black'))+
theme(legend.position = 'none')+xlab('Date')+ylab('value')
Output:

Related

Creating a grouped boxplot with different numbers of rows for each grouped column?

I have data that I would like to compare in a grouped boxplot, meaning comparing the before/after response to each treatment. The issue is my trial number for each type of treatment is different so I cannot create a dataframe (I am getting an error in the dataframe)
QXpre <- c(3,4,2,1,4,5,4,2,8)
QXpost <- c(0,4,0,0,0,7,0,1,6)
lidopre <-c(5,3,4,5,6)
lidopost <- c(0,0,0,1,2)
vehipre <- c(3,3,5,3,4,3,4)
vehipost <- c(4,3,3,12,6,4,10)
DF1D <- data.frame(QXpre, QXpost, lidopre, lidopost, vehipre, vehipost)
To clarify, I would like: within each group to compare the pre and post values, but have each group show up on the same plot so I can compare statistics across groups.
Thank you!
Instead of putting all vectors in one dataframe create a list of data frames per treatment. Afterwards reshape each one to long or tidy format using e.g. tidyr::pivot_longer and bind them by rows for which I use purrr::imap_dfr for convenience:
library(tidyverse)
dat <- list(
QX = data.frame(QXpre, QXpost),
lido = data.frame(lidopre, lidopost),
vehi = data.frame(vehipre, vehipost)
) |>
purrr::imap_dfr(~ tidyr::pivot_longer(.x, everything(), names_prefix = .y), .id = "treatment")
head(dat)
#> # A tibble: 6 × 3
#> treatment name value
#> <chr> <chr> <dbl>
#> 1 QX pre 3
#> 2 QX post 0
#> 3 QX pre 4
#> 4 QX post 4
#> 5 QX pre 2
#> 6 QX post 0
dat$name <- factor(dat$name, levels = c("pre", "post"))
ggplot(dat, aes(treatment, value, fill = name)) +
geom_boxplot()
Just to offer another solution. You can create a named list of all your vectors and then use stack() to create a data.frame in the long format. Afterwards you can use strsplit() to create two variables for your groups and timepoints. The rest is the same as in stefans answer.
library(ggplot2)
vector.list = list(
QXpre = c(3,4,2,1,4,5,4,2,8),
QXpost = c(0,4,0,0,0,7,0,1,6),
lidopre =c(5,3,4,5,6),
lidopost = c(0,0,0,1,2),
vehipre = c(3,3,5,3,4,3,4),
vehipost = c(4,3,3,12,6,4,10)
)
df <- stack(vector.list) # creates a data.frame in long format
df[, c("group", "time")] <- do.call(rbind, strsplit(as.character(df$ind), "(?<=.)(?=pre|post)", perl = TRUE)) # splits the names into two variables
df$time <- factor(df$time, levels = c("pre", "post")) # set the order of pre and post
ggplot(df, aes(group, values, fill = time)) +
geom_boxplot()
Created on 2023-02-16 by the reprex package (v2.0.1)

Ggplot: how to show boxplots in a given order?

I have a peculiar problem with arranging boxplots given a certain order of the x-axis, as I am adding two boxplots from different dataframe in the same plot and each time I add the second geom_boxplot, R reorders my x axis alphabetically instead of following ordered levels of factor(x).
So, I have two dataframe of different lengths lookings something like this:
df1:
id value
1 A 1
2 A 2
3 A 3
4 A 5
5 B 10
6 B 8
7 B 1
8 C 3
9 C 7
df2:
id value
1 A 4
2 A 5
3 B 6
4 B 8
There is always more observations per id in df1 than in df2 and there is some ids in df1 that are not available in df2.
I'd like df1 to be sorted by the median(value) (ascending) and to first plot boxplots for each id in that order.
Then I add a second layer with boxplots for all other measurements per id from df2, which should maintain the same order on the x-axis.
Here's how I approached that:
vec <- df %>%
group_by(id) %>%
summarize(m = median(value)) %>%
arrange(m) %>%
pull(id)
p1 <- df1 %>%
ggplot(aes(x = factor(id, levels = vec), y = value)) +
geom_boxplot()
p1
p2 <- p1 +
geom_boxplot(data = df2, aes(x = factor(id, levels = vec), y = value))
p2
p1 shows the right order (ids are ordered based on ascending medians), p2 always throws my order off and goes back to plotting ids alphabetically (my id is a character column with names actually). I tried with sample dataframes and the above code achieves what is required. Hence, I am not sure what could be specifically wrong about my data so that the code fails when applied to the specific data and not the above mock data.
Any ideas?
Thanks a lot in advance!
If I understood correctly, this shoud work.
library(tidyverse)
# Sample data
df1 <-
tibble(
id = c("A","A","A","A","B","B","B","C","C"),
value = c(1,2,3,5,10,8,1,3,7),
type = "df1"
)
df2 <-
tibble(
id = c("A","A","B","B"),
value = c(4,5,6,8),
type = "df2"
)
df <-
# Create single data.frame
df1 %>%
bind_rows(df2) %>%
# Reorder id by median(value)
mutate(id = fct_reorder(id,value,median))
df %>%
ggplot(aes(id, y = value, fill = type)) +
geom_boxplot()

Mutate statement not working to enable graph

Working through a tidyverse time series example on the sunspot data, getting the following error message when I try to graph in 10 year increments, seems to not recognize the new date variable created in step 1, wondering if this is a format issue.
Error: Problem with mutate() column decade.
i decade = f(date).
x subscript out of bounds
Appreciate any help
# libraries
library(datasets)
library(tidyverse)
library(tsibble)
library(ggplot)
# Tidy the data
tidy_ts <- sunspots %>%
as_tsibble() %>% # Convert to timeseries tibble
mutate( # Create new variables
year = year(index), # Create a year column
month = month(index) # Create a month column
) %>%
select( # Select, reorder, rename vars
date = index, # Rename "index" to "date"
year, # Use "year" as second variable
month, # Use "month" as third variable
spots = value # Rename "value" as "spots"
) %>%
print() # Show data
view(tidy_ts)
# Graph the tidy data by decade
tidy_ts %>%
index_by( # By decade
decade = ~ floor_date(tidy_ts$date, years(10))
) %>%
summarise(mean_s = mean(spots)) %>% # Mean for decade
ggplot(aes(decade, mean_s)) + # Plot means
geom_point() + # Scatterplot
geom_smooth() + # Smoother
ylab("Sunspots: Mean by Decade") # Label
Try replacing:
index_by( # By decade
decade = ~ floor_date(tidy_ts$date, years(10))
)
with:
index_by(decade = floor(year/10)*10)

copy factor level order from one column to another

I have two columns in a data.frame, that should have levels sorted in the same order, but I don't know how to do it in a straightforward manner.
Here's the situation:
library(ggplot2)
library(dplyr)
library(magrittr)
set.seed(1)
df1 <- data.frame(rating = sample(c("GOOD","BAD","AVERAGE"),10,T),
div = sample(c("A","B","C"),10,T),
n = sample(100,10,T))
# I'm adding a label column that I use for plotting purposes
df1 <- df1 %>% group_by(rating) %>% mutate(label = paste0(rating," (",sum(n),")")) %>% ungroup
# # A tibble: 10 x 4
# rating div n label
# <fctr> <fctr> <int> <chr>
# 1 BAD C 48 BAD (220)
# 2 BAD B 87 BAD (220)
# 3 BAD C 44 BAD (220)
# 4 GOOD B 25 GOOD (77)
# 5 AVERAGE B 8 AVERAGE (117)
# 6 AVERAGE C 10 AVERAGE (117)
# 7 AVERAGE A 32 AVERAGE (117)
# 8 GOOD B 52 GOOD (77)
# 9 AVERAGE C 67 AVERAGE (117)
# 10 BAD C 41 BAD (220)
# rating levels are sorted
df1$rating <- factor(df1$rating,c("BAD","AVERAGE","GOOD"))
ggplot(df1,aes(x=rating,y=n,fill=div)) + geom_col() # plots in the order I want
ggplot(df1,aes(x=label,y=n,fill=div)) + geom_col() # doesn't because levels aren't sorted
How do I manage to copy the factor order from one column to another ?
I can make it work this way but I think it's really awkward:
lvls <- df1 %>% select(rating,label) %>% unique %>% arrange(rating) %>% extract2("label")
df1$label <- factor(df1$label,lvls)
ggplot(df1,aes(x=label,y=n,fill=div)) + geom_col()
Instead of adding a label column and use aes(x = label, you may stick to aes(x = rating, and create the labels in scale_x_discrete:
ggplot(df1, aes(x = rating, y = n, fill = div)) +
geom_col() +
scale_x_discrete(labels = df1 %>%
group_by(rating) %>%
summarize(n = sum(n)) %>%
mutate(lab = paste0(rating, " (", n, ")")) %>%
pull(lab))
Once you have set the levels of rating, you can use forcats to set the levels of label by the order of rating like this...
library(forcats)
df1 <- df1 %>% group_by(rating) %>%
mutate(label=paste0(rating," (",sum(n),")")) %>%
ungroup %>%
arrange(rating) %>% #sort by rating
mutate(label=fct_inorder(label)) #set levels by order in which they appear
Or you can use forcats::fct_reorder to do the same thing...
df1$label <- fct_reorder(df1$label, as.numeric(df1$rating))
The plot then has the bars in the right order.

box plot for multiple observations

I have multiple observation of rainfall for the same station for around 14 years the data frame is in something like this :
df (from date -01/01/2000)
v1 v2 v3 v4 v5 v6 ........ v20
1 1 2 4 8 9..............
1.4 4 3.8..................
1.5 3 1.6....................
1.6 8 .....................
.
.
.
.
till date 31/01/2013 i.e total 5114 observations
where v1 v2 ...v20 are the rainfall simulation for the same point; I want to plot the box plot which represents the collective range of quantiles and median monthly when all the observations are taken together.
I can plot box plot for single monthly values using :
df$month<-factor(month.name,levels=month.name)
library(reshape2)
df.long<-melt(df,id.vars="month")
ggplot(df.long,aes(month,value))+geom_boxplot()
but in this problem as the data is daily and there are multiple observations i don't get idea where to start.
sample data
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
In case if u want to work with a zoo object :
date<-seq(as.POSIXct("2000-01-01 00:00:00","GMT"),as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min")
If you want yo can also convert it to zoo object
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
I am not familiar with zoo. So, I converted your sample to data frame. Your idea of using melt() is a right way. Then, you need to aggregate rain amount by month. I think it is good to look up aggregate() and other options. Here, I used dplyr and tidyr to arrange the sample data. I hope this will let you move forward.
### zoo to data frame by # Joshua Ulrich
### http://stackoverflow.com/questions/14064097/r-convert-between-zoo-object-and-data-frame-results-inconsistent-for-different
zoo.to.data.frame <- function(x, index.name="Date") {
stopifnot(is.zoo(x))
xn <- if(is.null(dim(x))) deparse(substitute(x)) else colnames(x)
setNames(data.frame(index(x), x, row.names=NULL), c(index.name,xn))
}
### to data frame
foo <- zoo.to.data.frame(df)
str(foo)
library(dplyr)
library(tidyr)
### wide to long data frame, aggregate rain amount by Date
ana <- foo %>%
melt(., id.vars = "Date") %>%
group_by(Date) %>%
summarize(rain = sum(value))
### Aggregate rain amount by year and month
bob <- ana %>%
separate(Date, c("year", "month", "date")) %>%
group_by(year, month) %>%
summarize(rain = sum(rain))
### Drawing a ggplot figure
ggplot(data = bob, aes(x = month, y = rain)) +
geom_boxplot()
just found out an easier way to do it, hwoever your answered really helped jazzuro
install.packages("reshape2")
library(dplyr)
library(reshape2)
require(ggplot2)
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"),
as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
v<-aggregate(x, as.yearmon, mean)
months<- rep(1:12,14)
lol<-data.frame(v,months)
df.m <- melt(lol, id.var = "months")
View(df.m)
p <- ggplot(df.m, aes(factor(months), value))
p + geom_boxplot(aes(fill = months))

Resources