I am trying to create a time series plot for each individual (ID) I have in my dataset.
Example data:
ID <- rep(c(2:5), each = 9, times = 4)
Attitude <- rep(c('A1', 'A2','A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9'), 16)
Answer <- rep(1:5, length.out = 144)
time <- as.character(rep(c(0, 1, 3, 4), each = 9, times = 4))
first_answer <- rep(1:5, length.out = 144)
df <- data.frame(ID, Attitude, Answer, time, first_answer)
df$time <- as.character(df$time)
The function code I am currently using:
library(dplyr)
spaghetti_plot <- function(input, MV, item_level){
MV <- enquo(MV)
titles <- enquo(item_level)
input %>%
filter(!!(MV) == item_level) %>%
mutate(first_answer = first_answer) %>%
ggplot(.,aes( x = time, y = jitter(Answer), group = ID)) +
geom_line(aes(colour = first_answer)) +
labs(title = titles ,x = 'Time', y = 'Answer', colour = 'Answer given at time 0')
}
This gives me a graph where I have a line for each individual, i.e. one plot for all individuals (equal to number of ID). Instead of this, I would like to have 1 plot with # panels = ID. For example, if I have data of 10 individuals, I would like to have a graph with 10 panels.
I tried using facet_wrap and facet_panel to get the job done, but I haven't found a proper solution yet.
EDIT using facet_wrap(~ID) gives
The result that I am after would look something like this:
Which was originally made in SAS.
EDIT2 Solution is in the comments.
The data from your reproducible example are a bit weird because you have only one value per ID, but I believe this is the code you are looking for:
library(ggplot2)
ggplot(df,aes(x = time, y = Answer)) +
geom_line()+
facet_grid(. ~ ID)
If you have too many facets the data may not show up, try to increase the size of the plot window or export the image directly with ggsave. If you find the right parameters for ggsave all the plots should be visible on the saved image.
Related
I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2
Below is a sample code:
set.seed(2)
failure_time <- rexp(100)
status <- factor(sample(0:3, 100, replace=TRUE), 0:3,
c('no event', 'death', 'progression','other'))
disease <- factor(sample(1:6, 100,replace=TRUE), 1:6,
c('BRCA','LUNG','OV','CANCER','AIDS','HEART'))
fit <- cuminc(ftime = failure_time, fstatus = status,
group = disease)
ggcompetingrisks(fit)
R automatically generations a plot that is organized in 3 columns, 2 rows. I would like it to be arranged as two columns, and three rows. Is there a way to do with ggcompetingrisks, or would I have to plot everything from scratch?
There is no option that does this that I'm aware of. Meaning you should change the functions code:
Call function code with:
trace(survminer:::ggcompetingrisks.cuminc, edit = T)
On line 23 add ncol = 2, to facet_wrap(~group) like:
pl <- ggplot(df, aes(time, est, color = event)) + facet_wrap(~group, ncol = 2)
plot normally:
ggcompetingrisks(fit)
Problem description
I have thousands of lines (~4000) that I want to plot. However it is infeasible to plot all lines using geom_line() and just use for example alpha=0.1 to illustrate where there is a high density of lines and where not. I came across something similar in Python, especially the second plot of the answers looks really nice, but I do not now if something similar can be achieved in ggplot2. Thus something like this:
An example dataset
It would make much more sense to demonstrate this with a set showing a pattern, but for now I just generated random sinus curves:
set.seed(1)
gen.dat <- function(key) {
c <- sample(seq(0.1,1, by = 0.1), 1)
time <- seq(c*pi,length.out=100)
val <- sin(time)
time = 1:100
data.frame(time,val,key)
}
dat <- lapply(seq(1,10000), gen.dat) %>% bind_rows()
Tried heatmap
I tried a heatmap like answered here, however this heatmap will not consider the connection of points over the complete axis (like in a line) but rather show the "heat" per time point.
Question
How can we in R, using ggplot2 plot a heatmap of lines simmilar to that shown in the first figure?
Looking closely, one can see that the graph to which you are linking consists of many, many, many points rather than lines.
The ggpointdensity package does a similar visualisation. Note with so many data points, there are quite some performance issues. I am using the developer version, because it contains the method argument which allows to use different smoothing estimators and apparently helps deal better with larger numbers. There is a CRAN version too.
You can adjust the smoothing with the adjust argument.
I have increased the x interval density of your code, to make it look more like lines. Have slightly reduced the number of 'lines' in the plot though.
library(tidyverse)
#devtools::install_github("LKremer/ggpointdensity")
library(ggpointdensity)
set.seed(1)
gen.dat <- function(key) {
c <- sample(seq(0.1,1, by = 0.1), 1)
time <- seq(c*pi,length.out=500)
val <- sin(time)
time = seq(0.02,100,0.1)
data.frame(time,val,key)
}
dat <- lapply(seq(1, 1000), gen.dat) %>% bind_rows()
ggplot(dat, aes(time, val)) +
geom_pointdensity(size = 0.1, adjust = 10)
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)
Created on 2020-03-19 by the reprex package (v0.3.0)
update
Thanks user Robert Gertenbach for creating some more interesting sample data. Here the suggested use of ggpointdensity on this data:
library(tidyverse)
library(ggpointdensity)
gen.dat <- function(key) {
has_offset <- runif(1) > 0.5
time <- seq(1, 1000, length.out = 1000)
val <- sin(time / 100 + rnorm(1, sd = 0.2) + (has_offset * 1.5)) *
rgamma(1, 20, 20)
data.frame(time,val,key)
}
dat <- lapply(seq(1,1000), gen.dat) %>% bind_rows()
ggplot(dat, aes(time, val, group=key)) +stat_pointdensity(geom = "line", size = 0.05, adjust = 10) + scale_color_gradientn(colors = c("blue", "yellow", "red"))
Created on 2020-03-24 by the reprex package (v0.3.0)
Your data will result in a quite uniform polkadot density.
I generated some slightly more interesting data like this:
gen.dat <- function(key) {
has_offset <- runif(1) > 0.5
time <- seq(1, 1000, length.out = 1000)
val <- sin(time / 100 + rnorm(1, sd = 0.2) + (has_offset * 1.5)) *
rgamma(1, 20, 20)
data.frame(time,val,key)
}
dat <- lapply(seq(1,1000), gen.dat) %>% bind_rows()
We then get a 2d density estimate. kde2d doesn't have a predict function so we model it with a LOESS
dens <- MASS::kde2d(dat$time, dat$val, n = 400)
dens_df <- data.frame(with(dens, expand_grid( y, x)), z = as.vector(dens$z))
fit <- loess(z ~ y * x, data = dens_df, span = 0.02)
dat$z <- predict(fit, with(dat, data.frame(x=time, y=val)))
Plotting it then gets this result:
ggplot(dat, aes(time, val, group = key, color = z)) +
geom_line(size = 0.05) +
theme_minimal() +
scale_color_gradientn(colors = c("blue", "yellow", "red"))
This is all highly reliant on:
The number of series
The resolution of series
The density of kde2d
The span of loess
so your mileage may vary
I came up with the following solution, using geom_segment(), however I'm not sure if geom_segment() is the way to go as it then only checks if pairwise values are exactly the same whereas in a heatmap (as in my question) values near each other also affect the 'heat' rather than being exactly the same.
# Simple stats to get all possible line segments
vals <- unique(dat$time)
min.val = min(vals)
max.val = max(vals)
# Get all possible line segments
comb.df <- data.frame(
time1 = min.val:(max.val - 1),
time2 = (min.val + 1): max.val
)
# Join the original data to all possible line segments
comb.df <- comb.df %>%
left_join(dat %>% select(time1 = time, val1 = val, key )) %>%
left_join(dat %>% select(time2 = time, val2 = val, key ))
# Count how often each line segment occurs in the data
comb.df <- comb.df %>%
group_by(time1, time2, val1, val2) %>%
summarise(n = n_distinct(key))
# ggplot2 to plot segments
ggplot(comb.df %>% arrange(n)) +
geom_segment(aes(x = time1, y = val1, xend = time2, yend = val2, color = n), alpha =0.9) +
scale_colour_gradient( low = 'green', high = 'red') +
theme_bw()
I have data in long form that looks like this:
id <- rep(seq(1:16), each = 3)
trial <- rep(seq(1:3), times = 16)
repeatedMeasure <- round(rnorm(48, mean = 3, sd = 2))
measuredOnce <- rep(10:14, times = c(9,6,6,12,15))
con1 <- rep(c('hi', 'lo'), each = 6, times = 4)
con2 <- rep(c('up', 'down'), each = 3, times = 8)
dat <- as.data.frame(cbind(id, trial, con1, con2, repeatedMeasure, measuredOnce))
dat$measuredOnce <- as.character(dat$measuredOnce)
dat$measuredOnce <- as.numeric(dat$measuredOnce)
Participants complete multiple trials. There is a unique measurement for each trial in the 'repeatedMeasures' variable. However, they are only measured once for the variable titled 'measuredOnce'. I want to produce a bar plot of the measuredOnce variable - something like this:
ggplot(data = dat) +
aes(x = measuredOnce) +
geom_bar() +
facet_wrap(~con1*con2)
However, I want to specify that the measurements for measuredOnce are grouped by id, so that the number of observations (and hence the height of the bar) is divided by three.
I know I could produce what I want by using spread() or taking every third row, but would like to work with the same (long) data frame.
Edit: plot using code above with group = id and fill = id added to aesthetics.
Edit 2: What I am looking for is something that looks like the plot produced by this code
dat %>%
spread(key = trial, value = repeatedMeasure) %>%
ggplot() +
aes(x = measuredOnce) +
geom_bar() +
facet_wrap(~con1*con2)
but without creating a new data frame using spread().
I need to make some plots for work and I've been learning to use ggplot2, but I can't quite figure out how to get it to work with the dataset I'm using. I can't post my actual data here, but can give a brief example of what it is like. I have two main dataframes; one contains quarterly total revenue for a variety of companies and the other contains quarterly revenue for various segments within each company. For example:
Quarter, CompA, CompB, CompC...
2011.0, 1, 2, 3...
2011.25, 2, 3, 4...
2011.5, 3, 4, 5...
2011.75, 4, 5, 6...
2012.0, 5, 6, 7...
and
Quarter, CompA_Footwear, CompA_Apparel, CompB_Wholesale...
2011.0, 1, 2, 3...
2011.25, 2, 3, 4...
2011.5, 3, 4, 5...
2011.75, 4, 5, 6...
2012.0, 5, 6, 7...
The script I've been building loops through each company in the first table and uses select() to grab all of the columns in the second table, so for the purposes of this question, forget about the other companies and assume that the first table is just CompA and the second table is all of the different CompA segments.
What I'm trying to do is for each segment, create a line plot that has both the total company revenue and the segment revenue charted over time. Something like this is what it would look like. Ideally, I'd like to be able to use a facet_wrap() or something to be able to make all the different graphs for each segment at once, but that's not absolutely necessary. To clarify, each individual graph should only have two lines: the overall company and one specific segment.
I'm fine with having to restructure my data in any way necessary. Does anyone know how I can get this to work?
I think the below should work. Note that you need to move data around a fair bit.
# Load packages
library(dplyr)
library(ggplot2)
library(reshape2)
library(tidyr)
Make a reproducible data set:
# Create companies
# Could pull this from column names in your data
companies <- paste0("Comp",LETTERS[1:4])
set.seed(12345)
sepData <-
lapply(companies, function(thisComp){
nDiv <- sample(3:6,1)
temp <-
sapply(1:nDiv,function(idx){
round(rnorm(24, rnorm(1,100,25), 6))
}) %>%
as.data.frame() %>%
setNames(paste(thisComp,sample(letters,nDiv), sep = "_"))
}) %>%
bind_cols()
sepData$Quarter <-
rep(2010:2015
, each = 4) +
(0:3)/4
meltedSep <-
melt(sepData, id.vars = "Quarter"
, value.name = "Revenue") %>%
separate(variable
, c("Company","Division")
, sep = "_") %>%
mutate(Division = factor(Division
, levels = c(sort(unique(Division))
, "Total")))
fullCompany <-
meltedSep %>%
group_by(Company, Quarter) %>%
summarise(Revenue = sum(Revenue)) %>%
mutate(Division = factor("Total"
, levels = levels(meltedSep$Division)))
The plot you say you want is here. Note that you need to set Divison = NULL to prevent the total from showing up in its own facet:
theme_set(theme_minimal())
catch <- lapply(companies, function(thisCompany){
tempPlot <-
meltedSep %>%
filter(Company == thisCompany) %>%
ggplot(aes(y = Revenue
, x = Quarter)) +
geom_line(aes(col = "Division")) +
facet_wrap(~Division) +
geom_line(aes(col = "Total")
, fullCompany %>%
filter(Company == thisCompany) %>%
mutate(Division = NULL)
) +
ggtitle(thisCompany) +
scale_color_manual(values = c(Division = "darkblue"
, Total = "green3"))
print(tempPlot)
})
Example of the output:
Note, however, that that looks sort of terrible. The difference between the "Total" and any one division is always going to be huge. Instead, you may want to just plot all the divisions on one plot:
allData <-
bind_rows(meltedSep, fullCompany)
catch <- lapply(companies, function(thisCompany){
tempPlot <-
allData %>%
filter(Company == thisCompany) %>%
ggplot(aes(y = Revenue
, x = Quarter
, col = Division)) +
geom_line() +
ggtitle(thisCompany)
# I would add manual colors here, assigned so that, e.g. "Clothes" is always the same
print(tempPlot)
})
Example:
The difference between Total and each is still large, but at least you can compare the divisions.
If it were mine to make though, I would probably make two plots. One with each division from each company (faceted) and one with the totals:
meltedSep %>%
ggplot(aes(y = Revenue
, x = Quarter
, col = Division)) +
geom_line() +
facet_wrap(~Company)
fullCompany %>%
ggplot(aes(y = Revenue
, x = Quarter
, col = Company)) +
geom_line()
There are two other ways I can think to do it using facet_wrap() that are a little more bare-bones:
using annotate() in ggplot2 (simple approach)
doubling your data frames for each company (still relatively simple, just more prone to errors)
Either way, let's recreate your two data frames so that we can reproduce your example:
First create the "total company revenue" data frame:
Quarter <- seq(2011, 2012, by = .25)
CompA <- as.integer(runif(5, 5, 15))
CompB <- as.integer(runif(5, 6, 16))
CompC <- as.integer(runif(5, 7, 17))
df1 <- data.frame(Quarter, CompA, CompB, CompC)
Next, the "segment revenue" data frame of Company A:
CompA_Footwear <- as.integer(runif(5, 0, 5))
CompA_Apparel <- as.integer(runif(5,1 , 6))
CompA_Wholesale <- as.integer(runif(5, 2, 7))
df2 <- data.frame(Quarter, CompA_Footwear, CompA_Apparel, CompA_Wholesale)
Now we will re-arrage your data to be something more recognizable for ggplot2 using melt() from reshape2
require(reshape2)
melt.df1 <- melt(df1, id = "Quarter")
melt.df2 <- melt(df2, id = "Quarter")
df <- rbind(melt.df1, melt.df2)
We are mostly ready to graph now. For sake of example, I'll only focus on "Company A"
Using annotate()
Subset the data so that it only contains "segment revenue" for Company A
CompA.df2 <- df[grep("CompA_", df$variable),]
This assumes all your segment revenue is coded starting with "CompA_*". You will have to subset according to your data.
Now plot:
require(ggplot2)
ggplot(data = CompA.df2, aes(x = Quarter, y = value,
group = variable, colour = variable)) +
geom_line() +
geom_point() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap(~variable) + # Facets by segment
# Next, adds the total revenue data as an annotation
annotate(geom = "line", x = Quarter, y = df1$CompA) +
annotate(geom = "point", x = Quarter, y = df1$CompA)
Basically, we are just annotating the graph with a line and points from our original "total company revenue" data frame for Company A. The major downside to this is the lack of a legend.
The second approach will produce a legend for all values
Duplicating your data
The way facet_wrap() works, we need to define the same facet variables for each of the intended plotted lines on each facet. So we are going to replicate our total revenue for each "segment revenue" level, and group each of these pairs together.
Using the same data frames as above, we are going to separate out the Total Company A Revenue and the Segment Revenue of Company A
CompA.df1 <- df[which(df$variable == "CompA"),] # Total Company A Revenue
CompA.df2 <- droplevels(df[grep("CompA_", df$variable),]) # Segment Revenue of Company A
Now repeat the total revenue data frame for Company A based on how many levels we have for the "Segment Revenue"
rep.CompA.df1 <- CompA.df1[rep(seq_len(nrow(CompA.df1)), nlevels(CompA.df2$variable)), ]
This might be prone to errors if you have NA's or NaN's
Now merge the repeated data frame, and add a facet variable (facet.var here) to pair these together.
CompA.df3 <- rbind(rep.CompA.df1, CompA.df2)
CompA.df3$facet.var <- rep(CompA.df2$variable,2)
Now you are ready to graph. You can still define group = variable, but this time we will set facet_wrap() to our newly created facet.var
require(ggplot2)
ggplot(data = CompA.df3, aes(x = Quarter, y = value,
group = variable, colour = variable)) +
geom_line() +
geom_point() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap(~facet.var)
As you can see, we now have our "Total Revenue" added to the legend:
That plot's a real beaut