Related
I have the following example dataframe and want to plot columns b (x-axis) and d (y-axis) but based on column a. Meaning I want the rows in b and d that correspond to value 1 in column a to be plotted next to each other (as points or vertical lines) and then similarly for value 2 in column a etc. (all in one graph).
I am having trouble doing this as it would mean the values on the x-axis will increase and decrease/fluctuate.
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = b, "d" = d)
plot(df$b, df$d)
I have tried basic plotting as shown above (not the results I want) and have tried other methods to relabel the values on the x-axis but that is also incorrect.
Lastly, I am unable to use any R libraries as I cannot download them on the computer I am using.
Thank you in advance for any help!
Not sure if I understand the question properly, but perhaps plotting the interaction of "a" and "b" would work?
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = interaction(b, a), "d" = d)
df$b <- factor(df$b, levels = df$b)
png("test.png", width = 900, height = 400, units = "px")
plot(df$b, df$d)
dev.off()
Edit
Or maybe remove the x-axis labels and add them in 'manually'?
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = b, "d" = d)
png("test.png", width = 900, height = 400, units = "px")
plot(seq_along(df$b), df$d, xaxt = "none")
axis(1, at = seq_along(df$b), labels = df$b)
dev.off()
Edit 2
Another approach that may better "preserve the spacing" of the x-axis is to split the dataframe and plot each group separately:
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = b, "d" = d)
split_df <- split(df, df$a)
par(mfrow = c(1, 3), mar = c(2, 2, 1, 1))
plot(split_df$`1`$b, split_df$`1`$d, ylim = c(0, 1))
plot(split_df$`2`$b, split_df$`2`$d, ylim = c(0, 1))
plot(split_df$`3`$b, split_df$`3`$d, ylim = c(0, 1))
Created on 2021-10-15 by the reprex package (v2.0.1)
I have the following data frame which contains 4 columns of data in addition to the vector of labels c.
Time <-c(1:4)
d<-data.frame(Time,
x1= rpois(n = 4, lambda = 10),
x2= runif(n = 4, min = 1, max = 10),
x3= rpois(n = 4, lambda = 5),
x4= runif(n = 4, min = 1, max = 5),
c=c(1,1,2,3))
I would like to use ggpolt to plot 4 curves"x1,..,x4" above each others where each curve is colored according to the label. So curves x1 and x2 are colored by the same color since they have the same label where as curves x3 and x4 in different colors.
I did the following
d %>% pivot_longer(-c(Time,x1,x2,x3,x4))%>%
rename(class=value) %>% select(-name) %>%
pivot_longer(-c(Time,class)) %>%
mutate(Label=ifelse(Time==max(Time,na.rm = T),name,NA),
Label=ifelse(duplicated(Label),NA,Label)) %>%
ggplot(aes(x=Time,y=value,color=factor(class),group=name))+
geom_line()+
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))+
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE,show.legend = F,color='black')
but I don't get the needed plot, the resulted curves are not colored according to the label. I want x1 and x2 in red, x3 in blue and x4 in green.
To add: I would like to obtain the same plot above in the following general case, where I can't add the vector c to the data frame as length(c) is not equal to length(x1)=...=length(x4)
Time <-c(1:5)
d<-data.frame(Time,
x1= rpois(n = 5, lambda = 10),
x2= runif(n = 5, min = 1, max = 10),
x3= rpois(n = 5, lambda = 5),
x4= runif(n = 5, min = 1, max = 5))
and c=c(1,1,2,3)
As you point out in your comments, it is only possible to put the vector of colors as a column in the original data.frame because it happens to be square, but this is a dangerous way to store the information because the colors really belong to the columns rather than the rows. It's better to assign the colors separately and then join into the long format data by variable name prior to plotting.
Below is an example of how I'd do this with your data.
First, prepare the data without the color mapping for each variable, we'll do that next:
# load necessary packages
library(tidyverse)
library(ggrepel)
# set seed to make simulated data reproducible
set.seed(1)
# simulate data
Time <-c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5))
Next, make a separate data.frame that maps the color grouping to the variable names. At some point you'll want to make this a factor (i.e. discrete rather than continuous) to map it to color so I just do it here but it can be done later in the ggplot call if you prefer. Per your request, this solution easily scales with your dataset without needing to manually set each level, but it requires that your vector of color mappings is in the same order and the same length as the variable names in d unless you have some other way to establish that relationship.
# create separate df with color groupings for variable in d
color_grouping <- data.frame(var = names(d)[-1],
color_group = factor(c(1, 1, 2, 3)))
Then you pivot_longer and do a join to merge the color mapping with the data for plotting.
# pivot d to long and merge in color codes
d_long <- d %>%
pivot_longer(cols = -Time, names_to = "var", values_to = "value") %>%
left_join(., color_grouping)
# inspect final table prior to plotting to confirm color mappings
head(d_long, 4)
# # A tibble: 4 x 4
# Time var value color_group
# <int> <chr> <dbl> <fct>
# 1 1 x1 8 1
# 2 1 x2 1.56 1
# 3 1 x3 4 2
# 4 1 x4 4.97 3
Finally, generate line plot where color is mapped to the color_group variable. To ensure you get one line per original variable you also need to set group = var. For more info on this check the documentation on grouping.
# plot data adding labels for each line
p <- d_long %>%
ggplot(aes(x = Time, y = value, group = var, color = color_group)) +
geom_line() +
labs(color='class') +
scale_color_manual(values=c('red','blue','green')) +
geom_label_repel(aes(label = var),
data = d_long %>% slice_max(order_by = Time, n = 1),
nudge_x = 1.5,
na.rm = TRUE,
show.legend = F,
color='black')
p
This produces the this plot:
In your comment you suggested wanting to separate out and stacking the plots. I'm not sure I fully understood, but one way to accomplish this is with faceting.
For example if you wanted to facet out separate panels by color_group, you could add this line to the plot above:
p + facet_grid(rows = "color_group")
Which gives this plot:
Note that the faceting variable must be put in quotes.
You were on the right path, but you need a little bit of a different structure to use ggplot:
# delete old color column
d$c <- NULL
# reshape df
plot.d <- reshape2::melt(d, id.vars = c("Time"))
# create new, correct color column
plot.d$c <- NA
plot.d$c[plot.d$variable == "x1"] <- 1
plot.d$c[plot.d$variable == "x2"] <- 1
plot.d$c[plot.d$variable == "x3"] <- 2
plot.d$c[plot.d$variable == "x4"] <- 3
# plot
ggplot(plot.d, aes(x=Time, y=value, color=as.factor(c), group = variable))+
geom_line() +
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))
Note that I omitted the labels for brevity, but you can add them back in using the same logic. The code above gives the following result:
Here is a solution for how I understood your question.
The DF is brought in the long format, the variable c is replaced with mutate / case_when with the number code you have used.
I have set a seed for better reproducibility.
library(tidyverse)
library(ggrepel)
set.seed(1)
# YOUR DATA
Time <- c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5),
c = c(1, 1, 2, 3)
)
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(c = as.factor(case_when( # replace consistently
name == "x1" | name == "x2" ~ 1, # according to YOUR DATA
name == "x3" ~ 2,
name == "x4" ~ 3
))) %>%
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = c, group = name)) +
geom_line() +
labs(color = "class") +
scale_color_manual(values = c("red", "blue", "green")) + # YOUR CHOICE
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)
ADDED
You could leave the c out and color according to name.
The color code was neccessary because you wanted 2 names with the same color. If that is not needed, the following code can do it.
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = name, group = name)) +
geom_line() +
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)
I need to scatter plot Observed Vs Predicted data of each Variable using facet_wrap functionality of ggplot. I might be close but not there yet. I use some suggestion from an answer to my previous question to gather the data to automate the plotting process. Here is my code so far- I understand that the aes of my ggplot is wrong but I used it purposely to make my point clear. I would also like to add geom_smooth to have the confidence interval.
library(tidyverse)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1,DF2)
DF_long = gather(DF, key = "Variable", value = "Value", -df)
ggplot(DF_long, aes(x = Observed, y = Predicted))+
geom_point() + facet_wrap(Variable~.)+ geom_smooth()
I should see a plot like below, comparing Observed Vs Predicted for each Variable.
We will need to convert each dataframe separately then cbind as x is Observed and y is Predicted, then facet, see this example:
library(ggplot2)
# reproducible data with seed
set.seed(1)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1_long <- gather(DF1, key = "group", "Observed")
DF2_long <- gather(DF2, key = "group", "Predicted")
plotDat <- cbind(DF1_long, DF2_long[, -1, drop = FALSE])
head(plotDat)
# group Observed Predicted
# 1 A 3.389578 10.590824
# 2 A 4.349115 10.234584
# 3 A 6.155680 8.298577
# 4 A 9.173870 11.750885
# 5 A 2.815137 7.942874
# 6 A 9.085507 6.203175
ggplot(plotDat, aes(x = Observed, y = Predicted))+
geom_point() +
facet_wrap(group~.) +
geom_smooth()
We can use ggpubr to add P and R values to the plot see answers in this post:
Similarly, consider merge on reshaped data frames using base R's reshape (avoiding any tidyr dependencies in case you are a package author). Below lapply + Reduce dynamically merges to bypass helper objects, DF1_long and DF2_long, in global environment:
Data
set.seed(10312019)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10),
C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14),
C = runif(12, 3,12), D = runif(12, 4,8))
Plot
library(ggplot2) # ONLY IMPORTED PACKAGE
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1, DF2)
DF_long <- Reduce(function(x,y) merge(x, y, by=c("Variable", "id")),
lapply(list(DF1, DF2), function(df)
reshape(df, varying=names(DF)[1:(length(names(DF))-1)],
times=names(DF)[1:(length(names(DF))-1)],
v.names=df$df[1], timevar="Variable", drop="df",
new.row.names=1:1E5, direction="long")
)
)
head(DF_long)
# Variable id Observed Predicted
# 1 A 1 6.437720 11.338586
# 2 A 10 4.690934 9.861456
# 3 A 11 6.116200 9.020343
# 4 A 12 6.499371 5.904779
# 5 A 2 6.779087 5.901970
# 6 A 3 6.499652 8.557102
ggplot(DF_long, aes(x = Observed, y = Predicted)) +
geom_point() + geom_smooth() + facet_wrap(Variable~.)
types = c("A", "B", "C")
df = data.frame(n = rnorm(100), type=sample(types, 100, replace = TRUE))
ggplot(data=df, aes(n)) + geom_histogram() + facet_grid(~type)
Above is how I normally used facetting. But can I use it when instead of a categorical variable I have a set of columns that are indicator variables such as:
df = data.frame(n = rnorm(100), A=rbinom(100, 1, .5), B=rbinom(100, 1, .5), C=rbinom(100, 1, .5))
Now the "Type" variable from my previous example isn't mutually exclusive. An observation can be "A and B" or "A and B and C" for example. However, I'd still like an individual histogram for any observation that has the presence of A, B, or C?
I would reshape the data with tidyr so that data in more that one category are duplicated. filter to remove unwanted cases.
df <- data.frame(
n = rnorm(100),
A = rbinom(100, 1, .5),
B = rbinom(100, 1, .5),
C = rbinom(100, 1, .5)
)
library("tidyr")
library("dplyr")
library("ggplot2")
df %>% gather(key = "type", value = "value", -n) %>%
filter(value == 1) %>%
ggplot(aes(x = n)) +
geom_histogram() +
facet_wrap(~type)
I've always despised gather, so I'll add another method and one for the data.table fans.
library(data.table)
DT <- melt(setDT(df), id= "n", variable = "type")[value > 0]
ggplot(DT,aes(n)) + geom_histogram() + facet_grid(~type)
#tidyland
library(reshape2)
library(dplyr)
library(ggplot2)
df %>%
melt(id = "n", variable = "type") %>%
filter(value > 0) %>%
ggplot(aes(n)) + geom_histogram() + facet_grid(~type)
Here is my question, I have a data like this
A B C D
a 24 1 2 3
b 26 2 3 1
c 25 3 1 2
Now I would like to plot A in a Y axis (0 to 30) and B~D in another Y axis (0 to 5) in one graph. Also, I want a, b, c row has a line to link them together (lets say a, b, c represents a mouse ID). Could anyone come up with ideas on how to do it? I prefer using R. Thanks in advance!
# create some data
data = as.data.frame(list(A = c(24,26,25),
B = c(1,2,3),
C = c(2,3,1),
D = c(3,1,2)))
# adjust your margins to allow room for your second axis
par(mar=c(5, 4, 4, 4) + 0.1)
# create your first plot
plot(1:3,data$A,pch = 19,ylab = "1st ylab",xlab="index")
# set par to new so you dont' overwrite your current plot
par(new=T)
# set axes = F, set your ylim and remove your labels
plot(1:3,data$B,ylim = c(0,5), pch = 19, col = 2,
xlab="", ylab="",axes = F)
# add your points
points(1:3,data$C,pch = 19,col = 3)
points(1:3,data$D, pch = 19,col = 4)
# set the placement for your axis and add text
axis(4, ylim=c(0,5))
mtext("2nd ylab",side=4,line=2.5)
I greatly prefer using ggplot2 for plotting. Sadly, ggplot2 does not support this for philosophical reasons.
I would like to propose an alternative which uses facets, i.e. subplots. Note that to be able to plot the data using ggplot2, we need to change the data structure. We do this using gather from the tidyr package. In addition, I use the programming style as defined in dplyr (which uses piping a lot):
library(ggplot2)
library(dplyr)
library(tidyr)
df = data.frame(A = c(24, 26, 25), B = 1:3, C = c(2, 3, 1), D = c(3, 1, 2))
plot_data = df %>% mutate(x_value = rownames(df)) %>% gather(variable, value, -x_value)
ggplot(plot_data) + geom_line(aes(x = x_value, y = value, group = variable)) +
facet_wrap(~ variable, scales = 'free_y')
Here, each subplot has it's own y-axis.