I would like to generate a correlation plot with my "True" variable pairs with all of the rest (People variables). I am pretty sure this has been brought up somewhere but solutions I have found do not work for me.
library(ggplot2)
set.seed(0)
dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))
ggplot(dt, aes(x=Salary, y=value)) +
geom_point() +
facet_grid(.~Salary)
Where I got error: Error: Column y must be a 1d atomic vector or a list.
I know one of the solutions is writing out all of the variables in y - which I am trying to avoid because my true data has 15 columns.
Also I am not entirely sure what do the "value", "variables" refer to in the ggplot. I saw them a lot in demonstrating codes.
Any suggestion is appreciated!
You want to convert your data from wide to long format using tidyr::gather() for example. Here is a solution using packages in the tidyverse framework
library(tidyr)
library(ggplot2)
theme_set(theme_bw(base_size = 14))
set.seed(0)
dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))
### convert data frame from wide to long format
dt_long <- gather(dt, key, value, -Salary)
head(dt_long)
#> Salary key value
#> 1 106.31477 People1 98.87866
#> 2 98.36883 People1 101.88698
#> 3 106.64900 People1 100.66668
#> 4 106.36215 People1 104.02095
#> 5 102.07321 People1 99.71447
#> 6 92.30025 People1 102.51804
### plot
ggplot(dt_long, aes(x = Salary, y = value)) +
geom_point() +
facet_grid(. ~ key)
### if you want to add regression lines
library(ggpmisc)
# define regression formula
formula1 <- y ~ x
ggplot(dt_long, aes(x = Salary, y = value)) +
geom_point() +
facet_grid(. ~ key) +
geom_smooth(method = 'lm', se = TRUE) +
stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~")),
label.x.npc = "left", label.y.npc = "top",
formula = formula1, parse = TRUE, size = 3) +
coord_equal()
### if you also want ggpairs() from the GGally package
library(GGally)
ggpairs(dt)
Created on 2019-02-28 by the reprex package (v0.2.1.9000)
You need to stack() your data first, probably that's what you have "seen".
dt <- setNames(stack(dt), c("value", "Salary"))
library(ggplot2)
ggplot(dt, aes(x=Salary, y=value)) +
geom_point() +
facet_grid(.~Salary)
Yields
Related
EDITED:
I have a large data base trying to reapeatedly assess energy expenditue over time with the aim to compare multiple different variables (0/1, e.g. presence of severe head trauma vs. no such). The graph analysis should be repeated for all available variables in the database. All tables should be exported to a PDF File.
Currently I'm using the following code:
library(tidyverse)
library(ggpmisc)
my_data %>%
pdf(file="Plots.pdf" )
print(colnames(my_data) %>%
map(function(x) my_data%>%
ggplot(aes(x = Day,
y = REE,
color=as_factor(x)))+
scale_x_continuous(breaks = c(0,2,4,6,8,10,12,14,16,18,20,22,24,26,28))+
scale_y_continuous(limits= c(0000,4000))+
geom_point()+
geom_smooth(method=lm,
se=TRUE,
size=2/10,
aes(group=as_factor(x)))+
stat_poly_eq(aes(label = paste(after_stat(eq.label),
after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*")),
label.y="bottom", label.x="right")+
labs(x="Time [d]",
y="Resting Energy Expenditure [kcal]")+
scale_colour_grey(start=0.7,
end=0.3)+
theme_bw()
))
dev.off()
It generates the PDF File with all graphs. However, it does not group/color according to the as_factor(x) and all data points are categorised into the same group.
Does anyone have a possible explanation on how to resolve this problem that the categorising according to the factor variable doesn't work?
The issue is that you loop over column names which are character strings. Doing color=as.factor(x) (or group = as.factor(x)) you are mapping a constant character string on the color aes, i.e. you are doing something like color="foo".
If you pass variable names as character strings you to have to tell ggplot2 that you want to map the data column with this name on an aesthetic, which could be achieved via the so-called .data pronoun, e.g. do color=as.factor(.data[[x]]).
Using a minimal reprex based on mtcars:
Note: Personally I would suggest to put your plotting code in a separate function ainstead of passing it as an anonymous function to purrr::map as I do below. Makes debugging easier and your code cleaner.
library(tidyverse)
library(ggpmisc)
my_data <- mtcars
plot_fun <- function(x) {
ggplot(my_data, aes(
x = mpg,
y = hp,
color = as_factor(.data[[x]])
)) +
geom_point() +
geom_smooth(
method = lm,
se = TRUE,
size = 2 / 10,
aes(group = as_factor(.data[[x]]))
) +
stat_poly_eq(aes(label = paste(after_stat(eq.label),
after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*"
)),
label.y = "bottom", label.x = "right"
) +
labs(
x = "Time [d]",
y = "Resting Energy Expenditure [kcal]"
) +
scale_colour_grey(
start = 0.7,
end = 0.3
) +
theme_bw()
}
cols <- c("cyl", "am", "gear") # colnames(my_data)
# pdf(file = "Plots.pdf")
purrr::map(cols, plot_fun)
#> [[1]]
#> `geom_smooth()` using formula 'y ~ x'
#>
#> [[2]]
#> `geom_smooth()` using formula 'y ~ x'
#>
#> [[3]]
#> `geom_smooth()` using formula 'y ~ x'
# dev.off()
I'm trying to evaluate the above data in a boxplot similar to this: https://www.r-graph-gallery.com/89-box-and-scatter-plot-with-ggplot2.html
I want the x axis to reflect my "Year" variable and each boxplot to evaluate the 8 methods as a distribution. Eventually I'd like to pinpoint the "Selected" variable in relation to that distribution but currently I just want this thing to render!
I figure out how to code my y variable and I get various errors no matter what I try. I think the PY needs to be as.factor but I've tried some code that way and I just get other errors.
anyway here is my code (Send Help):
# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(ggplot2)
library(readxl) # For reading in Excel files
library(lubridate) # For handling dates
library(dplyr) # for mutate and pipe functions
# Path to current and prior data folders
DataPath_Current <- "C:/R Projects/Box Plot Test"
Ult_sum <- read_excel(path = paste0(DataPath_Current, "/estimate.XLSX"),
sheet = "Sheet1",
range = "A2:J12",
guess_max = 100)
# just want to see what my table looks like
Ult_sum
# create a dataset - the below is code I commented out
# data <- data.frame(
# name=c(Ult_sum[,1]),
# value=c(Ult_sum[1:11,2:8])
#)
value <- Ult_sum[2,]
# Plot
Ult_sum %>%
ggplot( aes(x= Year, y= value, fill=Year)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("A boxplot with jitter") +
xlab("")
I do not see how your code matches the screenshot of your dataset. However, just a general hint: ggplot likes data in long format. I suggest you reshape your data using tidyr::reshape_long oder data.table::melt. This way you get 3 columns: year, method, value, of which the first two should be a factor. The resulting dataset can then be neatly used in aes() as aes(x=year, y=value, fill=method).
Edit: Added an example. Does this do what you want?
library(data.table)
library(magrittr)
library(ggplot2)
DT <- data.table(year = factor(rep(2010:2014, 10)),
method1 = rnorm(50),
method2 = rnorm(50),
method3 = rnorm(50))
DT_long <- DT %>% melt(id.vars = "year")
ggplot(DT_long, aes(x = year, y = value, fill = variable)) +
geom_boxplot()
I just joined the community and looking forward to get some help for the data analysis for my master thesis.
At the moment I have the following problem:
I plotted 42 varieties with ggplot by using facet_wrap:
`ggplot(sumfvvar,aes(x=TemperaturCmean,y=Fv.Fm,col=treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety)`
That works very well, but I would like to annotate the r squared values for the regression lines. I have two treatments and 42 varieties, therefore 84 regression lines.
Are there any possibilties to calculate all r squared values and integrate them into the ggplot? I found allready the function
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
but that works just for one variety and one treatment. Could be a loop for the lm() function an option?
Here is an example with the ggpmisc package:
library(ggpmisc)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
formula <- y ~ poly(x, 1, raw = TRUE)
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, parse = TRUE,
mapping = aes(label = stat(rr.label)))
You can't apply different labels to different facet, unless you add another r^2 column to your data.. One way is to use geom_text, but you need to calculate the stats you need first. Below I show an example with iris, and for your case, just change Species for Variety, and so on
library(tidyverse)
# simulate data for 2 treatments
# d2 is just shifted up from d1
d1 <- data.frame(iris,Treatment="A")
d2 <- data.frame(iris,Treatment="B") %>%
mutate(Sepal.Length=Sepal.Length+rnorm(nrow(iris),1,0.5))
# combine datasets
DF <- rbind(d1,d2) %>% rename(Variety = Species)
# plot like you did
# note I use "free" scales, if scales very different between Species
# your facet plots will be squished
g <- ggplot(DF,aes(x=Sepal.Width,y=Sepal.Length,col=Treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety,scales="free")
# rsq function
RSQ = function(y,x){signif(summary(lm(y ~ x))$adj.r.squared, 3)}
#calculate rsq for variety + treatment
STATS <- DF %>%
group_by(Variety,Treatment) %>%
summarise(Rsq=RSQ(Sepal.Length,Sepal.Width)) %>%
# make a label
# one other option is to use stringr::str_wrap in geom_text
mutate(Label=paste("Treat",Treatment,", Rsq=",Rsq))
# set vertical position of rsq
VJUST = ifelse(STATS$Treatment=="A",1.5,3)
# finally the plot function
g + geom_text(data=STATS,aes(x=-Inf,y=+Inf,label=Label),
hjust = -0.1, vjust = VJUST,size=3)
For the last geom_text() call, I allowed the y coordinates of the text to be different by multiplying the Treatment.. You might need to adjust that depending on your plot..
I want to use ggplot to loop over several columns to create multiple plots, but using the placeholder in the for loop changes the behavior of ggplot.
If I have this:
t <- data.frame(w = c(1, 2, 3, 4), x = c(23,45,23, 34),
y = c(23,34,54, 23), z = c(23,12,54, 32))
This works fine:
ggplot(data=t, aes(w, x)) + geom_line()
But this does not:
i <- 'x'
ggplot(data=t, aes(w, i)) + geom_line()
Which is a problem if I want to eventually loop over x, y and z.
Any help?
You just need to use aes_string instead of aes, like this:
ggplot(data=t, aes_string(x = "w", y = i)) + geom_line()
Note that w then needs to be specified as a string, too.
ggplot2 > 3.0.0 supports tidy evaluation pronoun .data. So we can do the following:
Build a function that takes x- & y- column names as inputs. Note the use of .data[[]].
Then loop through every column using purrr::map.
library(rlang)
library(tidyverse)
dt <- data.frame(
w = c(1, 2, 3, 4), x = c(23, 45, 23, 34),
y = c(23, 34, 54, 23), z = c(23, 12, 54, 32)
)
Define a function that accept strings as input
plot_for_loop <- function(df, x_var, y_var) {
ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]])) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Loop through every column
plot_list <- colnames(dt)[-1] %>%
map( ~ plot_for_loop(dt, colnames(dt)[1], .x))
# view all plots individually (not shown)
plot_list
# Combine all plots
library(cowplot)
plot_grid(plotlist = plot_list,
ncol = 3)
Edit: the above function can also be written w/ rlang::sym & !! (bang bang).
plot_for_loop2 <- function(df, .x_var, .y_var) {
# convert strings to variable
x_var <- sym(.x_var)
y_var <- sym(.y_var)
# unquote variables using !!
ggplot(df, aes(x = !! x_var, y = !! y_var)) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Or we can just use facet_grid/facet_wrap after convert the data frame from wide to long format (tidyr::gather)
dt_long <- dt %>%
tidyr::gather(key, value, -w)
dt_long
#> w key value
#> 1 1 x 23
#> 2 2 x 45
#> 3 3 x 23
#> 4 4 x 34
#> 5 1 y 23
#> 6 2 y 34
#> 7 3 y 54
#> 8 4 y 23
#> 9 1 z 23
#> 10 2 z 12
#> 11 3 z 54
#> 12 4 z 32
### facet_grid
ggp1 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_grid(. ~ key, scales = "free", space = "free") +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp1
### facet_wrap
ggp2 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_wrap(. ~ key, nrow = 2, ncol = 2) +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp2
### bonus: reposition legend
# https://cran.r-project.org/web/packages/lemon/vignettes/legends.html
library(lemon)
reposition_legend(ggp2 + theme(legend.direction = 'horizontal'),
'center', panel = 'panel-2-2')
The problem is how you access the data frame t. As you probably know, there are several ways of doing so but unfortunately using a character is obviously not one of them in ggplot.
One way that could work is using the numerical position of the column in your example, e.g., you could try i <- 2. However, if this works rests on ggplot which I have never used (but I know other work by Hadley and I guess it should work)
Another way of circumventing this is by creating a new temporary data frame every time you call ggplot. e.g.:
tmp <- data.frame(a = t[['w']], b = t[[i]])
ggplot(data=tmp, aes(a, b)) + geom_line()
Depending on what you are trying to do, I find facet_wrap or facet_grid to work well for creating multiple plots with the same basic structure. Something like this should get you in the right ballpark:
t.m = melt(t, id="w")
ggplot(t.m, aes(w, value)) + facet_wrap(~ variable) + geom_line()
I want to use ggplot to loop over several columns to create multiple plots, but using the placeholder in the for loop changes the behavior of ggplot.
If I have this:
t <- data.frame(w = c(1, 2, 3, 4), x = c(23,45,23, 34),
y = c(23,34,54, 23), z = c(23,12,54, 32))
This works fine:
ggplot(data=t, aes(w, x)) + geom_line()
But this does not:
i <- 'x'
ggplot(data=t, aes(w, i)) + geom_line()
Which is a problem if I want to eventually loop over x, y and z.
Any help?
You just need to use aes_string instead of aes, like this:
ggplot(data=t, aes_string(x = "w", y = i)) + geom_line()
Note that w then needs to be specified as a string, too.
ggplot2 > 3.0.0 supports tidy evaluation pronoun .data. So we can do the following:
Build a function that takes x- & y- column names as inputs. Note the use of .data[[]].
Then loop through every column using purrr::map.
library(rlang)
library(tidyverse)
dt <- data.frame(
w = c(1, 2, 3, 4), x = c(23, 45, 23, 34),
y = c(23, 34, 54, 23), z = c(23, 12, 54, 32)
)
Define a function that accept strings as input
plot_for_loop <- function(df, x_var, y_var) {
ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]])) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Loop through every column
plot_list <- colnames(dt)[-1] %>%
map( ~ plot_for_loop(dt, colnames(dt)[1], .x))
# view all plots individually (not shown)
plot_list
# Combine all plots
library(cowplot)
plot_grid(plotlist = plot_list,
ncol = 3)
Edit: the above function can also be written w/ rlang::sym & !! (bang bang).
plot_for_loop2 <- function(df, .x_var, .y_var) {
# convert strings to variable
x_var <- sym(.x_var)
y_var <- sym(.y_var)
# unquote variables using !!
ggplot(df, aes(x = !! x_var, y = !! y_var)) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Or we can just use facet_grid/facet_wrap after convert the data frame from wide to long format (tidyr::gather)
dt_long <- dt %>%
tidyr::gather(key, value, -w)
dt_long
#> w key value
#> 1 1 x 23
#> 2 2 x 45
#> 3 3 x 23
#> 4 4 x 34
#> 5 1 y 23
#> 6 2 y 34
#> 7 3 y 54
#> 8 4 y 23
#> 9 1 z 23
#> 10 2 z 12
#> 11 3 z 54
#> 12 4 z 32
### facet_grid
ggp1 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_grid(. ~ key, scales = "free", space = "free") +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp1
### facet_wrap
ggp2 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_wrap(. ~ key, nrow = 2, ncol = 2) +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp2
### bonus: reposition legend
# https://cran.r-project.org/web/packages/lemon/vignettes/legends.html
library(lemon)
reposition_legend(ggp2 + theme(legend.direction = 'horizontal'),
'center', panel = 'panel-2-2')
The problem is how you access the data frame t. As you probably know, there are several ways of doing so but unfortunately using a character is obviously not one of them in ggplot.
One way that could work is using the numerical position of the column in your example, e.g., you could try i <- 2. However, if this works rests on ggplot which I have never used (but I know other work by Hadley and I guess it should work)
Another way of circumventing this is by creating a new temporary data frame every time you call ggplot. e.g.:
tmp <- data.frame(a = t[['w']], b = t[[i]])
ggplot(data=tmp, aes(a, b)) + geom_line()
Depending on what you are trying to do, I find facet_wrap or facet_grid to work well for creating multiple plots with the same basic structure. Something like this should get you in the right ballpark:
t.m = melt(t, id="w")
ggplot(t.m, aes(w, value)) + facet_wrap(~ variable) + geom_line()