ggplot2 - passing dataframe with column names

ggplot2 - passing dataframe with column names - r

I looked at other solutions but cannot get a logical within ggplot to work correctly. I have the following function. A dataframe is passed alongwith two
columns to plot as a scatter plot.
scatter_plot2 <- function(df, xaxis, yaxis){
b <- ggplot(data = df, aes_string(xaxis, yaxis), environment = environment())
gtype <- geom_point(aes(alpha = 0.2, color = yaxis > 0))
sm <- geom_smooth(formula = xaxis ~ yaxis, color="black")
b + gtype + sm + theme_bw()
}
which I call using :
scatter_plot2(train_df, "train_df$signal", "train_df$yhat5")
===
The color = yaxis > 0
is intended to plot points above (yaxis) 0 in "green" and ones below in "red". While i'm able to get the string names to correctly display on the axis, I'm not able to get the logical to work correctly.
Please help.

Since you're creating your own function for this, just calculate the needed color ahead of time. Since you're passing in a data frame and the variables, you'll need to use some standard evaluation (you're already doing this using aes_string).
I cleaned up the code a bit, putting the ggplot statement into a single chain,, making some aes calls explicit, and making your smooth formula y~x. You also don't want to use $ when passing in the variables, just pass quoted names.
library(dplyr)
library(ggplot2)
scatter_plot2 <- function(df, xaxis, yaxis){
df <- mutate_(df, color = ~ifelse(yaxis > 0, "green", "red"))
ggplot(data = df, aes_string(x = xaxis, y = yaxis)) +
geom_point(aes(alpha = 0.2, color = color)) +
geom_smooth(formula = y ~ x, color =" black") +
scale_color_identity() +
theme_bw()
}
The call would be (using iris for an example):
scatter_plot2(iris, "Sepal.Width", "Sepal.Length")
resulting in:

Related

Is there a way to pass the data of a ggplot2 call to the scale_* functions that works with .+gg in one pass [duplicate]

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot

The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()

Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

How to use sec_axis() for discrete data in ggplot2 R?

I have discreet data that looks like this:
height <- c(1,2,3,4,5,6,7,8)
weight <- c(100,200,300,400,500,600,700,800)
person <- c("Jack","Jim","Jill","Tess","Jack","Jim","Jill","Tess")
set <- c(1,1,1,1,2,2,2,2)
dat <- data.frame(set,person,height,weight)
I'm trying to plot a graph with same x-axis(person), and 2 different y-axis (weight and height). All the examples, I find is trying to plot the secondary axis (sec_axis), or discreet data using base plots.
Is there an easy way to use sec_axis for discreet data on ggplot2?
Edit: Someone in the comments suggested I try the suggested reply. However, I run into this error now
Here is my current code:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("height",sec_axis(~.*1.2, name="height"))
p2
I get the error: Error in x < range[1] :
comparison (3) is possible only for atomic and list types
Alternately, now I have modified the example to match this example posted.
p <- ggplot(dat, aes(x = person))
p <- p + geom_line(aes(y = height, colour = "Height"))
# adding the relative weight data, transformed to match roughly the range of the height
p <- p + geom_line(aes(y = weight/100, colour = "Weight"))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*100, name = "Relative weight [%]"))
# modifying colours and theme options
p <- p + scale_colour_manual(values = c("blue", "red"))
p <- p + labs(y = "Height [inches]",
x = "Person",
colour = "Parameter")
p <- p + theme(legend.position = c(0.8, 0.9))+ facet_wrap(~set, scales="free")
p
I get an error that says
"geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?"
I get the template, but no points get plotted

R function arguments are fed in by position if argument names are not specified explicitly. As mentioned by #Z.Lin in the comments, you need sec.axis= before your sec_axis function to indicate that you are feeding this function into the sec.axis argument of scale_y_continuous. If you don't do that, it will be fed into the second argument of scale_y_continuous, which by default, is breaks=. The error message is thus related to you not feeding in an acceptable data type for the breaks argument:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("weight", sec.axis = sec_axis(~.*1.2, name="height"))
p2
The first argument (name=) of scale_y_continuous is for the first y scale, where as the sec.axis= argument is for the second y scale. I changed your first y scale name to correct that.

Refering to a variable of the data frame passed in the 'data' parameter of ggplot function

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot

The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()

Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

ggplot2: how to add sample numbers to density plot?

I am trying to generate a (grouped) density plot labelled with sample sizes.
Sample data:
set.seed(100)
df <- data.frame(ab.class = c(rep("A", 200), rep("B", 200)),
val = c(rnorm(200, 0, 1), rnorm(200, 1, 1)))
The unlabelled density plot is generated and looks as follows:
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
What I want to do is add text labels somewhere near the peak of each density, showing the number of samples in each group. However, I cannot find the right combination of options to summarise the data in this way.
I tried to adapt the code suggested in this answer to a similar question on boxplots: https://stackoverflow.com/a/15720769/1836013
n_fun <- function(x){
return(data.frame(y = max(x), label = paste0("n = ",length(x))))
}
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4) +
stat_summary(geom = "text", fun.data = n_fun)
However, this fails with Error: stat_summary requires the following missing aesthetics: y.
I also tried adding y = ..density.. within aes() for each of the geom_density() and stat_summary() layers, and in the ggplot() object itself... none of which solved the problem.
I know this could be achieved by manually adding labels for each group, but I was hoping for a solution that generalises, and e.g. allows the label colour to be set via aes() to match the densities.
Where am I going wrong?

The y in the return of fun.data is not the aes. stat_summary complains that he cannot find y, which should be specificed in global settings at ggplot(df, aes(x = val, group = ab.class, y = or stat_summary(aes(y = if global setting of y is not available. The fun.data compute where to display point/text/... at each x based on y given in the data through aes. (I am not sure whether I have made this clear. Not a native English speaker).
Even if you have specified y through aes, you won't get desired results because stat_summary compute a y at each x.
However, you can add text to desired positions by geom_text or annotate:
# save the plot as p
p <- ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
# build the data displayed on the plot.
p.data <- ggplot_build(p)$data[[1]]
# Note that column 'scaled' is used for plotting
# so we extract the max density row for each group
p.text <- lapply(split(p.data, f = p.data$group), function(df){
df[which.max(df$scaled), ]
})
p.text <- do.call(rbind, p.text) # we can also get p.text with dplyr.
# now add the text layer to the plot
p + annotate('text', x = p.text$x, y = p.text$y,
label = sprintf('n = %d', p.text$n), vjust = 0)

Why are the colors wrong on this ggplot? [duplicate]

This question already has an answer here:
ggplot wrong color assignment
(1 answer)
Closed 7 months ago.
I am new to ggplot2 so please have mercy on me.
My first attempt produces a strange result (at least it's strange to me). My reproducible R code is:
library(ggplot2)
iterations = 7
variables = 14
data <- matrix(ncol=variables, nrow=iterations)
data[1,] = c(0,0,0,0,0,0,0,0,10134,10234,10234,10634,12395,12395)
data[2,] = c(18596,18596,18596,18596,19265,19265,19390,19962,19962,19962,19962,20856,20856,21756)
data[3,] = c(7912,11502,12141,12531,12718,12968,13386,17998,19996,20226,20388,20583,20879,21367)
data[4,] = c(0,0,0,0,0,0,0,43300,43500,44700,45100,45100,45200,45200)
data[5,] = c(11909,11909,12802,12802,12802,13202,13307,13808,21508,21508,21508,22008,22008,22608)
data[6,] = c(11622,11622,11622,13802,14002,15203,15437,15437,15437,15437,15554,15554,15755,16955)
data[7,] = c(8626,8626,8626,9158,9158,9158,9458,9458,9458,9458,9458,9458,9558,11438)
df <- data.frame(data)
n_data_rows = nrow(df)
previous_volumes = df[1:(n_data_rows-1),]/1000
todays_volume = df[n_data_rows,]/1000
time = seq(ncol(df))/6
min_y = min(previous_volumes, todays_volume)
max_y = max(previous_volumes, todays_volume)
ylimit = c(min_y, max_y)
x = seq(nrow(previous_volumes))
# This gives a plot with 6 gray lines and one red line, but no Ledgend
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1"), color="gray")
}
p
This code produces a correct plot... but no legend. The plot looks like:
If I move "color" inside "aes", I now get a legend... but the colors are wrong.
For example, the code:
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1", color="gray"))
}
y2 = as.integer(todays_volume[1,])
dd = data.frame(time, y2)
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p
produces:
Why are the line colors wrong?
Charles

Colours can be controlled on an individual layer basis (i.e. the colour = XYZ) variable, however, these will not appear in any legend. Legends are produced when you have an aesthetic (i.e. in this case colour aesthetic) mapped to a variable in your data, in which case, you need to instruct how to to represent that specific mapping. If you do not specify explicitly, ggplot2 will try to make a best guess (say in the difference between discrete and continuous mapping for factor data vs numeric data). There are many options available here, including (but not limited to): scale_colour_continuous, scale_colour_discrete, scale_colour_brewer, scale_colour_manual.
By the sounds of it, scale_colour_manual is probably what you are after, note that in the below I have mapped the 'variable' column in the data to the colour aesthetic, and in the 'variable' data, the discrete values [PREV-A to PREV-F,Today] exists, so now we need to instruct what actual colour 'PREV-A','PREV-B',...'PREV-F' and 'Today' represents.
Alternatively, If the variable column contains 'actual' colours (i.e. hex '#FF0000' or name 'red') then you can use scale_colour_identity. We can also create another column of categories ('Previous','Today') to make things a little easier, in which case, be sure to introduce the 'group' aesthetic mapping to prevent series with the same colour (which are actually different series) being made continuous between them.
First prepare the data, then go through some different methods to assign colours.
# Put data as points 1 per row, series as columns, start with
# previous days
df.new = as.data.frame(t(previous_volumes))
#Rename the series, for colour mapping
colnames(df.new) = sprintf("PREV-%s",LETTERS[1:ncol(df.new)])
#Add the times for each point.
df.new$Times = seq(0,1,length.out = nrow(df.new))
#Add the Todays Volume
df.new$Today = as.numeric(todays_volume)
#Put in long format, to enable mapping of the 'variable' to colour.
df.new.melt = reshape2::melt(df.new,'Times')
#Create some colour mappings for use later
df.new.melt$color_group = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='Today','Previous'))
df.new.melt$color_identity = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='red','grey'))
And here are a few different ways of manipulating the colours:
#1. Base plot + color mapped to variable
plot1 = base + geom_path(aes(color=variable)) +
ggtitle("Plot #1")
#2. Base plot + color mapped to variable, Manual scale for Each of the previous days and today
colors = setNames(c(rep('gray',nrow(previous_volumes)),'red'),
unique(df.new.melt$variable))
plot2 = plot1 + scale_color_manual(values = colors) +
ggtitle("Plot #2")
#3. Base plot + color mapped to color group
plot3 = base + geom_path(aes(color = color_group,group=variable)) +
ggtitle("Plot #3")
#4. Base plot + color mapped to color group, Manual scale for each of the groups
plot4 = plot3 + scale_color_manual(values = c('Previous'='gray','Today'='red')) +
ggtitle("Plot #4")
#5. Base plot + color mapped to color identity
plot5 = base + geom_path(aes(color = color_identity,group=variable))
plot5a = plot5 + scale_color_identity() + #Identity not usually in legend
ggtitle("Plot #5a")
plot5b = plot5 + scale_color_identity(guide='legend') + #Identity forced into legend
ggtitle("Plot #5b")
gridExtra::grid.arrange(plot1,plot2,plot3,plot4,
plot5a,plot5b,ncol=2,
top="Various Outputs")
So given your question, #2 or #4 is probably what you are after, using #2, we can add another layer to render the value of the last points:
#Additionally, add label of the last point in each series.
df.new.melt.labs = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%.2f",df$value)
df
})
baseWithLabels = base +
geom_path(aes(color=variable)) +
geom_label(data = df.new.melt.labs,aes(label=label,color=variable),
position = position_nudge(y=1.5),size=3,show.legend = FALSE) +
scale_color_manual(values=colors)
print(baseWithLabels)
If you want to be able to distinguish between the various 'PREV-X' lines, then you can also map linetype to this variable and/or make the label geometry more descriptive, below demonstrates both modifications:
#Add labels of the last point in each series, include series info:
df.new.melt.labs2 = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%s: %.2f",df$variable,df$value)
df
})
baseWithLabelsAndLines = base +
geom_path(aes(color=variable,linetype=variable)) +
geom_label(data = df.new.melt.labs2,aes(label=label,color=variable),
position = position_nudge(y=1.5),hjust=1,size=3,show.legend = FALSE) +
scale_color_manual(values=colors) +
labs(linetype = 'Series')
print(baseWithLabelsAndLines)

My solution, which I got from here is to add scale_colour_identity() to your ggplot object -
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p = p + scale_colour_identity()
p

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot2 - passing dataframe with column names - r

Related

Is there a way to pass the data of a ggplot2 call to the scale_* functions that works with .+gg in one pass [duplicate]

How to use sec_axis() for discrete data in ggplot2 R?

Refering to a variable of the data frame passed in the 'data' parameter of ggplot function

ggplot2: how to add sample numbers to density plot?

Why are the colors wrong on this ggplot? [duplicate]

Categories

Resources