everyone. I`m reading two numeric vectors from files, and I want to plot two ecdfs on the one plot using ggplot2, but I seem to fail:
>exp = rnorm(100)
>cont = rnorm(100)
> ggplot() + stat_ecdf(data = exp) + stat_ecdf(data = cont)
Error: ggplot2 doesn't know how to deal with data of class numeric
How do I plot them together without getting this kind of error?
library(ggplot2)
var1 = rnorm(100)
var2 = rnorm(100)
DF <- data.frame(variable=rep(c('var1', 'var2'), each=100), value=c(var1, var2))
ggplot(DF) + stat_ecdf(aes(value, color=variable))
You get an error because you are not using a data.frame, which should be a foundamental practice in ggplot2. Moreover, you are missing the aes which is mandatory when you are dealing with variables. Lastly, try to use stat_ecdf only once, and use color, shape, etc.. to distinguish among different variables.
Related
I am trying to make a simple heatmap from a simple dataframe in R. The example is fully reproducible.
I have a simple data.frame from the breat cancer dataset.
data(BreastCancer, package="mlbench")
bc <- BreastCancer[complete.cases(BreastCancer), ]
I am using ggplot to try and create a simple heatmap using the following code:
x <- bc[2:9]
y <- bc[10]
data <- expand.grid(X=x, Y=y)
# Heatmap
ggplot(data, aes(X, Y)) +
geom_tile()
however I get the following error:
Error in is.finite(x) : default method not implemented for type 'list'
I have followed the example on the ggplot2 guide and don't understand this error and a fairly new R user.
Could someone help please.
Thanks
what sort of plot are you looking for? You are giving ggplot a list where it expects a data.frame. Which variables do you want on the X and Y axes?
ggplot(data[[1]], aes(Cl.thickness, Cell.size)) + geom_tile()
I feel like you may be wanting to plot this simply from bc data.frame.
ggplot(bc, aes(x = Cl.thickness, y = Cell.size)) +
+ geom_tile(aes(fill = Cell.shape))
Based on your comment i think you need a different way to handle your data than expand.grid and then to use facet like this:
library(dplyr)
library(tidyr)
xdat <- bc %>% gather(variable, value, -Id, - Mitoses)
ggplot(xdat, aes(x = Mitoses, y = value)) +
geom_tile() +
facet_wrap(~ variable)
I want to create a correlation matrix plot, i.e. a plot where each variable is plotted in a scatterplot against each other variable like with pairs() or splom(). I want to do this with ggplot2. See here for examples. The link mentions some code someone wrote for doing this in ggplot2, however, it is outdated and no longer works (even after you swap out the deprecated parts).
One could do this with a loop in a loop and then multiplot(), but there must be a better way. I tried melting the dataset to long, and copying the value and variable variables and then using facets. This almost gives you something correct.
d = data.frame(x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100),
x5=rnorm(100))
library(reshape2)
d = melt(d)
d$value2 = d$value
d$variable2 = d$variable
library(ggplot2)
ggplot(data=d, aes(x=value, y=value2)) +
geom_point() +
facet_grid(variable ~ variable2)
This gets the general structure right, but only works for the plotting each variable against itself. Is there some more clever way of doing this without resorting to 2 loops?
library(GGally)
set.seed(42)
d = data.frame(x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100),
x5=rnorm(100))
# estimated density in diagonal
ggpairs(d)
# blank
ggpairs(d, diag = list("continuous"="blank")
Using PerformanceAnalytics library :
library("PerformanceAnalytics")
chart.Correlation(df, histogram = T, pch= 19)
I have a huge data frame and I would like to make some plots to get an idea of the associations among different variables. I cannot use
pairs(data)
, because that would give me 400+ plots. However, there's one response variable y I'm particularly interested in. Thus, I'd like to plot y against all variables, which would reduce the number of plots from n^2 to n. How can I do it?
EDIT: I add an example for the sake of clarity. Let's say I have the dataframe
foo=data.frame(x1=1:10,x2=seq(0.1,1,0.1),x3=-7:2,x4=runif(10,0,1))
and my response variable is x3. Then I'd like to generate four plots arranged in a row, respectively x1 vs x3, x2 vs x3, an histogram of x3 and finally x4 vs x3. I know how to make each plot
plot(foo$x1,foo$x3)
plot(foo$x2,foo$x3)
hist(foo$x3)
plot(foo$x4,foo$x3)
However I have no idea how to arrange them in a row. Also, it would be great if there was a way to automatically make all the n plots, without having to call the command plot (or hist) each time. When n=4, it's not that big of an issue, but I usually deal with n=20+ variables, so it can be a drag.
Could do reshape2/ggplot2/gridExtra packages combination. This way you don't need to specify the number of plots. This code will work on any number of explaining variables without any modifications
foo <- data.frame(x1=1:10,x2=seq(0.1,1,0.1),x3=-7:2,x4=runif(10,0,1))
library(reshape2)
foo2 <- melt(foo, "x3")
library(ggplot2)
p1 <- ggplot(foo2, aes(value, x3)) + geom_point() + facet_grid(.~variable)
p2 <- ggplot(foo, aes(x = x3)) + geom_histogram()
library(gridExtra)
grid.arrange(p1, p2, ncol=2)
The package tidyr helps doing this efficiently. please refer here for more options
data %>%
gather(-y_value, key = "some_var_name", value = "some_value_name") %>%
ggplot(aes(x = some_value_name, y = y_value)) +
geom_point() +
facet_wrap(~ some_var_name, scales = "free")
you would get something like this
If your goal is only to get an idea of the associations among different variables, you can also use:
plot(y~., data = foo)
It is not as nice as using ggplot and it doesn't automatically put all the graphs in one window (although you can change that using par(mfrow = c(a, b)), but it is a quick way to get what you want.
I faced the same problem, and I don't have any experience of ggplot2, so I created a function using plot which takes the data frame, and the variables to be plotted as arguments and generate graphs.
dfplot <- function(data.frame, xvar, yvars=NULL)
{
df <- data.frame
if (is.null(yvars)) {
yvars = names(data.frame[which(names(data.frame)!=xvar)])
}
if (length(yvars) > 25) {
print("Warning: number of variables to be plotted exceeds 25, only first 25 will be plotted")
yvars = yvars[1:25]
}
#choose a format to display charts
ncharts <- length(yvars)
nrows = ceiling(sqrt(ncharts))
ncols = ceiling(ncharts/nrows)
par(mfrow = c(nrows,ncols))
for(i in 1:ncharts){
plot(df[,xvar],df[,yvars[i]],main=yvars[i], xlab = xvar, ylab = "")
}
}
Notes:
You can provide the list of variables to be plotted as yvars,
otherwise it will plot all (or first 25, whichever is less) the variables in the data frame against xvar.
Margins were going out of bounds if the number of plots exceeds 25,
so I kept a limit to plot 25 charts only. Any suggestions to nicely
handle this are welcome.
Also the y axis labels are removed as titles of the graphs take care
of it. x axis label is set to xvar.
At the moment I`m writing my bachelor thesis and all of my plots are created with ggplot2. Now I need a plot of two ecdfs but my problem is that the two dataframes have different lengths. But by adding values to equalize the length I would change the distribution, therefore my first thought isn't possible. But a ecdf plot with two different dataframes with a different length is forbidden.
daten <- peptidPSMotherExplained[peptidPSMotherExplained$V3!=-1,]
daten <- cbind ( daten , "scoreDistance"= daten$V2-daten$V3 )
daten2 <- peptidPSMotherExplained2[peptidPSMotherExplained2$V3!=-1,]
daten2 <- cbind ( daten2 , "scoreDistance"= daten2$V2-daten2$V3 )
p <- ggplot(daten, aes(x = scoreDistance)) + stat_ecdf()
p <- p + geom_point(aes(x = daten2$lengthDistance))
p
with the normal plot function of R it is possible
plot(ecdf(daten$scoreDistance))
plot(ecdf(daten2$scoreDistance),add=TRUE)
but it looks different to all of my other plots and I dislike this.
Has anybody a solution for me?
Thank you,
Tobias
Example:
df <-data.frame(scoreDifference = rnorm(10,0,12))
df2 <- data.frame(scoreDifference = rnorm(5,-3,9))
plot(ecdf(df$scoreDifference))
plot(ecdf(df2$scoreDifference),add=TRUE)
So how can I achieve this kind of plot in ggplot?
I don't know what geom one should use for such plots, but for combining two datasets you can simply specify the data in a new layer,
ggplot(df, aes(x = scoreDifference)) +
stat_ecdf(geom = "point") +
stat_ecdf(data=df2, geom = "point")
I think, reshaping your data in the right way will probably make ggplot2 work for you:
df <-data.frame(scoreDiff1 = rnorm(10,0,12))
df2 <- data.frame(scoreDiff2 = rnorm(5,-3,9))
library('reshape2')
data <- merge(melt(df),melt(df2),all=TRUE)
Then, with data in the right shape, you can simply go on to plot the stuff with colour (or shape, or whatever you wish) to distinguish the two datasets:
p <- ggplot(daten, aes(x = value, colour = variable)) + stat_ecdf()
Hope this is what you were looking for!?
I am trying to produce something similar to densityplot() from the lattice package, using ggplot2 after using multiple imputation with the mice package. Here is a reproducible example:
require(mice)
dt <- nhanes
impute <- mice(dt, seed = 23109)
x11()
densityplot(impute)
Which produces:
I would like to have some more control over the output (and I am also using this as a learning exercise for ggplot). So, for the bmi variable, I tried this:
bar <- NULL
for (i in 1:impute$m) {
foo <- complete(impute,i)
foo$imp <- rep(i,nrow(foo))
foo$col <- rep("#000000",nrow(foo))
bar <- rbind(bar,foo)
}
imp <-rep(0,nrow(impute$data))
col <- rep("#D55E00", nrow(impute$data))
bar <- rbind(bar,cbind(impute$data,imp,col))
bar$imp <- as.factor(bar$imp)
x11()
ggplot(bar, aes(x=bmi, group=imp, colour=col)) + geom_density()
+ scale_fill_manual(labels=c("Observed", "Imputed"))
which produces this:
So there are several problems with it:
The colours are wrong. It seems my attempt to control the colours is completely wrong/ignored
There are unwanted horizontal and vertical lines
I would like the legend to show Imputed and Observed but my code gives the error invalid argument to unary operator
Moreover, it seems like quite a lot of work to do what is accomplished in one line with densityplot(impute) - so I wondered if I might be going about this in the wrong way entirely ?
Edit: I should add the fourth problem, as noted by #ROLO:
.4. The range of the plots seems to be incorrect.
The reason it is more complicated using ggplot2 is that you are using densityplot from the mice package (mice::densityplot.mids to be precise - check out its code), not from lattice itself. This function has all the functionality for plotting mids result classes from mice built in. If you would try the same using lattice::densityplot, you would find it to be at least as much work as using ggplot2.
But without further ado, here is how to do it with ggplot2:
require(reshape2)
# Obtain the imputed data, together with the original data
imp <- complete(impute,"long", include=TRUE)
# Melt into long format
imp <- melt(imp, c(".imp",".id","age"))
# Add a variable for the plot legend
imp$Imputed<-ifelse(imp$".imp"==0,"Observed","Imputed")
# Plot. Be sure to use stat_density instead of geom_density in order
# to prevent what you call "unwanted horizontal and vertical lines"
ggplot(imp, aes(x=value, group=.imp, colour=Imputed)) +
stat_density(geom = "path",position = "identity") +
facet_wrap(~variable, ncol=2, scales="free")
But as you can see the ranges of these plots are smaller than those from densityplot. This behaviour should be controlled by parameter trim of stat_density, but this seems not to work. After fixing the code of stat_density I got the following plot:
Still not exactly the same as the densityplot original, but much closer.
Edit: for a true fix we'll need to wait for the next major version of ggplot2, see github.
You can ask Hadley to add a fortify method for this mids class. E.g.
fortify.mids <- function(x){
imps <- do.call(rbind, lapply(seq_len(x$m), function(i){
data.frame(complete(x, i), Imputation = i, Imputed = "Imputed")
}))
orig <- cbind(x$data, Imputation = NA, Imputed = "Observed")
rbind(imps, orig)
}
ggplot 'fortifies' non-data.frame objects prior to plotting
ggplot(fortify.mids(impute), aes(x = bmi, colour = Imputed,
group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00"))
note that each ends with a '+'. Otherwise the command is expected to be complete. This is why the legend did not change. And the line starting with a '+' resulted in the error.
You can melt the result of fortify.mids to plot all variables in one graph
library(reshape)
Molten <- melt(fortify.mids(impute), id.vars = c("Imputation", "Imputed"))
ggplot(Molten, aes(x = value, colour = Imputed, group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00")) +
facet_wrap(~variable, scales = "free")