matplot() makes it easy to plot a matrix/two dimensional array by columns (also works on data frames):
a <- matrix (rnorm(100), c(10,10))
matplot(a, type='l')
Is there something similar using ggplot2, or does ggplot2 require data to be melted into a dataframe first?
Also, is there a way to arbitrarily color/style subsets of the matrix columns using a separate vector (of length=ncol(a))?
Maybe a little easier for this specific example:
library(ggplot2)
a <- matrix (rnorm(100), c(10,10))
sa <- stack(as.data.frame(a))
sa$x <- rep(seq_len(nrow(a)), ncol(a))
qplot(x, values, data = sa, group = ind, colour = ind, geom = "line")
The answers to questions posed in the past have generally advised the melt strategy before specifying the group parameter:
require(reshape2); require(ggplot2)
dataL = melt(a, id="x")
qplot(a, x=Var1, y=value, data=dataL, group=Var2)
p <- ggplot(dataL, aes_string(x="Var1", y="value", colour="Var2", group="Var2"))
p <- p + geom_line()
Just somewhat simplifying what was stated before (matrices are wrapped in c() to make them vectors):
require(ggplot2)
a <- matrix(rnorm(200), 20, 10)
qplot(c(row(a)), c(a), group = c(col(a)), colour = c(col(a)), geom = "line")
Related
as the title suggest, I want to plot all columns from my data.frame, but I want to do it in a generic way. All my columns are factor.
Here is my code so far:
nums <- sapply(train_dataset, is.factor) #Select factor columns
factor_columns <- train_dataset[ , nums]
plotList <- list()
for (i in c(1:NCOL(factor_columns))){
name = names(factor_columns)[i]
p <- ggplot(data = factor_columns) + geom_bar(mapping = aes(x = name))
plotList[[i]] <- p
}
multiplot(plotList, cols = 3)
where multiplot function came from here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
And my dataset came from Kaggle (house pricing prediction): https://www.kaggle.com/c/house-prices-advanced-regression-techniques
What I get from my code is the image below, which appears to be the last column badly represented.
This would be the last column well represented:
EDIT:
Using gridExtra as #LAP suggest also doesn't give me a good result. I use this instead of multiplot.
nCol <- floor(sqrt(length(plotList)))
do.call("grid.arrange", c(plotList, ncol=nCol))
but what I get is this:
Again, SaleCondition is the only thing printed and not very well.
PD: I also tried cowplot, same result.
Using tidyr you can do something like the following:
factor_columns %>%
gather(factor, level) %>%
ggplot(aes(level)) + geom_bar() + facet_wrap(~factor, scales = "free_x")
I would like to create a grid of histograms using a loop and ggplot2. Say I have the following code:
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-NULL
for (i in 1:5){
out[[i]]<-ggplot(df, aes(x=df[,i])) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
Note that all of the plots appear, but that they all have the same mean and shape, despite having set each of the columns of df to have different means.
It seems to only plot the last plot (out[[5]]), that is, the loop seems to be reassigning all of the out[[i]]s with out[[5]].
I'm not sure why, could someone help?
I agree with #GabrielMagno, facetting is the way to go. But if for some reason you need to work with the loop, then either of these will do the job.
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-list()
for (i in 1:5){
x = df[,i]
out[[i]] <- ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
or
out1 = lapply(df, function(x){
ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5) })
grid.arrange(out1[[1]],out1[[2]],out1[[3]],out1[[4]],out1[[5]], ncol=2)
I would recommend using facet_wrap instead of aggregating and arranging the plots by yourself. It requires you to specify a grouping variable in the data frame that separates the values for each distribution. You can use the melt function from the reshape2 package to create such new data frame. So, having your data stored in df, you could simply do this:
library(ggplot2)
library(reshape2)
ggplot(melt(df), aes(x = value)) +
facet_wrap(~ variable, scales = "free", ncol = 2) +
geom_histogram(binwidth = .5)
That would give you something similar to this:
I'm trying to make a plot in R from a data frame with several columns and I'd like to have ggplot plot one of the columns as points, and the other several as lines of different colors.
I can find examples about how to make each of these plots separately, but I can't seem to find the command to combine the plots...
Thanks for any help you can provide.
Like this:
dat <- data.frame(points.x = c(1:10), points.y = c(1:10),
lines.x = c(10:1), lines.y = c(1:10))
ggplot(dat, aes(points.x, points.y)) + geom_point() +
geom_line(aes(lines.x,lines.y))
In order to plot several different columns as lines of different colors, use the melt function from the reshape2 package.
For example:
df <- data.frame(A=1:10, B=rnorm(10), C=rnorm(10), D=rnorm(10))
melted <- melt(df, id="A")
ggplot(melted[melted$variable!="B",], aes(A, value, color=variable)) + geom_line() +
geom_point(data=melted[melted$variable=="B",])
At the moment I`m writing my bachelor thesis and all of my plots are created with ggplot2. Now I need a plot of two ecdfs but my problem is that the two dataframes have different lengths. But by adding values to equalize the length I would change the distribution, therefore my first thought isn't possible. But a ecdf plot with two different dataframes with a different length is forbidden.
daten <- peptidPSMotherExplained[peptidPSMotherExplained$V3!=-1,]
daten <- cbind ( daten , "scoreDistance"= daten$V2-daten$V3 )
daten2 <- peptidPSMotherExplained2[peptidPSMotherExplained2$V3!=-1,]
daten2 <- cbind ( daten2 , "scoreDistance"= daten2$V2-daten2$V3 )
p <- ggplot(daten, aes(x = scoreDistance)) + stat_ecdf()
p <- p + geom_point(aes(x = daten2$lengthDistance))
p
with the normal plot function of R it is possible
plot(ecdf(daten$scoreDistance))
plot(ecdf(daten2$scoreDistance),add=TRUE)
but it looks different to all of my other plots and I dislike this.
Has anybody a solution for me?
Thank you,
Tobias
Example:
df <-data.frame(scoreDifference = rnorm(10,0,12))
df2 <- data.frame(scoreDifference = rnorm(5,-3,9))
plot(ecdf(df$scoreDifference))
plot(ecdf(df2$scoreDifference),add=TRUE)
So how can I achieve this kind of plot in ggplot?
I don't know what geom one should use for such plots, but for combining two datasets you can simply specify the data in a new layer,
ggplot(df, aes(x = scoreDifference)) +
stat_ecdf(geom = "point") +
stat_ecdf(data=df2, geom = "point")
I think, reshaping your data in the right way will probably make ggplot2 work for you:
df <-data.frame(scoreDiff1 = rnorm(10,0,12))
df2 <- data.frame(scoreDiff2 = rnorm(5,-3,9))
library('reshape2')
data <- merge(melt(df),melt(df2),all=TRUE)
Then, with data in the right shape, you can simply go on to plot the stuff with colour (or shape, or whatever you wish) to distinguish the two datasets:
p <- ggplot(daten, aes(x = value, colour = variable)) + stat_ecdf()
Hope this is what you were looking for!?
I have the following script that emulates the type of data structure I have and analysis that I want to do on it,
library(ggplot2)
library(reshape2)
n <- 10
df <- data.frame(t=seq(n)*0.1, a =sort(rnorm(n)), b =sort(rnorm(n)),
a.1=sort(rnorm(n)), b.1=sort(rnorm(n)),
a.2=sort(rnorm(n)), b.2=sort(rnorm(n)))
head(df)
mdf <- melt(df, id=c('t'))
## head(mdf)
levels(mdf$variable) <- rep(c('a','b'),3)
g <- ggplot(mdf,aes(t,value,group=variable,colour=variable))
g +
stat_smooth(method='lm', formula = y ~ ns(x,3)) +
geom_point() +
facet_wrap(~variable) +
opts()
What I would like to do in addition to this is plot the first derivative of the smoothing function against t and against the factors, c('a','b'), as well. Any suggestions how to go about this would be greatly appreciated.
You'll have to construct the derivative yourself, and there are two possible ways for that. Let me illustrate by using only one group :
require(splines) #thx #Chase for the notice
lmdf <- mdf[mdf$variable=="b",]
model <- lm(value~ns(t,3),data=lmdf)
You then simply define your derivative as diff(Y)/diff(X) based on your predicted values, as you would do for differentiation of a discrete function. It's a very good approximation if you take enough X points.
X <- data.frame(t=seq(0.1,1.0,length=100) ) # make an ordered sequence
Y <- predict(model,newdata=X) # calculate predictions for that sequence
plot(X$t,Y,type="l",main="Original fit") #check
dY <- diff(Y)/diff(X$t) # the derivative of your function
dX <- rowMeans(embed(X$t,2)) # centers the X values for plotting
plot(dX,dY,type="l",main="Derivative") #check
As you can see, this way you obtain the points for plotting the derivative. You'll figure out from here how to apply this to both levels and combine those points to the plot you like. Below the plots from this sample code :
Here's one approach to plotting this with ggplot. There may be a more efficient way to do it, but this uses the manual calculations done by #Joris. We'll simply construct a long data.frame with all of the X and Y values while also supplying a variable to "facet" the plots:
require(ggplot2)
originalData <- data.frame(X = X$t, Y, type = "Original")
derivativeData <- data.frame(X = dX, Y = dY, type = "Derivative")
plotData <- rbind(originalData, derivativeData)
ggplot(plotData, aes(X,Y)) +
geom_line() +
facet_wrap(~type, scales = "free_y")
If data is smoothed using smooth.spline, the derivative of predicted data can be specified using the argument deriv in predict. Following from #Joris's solution
lmdf <- mdf[mdf$variable == "b",]
model <- smooth.spline(x = lmdf$t, y = lmdf$value)
Y <- predict(model, x = seq(0.1,1.0,length=100), deriv = 1) # first derivative
plot(Y$x[, 1], Y$y[, 1], type = 'l')
Any dissimilarity in the output is most likely due to differences in the smoothing.