R: ggplot. Axis labels in a boxplot for loop - r

I am using a for loop to create multiple box plots for a large dataset I have (320269 observables of 170 variables).
For this I am using the following code to generate the boxplots:
nm <- names(data)
for (i in 1:(ncol(data)-1)){
print(ggplot(data,aes(as.factor(data$Month),data[c(i)],color=as.factor(data$Month),aes_string("Month",nm[i])))
+ geom_boxplot(outlier.colour="black",outlier.shape=16,outlier.size=1,notch=FALSE))}
The graphs are printed in pdf and the boxplot itself comes out correctly, but something goes wrong with the axis labels.
No matter what I try, I get the x-axis label: as.factor(data$Month), and on the y-axis: data[c(i)], instead of "Month" on the x-axis and the actual column-names from the dataset on the y-axis.
What am I missing?
Your help is much appreciated.

You can specify x and y axis labels by + xlab() and + ylab()
for (i in 1:(ncol(data)-1)){
print(ggplot(data,aes(as.factor(data$Month),data[c(i)],color=as.factor(data$Month)))
+ geom_boxplot(outlier.colour="black",outlier.shape=16,outlier.size=1,notch=FALSE)
+ xlab("Month")
+ ylab(colnames(data)[i])
)
}

Related

Barchart with ggplot 2 y axis labels

I have a little problem with a ggplot barchart.
I wanted to make a barchart with ggplot2 in order to compare my Svolumes for my 4 stocks on a period of few months.
I have two problems:
The first one is that my y axis is wrong. My graph/data seems correct but the y axis don't "follow" as I thought it will contain another scale... I would to have to "total" number of my dataset svolumes, I think here it is writing my svolumes values. I don't know how to explain but I would like the scale corresponding to all of my data on the graph like 10,20,etc until my highest sum of svolumes.
There is my code:
Date=c(rep(data$date))
Subject=c(rep(data$subject))
Svolume=c(data$svolume)
Data=data.frame(Date,Subject,Svolume)
Data=ddply(Data, .(Date),transform,pos=cumsum(as.numeric(Svolume))-(0.5*(as.numeric(Svolume))))
ggplot(Data, aes(x=Date, y=Svolume))+
geom_bar(aes(fill=Subject),stat="identity")+
geom_text(aes(label=Svolume,y=pos),size=3)
and there is my plot:
I helped with the question here
Finally, How could I make the same plot for each months please? I don't know how to get the values per month in order to have a more readable barchart as we can't read anything here...
If you have other ideas for me I would be very glad to take any ideas and advices! Maybe the same with a line chart would be more readable...? Or maybe the same barchart for each stocks ? (I don't know how to get the values per stock either...)
I just found how to do it with lines.... but once again my y axis is wrong, and it's not very readable....
Thanks for your help !! :)
Try adding the following line right before your ggplot function. It looks like your y-axis is in character.
[edit] Incorporate #user20650's comments, add as.character() first then convert to numeric.
Data$Svolume <- as.numeric(as.character(Data$Svolume))
To produce the same plot for each month, you can add the month variable first: Data$Month <- month(as.Date(Date)). Then add facet to your ggplot object.
ggplot(Data, aes(x=Date, y=Svolume) +
...
+ facet_wrap(~ Month)
For example, your bar chart code will be:
Data$Svolume <- as.numeric(as.character(Data$Svolume))
Data$Month <- month(as.Date(Date))
ggplot(Data, aes(x=Date, y=Svolume)) +
geom_bar(aes(fill=Subject),stat="identity") +
geom_text(aes(label=Svolume,y=pos),size=3) +
facet_wrap(~ Month)
and your Line chart code will be:
Data$Svolume <- as.numeric(as.character(Data$Svolume))
Data$Month <- month(as.Date(Date))
ggplot(Data, aes(x=Date, y=Svolume, colour=Subject)) +
geom_line() +
facet_wrap(~ Month)

Plotting each column of a dataframe as one line using ggplot

The whole dataset describes a module (or cluster if you prefer).
In order to reproduce the example, the dataset is available at:
https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0
(54kb file)
You can read as:
test_example <- read.table(file='example_dataset.txt')
What I would like to have in my plot is this
On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.
This is exactly what I want, but the way I achieved this was with the following code:
plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...
As you can see it is not very automated. I thought about putting in a loop, like
columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap( ~ ConditionID, ncol=6) )
That doesn't work.
I found this topic
Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem.
I tried the solution given with the melt() function.
The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:
data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)
I tried using aggregate
aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)
Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.
Can anyone suggest me an approach.
I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.
Thanks
You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:
melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
geom_line(aes(group=paste0(variable, InModule)))
p

Problems making a graphic in ggplot

I an working with ggplot. I want to desine a graphic with ggplot. This graphics is with two continuous variables but I would like to get a graphic like this:
Where x and y are the continuous variables. My problem is I can't get it to show circles in the line of the plot. I would like the plot to have circles for each pair of observations from the continuous variables. For example in the attached graphic, it has a circle for pairs (1,1), (2,2) and (3,3). It is possible to get it? (The colour of the line doesn't matter.)
# dummy data
dat <- data.frame(x = 1:5, y = 1:5)
ggplot(dat, aes(x,y,color=x)) +
geom_line(size=3) +
geom_point(size=10) +
scale_colour_continuous(low="blue",high="red")
Playing with low/high will change the colours.
In general, to remove the legend, use + theme(legend.position="none")

Different Plottypes in facet_grid

once again Im confronted with a complicated ggplot. I want to plot different plottypes within one plot using facet grid.
I hope I can make my point clear using the following example:
I want to produce a plot similar to the first picture but the upper plot should look like the second picture.
I already found the trick using the subset function but I can't add vertical lines to only one plot let alone two or three (or specify the color).
CODE:
a <- rnorm(100)
b <- rnorm(100,8,1)
c <- rep(c(0,1),50)
dfr <- data.frame(a=a,b=b,c=c,d=seq(1:100))
dfr_melt <- melt(dfr,id.vars="d")
#I want only two grids, not three
ggplot(dfr_melt,aes(x=d,y=value)) + facet_grid(variable~.,scales="free")+
geom_line(subset=.(variable=="a")) + geom_line(subset=.(variable=="b"))
#Upper plot should look like this
ggplot(dfr,aes(x=d,y=a)) + geom_line() + geom_line(aes(y=c,color="c"))+
geom_hline(aes(yintercept=1),linetype="dashed")+
geom_hline(aes(yintercept=-2),linetype="dashed")
If I understand your question correctly, you just need to a variable column to dfr in order to allow the faceting to work:
dfr$variable = "a"
ggplot(subset(dfr_melt, variable=="a"),aes(x=d,y=value)) +
facet_grid(variable~.,scales="free")+
geom_line(data=subset(dfr_melt,variable=="a")) +
geom_line(data=subset(dfr_melt, variable=="b")) +
geom_line(data=dfr, aes(y=c, colour=factor(c))) +
geom_hline(aes(yintercept=1),linetype="dashed")+
geom_hline(aes(yintercept=-2),linetype="dashed")
Notice that my plot doesn't have the zig-zig line, this is because I changed:
#This is almost certainly not what you want
geom_line(data=dfr, aes(y=c, colour="c"))
to
#I made c a factor since it only takes the values 0 or 1
geom_line(data=dfr, aes(y=c, colour=factor(c)))
##Alternatively, you could have
geom_line(data=dfr, aes(y=c), colour="red") #or
geom_line(data=dfr, aes(y=c, colour=c)) #or
To my knowledge, you can't put multiple plot types in a single plot using facet.grid(). Your two options, as far as I can see, are
to put empty data in the first facet, so the lines are 'there' but not displayed, or
to combine multiple plots into one using viewports.
I think the second solution is more general, so that's what I did:
#name each of your plots
p2 <- ggplot(subset(dfr_melt, variable=="a"),aes(x=d,y=value)) + facet_grid(variable~.,scales="free")+
geom_line(subset=.(variable=="a")) + geom_line(subset=.(variable=="b"))
#Upper plot should look like this
p1 <- ggplot(dfr,aes(x=d,y=a)) + geom_line() + geom_line(aes(y=c,color="c"))+
geom_hline(aes(yintercept=1),linetype="dashed")+
geom_hline(aes(yintercept=-2),linetype="dashed")
#From Wickham ggplot2, p154
vplayout <- function(x,y) {
viewport(layout.pos.row=x, layout.pos.col=y)
}
require(grid)
png("myplot.png", width = 600, height = 300) #or use a different device, e.g. quartz for onscreen display on a mac
grid.newpage()
pushViewport(viewport(layout=grid.layout(2, 1)))
print(p1, vp=vplayout(1, 1))
print(p2, vp=vplayout(2, 1))
dev.off()
You might need to fiddle a bit to get them to line up exactly right. Turning off the faceting on the upper plot, and moving the legend on the lower plot to the bottom, should do the trick.

Adding individual lines to facet_wrapped plot

I'm trying to use facet_wrap to plot a bunch of dataset as scatterpoints, each with an individual line in them indicating when a specific event happened. However, I haven't been able to get the lines to show up individually in each plot, but rather they all show up in all of the plots. After melting, the data looks like:
names(data) = c("Date","ID", "event_date", "variable", "value")
where I want each plot to be a scatter plot of value ~ Date, and each plot divided up by ID with a vertical line appearing at each "event_date" that shows up for each ID. My best efforts have gotten me to:
p <- qplot(Date, value, data=dat, colour=variable)
p <- p + geom_vline(xintercept=as.numeric(dat$event_date))
p + facet_wrap(~ID)
Which works perfectly except for all of the vertical lines showing up in every subplot. Any suggestions? Reading through the documentation hasn't gotten me anywhere yet.
ggplot(dat, aes(Date,value))+ geom_point() + geom_vline(data=dat,aes(xintercept=as.numeric(event_date))) + facet_wrap(~ID)
Is how I do the same thing using facet_grid(). I'm pretty sure it will work for facet_wrap as well.

Resources