R_Multiple plots on same figure using a for loop - r

I have 2 data frames, mydf1 and mydf2
> mydf1
id a b c
1 1 2 10 2
2 2 3 11 4
3 3 5 12 6
4 4 7 13 8
5 5 8 14 10
> mydf2
id a b c
1 1 4 20 4
2 2 6 22 8
3 3 10 24 12
4 4 14 26 16
5 5 16 28 20
I would like to plot variables a,b & c against id (sample graphs is given below). I want similar graphs for variables b and c too and I want to do it in a loop and then export it to a local folder. So, I am using the following code
for (i in 2:4) {
jpeg(paste("C:/Data/myplot",i,".jpg"))
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf)[i])
points(mydf2[,1],mydf2[,i],pch=2)
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
dev.off()
}
My problem is that I would like to get all three different graphs, (id vs a (mydf1 and mydf2) , id vs b(mydf1 and mydf2), id vs c(mydf1 and mydf2) in one figure.(something like 2 along the first row of the figure and the third one in the second row with legend) I tried the following
jpeg("C:/Data/myplot.jpg")
par(mfrow=c(2,2))
for (i in 2:4) {
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf)[i])
points(mydf2[,1],mydf2[,i],pch=2)
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
dev.off()
}
But it didn't work. Any suggestion to do this?
p.s: This is the simplified version of my task. Actually I have hundreds of columns, that's why I am using a loop operation
Sample plot id vs a (mydf1 and mydf2) plotted on the same graph

It is unclear what you are trying to do. Do you want 2 plots, one for mydf1 and one for mydf2 or all on one figure? If two panels, you should change to mfrow=c(2,1) instead of c(2,2) which is currently making 4 panels?
If you want them all on a single plot, then remove the par(mfrow... line.
Then within the plots, you are plotting the first series from mydf1 and the other two series from mydf2. Is that actually what you want?
Using base graphics, you should move your plot line outside the loop so it is done once, change the loop to start at 3, and then keep the points statements inside the loop. Alternatively, you could put an if statement inside the loop to see if it is the first time.
You also have a typo in the plot statement with mydf (no number).
And move your dev.off() outside the loop so it only closes the figure once.
Here is some code that generates a single-panel plot, and you should be able to modify it to work for your desired output...
jpeg("myplot.jpg")
for (i in 2:4) {
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
if (i==2){
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf1)[i])
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
}
else{
points(mydf2[,1],mydf2[,i],pch=2)
}
}
dev.off()
EDIT: After your clarified question, I think the only problem is that dev.off() should be outside the loop. (I recommend PNG or PDF instead of JPEG for any plot worth presenting....)
png("myplot.png")
par(mfrow=c(2,2))
for (i in 2:4) {
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf1)[i])
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
points(mydf2[,1],mydf2[,i],pch=2)
}
dev.off()

I'd do something like
mydf1$g <- 1
mydf2$g <- 2
d3 <- rbind(mydf1, mydf2)
library(reshape2)
d3 <- melt(d3, id.vars = c('id', 'g'))
library(ggplot2)
ggplot(d3, aes(x=id, y=value)) +
geom_point(aes(colour = as.factor(g), shape = variable))
or using facets
ggplot(d3, aes(x=id, y=value)) +
geom_point(aes(colour = as.factor(g))) +
facet_wrap(~variable)
to finally export it
ggsave(file = paste0(tempdir(), 'myplot.png'),
last_plot()
)

Related

Plot aligned barplots in the same graph

I am trying to use the R barplot function to plot the following array on the same graph:
ID 1 2 3 4 5 6 7 8
HeL 0 2 1 4 2 3 2 4
CaC 2 0 0 2 1 5 7 8
NIH 1 2 5 6 3 5 7 9
I would need to have the barplot of each row having its own y-axis, but the x-axis should be common for all rows. What I have achieved so far, is to read the matrix from the file "rna.tab" and then plot each row separately:
dat <- read.table ("rna.tab", row.names=1, header=TRUE)
barplot (as.matrix (dat[,1]))
barplot (as.matrix (dat[,2]))
barplot (as.matrix (dat[,3]))
but I didn't succeed in plotting them all together.
Thanks in advance-
Arturo
Is this what you are looking for? If it isn't could you please make a manual example of what you want and post the image?
par(mfrow = c(ncol(dat),1), mar = c(2.5,4,1,1))
apply(dat, 2, barplot, beside = TRUE)
par(mfrow = c(1,1))
The first par say you want a grid of plots with as many rows as there are columns of dat and 1 column, and changes the margins of the plot to be appropriate. The apply function makes a barplot for eash column of dat and beside = TRUE puts the columns next to each other. The next par resets the plotting grid to a single graph so next time you need to plot something you aren't just making a bunch of tiny plots.
Thanks Barker for the fix and sorry for taking so long to get back to you, but I was sick for almost one week.
Your code works great, the only thing is that, since I need to plot the rows and not the columns, it should be:
apply(dat, 1, barplot, beside = TRUE)
Sorry for not being clear about this point.
I have just one last question, if you don't mind. Usually my real life matrix is 6000*30. This means that I have to plot 30 rows.
Usually I save the image to disk:
png ("plot.png")
par(mfrow = c(ncol(dat),1), mar = c(2.5,4,1,1))
apply(dat, 1, barplot, beside = TRUE)
dev.off ()
When I do this, I get only the plot of the last 4 rows in the file "plot.png", instead of the plot of all rows. Also, since the x-axis is the same for all plots, would be possible to draw it only at the end?

Scatterplot with R from text file with log scale

I have data saved in a text file with couple thousands line. Each line only has one value. Like this
52312
2
3
4
5
7
9
4
5
3
The first value is always roughly 10.000 times bigger than all the other values.
I can read in the data with data<-read.table("data.txt")
When I just use plot(data) all the data have the same y-value, resulting in a line, where the x values just represent the values given from the data.
What I want, however, is that the x-value represents the linenumber and y-value the actual data. So for the above example my values would be (1,52312), (2,2), (3,3), (4,4), (5,5), (6,7), (7,9), (8,4), (9,5), (10,3).
Also, since the first value is way higher than all the other values, I'd like to use a log scale for the y-axis.
Sorry, very new to R.
set.seed(1000)
df = data.frame(a=c(9999999,sample(2:78,77,replace = F)))
plot(x=1:nrow(df), y=log(df$a))
i) set.seed(1000) helps you reproduce the same random numbers from sample() each time you run this code. It makes code reproducible.
ii) type ?sample in R console for documentation.
iii) since you wanted the x-axis to be linenumber - I create it using ":" operator. 1:3 = 1,2,3. Similarily I created a "id" index using 1:nrow(df) which will create based on the dimension of your data.
iv) for log ,just use it simple :). read more about ?plot and its parameters
Try this:
df
x y
1 1 52312
2 2 2
3 3 3
4 4 4
5 5 5
6 6 7
7 7 9
8 8 4
9 9 5
10 10 3
library(ggplot2)
ggplot(df, aes(x, y)) + geom_point(size=2) + scale_y_log10()

Creating stacked barplots in R using different variables

I am a novice R user, hence the question. I refer to the solution on creating stacked barplots from R programming: creating a stacked bar graph, with variable colors for each stacked bar.
My issue is slightly different. I have 4 column data. The last column is the summed total of the first 3 column. I want to plot bar charts with the following information 1) the summed total value (ie 4th column), 2) each bar is split by the relative contributions of each of the three column.
I was hoping someone could help.
Regards,
Bernard
If I understood it rightly, this may do the trick
the following code works well for the example df dataframe
df <- a b c sum
1 9 8 18
3 6 2 11
1 5 4 10
23 4 5 32
5 12 3 20
2 24 1 27
1 2 4 7
As you don't want to plot a counter of variables, but the actual value in your dataframe, you need to use the goem_bar(stat="identity") method on ggplot2. Some data manipulation is necessary too. And you don't need a sum column, ggplot does the sum for you.
df <- df[,-ncol(df)] #drop the last column (assumed to be the sum one)
df$event <- seq.int(nrow(df)) #create a column to indicate which values happaned on the same column for each variable
df <- melt(df, id='event') #reshape dataframe to make it readable to gpglot
px = ggplot(df, aes(x = event, y = value, fill = variable)) + geom_bar(stat = "identity")
print (px)
this code generates the plot bellow

Histograms in R with a "more" categorie, similar to MS Excel

Consider the following frequency data:
> table(income)
income
3 5 6 7 8 5000
2 7 2 2 2 1
When I type >hist(income) I get the following histogram
So as you can see, the fact that most income values are concentrated around 5 and there is one value quite distant from the others makes the histogram not look very good. MS Excel can consider the 5000 value as of another category, so the data would like this instead:
> table(income)
income
3 5 6 7 8 more
2 7 2 2 2 1
So plotting this as a histogram would look much better, so you can see the frequency within a shorter range:
Is there anyway to do this either with the hist() function or others functions from lattice or ggplot2? I do however, don't want to overwrite the values that exceed a certain threshold, so as I do lose any information.
Thanks a lot!
Data generation:
income <- c(rep(3,2), rep(5,7), rep(6,2), rep(7,2), rep(8,2), 5000)
Function for preparing data for plotting:
nice.data <- function(x, threshold=10){
x[x>threshold] <- "More"
x
}
Plotting:
library(ggplot2)
ggplot() + geom_histogram(aes(x=nice.data(income))) + xlab("Income")
Result:

Combining ggplot with ddply

I have sample data test.data as follows.
income expend id
9142.7 1576.2 1
23648.75 2595 2
9014.25 156 1
4670.4 604.4 3
6691.4 3654.4 3
14425.2 66 2
8563.45 1976.2 2
2392 6 1
7915.95 619.2 3
4424.2 504.2 2
I first use ddply to get the mean income and expend for each id
library(plyr)
group<-ddply(test.data, .id,summarize, income.mean=mean(income),expend.mean=mean(expend))
Now, I use the plot function from ggplot2 to plot income.mean and expend.mean by id.
library (ggplot2)
plot.income<-qplot(id,income.mean,data=group)
plot.expend<-qplot(id,expend.mean,data=group)
While the above code runs without any error, I am looking for the efficient way to combine qplot function in ddply or vice versa. Also, if I need to combine both these plots how do I do that?
Thanks .
I think what you're trying to get at is going to require switching from the 'qplot' function to the 'ggplot' function. Including the graphing functions inside your 'ddply' function is not going to be very pretty, and vice versa. You're better leaving them separate, so I'm going to just focus on combining the graphs. There are two good (in my opinion) ways to do this:
Option 1: Just do both plots as separate geometries on the same 'ggplot' object. This isn't two hard to do, and works like this:
ggplot(group) + geom_point(aes(x=id, y=income.mean), colour="red") + geom_point(aes(x=id, y=expend.mean), colour="blue")
This is a fast option and gets the job done with minimal computation. However, it requires that you specify a new geometry for each column. In your sample data, this isn't an issue, but in many cases, you want to do this with code, instead of doing it by hand.
Option 2: Reshape your data to combine both sets inside of one plot. Then, we can specify groupings by coloring by the variable
library(reshape2)
plot_Data <- melt(group, id="id")
# Output of plot_Data
# id variable value
# 1 1 income.mean 6849.650
# 2 2 income.mean 12765.400
# 3 3 income.mean 6425.917
# 4 1 expend.mean 579.400
# 5 2 expend.mean 1285.350
# 6 3 expend.mean 1626.000
ggplot(plot_Data, aes(x=id, y=value, col=variable)) + geom_point()
The disadvantage of this method is that we are doing a lot more computation, so large complicated data frames may become slow to process. However, the advantage (and this is huge) is that we don't have to know what columns existed in the data frame we are plotting. Everything is sorted, colored, and plotted without our intervention, so we can use this flexibly for just about anything.
You should be able to adjust from here to suit your needs.
To combine both plots I had to throw in the reshape2 package to melt the data:
library(ggplot2)
library(plyr)
library(reshape2)
test.data <- read.table(text="income expend id
9142.7 1576.2 1
23648.75 2595 2
9014.25 156 1
4670.4 604.4 3
6691.4 3654.4 3
14425.2 66 2
8563.45 1976.2 2
2392 6 1
7915.95 619.2 3
4424.2 504.2 2", header=TRUE)
qplot(data=melt(ddply(test.data, .(id), colwise(mean)), id.vars="id"), x=id, y=value, colour=variable)
Well, your question is not very precise because we don't know what you exactly want to do. But here is a guess :
d <- read.table(textConnection("income expend id
9142.7 1576.2 1
23648.75 2595 2
9014.25 156 1
4670.4 604.4 3
6691.4 3654.4 3
14425.2 66 2
8563.45 1976.2 2
2392 6 1
7915.95 619.2 3
4424.2 504.2 2"), header=TRUE)
library(reshape2)
d2 <- melt(d, id.var="id")
ggplot(data=d2, aes(x=id,y=value)) + stat_summary(fun.y="mean", geom="bar") + facet_grid(.~variable)
Will give :

Resources