R statistical Programing - r

I am trying to write R codes for the histogram plot and save each histogram separate file using the following command.
I have a data set "Dummy" and i want to plot each histogram by a column name and there will be 100 histogram plots in total...
I have the following R codes that draws the each Histogram...
library(ggplot2)
i<-1
for(i in 1:100)
{
jpeg(file="d:/R Data/hist.jpeg", sep=",")
hist(Dummy$colnames<-1, ylab= "Score",ylim=c(0,3),col=c("blue"));
dev.off()
i++
if(i>100)
break()
}

As a start, let's get your for loop into R a little better by taking out the lines trying to change i, your for loop will do that for you.
We'll also include a file= value that changes with each loop run.
for(i in 1:100)
{
jpeg(file = paste0("d:/R Data/hist", i, ".jpeg"))
hist(Dummy[[i]], ylab = "Score", ylim = c(0, 3), col = "blue")
dev.off()
}
Now we just need to decide what you want to plot. Will each plot be different? How will each plot extract the data it needs?
EDIT: I've taken a stab at what you're trying to do. Are you trying to take each of 100 columns from the Dummy dataset? If so, Dummy[[i]] should achieve that (or Dummy[,i] if Dummy is a matrix).

Related

plot density of multiple csv files of different size in R

I have multiple csv files, each with a single column.
I want to read them and plot their density distribution in a single plot.
can anyone help me?
There are answers elsewhere about
reading multiple csv files so I will mainly concentrate on the density plotting part. Since you did not provide any data, I will use the built-in iris data to create some example files. This first step is to make a reusable example. I am assuming that you already have the data on the disk and have a list of the file names.
## Create some test data
FileNames = paste(names(iris[,1:4]), ".csv", sep="")
for(i in 1:4) {
write.csv(iris[,i], FileNames[i], row.names=FALSE)
}
So, on to the density plots. There is one small sticky point. Each of the different density plots will cover a different range of x and y values. If you want them all in one plot, you will need to leave enough room in your plot to hold them all. The code below first computes that range, then makes the plots.
## Read in all of the data from csv
DataList = list()
for(i in seq_along(FileNames)) {
DataList[[i]] = read.csv(FileNames[i], header=T)[[1]]
}
## Find the range that we will need to include all plots
XRange = range(DataList[[1]])
YRange = c(0,0)
for(i in seq_along(DataList)) {
Rx = range(DataList[[i]])
XRange[1] = min(XRange[1], Rx[1])
XRange[2] = max(XRange[2], Rx[2])
YRange[2] = max(density(DataList[[i]], na.rm=T)$y, YRange[2])
}
## Now make all of the plots
plot(density(DataList[[1]], na.rm=T), xlim=XRange, ylim=YRange,
xlab=NA, ylab=NA, main="Density Plots")
for(i in seq_along(DataList)) {
lines(density(DataList[[i]], na.rm=T), col=i)
}
legend("topright", legend=FileNames, lty=1, col=1:4, bty='n')

Errors in R Histogram

Can anyone understand why this block of code isn't producing a histogram? Here is the code:
incremental <- c()
for (i in 1:1000) {
set.seed(42)
avg_2 = mean(runif(100))
incremental <- rbind(incremental, c(avg_2))
}
incremental <- as.numeric(incremental)
hist(incremental, main = "Histogram of Averages From For Loop",
xlab = "Averages")
Don't worry about the set.seed, it is part of the exercise. All the data points will be the same, but nothing shows up on the histogram. Why is this so? Here is a screenshot of the histogram:
Actually, you are just looking at a plot with one big bar. It's very hard for R (or anyone) to guess where to create breaks if you only observe one value. Maybe you want something like this:
hist(incremental, main = "Histogram of Averages From For Loop",
xlab = "Averages",
breaks=seq(0,1, length.out=10))
This tells hist() to create 10 breaks in the range from 0 to 1.

Prevent a plot to be overwrite in a for loop

I am trying to create three different plots in a for loop and then plotting them together in the same graph.
I know that some questions regarding this topic have already been asked. But I do not know what I am doing wrong. Why is my plot being overwritten.
Nevertheless, I tried both solutions (creating a list or using assign function) and I do not know why I get my plot overwriten at the end of the loop.
So, the first solution is to create a list:
library(gridExtra)
library(ggplot2)
out<-list()
for (i in c(1,2,4)){
print(i)
name= paste("WT.1",colnames(WT.1#meta.data[i]), sep=" ")
print(name)
out[[length(out) + 1]] <- qplot(NEW.1#meta.data[i],
geom="density",
main= name)
print(out[[i]])
}
grid.arrange(out[[1]], out[[2]], out[[3]], nrow = 2)
When I print the plot inside the loop, I get what I want...but of course they are not together.
First Plot
When I plot them all together at the end, I get the same plot for all of the three: the last Plot I did.
All together
This is the second option: assign function. I have exactly the same problem.
for ( i in c(1,2,4)) {
assign(paste("WT.1",colnames(WT.1#meta.data[i]),sep="."),
qplot(NEW.1#meta.data[i],geom="density",
main=paste0("WT.1",colnames(WT.1#meta.data[i]))))
}
You're missing to dev.off inside the loop for every iteration. Reproducible code below:
library(gridExtra)
library(ggplot2)
out<-list()
for (i in c(1,2,3)){
print(i)
out[[i]] <- qplot(1:100, rnorm(100), colour = runif(100))
print(out[[i]])
dev.off()
}
grid.arrange(out[[1]], out[[2]], out[[3]], nrow = 2)

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

Graphic of binary variable in R

I would like to plot a simple graphic. I have a dat set with n rowns and k columns, in which each row has a a sequence of 0 and 1. I would like to plot exactly this sequence for all rows.
Actually I want to reproduce the figure 24.1, p. 516, of Gelman and Hill's book (Data aAnalysis Using Regression and Multilevel/Hierarchical Models). I suspect that he made the graphic in Latex, but it seems quite ridiculous that I'm not able to repplicate this simple graphic in R. The figue is something like this. As you can see from the link, the "ones" are replaced by "S" and "zeros" by ".". It's a simple graphic, but it shows each individual response by time.
I would go with a formatted text output using sprintf. Much cleaner and simpler. If you still want a plot, you could go with the following:
Given matrix tbl containing your data:
tbl <- matrix(data=rep(0:1,25), nrow=5)
You can generate a plot as:
plot(1, 1, xlim=c(1,dim(tbl)[2]+.5), ylim=c(0.5,dim(tbl)[1]), type="n")
lapply(1:dim(tbl)[1], function(x) {
text(x=c(1:dim(tbl)[2]), y=rep(x,dim(tbl)[2]), labels=tbl[x,])
})
Using this as a base you can play around with the text and plot args to stylize the plot the way you wish.
Here are two possible solutions, based on fake data generated with this helper function:
generate.data <- function(rate=.3, dim=c(25,25)) {
tmp <- rep(".", prod(dim))
tmp[sample(1:prod(dim), ceiling(prod(dim)*rate))] <- "S"
m <- matrix(tmp, nr=dim[1], nc=dim[2])
return(m)
}
Text-based output
x <- generate.data()
rownames(x) <- colnames(x) <- 1:25
capture.output(as.table(x), file="res.txt")
The file res.txt include a pretty-printed version of the console output; you can convert it to pdf using any txt to pdf converter (I use the one from PDFlib). Here is a screenshot of the text file:
Image-based output
First, here is the plotting function I used:
make.table <- function(x, labels=NULL) {
# x = matrix
# labels = list of labels for x and y
coord.xy <- expand.grid(x=1:nrow(x), y=1:ncol(x))
opar <- par(mar=rep(1,4), las=1)
plot.new()
plot.window(xlim=c(0, ncol(x)), ylim=c(0, nrow(x)))
text(coord.xy$x, coord.xy$y, c(x), adj=c(0,1))
if (!is.null(labels)) {
mtext(labels[[1]], side=3, line=-1, at=seq(1, ncol(x)), cex=.8)
mtext(labels[[2]], side=2, line=-1, at=seq(1, nrow(x)), cex=.8, padj=1)
}
par(opar)
}
Then I call it as
make.table(x, list(1:25, 1:25))
and here is the result (save it as png, pdf, jpg, or whatever).
As far as I can see, this is a text table. I am wondering why you want to make it a graph? Anyway, quick solutions are (either way)
make the text table (by programming or typing) and make its screenshot and embed the image into the plot.
make a blank plot and put the text on the plot by programming R with "text" function. For more info on "text", refer to http://cran.r-project.org/doc/contrib/Lemon-kickstart/kr_adtxt.html

Resources