R, graph of binomial distribution - r

I have to write own function to draw the density function of binomial distribution and hence draw
appropriate graph when n = 20 and p = 0.1,0.2,...,0.9. Also i need to comments on the graphs.
I tried this ;
graph <- function(n,p){
x <- dbinom(0:n,size=n,prob=p)
return(barplot(x,names.arg=0:n))
}
graph(20,0.1)
graph(20,0.2)
graph(20,0.3)
graph(20,0.4)
graph(20,0.5)
graph(20,0.6)
graph(20,0.7)
graph(20,0.8)
graph(20,0.9)
#OR
graph(20,scan())
My first question : is there any way so that i don't need to write down the line graph(20,p) several times except using scan()?
My second question :
I want to see the graph in one device or want to hit ENTER to see the next graph. I wrote
par(mfcol=c(2,5))
graph(20,0.1)
graph(20,0.2)
graph(20,0.3)
graph(20,0.4)
graph(20,0.5)
graph(20,0.6)
graph(20,0.7)
graph(20,0.8)
graph(20,0.9)
but the graph is too tiny. How can i present the graphs nicely with giving head line n=20 and p=the value which i used to draw the graph?[though it can be done by writing mtext() after calling the function graphbut doing so i have to write a similar line few times. So i want to do this including in function graph. ]
My last question :
About comment. The graphs are showing that as the probability of success ,p is increasing the graph is tending to right, that is , the graph is right skewed.
Is there any way to comment on the graph using program?

Here a job of mapply since you loop over 2 variables.
graph <- function(n,p){
x <- dbinom(0:n,size=n,prob=p)
barplot(x,names.arg=0:n,
main=sprintf(paste('bin. dist. ',n,p,sep=':')))
}
par(mfcol=c(2,5))
mapply(graph,20,seq(0.1,1,0.1))

Plotting base graphics is one of the times you often want to use a for loop. The reason is because most of the plotting functions return an object invisibly, but you're not interested in these; all you want is the side-effect of plotting. A loop ignores the returned obects, whereas the *apply family will waste effort collecting and returning them.
par(mfrow=c(2, 5))
for(p in seq(0.1, 1, len=10))
{
x <- dbinom(0:20, size=20, p=p)
barplot(x, names.arg=0:20, space=0)
}

Related

How to make multiple plots with a for loop?

I was experimenting with the waffle package in r, and was trying to use a for loop to make multiple plots at once but was not able to get my code to work. I have a dataset with values for each year of renewables,and since it is over 40 years of data, was looking for a simple way to plot these with a for loop rather than manyally year by year. What am I doing wrong?
I have it from 1:16 as an experiment to see if it would work, although in reality I would do it for all the years in my dataset.
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If I get your question correctly you want to plot all the 16 iterations in a same panel? You can parametrise your plot window to be divided into 16 smaller plots using par(mfrow = c(4,4)) (creating a 4 by 4 matrix and plotting into each cells recursively).
## Setting the graphical parameters
par(mfrow = c(4,4))
## Running the loop normally
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If you need more plots (e.g. 40) you can increase the numbers in the graphical parameters (e.g. par(mfrow = c(6,7))) but that will create really tiny plots. One solution is to do it in multiple loops (for(i in 1:16); for(i in 17:32); etc.)
UPDATE: The code simply wasn't plotting anything when i tried putting in anything above one value (ex. 1:16) or a letter, both in terms of separate plots or many in one plot window (which I think perhaps waffle does not support in the same way as regular plots). In the end, I managed by making it into a function, although I'm still not sure why my original method wouldn't work if this did. See the code that worked below. I also tweaked it a bit, adding ggsave for example.
#function
waffling <- function(x){
renperc<-islren$Value[x]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"), title="",
xlab=islren$TIME[x])
ggsave(file=paste0("plot_", x,".png"))}
for(i in 1:57){
waffling(i)
}

R - plotting points on the same plot in for loop

lambda <- runif(10,min=0,max=3)
mean(lambda)
for (i in 1:10){
N <- rpois(i,mean(lambda))
mean(N)
plot(i,mean(N))
}
Hi all, this is a really simple R code block I have here. I am basically trying to create a plot where I can see how the mean of the poisson distribution is changing as I increase its iteration using the rpois function. I would like to post these values (mean(N)) all on the same graph so I can see the change but I am not quite sure how to do that.
I have been googling a lot and I came across qqplot or so but I just started using R few days ago and I have having a lot of trouble.
Any insights would be helpful
You can use the points function once a plot has been called:
lambda <- runif(10,min=0,max=3)
mean(lambda)
## First plot
N <- rpois(1,mean(lambda))
plot(1,mean(N), xlim = c(1,10))
## Subsequent points
for (i in 2:10){
N <- rpois(i,mean(lambda))
points(i,mean(N))
}

How to only change parameters for "lower" plots in the ggpairs function from GGally package

I have the following example
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
Resulting in a really nice figure:
But my problem is that in the real dataset I have to many points whereby I would like to change the parameters for the point geom. I want to reduce the dot size and use a lower alpha value. I can however not doe this with the "param" option it applies to all plot - not just the lower one:
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
params=c(alpha=1/10),
axisLabels='show'
)
resulting in this plot:
Is there a way to apply parameters to only "lower" plots - or do I have to use the ability to create custom plots as suggested in the topic How to adjust figure settings in plotmatrix?
In advance - thanks!
There doesn't seem to be any elegant way to do it, but you can bodge it by writing a function to get back the existing subchart calls from the ggally_pairs() object and then squeezing the params in before the last bracket. [not very robust, it'll only work for if the graphs are already valid]
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
g<-ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
add_p<-function(g,i,params){
side=length(g$columns) # get number of cells per side
lapply(i,function(i){
s<-as.character(g$plots[i]) # get existing call as a template
l<-nchar(s)
p<-paste0(substr(s,1,l-1),",",params,")") # append params before last bracket
r<-i%/%side+1 # work out the position on the grid
c<-i%%side
array(c(p,r,c)) # return the sub-plot and position data
})
}
rep_cells<-c(4,7,8)
add_params<-"alpha=0.3, size=0.1, color='red'"
ggally_data<-g$data # makes sure that the internal parameter picks up your data (it always calls it's data 'ggally_data'
calls<-add_p(g,rep_cells,params=add_params) #call the function
for(i in 1:length(calls)){g<-putPlot(g,calls[[i]][1],as.numeric(calls[[i]][2]),as.numeric(calls[[i]][3]))}
g # call the plot

Trying to determine why my heatmap made using heatmap.2 and using breaks in R is not symmetrical

I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. My matrix is symmetrical.
Here is a copy of the data-set I am using after it is run through pearson:DataSet
Here is the complete process that I am following to generate these graphs: Generate a distance matrix using some correlation in my case pearson, then take that matrix and pass it to R and run the following code on it:
library(RColorBrewer);
library(gplots);
library(MASS);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
# location <- args[2];
# setwd(args[2]);
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
mycol <- c("blue","white","red")
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
#colors <- colorpanel(75,"midnightblue","mediumseagreen","yellow")
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
dev.off()
The issue I am having is once I use breaks to help me control the color separation the heatmap no longer looks symmetrical.
Here is the heatmap before I use breaks, as you can see the heatmap looks symmetrical:
Here is the heatmap when breaks are used:
I have played with the cutoff's for the sequences to make sure for instance one sequence does not end exactly where the other begins, but I am not able to solve this problem. I would like to use the breaks to help bring out the clusters more.
Here is an example of what it should look like, this image was made using cluster maker:
I don't expect it to look identical to that, but I would like it if my heatmap is more symmetrical and I had better definition in terms of the clusters. The image was created using the same data.
After some investigating I noticed was that after running my matrix through heatmap, or heatmap.2 the values were changing, for example the interaction taken from the provided data set of
Pacdh-2
and
pegg-2
gave a value of 0.0250313 before the matrix was sent to heatmap.
After that I looked at the matrix values using result$carpet and the values were then
-0.224333135
-1.09805379
for the two interactions
So then I decided to reorder the original matrix based on the dendrogram from the clustered matrix so that I was sure that the values would be the same. I used the following stack overflow question for help:
Order of rows in heatmap?
Here is the code used for that:
rowInd <- rev(order.dendrogram(result$rowDendrogram))
colInd <- rowInd
data_ordered <- matrix_a[rowInd, colInd]
I then used another program "matrix2png" to draw the heatmap:
I still have to play around with the colors but at least now the heatmap is symmetrical and clustered.
Looking into it even more the issue seems to be that I was running scale(matrix_a) when I change my code to just be mtscaled <- as.matrix(matrix_a) the result now looks symmetrical.
I'm certainly not the person to attempt reproducing and testing this from that strange data object without code that would read it properly, but here's an idea:
..., col=bluered(20)[4:20], ...
Here's another though which should return the full rand of red which tha above strategy would not:
shift.BR<- colorRamp(c("blue","white", "red"), bias=0.5 )((1:16)/16)
heatmap.2( ...., col=rgb(shift.BR, maxColorValue=255), .... )
Or you can use this vector:
> rgb(shift.BR, maxColorValue=255)
[1] "#1616FF" "#2D2DFF" "#4343FF" "#5A5AFF" "#7070FF" "#8787FF" "#9D9DFF" "#B4B4FF" "#CACAFF" "#E1E1FF" "#F7F7FF"
[12] "#FFD9D9" "#FFA3A3" "#FF6C6C" "#FF3636" "#FF0000"
There was a somewhat similar question (also today) that was asking for a blue to red solution for a set of values from -1 to 3 with white at the center. This it the code and output for that question:
test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:20)/20)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)
(But that would seem to be the wrong direction for the bias in your situation.)

How to draw lines on a plot in R?

I need to draw lines from the data stored in a text file.
So far I am able only to draw points on a graph and i would like to have them as lines (line graph).
Here's the code:
pupil_data <- read.table("C:/a1t_left_test.dat", header=T, sep="\t")
max_y <- max(pupil_data$PupilLeft)
plot(NA,NA,xlim=c(0,length(pupil_data$PupilLeft)), ylim=c(2,max_y));
for (i in 1:(length(pupil_data$PupilLeft) - 1))
{
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red", cex = 0.5, lwd = 2.0)
}
Please help me change this line of code:
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red")
to draw lines from the data.
Here is the data in the file:
PupilLeft
3.553479
3.539469
3.527239
3.613131
3.649437
3.632779
3.614373
3.605981
3.595985
3.630766
3.590724
3.626535
3.62386
3.619688
3.595711
3.627841
3.623596
3.650569
3.64876
By default, R will plot a single vector as the y coordinates, and use a sequence for the x coordinates. So to make the plot you are after, all you need is:
plot(pupil_data$PupilLeft, type = "o")
You haven't provided any example data, but you can see this with the built-in iris data set:
plot(iris[,1], type = "o")
This does in fact plot the points as lines. If you are actually getting points without lines, you'll need to provide a working example with your data to figure out why.
EDIT:
Your original code doesn't work because of the loop. You are in effect asking R to plot a line connecting a single point to itself each time through the loop. The next time through the loop R doesn't know that there are other points that you want connected; if it did, this would break the intended use of points, which is to add points/lines to an existing plot.
Of course, the line connecting a point to itself doesn't really make sense, and so it isn't plotted (or is plotted too small to see, same result).
Your example is most easily done without a loop:
PupilLeft <- c(3.553479 ,3.539469 ,3.527239 ,3.613131 ,3.649437 ,3.632779 ,3.614373
,3.605981 ,3.595985 ,3.630766 ,3.590724 ,3.626535 ,3.62386 ,3.619688
,3.595711 ,3.627841 ,3.623596 ,3.650569 ,3.64876)
plot(PupilLeft, type = 'o')
If you really do need to use a loop, then the coding becomes more involved. One approach would be to use a closure:
makeaddpoint <- function(firstpoint){
## firstpoint is the y value of the first point in the series
lastpt <- firstpoint
lastptind <- 1
addpoint <- function(nextpt, ...){
pts <- rbind(c(lastptind, lastpt), c(lastptind + 1, nextpt))
points(pts, ... )
lastpt <<- nextpt
lastptind <<- lastptind + 1
}
return(addpoint)
}
myaddpoint <- makeaddpoint(PupilLeft[1])
plot(NA,NA,xlim=c(0,length(PupilLeft)), ylim=c(2,max(PupilLeft)))
for (i in 2:(length(PupilLeft)))
{
myaddpoint(PupilLeft[i], type = "o")
}
You can then wrap the myaddpoint call in the for loop with whatever testing you need to decide whether or not you will actually plot that point. The function returned by makeaddpoint will keep track of the plot indexing for you.
This is normal programming for Lisp-like languages. If you find it confusing you can do this without a closure, but you'll need to handle incrementing the index and storing the previous point value 'manually' in your loop.
There is a strong aversion among experienced R coders to using for-loops when not really needed. This is an example of a loop-less use of a vectorized function named segments that takes 4 vectors as arguments: x0,y0, x1,y1
npups <-length(pupil_data$PupilLeft)
segments(1:(npups-1), pupil_data$PupilLeft[-npups], # the starting points
2:npups, pupil_data$PupilLeft[-1] ) # the ending points

Resources