I am trying to produce multiple plots using the split.screen option and I need to have 7 plots on the page. One of them should be plotted on its own and the other 6 plotted repeatedly using a for loop.
This is my code for some simulation I am carrying out. It runs well, but I have two potential problems:
I am not sure which of the plots actually gets plotted because I couldn't get the assigned label to show up on the bigger plot.
The plot showing on screen 1 is not the actual data because I have plotted it separately and know what it should look like.
Simulating the data:
numpop = 2
N = 1250
nSNP = 5000
Fst = 0.001
omega = c(0.5, 0.5)
propnExtreme = 0.1
nsim = 10
Fst.obs = vector(length = nSNP)
pdiffs = vector(length = nSNP)
genomat = matrix(nrow = N, ncol = nSNP)
for (i in 1:nSNP){
p = runif(1, 0.1, 0.9)
alpha = p * (1 - Fst) / Fst
beta = (1 - p) * (1 - Fst) / Fst
ps = rbeta(numpop, shape1 = alpha, shape2 = beta)
vars = var(ps)
pdiffs[i] = max(ps) - min(ps)
Fst.obs[i] = vars / (p * (1 - p))
for (j in 1:numpop){
ind1 = (j-1) * N * omega[j] + 1
ind2 = j * N * omega[j]
freqs = c(ps[j]^2, 2 * ps[j] * (1 - ps[j]), (1 - ps[j])^2)
genomat[ind1:ind2, i] = sample(c(0, 1, 2), size = N*omega[j], replace = TRUE, prob = freqs)
}
}
snpmeans = colMeans(genomat)
pi = (1 + colSums(genomat)) / (2 + 2*nrow(genomat))
stdmat = scale(genomat, center=snpmeans, scale=sqrt(pi*(1-pi)))
pr = prcomp(stdmat, center=F, scale=F)
Plotting:
get( getOption("device" ) )()
png(file="myplot.png", width=2000, height = 1200)
par(oma = c(0,0,3,0))
split.screen(c(1,2)) # split display into two screens
plot(pr$x,
col = c(rep("red", N*omega[1]), rep("blue", N*omega[2])),
main = "Whole genotype data")
split.screen(c(2, 3), screen = 2) # now split the second into 2x3
for(i in 1:8) ## 8=#of screens
{
screen(i) # prepare screen i for output
fA=0.5
fa = 1-fA
combined_SNP <- sample(c(0:2), N, prob=c(fA^2, 2*fA*fa, fa^2), replace=T)
pheno_indep <-c()
##Phenotypes
for (i in 1:length(combined_SNP)){
if (combined_SNP[i] == '0') {
pheno_indep<- c(pheno_indep, rnorm(1, mean = 0.07, sd = 1))
} else if (combined_SNP[i ]== '1') {
pheno_indep <- c(pheno_indep, rnorm(1, mean = 0, sd = 1))
} else {
pheno_indep <- c(pheno_indep, rnorm(1, mean = -0.07, sd = 1))
}
}
l <- 1:N
combined_indep <- cbind(combined_SNP, pheno_indep, l)
sorted_combined <- combined_indep[order(combined_indep[, 2]), ]
##eps data
f = 0.1
Nums = nrow(sorted_combined)
keep <- c(1:(f*Nums), (Nums-(f*Nums)+1):Nums)
epsdat<- c(rep("0", f*Nums), rep("1", f*Nums))
EPS_dat <- as.factor(cbind(sorted_combined[keep, ], epsdat))
dim(EPS_dat) <- c(length(keep), 4)
#colnames(EPS_dat) <- c("Genotypes", "Phenotypes", "ID", "position")
PC_EPS <- prcomp((genomat[EPS_dat[, 3], ]))
plot(PC_EPS$x,
col=c(rep("red", f*Nums), rep("blue", f*Nums)))
}
close.screen(all=TRUE)
dev.off()
Result:
I have spent a lot of time trying to figure this out even with other packages like layout.show. Thanks!
Is the following what you expect to be plotted? (I added screen title to the small plots for illustration)
When you split the screens, you should have gotten the following on your console:
> split.screen(c(1, 2))
[1] 1 2
# (code used to plot first chart on the left)
> split.screen(c(2, 3), screen = 2)
[1] 3 4 5 6 7 8
As described in the help file ?split.screen, this is a a vector of screen numbers for the newly-created screens. So your valid screen numbers are 1 (already plotted), and 3-8 (6 small screens).
As such, the next line doesn't work as expected, since you're now looping through screens 1-8 rather than screens 3-8.
# instead of
for(i in 1:8) ## 8=#of screens
# use this
for(i in 3:8) ## 8=#of screens
As a side note, you should also use different loop counters for nested loops. Your outer loop (for the 6 small plots) used i as the loop counter. Within this loop, you have another loop for phenotypes, which used i as well. Since the screen selection was done at the start of each outer loop iteration, the code still worked in this case, but in general, best to keep the loop counters separate.
Related
I have a Bayesian MCMC in R, and I have the code below:
RWM = function(Niter,Y,X){
p = ncol(X)
alpha = 0.7
beta = matrix(0,ncol=1,nrow=3)
beta = as.matrix(beta)
sig_p = 0
mu_p = beta
C = diag(p)
R = t(chol(C))
lpi = posterior(beta,Y,X)
OUT = matrix(NA, ncol=3, nrow=Niter)
for (j in 1:Niter){
rr = rnorm(p)
beta_p = beta + exp(sig_p) * as.vector(R%*%rr)
lpi_p = posterior(beta_p,Y,X)
A = exp(lpi_p-lpi)
Acc = min(1,A)
if (runif(1)<=Acc){
beta = beta_p
lpi = lpi_p
}
OUT[j,] = beta
sig_p = sig_p + (1/j^alpha)*(Acc -0.3)
mu_p = mu_p + (1/j)*(as.matrix(beta) - mu_p)
bmu = as.matrix(beta - mu_p)
C = C + (1/j)*(as.matrix(t(bmu)%*%bmu) - C)
}
return(OUT)
It looks like the vector beta will update, and the three elements in this vector will be different due to the rnorm function. However, this is not the case. The 3 columns of the output, one for each element, are exactly the same in the row. I have iterated this function out in the console several times, and in no case did the elements in beta appear to be the same.
For example: beta = [1, 2, 3] but the output = [1, 1, 1]
The MCMC iterates and does not get stuck, as the histogram shows a wide range of values in the output. It is just the sampled betas that are giving me the issue.
I'm just not understanding what is wrong with my code that prevents my vector beta from being added directly to the matrix OUT.
I am relying on edge detection (as opposed to colour detection) to extract features from blood cells. The original image looks like:
I am using the R EBImage package to run a sobel + low pass filter to get to something like this:
library(EBImage)
library(data.table)
img <- readImage("6hr-007-DIC.tif")
#plot(img)
#print(img, short = T)
# 1. define filter for edge detection
hfilt <- matrix(c(1, 2, 1, 0, 0, 0, -1, -2, -1), nrow = 3) # sobel
# rotate horizontal filter to obtain vertical filter
vfilt <- t(hfilt)
# get horizontal and vertical edges
imgH <- filter2(img, hfilt, boundary="replicate")
imgV <- filter2(img, vfilt, boundary="replicate")
# combine edge pixel data to get overall edge data
hdata <- imageData(imgH)
vdata <- imageData(imgV)
edata <- sqrt(hdata^2 + vdata^2)
# transform edge data to image
imgE <- Image(edata)
#print(display(combine(img, imgH, imgV, imgE), method = "raster", all = T))
display(imgE, method = "raster", all = T)
# 2. Enhance edges with low pass filter
hfilt <- matrix(c(1, 1, 1, 1, 1, 1, 1, 1, 1), nrow = 3) # low pass
# rotate horizontal filter to obtain vertical filter
vfilt <- t(hfilt)
# get horizontal and vertical edges
imgH <- filter2(imgE, hfilt, boundary="replicate")
imgV <- filter2(imgE, vfilt, boundary="replicate")
# combine edge pixel data to get overall edge data
hdata <- imageData(imgH)
vdata <- imageData(imgV)
edata <- sqrt(hdata^2 + vdata^2)
# transform edge data to image
imgE <- Image(edata)
plot(imgE)
I would like to know if there are any methods to fill in the holes in the large rings (blood cells) so they are solid bodies a bit like:
(obviously this is not the same image but imagine that last image only started out with edges.)
I would then like to use something like computeFeatures() method from the EBImage package (which as far as I'm aware only works on solid bodies)
EDIT Little more code to extract interior of objects with "connections" to border. The additional code includes defining the convex hull of the segmented cells and creating a filled mask.
The short answer is that fillHull and floodFill may be helpful for filling cells that have well defined borders.
The longer (edited) answer below suggests an approach with floodFill that might be useful. You did a great job extracting information from the low contrast DIC images, but even more image processing might be helpful such as "flat-field correction" for noisy DIC images. The principle is described in this Wikipedia page but a simple implementation does wonders. The coding solution suggested here requires user interaction to select cells. That's not such a robust approach. Still, perhaps more image processing combined with code to locate cells could work. In the end, the interior of cells are segmented and available for analysis with computeFeatures.
The code starts with the thresholded image (having trimmed the edges and converted to binary).
# Set up plots for 96 dpi images
library(EBImage)
dm <- dim(img2)/96
dev.new(width = dm[1], height = dm[2])
# Low pass filter with gblur and make binary
xb <- gblur(img2, 3)
xt <- thresh(xb, offset = 0.0001)
plot(xt) # thresh.jpg
# dev.print(jpeg, "thresh.jpg", width = dm[1], unit = "in", res = 96)
# Keep only "large" objects
xm <- bwlabel(xt)
FS <- computeFeatures.shape(xm)
sel <- which(FS[,"s.area"] < 800)
xe <- rmObjects(xm, sel)
# Make binary again and plot
xe <- thresh(xe)
plot(xe) # trimmed.jpg
# dev.print(jpeg, "trimmed.jpg", width = dm[1], unit = "in", res = 96)
# Choose cells with intact interiors
# This is done by hand here but with more pre-processing, it may be
# possible to have the image suitable for more automated analysis...
pp <- locator(type = "p", pch = 3, col = 2) # marked.jpg
# dev.print(jpeg, "marked.jpg", width = dm[1], unit = "in", res = 96)
# Fill interior of each cell with a unique integer
myCol <- seq_along(pp$x) + 1
xf1 <- floodFill(xe, do.call(rbind, pp), col = myCol)
# Discard original objects from threshold (value = 1) and see
cells1 <- rmObjects(xf1, 1)
plot(colorLabels(cells1))
# dev.print(jpeg, "cells1.jpg", width = dm[1], unit = "in", res = 96)
I need to introduce algorithms to connect integer points between vertices and fill a convex polygon. The code here implements Bresenham's algorithm and uses a simplistic polygon filling routine that works only for convex (simple) polygons.
#
# Bresenham's balanced integer line drawing algorithm
#
bresenham <- function(x, y = NULL, close = TRUE)
{
# accept any coordinate structure
v <- xy.coords(x = x, y = y, recycle = TRUE, setLab = FALSE)
if (!all(is.finite(v$x), is.finite(v$y)))
stop("finite coordinates required")
v[1:2] <- lapply(v[1:2], round) # Bresenham's algorithm IS for integers
nx <- length(v$x)
if (nx == 1) return(list(x = v$x, y = v$y)) # just one point
if (nx > 2 && close == TRUE) { # close polygon by replicating 1st point
v$x <- c(v$x, v$x[1])
v$y <- c(v$y, v$y[1])
nx <- nx + 1
}
# collect result in 'ans, staring with 1st point
ans <- lapply(v[1:2], "[", 1)
# process all vertices in pairs
for (i in seq.int(nx - 1)) {
x <- v$x[i] # coordinates updated in x, y
y <- v$y[i]
x.end <- v$x[i + 1]
y.end <- v$y[i + 1]
dx <- abs(x.end - x); dy <- -abs(y.end - y)
sx <- ifelse(x < x.end, 1, -1)
sy <- ifelse(y < y.end, 1, -1)
err <- dx + dy
# process one segment
while(!(isTRUE(all.equal(x, x.end)) && isTRUE(all.equal(y, y.end)))) {
e2 <- 2 * err
if (e2 >= dy) { # increment x
err <- err + dy
x <- x + sx
}
if (e2 <= dx) { # increment y
err <- err + dx
y <- y + sy
}
ans$x <- c(ans$x, x)
ans$y <- c(ans$y, y)
}
}
# remove duplicated points (typically 1st and last)
dups <- duplicated(do.call(cbind, ans), MARGIN = 1)
return(lapply(ans, "[", !dups))
}
And a simple routine to find interior points of a simple polygon.
#
# Return x,y integer coordinates of the interior of a CONVEX polygon
#
cPolyFill <- function(x, y = NULL)
{
p <- xy.coords(x, y = y, recycle = TRUE, setLab = FALSE)
p[1:2] <- lapply(p[1:2], round)
nx <- length(p$x)
if (any(!is.finite(p$x), !is.finite(p$y)))
stop("finite coordinates are needed")
yc <- seq.int(min(p$y), max(p$y))
xlist <- lapply(yc, function(y) sort(seq.int(min(p$x[p$y == y]), max(p$x[p$y == y]))))
ylist <- Map(rep, yc, lengths(xlist))
ans <- cbind(x = unlist(xlist), y = unlist(ylist))
return(ans)
}
Now these can be used along with ocontour() and chull() to create and fill a convex hull about each segmented cells. This "fixes" those cells with intrusions.
# Create convex hull mask
oc <- ocontour(cells1) # for all points along perimeter
oc <- lapply(oc, function(v) v + 1) # off-by-one flaw in ocontour
sel <- lapply(oc, chull) # find points that define convex hull
xh <- Map(function(v, i) rbind(v[i,]), oc, sel) # new vertices for convex hull
oc2 <- lapply(xh, bresenham) # perimeter points along convex hull
# Collect interior coordinates and fill
coords <- lapply(oc2, cPolyFill)
cells2 <- Image(0, dim = dim(cells1))
for(i in seq_along(coords))
cells2[coords[[i]]] <- i # blank image for mask
xf2 <- xe
for (i in seq_along(coords))
xf2[coords[[i]]] <- i # early binary mask
# Compare before and after
img <- combine(colorLabels(xf1), colorLabels(cells1),
colorLabels(xf2), colorLabels(cells2))
plot(img, all = T, nx = 2)
labs <- c("xf1", "cells1", "xf2", "cells2")
ix <- c(0, 1, 0, 1)
iy <- c(0, 0, 1, 1)
text(dm[1]*96*(ix + 0.05), 96*dm[2]*(iy + 0.05), labels = labs,
col = "white", adj = c(0.05,1))
# dev.print(jpeg, "final.jpg", width = dm[1], unit = "in", res = 96)
I am using the following r code to compute the loglikelihood for left side and right side for each i = 1,2,...,200.
But I want to do this procedure for large number of generated dataset, for instance a = 10000 and iterate the entire loop for 1000 times. How can I speed up the the following program? Am I able to use applyfunction instead of for function?
Thank you in advance!
n1 = 100
n2 = 100
a = 1000
n= n1 + n2
# number of simulated copies of y
sim.data = matrix(NA, nrow = n, ncol = a)
for (i in 1:a) {
#for(j in 1:a){
sim.data[,i] = c(rnorm(n1, 2, 1), rnorm(n-n1, 4, 1))
#}
}
dim(sim.data)
# Compute the log-likelihood
B = ncol(sim.data)
loglike_profb = matrix(NA, n - 1, B)
for (j in 1:B) {
for (i in 1:(n - 1)) {
loglike_profb[i, j] = -0.5*(sum(((sim.data[1:i,j]) - mean(sim.data[1:i,j]))^2) + sum(((sim.data[(i + 1):n,j]) - mean(sim.data[(i +1):n,j]))^2))
}
}
You can put the calculation of the loglike_profb into a function and then use mapply
loglike_profb_func <- function(i,j){
-0.5*(sum(((sim.data[1:i,j]) - mean(sim.data[1:i,j]))^2) + sum(((sim.data[(i + 1):n,j]) - mean(sim.data[(i +1):n,j]))^2))
}
mapply(loglike_profb_func, rep(1:(n-1),B), rep(1:B,(n-1)))
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Let's assume I have a data frame with 1000 entries/rows. Each row has an ID, a 2nd column with some data, and a 3rd column also with some data.
So the data frame would look something like:
ID yesNo Id_specific_data
1 1 4
2 0 8
3 0 43
4 1 11
5 0 9
... and so on.
I now need to do the following:
n = 4
ID_range <- c(1:n)
ID_spec_data <- floor(runif(n, min=10, max=100))
yesNo_data <- sample(c(0,1), replace=TRUE, size=n)
df <- data.frame("ID" = ID_range, "yesNo" = yesNo_data, "ID_specific_data" = ID_spec_data)
m <- 1
for (i in seq(1, 100, 1)) {
for (j in seq(0.1, 1, 0.1)) {
log_like_list <- c()
for (k in seq(0.1, 1, 0.1)) {
total_ID_list <- c()
for (l in seq(1, length(df$ID))) {
x = (df$ID_specific_data[[l]]*k - j) / (i*j)
calc = pnorm(x, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
total_ID_list[[l]] = calc
}
# log likelihood function
final_calc = sum(df$yesNo*log(total_ID_list)+(1-df$yesNo)*log(1 - total_ID_list))
log_like_list[[m]] = final_calc
m <- m + 1
}
}
}
So basically the end result (log_like_list) should be a list/vector with 1500*200*100 values. But in order to do this the same amount of calculations is needed to be done on the number of ID's in the data frame (which is about 500-1000 in my case). All in all - a lot of calculations.
I know for loops are probably the worst thing you can do in terms of speed, but I'm not even sure that using apply would make it super fast when it's this many calculations ? I have read about Rcpp, which in principle could reduce calculation time the most of any option. But it requires knowledge of C++ as I can see (which I really lack), and I'm not even sure if it is applicable to my problem here ?
So could the calculation times be reduced significantly with any R tricks, or do I just have to wait it out ?
I think your current edit is still wrong,
you probably shouldn't be re-defining log_like_list inside any of the loops.
Here's an alternative that first allocates all parameter combinations with expand.grid,
which is a bit wasteful in terms of RAM,
but I think it's manageable:
n <- 4L
df <- data.frame(
ID = 1L:n,
yesNo = sample(c(0,1), replace=TRUE, size=n),
ID_specific_data = floor(runif(n, min=10, max=100))
)
params <- expand.grid(
i = seq(1, 100, 1),
j = seq(0.1, 1, 0.1),
k = seq(0.1, 1, 0.1)
)
log_like <- sapply(1L:nrow(params), function(row_id) {
i <- params$i[row_id]
j <- params$j[row_id]
k <- params$k[row_id]
calc <- sapply(df$ID_specific_data, function(idsd) {
x <- (idsd * k - j) / (i * j)
pnorm(x, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
})
sum(df$yesNo * log(calc) + (1 - df$yesNo) * log(1 - calc))
})
However, for your final use case, this is probably still too slow...
You can try to use parallelization,
which might have acceptable times if you have many cores:
library(doParallel)
library(itertools)
# do NOT run these lines several times without calling stopCluster() on the created workers
workers <- makeCluster(detectCores())
registerDoParallel(workers)
n <- 1000L
df <- data.frame(
ID = 1L:n,
yesNo = sample(c(0,1), replace=TRUE, size=n),
ID_specific_data = floor(runif(n, min=10, max=100))
)
params <- expand.grid(
i = seq(1, 150, 0.1),
j = seq(0.1, 2, 0.01),
k = seq(0.1, 1, 0.01)
)
params_chunk <- isplitRows(params, chunks = getDoParWorkers())
log_like_par <- foreach(param = params_chunk, .combine = c, .multicombine = TRUE) %dopar% {
# return from foreach body here
sapply(1L:nrow(param), function(row_id) {
i <- param$i[row_id]
j <- param$j[row_id]
k <- param$k[row_id]
calc <- sapply(df$ID_specific_data, function(idsd) {
x <- (idsd * k - j) / (i * j)
pnorm(x, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
})
# return from sapply body here
sum(df$yesNo * log(calc) + (1 - df$yesNo) * log(1 - calc))
})
}
stopCluster(workers); registerDoSEQ()
I tried running it in my system (4 cores),
but stopped it after a few minutes.
If you wait it out, let me know how long it took.
This is not going to be a 100% answer that you can copy and paste, but I think it will help you get part of the way. Mainly you need to think about why it is that you are spending time doing loops where you are really dealing with essentially constant values.
For example
i <- seq(1, 100, 1)
j <- seq(0.1, 1, 0.1)
ioxj <- i %o% j
df_ij <- data.frame("i" = i, "j" = j, "ioxj" = ioxj)
df_ij$ixj <- df_ij$i * df_ij$j
Will get you every combination of i and j and their product and there is no reason to use a loop to get that basic math result. You might use a loop to go through the columns at some point, that would potentially make sense because the values of i and j might change.
You can also work similarly with k.
Also there is never a reason to do something like this
x = (df$ID_specific_data[[l]]*k - j) / (i*j)
in a loop going over each row in a data frame, that loses the whole idea of vectorization, you want to end up with this instead.
x = (df$ID_specific_data*k - j) / (i*j)
You need to play around with the code to get it exactly the way you want it, but it will be worth spending the time to do so. It's possible the occasional loop may be correct but I think you will likely end up doing something much simpler.
I am trying to generate a matrix sz by first applying a binomial, then adding values from the corresponding column of pombe_new_subs and this combined value being input as size for the following column.
After many frustrations, the following code is what I've ended up with and it just doesn't work - problems I'm coming across are;
# Error in sz[j, i + 1] = sz[, i] + pombe_new_subs[, i] :
# number of items to replace is not a multiple of replacement length
pombe_new_subs <- rmultinom(3, 15, prob = c(0.3, 0.3, 0.3))
randomdiv <- function(nchrom, ndivs, size) {
sz <- matrix(nrow = nchrom, ncol = ndivs)
for (j in 1:nchrom) {
n <- size
for (i in 1:ndivs) {
n <- rbinom(1, n, 0.5)
sz[j,i] <- n
}
sz[j,i+1] = sz[ ,i] + pombe_new_subs[ ,i]
sz[j, i+1] <- n
}
return (sz)
}
randomdiv(3, 3, 10)
I know this is probably a fairly simple looping exercise but frustration has entirely taken over.