R: Processing data within a loop - r

Below, I run a loop using the sample data provided, appending necessary items to geneexptest, and processing it further within the loop. However, in building dfs, I wish for the endpoint of each run to be data.frame(geneexptotal,...), as shown. The problem is, it seems to be stopping somehow at geneexptestapp, and outputting that instead into dfs each round. Please let me know how I might include the rest of the loop into the output.
gex <- data.frame("sample" = c("BIX","HEF","TUR","ZOP","VAG","JUF","FED","MEQ","YIF","HRB","LOP","LIX","COT","DRP","KFC","TUY","DOG","KEX","RAV","UEH"),
"TCGA-F4-6703-01" = runif(20, -1, 1),
"TCGA-DM-A28E-01" = runif(20, -1, 1),
"TCGA-AY-6197-01" = runif(20, -1, 1),
"TCGA-A6-5657-01" = runif(20, -1, 1))
colnames(gex) <- gsub("[.]", "_",colnames(gex))
listx <- c("TCGA_DM_A28E_01","TCGA_A6_5657_01")
mxy <- data.frame("TCGA-AD-6963-01" = runif(20, -1, 1),
"TCGA-AA-3663-11" = runif(20, -1, 1),
"TCGA-AD-6901-01" = runif(20, -1, 1),
"TCGA-AZ-2511-01" = runif(20, -1, 1),
"TCGA-A6-A567-01" = runif(20, -1, 1))
colnames(mxy) <- gsub("[.]", "_",colnames(mxy))
zScore <- function(x,y)((as.numeric(x) - as.numeric(rowMeans(y,na.rm=T)))/as.numeric(sd(y,na.rm=T)))
dfs <- lapply(listx, function(colName) {
do.call(rbind, lapply(seq(nrow(mxy)), function(i) {
zvalues <- zScore(gex[i,colName], mxy[i,])
geneexptest <- data.frame(gex$sample[i], zvalues, row.names = NULL, stringsAsFactors = TRUE)
geneexptest$zvalues <- as.numeric(as.character(geneexptest$zvalues))
is.na(geneexptest) <- sapply(geneexptest, is.infinite)
geneexptestapp <- na.omit(geneexptest)
geneexptestorder <- geneexptestapp[order(geneexptestapp$zvalues, decreasing = FALSE, na.last = NA), ]
geneexpa <- geneexptestorder[1:((0.05)*nrow(geneexptest)),]
geneexpz <- geneexptestorder[(nrow(geneexptestorder)-((0.05)*nrow(geneexptest))):nrow(geneexptestorder),]
geneexptotal <- rbind(geneexpa, geneexpz)
data.frame(geneexptotal$gex.sample, row.names = NULL, stringsAsFactors = TRUE)
}))
})

Your code, as it is, is working fine. You just have unexpected output because of some data management you're doing. I broke your code out a bit to help with readability. I made two new functions, fun1 and fun2 - fun2 is your inner function and fun1 is the outer function. fun2 takes colName as an argument to pass it along.
fun2 = function(i,colName) {
zvalues <- zScore(gex[i,colName], mxy[i,])
geneexptest <- data.frame(gex$sample[i], zvalues, row.names = NULL, stringsAsFactors = TRUE)
geneexptest$zvalues <- as.numeric(as.character(geneexptest$zvalues))
is.na(geneexptest) <- sapply(geneexptest, is.infinite)
geneexptestapp <- na.omit(geneexptest)
geneexptestorder <- geneexptestapp[order(geneexptestapp$zvalues, decreasing = FALSE, na.last = NA), ]
geneexpa <- geneexptestorder[1:((0.05)*nrow(geneexptest)),]
geneexpz <- geneexptestorder[(nrow(geneexptestorder)-((0.05)*nrow(geneexptest))):nrow(geneexptestorder),]
geneexptotal <- rbind(geneexpa, geneexpz)
data.frame(geneexptotal, row.names = NULL, stringsAsFactors = TRUE)
}
fun1 = function(colName) {
do.call(rbind, lapply(seq(nrow(mxy)), fun2, colName=colName))
}
dfs <- lapply(listx, fun1)
So, listx has two column names in it - let's start with the first one, TCGA_DM_A28E_01. So, we call fun1("TCGA_DM_A28E_01"). This is going to rbind each of 5 (nrow(mxy) = 5) iterations for fun2.
So, let's start with i=1 (we're now in fun2). You take one value from gex and one row from mxy, put them in zScore which gives a value of -0.6955057. You then make a one row data.frame and do some stuff to the zvalues column. So we have:
> geneexptest
gex.sample.i. zvalues
1 BIX -0.6955057
Do some checks and verify that nothing is infinite and remove it if it is. So now we have:
> geneexptestapp
gex.sample.i. zvalues
1 BIX -0.6955057
Now you do some ordering on a 1 row data frame. Nothing changes. This is where the problem is nrow(geneexptest) = 1 so for geneexpa you're asking for rows 1:.05 which is the same as 1 and for geneexpz you're asking for rows .95:1 which is 0.95. There are no fractional rows. This results in:
> geneexpa;geneexpz
gex.sample.i. zvalues
1 BIX -0.6955057
[1] gex.sample.i. zvalues
<0 rows> (or 0-length row.names)
which you rbind and return as a data frame. In this case, it results in a single row data frame. fun1 produces 5 of these and rbinds them together giving you a 5 row data.frame. dfs is a list of 2 of these.
Your code suggests you expect to get more than 1 row (your multiplication suggests at least 20) but there's only 1. Either the example here isn't complete or you need to rethink what's happening in fun2.
UPDATE
OK, given your updated requirements, consider this function:
getExtremeValues = function(x,p=0.05){
z = x[,2]
n = ceiling(nrow(x)*p)
r = x[order(z),1]
return(as.character(r[c(1:n,length(r):(length(r)-n+1))]))
}
You pass this function a dataframe (x), tell it which column has your zvalues (c = 2, by default) and what proportion you want from the top and bottom (p = 0.05 by default). It then returns the first column where the zvalues are in the top and bottom percent.
How to make this all work:
fun2 = function(i,colName) {
zvalues <- zScore(gex[i,colName], mxy[i,])
geneexptest <- data.frame(gex$sample[i], zvalues, row.names = NULL, stringsAsFactors = TRUE)
geneexptest$zvalues <- as.numeric(as.character(geneexptest$zvalues))
is.na(geneexptest) <- sapply(geneexptest, is.infinite)
return(na.omit(geneexptest))
}
fun1 = function(colName) {
getExtremeValues(do.call(rbind, lapply(seq(nrow(mxy)), fun2, colName=colName)))
}
dfs <- lapply(listx, fun1)
This returns:
> dfs
[[1]]
[1] "BIX" "TUY"
[[2]]
[1] "BIX" "TUR"
Given there are 20 samples, 1 is in the top 5%, 1 is in the bottom 5%, there are two column names listed in listx, so 4 samples returned.

Related

R: How to access a 'complicated list'

I am working on an assignment, which tasks me to generate a list of data, using the below code.
##Use the make_data function to generate 25 different datasets, with mu_1 being a vector
x <- seq(0, 3, len=25)
make_data <- function(a){
n = 1000
p = 0.5
mu_0 = 0
mu_1=a
sigma_0 = 1
sigma_1 = 1
y <- rbinom(n, 1, p)
f_0 <- rnorm(n, mu_0, sigma_0)
f_1 <- rnorm(n, mu_1, sigma_1)
x <- ifelse(y == 1, f_1, f_0)
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
list(train = data.frame(x = x, y = as.factor(y)) %>% slice(-test_index),
test = data.frame(x = x, y = as.factor(y)) %>% slice(test_index))
}
dat <- sapply(x,make_data)
The code looks good to go, and 'dat' appears to be a 25 column, 2 row table, each with its own data frame.
Now, each data frame within a cell has 2 columns.
And this is where I get stuck.
While I can get to the data frame in row 1, column 1, just fine (i.e. just use dat[1,1]), I can't reach the column of 'x' values within dat[1,1]. I've experimented with
dat[1,1]$x
dat[1,1][1]
But they only throw weird responses: error/null.
Any idea how I can pull the column? Thanks.
dat[1, 1] is a list.
class(dat[1, 1])
#[1] "list"
So to reach to x you can do
dat[1, 1]$train$x
Or
dat[1, 1][[1]]$x
As a sidenote, instead of having this 25 X 2 matrix as output in dat I would actually prefer to have a nested list.
dat <- lapply(x,make_data)
#Access `x` column of first list from `train` dataset.
dat[[1]]$train$x
However, this is quite subjective and you can chose whatever format you like the best.

How to create a single data frame with multiple vectors result of a loop operation?

I have a .wav file and want to get power spectrums for successive no overlapping time windows.
The data of the power spectrum is obtained with the next function, once seewave and tuneR libraries are loaded:
n <- 0:1
sound1 <- readWave("D:\\sound.wav")
result <- do.call(cbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,wl=16,plot=FALSE)))
result1 <- data.frame(result)
The ouput will be
structure(list(x = c(0, 2.75625, 5.5125, 8.26875, 11.025, 13.78125,
16.5375, 19.29375), y = c(1, 0.551383594277632, 0.0742584974502194,
0.0399059818168578, 0.0218500553648978, 0.0176655910374274,
0.00904887363707214,
0.00333698474894753), x.1 = c(0, 2.75625, 5.5125, 8.26875, 11.025,
13.78125, 16.5375, 19.29375), y.1 = c(1, 0.558106398109396,
0.145460335046358,
0.0804097312947365, 0.0476025570412434, 0.0393549921764155,
0.0203584314573552,
0.00737927765210362)), class = "data.frame", row.names = c(NA,
But in the resultant df I only need y and y.1 but no x and x.1. As you may notice x and 1.x have the same data and such iformation is redundant. In short: I only need y data.
Thankyou for your suggestions!
There are more than a few ways to do what you are talking about. I don't know the length of the vector you are talking about though or the way meanspec returns its data, so you will have to fill that in yourself
vec_length <- length(amplitude_vector)
wav_df <- data.frame(matrix(nrow = 0, ncol = vec_length + 1))
for(i in 0:(end-1)){
#Add relevant code to get the amplitude vector from the function below
amp_vec <- meanspec(sound1, from = i, to = i+1, plot = FALSE)...
wav_df <- rbind(wav_df,c(i,amp_vec))
}
colnames(wav_df) <- c("start-time",...)#Add in the other column names that you want
wav_df should then have the information you want.
You may use lapply -
n <- 0:9 #to end at 9-10;change as per your preference
Sound1 <- readWave("D:\\Sound.wav")
result <- do.call(rbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,plot=FALSE)))
result
#to get dataframe as output
#result <- data.frame(result)

Sample method error in r

I have a vector like that
objetosDisponibles <- c(1,2,3,4)
I choose random one with this
objetoAleatorio <- sample(objetosDisponibles,size = 1, replace = F)
Then, I delete the choosen element in the vector
objetosDisponibles <- objetosDisponibles[objetosDisponibles!=objetoAleatorio]
If I do this 4 times, I want to choose the elements ramdonly, when I used sample method at the vector with 1 element, it choose other diferent. You can probe this with this code:
cont <- 0
objetosDisponibles <- c(1,2,3,4)
while(cont < 4){
objetoAleatorio <- sample(objetosDisponibles,size = 1, replace = F)
print(objetoAleatorio)
objetosDisponibles <- objetosDisponibles[objetosDisponibles!=objetoAleatorio]
print(objetosDisponibles)
cont <- cont +1
}
The error, a number with "L":
Print output:

Vectorization of nested for loops

I am trying to vectorize my nested for loop code using apply/mapply/lapply/sapply or any other way to reduce the running time. My code is as follows:
for (i in 1:dim){
for (j in i:dim){
if(mydist.fake[i,j] != d.hat.fake[i,j]){
if((mydist.fake[i,j]/d.hat.fake[i,j] > 1.5)|(d.hat.fake[i,j]/mydist.fake[i,j]>1.5)){
data1 = cbind(rowNames[i],rowNames[j], mydist.fake[i,j], d.hat.fake[i,j], 1)
colnames(data1) = NULL
row.names(data1) = NULL
data = rbind(data, data1)
}else{
data1 = cbind(rowNames[i],rowNames[j], mydist.fake[i,j], d.hat.fake[i,j], 0)
colnames(data1) = NULL
row.names(data1) = NULL
data = rbind(data, data1)
}
}
}
}
write.table(data, file = "fakeTest.txt", sep ="\t", col.names = FALSE, row.names = FALSE)
rowNames is the vector of rownames of all data points
data is a dataframe
mydist.fake and d.hat.fake are distance matrices (where the diagonal is zero and values of upper and lower triangle is same) and therefore, interested in the transversal of lower triangle (leaving values of diagonals too).
The dimensions of the both the matrices are the same.
The major problem I am facing is the vectorization of the j loop where j is initialized as i.
A vectorized version of your code is:
dist1 <- mydist.fake
dist2 <- d.hat.fake
data <- data.frame(i = rowNames[row(dist1)[lower.tri(dist1)]],
j = rowNames[col(dist1)[lower.tri(dist1)]],
d1 = dist1[lower.tri(dist1)],
d2 = dist2[lower.tri(dist2)])
data <- transform(data, outcome = d1/d2 > 1.5 | d2/d1 > 1.5)
I tested it successfully using the following sample data:
X <- matrix(runif(200), 20, 10)
Y <- matrix(runif(200), 20, 10)
rowNames <- paste0("var", seq_len(nrow(X)))
mydist.fake <- as.matrix(dist(X))
d.hat.fake <- as.matrix(dist(Y))

for loop to output different objects in r

I am trying to use for to create multiple objects from for, just example (not exact):
l_gr <- list (1:10, 11:20, 21:30)
for (i in 1:length(l_gr)){
grp <- NULL
grp[[i]] <- mean(l_gr[[i]])
}
This is not what I am expecting, rather I need to output multiple objects (of different class) however the name is different with i level for example: here grp1, grp2, grp3.
Each of these object has output of the function for particular i list. Sorry for simple question.
Edits: response to provide specific example:
install.packages("onemap")
require(onemap)
data(example.out)
twopts <- rf.2pts(example.out)
all.data <- make.seq(twopts,"all")
link_gr <- group(all.data)
link_gr$n.groups
starts the loop
# without loop:
# for 1
grp1 <- make.seq(link_gr, 1)
grp1.od <- order.seq(input.seq=grp1, n.init = 5, subset.search = "twopt",
twopt.alg = "rcd", THRES = 3, draw.try = TRUE, wait = 1, touchdown=TRUE)
# for 2
grp2 <- make.seq(link_gr, 2)
grp2.od <- order.seq(input.seq=grp2, n.init = 5, subset.search = "twopt",
twopt.alg = "rcd", THRES = 3, draw.try = TRUE, wait = 1, touchdown=TRUE)
same process report for 1:1:link_gr$n.groups
So I want create a for loop and output objects:
for (i in 1:link_gr$n.groups){
grp <- NULL
grp[i] <- make.seq(link_gr, i)
grp[i].od <- order.seq(input.seq=grp[i], n.init = 5, subset.search = "twopt",
twopt.alg = "rcd", THRES = 3, draw.try = TRUE, wait = 1, touchdown=TRUE)
}
Note that your for loops are wrong. If you set grp <- NULL within the loop, you'll just wipe your results variable with each iteration - probably not what you want. You need to put the variable initialisation outside the loop.
Note, too, that I'd suggest that you are still better off using a single variable instead of multiple ones. list objects are very flexible in R and can accomodate objects of different classes. You can do
require(onemap)
data(example.out)
twopts <- rf.2pts(example.out)
all.data <- make.seq(twopts,"all")
link_gr <- group(all.data)
link_gr$n.groups
# initialise list outputs
grp = list()
grp.od = list()
for (i in 1:2){
grp[[i]] <- make.seq(link_gr, i)
grp.od[[i]] <- order.seq(input.seq=grp[[i]], n.init = 5, subset.search = "twopt",
twopt.alg = "rcd", THRES = 3, draw.try = TRUE, wait = 1, touchdown=TRUE)
}
#check out output
str(grp)
str(grp.od)
grp[[1]]
grp[[2]
If you must insist on using different variables, consider ?assign and ?get. Something like this will work:
i = 1
assign(paste("grp", i, sep = ""), grp[[1]])
exists("grp1")
str(get(paste("grp", i, sep = "")))

Resources