Creating a for loop for a dataframe - r

I have a list of vectors stored
library(seqinr) mydata <- read.fasta(file="mydata.fasta")
mydatavec <- mydata[[1]]
lst <- split(mydatavec, as.integer(gl(length(mydatavec), 100,length(mydatavec))))
df <- data.frame(matrix(unlist(lst), nrow=2057, byrow=T), stringsAsFactors=FALSE)
Now, each vector in df is 100 long and made up of letters "a", "c", "g", "t". I would like to calculate Shannon entropy of each of these vector, I will give example of what I mean:
v1 <- count(df[1,], 1)
a c g t
27 26 24 23
v2 <- v1/sum(v1)
a c g t
0.27 0.26 0.24 0.23
v3 <- -sum(log(v2)*v2) ; print(v3)
[1]1.384293
In total I need 2057 printed values because that is how many vectors I have. My question here, is it possible to create a for loop or repeat loop that would do this operation for me? I tried myself but I didn't get nowhere with this.
dput(head(sequence))
structure(c("function (nvec) ", "unlist(lapply(nvec, seq_len))"
), .Dim = c(2L, 1L), .Dimnames = list(c("1", "2"), ""), class = "noquote")
My attempt: I wanted to focus on the count function only and created this
A <- matrix(0, 2, 4)
for (i in 1:2) {
A[i] <- count(df[i,], 1)
}
What the function does is it correctly calculates number of "a" in the first vector and then follows to the second one. It completely ignores the rest of the letters
A
[,1] [,2] [,3] [,4]
[1,] 27 0 0 0
[2,] 28 0 0 0
Additionally I naively thought that adding bunch of "i" everywhere will make it work
s <- matrix(0, 1, 4)
s1 <- matrix(0, 1, 4)
s2 <- numeric(4)
for (i in 1:2) {
s[i] <- count(df[i,],1)
s1[i] <- s[i]/sum(s[i])
s2[i] <- -sum(log(s1[i])*s1[i])
}
But that didn't get me anywhere either.

If you don't need to save the count and you only need to print or save the calculation you show, these should work:
for(i in 1:dim(df)[1]{
v1 <- count(df[i,], 1)
v2 <- v1/sum(v1)
v3 <- sum(log(v2)*v2)
print(-v3) #for print
entropy[i] <- v3 #for save the value in a vector, first create this vector
}
The problem with the loop that you show may be the output of count is a table class with 1 row and 4 columns and you assign that to a matrix row. Also another possible problem may be that in the assignment for example you declare s[i] <- count(df[i,],1), when should be s[i,] <- count(df[i,],1).

Would this work for you:
df <- data.frame (x = c("a","c","g","g","g"),
y = c("g","c","a","a","g"),
z = c("g","t","t","a","g"),stringsAsFactors=FALSE)
A <- sapply(1:nrow(df), FUN=function(i){count(df[i,],1)})
> A
[,1] [,2] [,3] [,4] [,5]
a 1 0 1 2 0
c 0 2 0 0 0
g 2 0 1 1 3
t 0 1 1 0 0

Related

R language iteratively input a matrix

I'm trying to figure out how to iteratively load a matrix (this form part of a bigger function I can't reproduce here).
Let's suppose that I create a matrix:
m <- matrix(c(1:9), nrow = 3, ncol = 3)
m
This matrix can be named "m", "x" or whatsoever. Then, I need to load iteratively the matrix in the function:
if (interactive() ) { mat <-
readline("Your matrix, please: ")
}
So far, the function "knows" the name of the matrix, since mat returns [1] "m", and is a object listed in ls(). But when I try to get the matrix values, for example through x <- get(mat) I keep getting an error
Error in get(mat) : unused argument (mat)
Can anybody be so kind as to tell me what I'm doing wrong here?
1) Assuming you mean interactive, not iterative,
get_matrix <- function() {
nr <- as.numeric(readline("how many rows? "))
cat("Enter space separated data row by row. Enter empty row when finished.\n")
nums <- scan(stdin())
matrix(nums, nr, byrow = TRUE)
}
m <- get_matrix()
Here is a test:
> m <- get_matrix()
how many rows? 3
Enter space separated data row by row. Enter empty row when finished.
1: 1 2
3: 3 4
5: 5 6
7:
Read 6 items
> m
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
>
2) Another possibility is to require that the user create a matrix using R and then just give the name of the matrix:
get_matrix2 <- function(envir = parent.frame()) {
m <- readline("Enter name of matrix: ")
get(m, envir)
}
Test it:
> m <- matrix(1:6, 3)
> mat <- get_matrix2()
Enter name of matrix: m
> mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

Create a list of matrices with 1's / 0's based on a list of matrices with the index

I have the following problem:
I do have a lists with matrices with indices.
Every column of a matrix shows which row indices should be equal to 1 for that specific column.
All the other values should be equal to 0.
I do know the size of the output matrices and there are no duplicated values in a column.
For example the following matrix should be translated as follows:
m_in = matrix(c(1,3,5,7,3,4), nrow =2)
m_out = matrix(c(1,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,0), nrow = 7)
I did made a code that works, but it would be great if I could achieve this without loops in a more efficient/clever way.
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
ListIndices <- sapply(Vector, function(x)Index[0:x,])
emptylistlist <- list()
for (i in 1: length(ListIndices)){
for (j in 1 : 100){
emptylistlist[[i]] <- matrix(nrow = 200, ncol = 100, data = 0)
emptylistlist[[i]][ListIndices[[i]],j]<-1
}
}
We can try sparseMatrix from library(Matrix) and then wrap it with as.matrix.
library(Matrix)
as.matrix(sparseMatrix(i= c(m1), j= c(col(m1)), x=1))
# [,1] [,2] [,3]
#[1,] 1 0 0
#[2,] 0 0 0
#[3,] 1 0 1
#[4,] 0 0 1
#[5,] 0 1 0
#[6,] 0 0 0
#[7,] 0 1 0
If there is a list of matrices, then we can use lapply
lapply(lst, function(y) as.matrix(sparseMatrix(i= c(y), j= c(col(y)), x= 1)))
The typical way is with matrix assignment:
m_out = matrix(0L, max(m_in), ncol(m_in))
m_out[cbind(c(m_in), c(col(m_in)))] <- 1L
How it works: The syntax for matrix assignment M[IND] <- V is described at help("[<-").
Each row of IND is a pair of (row, column) positions in M.
Elements of M at those positions will be overwritten with (corresponding elements of) V.
As far as the list of matrices goes, an array would be more natural:
set.seed(1)
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
idx <- sapply(Vector, function(x)Index[0:x,])
# "ListIndices" is too long a name
a_out = array(0L, dim=c(
max(unlist(idx)),
max(sapply(idx,ncol)),
length(idx)))
a_out[ cbind(
unlist(idx),
unlist(lapply(idx,col)),
rep(seq_along(idx),lengths(idx))
)] <- 1L
The syntax is the same as for matrix assignment.
Seeing as the OP has so many zeros and so few ones, a sparse matrix, as in #akrun's answer makes the most sense, or a sparse array, if such a thing has been implemented.

Assign an element value based on element adjacencies in R

I have a data frame with {0,1} indicating whether a product was Small, Medium or Large.
dat <- data.frame(Sm = c(1,0,0), Med = c(0,1,0), Lg = c(0,0,1))
Sm Med Lg
1 1 0 0
2 0 1 0
3 0 0 1
I'm looking to assign 1's to the 0's leading up to a 1 in a given row. For example in row 2 the product is a "Med", so I'm looking to assign a 1 to the 0 in the "Sm" column.
Allocation size is a consideration so I'm looking for a vectorized approach without using a for loop please. The final solution should output the following:
Sm Med Lg
1 1 0 0
2 1 1 0
3 1 1 1
I've tried several variations of the code below, but the closest I can get is a ragged array which assigns all of the 1's correctly while dropping the elements that have legitimate 0's.
apply(dat, 1, function(x) {
x[1:which.max(x)] <- 1
})
[1] 1 1 1
And below, which gets close but without the needed trailing 0's
apply(dat, 1, function(x) {
temp <- x[1:which.max(x)]
unlist(lapply(temp, function(y) {
y <- 1
}))
})
[[1]]
Sm
1
[[2]]
Sm Med
1 1
[[3]]
Sm Med Lg
1 1 1
First, convert to matrix and use max.col to get the index of the 1 in each row:
mat <- as.matrix(dat)
mc <- max.col(mat)
logical construction Overwrite the matrix:
mat = +(col(mat) <= mc)
or construct an index of matrix positions to change and change 'em:
logical indexing
mat[col(mat) < mc] <- 1L
# or
mat[which(col(mat) < mc)] <- 1L
matrix indexing
idx <- do.call( rbind, lapply( seq_along(mc), function(i)
if (i==1L) NULL
else cbind(i,seq_len(mc[i]-1))
))
mat[idx] <- 1L
vector indexing
nr <- nrow(mat)
idx <- unlist( lapply( seq_along(mc), function(i)
if (mc[i]==1L) NULL
else seq(from = i, by = nr, length.out = mc[i]-1L)
))
mat[idx] <- 1L
The help for all three indexing methods can be found at help("[<-").
This will do what you want.
dat[which(dat$Med==1),]$Sm = 1
dat[which(dat$Lg==1),]$Med = 1
dat[which(dat$Lg==1),]$Sm = 1

Trying to output a list of lists from a loop in r

I am trying to do something that I am sure should be quite simple: I am trying to make a function which turns a list of number pairs (pairedList) and a vector (botList) into a series of vectors (one for each pair) of length(botlist) where the numbers in those vectors are all equal to zero except for those corresponding to the index points identified by the pair which will be 1.
#generating mock data to simulate my application:
pair1 <- c(2,4)
pair2 <- c(1,3)
pair3 <- c(5,6)
pairedList <- c(pair1, pair2, pair3)
botList <- c(1:length(pairedList))
Here is what the output should ultimately look like:
[1] 0 1 0 1 0 0
[1] 1 0 1 0 0 0
[1] 0 0 0 0 1 1
The code below allows me to print the vectors in the right manner (by replacing the line in the if loop with print(prob) and commenting out the final print statement):
library(gtools)
test <- function() {
#initialising empty list
output <- list()
for (i in botList) {
x <- rep(0, length(pairedList))
ind <- pairedList[i:(i+1)]
ind.inv <- sort(ind, decreasing=T)
val <- rep(1,length(ind))
new.x <- vector(mode="numeric",length(x)+length(val))
new.x <- new.x[-ind]
new.x[ind] <- val
prob <- new.x
if (odd(i)) {
output[i] <- prob
}
print(output)
}
}
However I need to return this list of vectors from my function rather than printing it and when I do so, I get the following output and am met with an error and a number of warnings:
[[1]]
[1] 0
[[1]]
[1] 0
[[1]]
[1] 0
[[2]]
NULL
[[3]]
[1] 1
[[1]]
[1] 0
[[2]]
NULL
[[3]]
[1] 1
[[1]]
[1] 0
[[2]]
NULL
[[3]]
[1] 1
[[4]]
NULL
[[5]]
[1] 0
Error in new.x[-ind] : only 0's may be mixed with negative subscripts
In addition: Warning messages:
1: In output[i] <- prob :
number of items to replace is not a multiple of replacement length
2: In output[i] <- prob :
number of items to replace is not a multiple of replacement length
3: In output[i] <- prob :
number of items to replace is not a multiple of replacement length
My question is:
How can I change my code to output what I need from this function? I thought this was going to be a five minute job, and after hours on this one little thing I am stuck!
Thanks in advance
Something you can try, although there must be nicer ways:
# create a list with all the "pair1", "pair2", ... objects
l_pairs <- mget(ls(pattern="^pair\\d+"))
# compute maximum number among the values of pair., it determines the number of columns of the results
n_max <- max(unlist(l_pairs))
# finally, create for each pair. a vector of 0s and put 1s at the positions specified in pair.
res <- t(sapply(l_pairs, function(x){y <- rep(0, n_max); y[x]<-1; y}))
res
# [,1] [,2] [,3] [,4] [,5] [,6]
#pair1 0 1 0 1 0 0
#pair2 1 0 1 0 0 0
#pair3 0 0 0 0 1 1
You could use row/col indexing
m1 <- matrix(0, ncol=max(pairedList), nrow=3)
m1[cbind(rep(1:nrow(m1),each=2), pairedList)] <- 1
m1
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 0 1 0 1 0 0
#[2,] 1 0 1 0 0 0
#[3,] 0 0 0 0 1 1
James, the following should work. I've just tested it.
pair1 <- c(2,4)
pair2 <- c(1,3)
pair3 <- c(5,6)
pairedList <- c(pair1, pair2, pair3)
botList <- c(1:(length(pairedList)/2)
library(gtools)
test <- function(pairedList, botList) {
#initialising empty list
output <- list()
for (i in botList) {
x <- rep(0, length(pairedList))
ind <- pairedList[i:(i+1)]
ind.inv <- sort(ind, decreasing=T)
val <- rep(1,length(ind))
new.x <- vector(mode="numeric",length(x)+length(val))
new.x <- new.x[-ind]
new.x[ind] <- val
prob <- new.x
output[[i]] <- prob
print(prob)
}
return(output)
}
The reason for the strange error is that botList was being created as length 6 rather than length 3. Also if you want to assign a value to a list within a function you need to use double [[]] rather than []
Once you've removed them from the function rbind them all together as follows:
output <- test(pairedList, botList)
result <- do.call(rbind,output)

R wrong result with for loop

I have below code
n=c('a','b','c')
one=c('a','c')
two=c('b','a')
three=data.frame(one, two)
m=matrix(0,3,2)
for (i in length(n) ) {
m[i,]=t(1*(n[i]==three[,1])-1*(n[i]==three[,2]))
}
t(1*(n[1]==three[,1])-1*(n[1]==three[,2]))
t(1*(n[2]==three[,1])-1*(n[2]==three[,2]))
t(1*(n[3]==three[,1])-1*(n[3]==three[,2]))
why the output of m matrix and output of last 3 lines is different? is there any efficient way to do this?
Because you want
for (i in seq_along(n)) {
Since you asked if there was a better way to do this with an apply function, here you go. The result from do.call(rbind, ...) is "naturally" coerced to a matrix, so there is no need to define the matrix m beforehand.
I'm not understanding the logic behind multiplying by 1, so I left it out. It will still work if you need it.
> n <- c('a','b','c')
> three <- data.frame(one = c("a", "c"), two = c("b", "a"))
> m <- do.call(rbind, lapply(seq(n), function(i){
+ t((n[i] == three[,1]) - (n[i] == three[,2]))
+ }))
> m
[,1] [,2]
[1,] 1 -1
[2,] -1 0
[3,] 0 1

Resources