R - Replace a double loop by a function from the apply family - r

I have these loops :
xall = data.frame()
for (k in 1:nrow(VectClasses))
{
for (i in 1:nrow(VectIndVar))
{
xall[i,k] = sum(VectClasses[k,] == VectIndVar[i,])
}
}
The data:
VectClasses = Data Frame containing the characteristics of each classes
VectIndVar = Data Frame containing each record of the data base
The two for loops work and give an output I can work with, however, it takes too long, hence my need for the apply family
The output I am looking for is as this:
V1 V2 V3 V4
1 3 3 2 2
2 2 2 1 1
3 3 4 3 3
4 3 4 3 3
5 4 4 3 3
6 3 2 3 3
I tried using :
xball = data.frame()
xball = sapply(xball, function (i,k){
sum(VectClasses[k,] == VectIndVar[i,])})
xcall = data.frame()
xcall = lapply(xcall, function (i, k){sum(VectClasses[k,] == VectIndVar[i,]} )
but neither seems to be filling the dataframe
reproductible data (shortened):
VectIndVar <- data.frame(a=sample(letters[1:5], 100, rep=T), b=floor(runif(100)*25),
c = sample(c(1:5), 100, rep=T),
d=sample(c(1:2), 100, rep=T))
and :
> K1 = 4
VectClasses= VectIndVar [sample(1:nrow(VectIndVar ), K1, replace=FALSE), ]
Can you help me?

I would use outer instead of *apply:
res <- outer(
1:nrow(VectIndVar),
1:nrow(VectClasses),
Vectorize(function(i,k) sum(VectIndVar[i,-1]==VectClasses[k,-1]))
)
(Thanks to this Q&A for clarifying that Vectorize is needed.)
This gives
> head(res) # with set.seed(1) before creating the data
[,1] [,2] [,3] [,4]
[1,] 1 1 2 1
[2,] 0 0 1 0
[3,] 0 0 0 0
[4,] 0 0 1 0
[5,] 1 0 0 1
[6,] 1 1 1 1
As for speed, I would suggest using matrices instead of data.frames:
cmat <- as.matrix(VectClasses[-1]); rownames(cmat)<-VectClasses$a
imat <- as.matrix(VectIndVar[-1]); rownames(imat)<-VectIndVar$a

Related

Have 3 matrices of same dimensions - I want to get the highest value of each cell of the three different matrices

Basically I have 3 matrices of the same dimensions. They only consist of values 0 , 1, 2 ,3. I would like to create a new matrix that takes the highest value from each of the corresponding matrices.
For example, if the first row of the matrices are as follows:
A: 0 1 0 0 1
B: 2 0 0 2 0
C: 0 3 0 3 0
Final: 2 3 0 3 1
I was trying to do a for function with apply but I couldn't get it working.
Edit: I think pmax is the function to do according to the comments.. Thanks! Im am just starting out and learning about R so sorry if this is a simple question.
Here's some sample data:
m1 <- matrix(sample(0:3, 12, replace = TRUE), 4)
m2 <- matrix(sample(0:3, 12, replace = TRUE), 4)
m3 <- matrix(sample(0:3, 12, replace = TRUE), 4)
And the result
pmax(m1, m2, m3)
# [,1] [,2] [,3]
# [1,] 3 1 3
# [2,] 2 3 1
# [3,] 1 3 3
# [4,] 3 3 3

Rescore Items from Scoring Key

I have a set of data on which respondents were given a series of questions, each with five response options (e.g., 1:5). Given those five options, I have a scoring key for each question, where some responses are worth full points (e.g., 2), others half points (1), and others no points (0). So, the data frame is n (people) x k (questions), and the scoring key is a k (questions) x m (responses) matrix.
What I am trying to do is to programmatically create a new dataset of the rescored items. Trivial dataset:
x <- sample(c(1:5), 50, replace = TRUE)
y <- sample(c(1:5), 50, replace = TRUE)
z <- sample(c(1:5), 50, replace = TRUE)
dat <- data.frame(cbind(x,y,z)) # 3 items, 50 observations (5 options per item)
head(dat)
x y z
1 3 1 2
2 2 1 3
3 5 3 4
4 1 4 5
5 1 3 4
6 4 5 4
# Each option is scored 0, 1, or 2:
key <- matrix(sample(c(0,0,1,1,2), size = 15, replace = TRUE), ncol=5)
key
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 1 2
[2,] 2 1 1 1 2
[3,] 2 2 1 1 2
Some other options, firstly using Map:
data.frame(Map( function(x,y) key[y,x], dat, seq_along(dat) ))
# x y z
#1 0 2 2
#2 0 2 1
#3 2 1 1
#4 0 1 2
#5 0 1 1
#6 1 2 1
Secondly using matrix indexing on key:
newdat <- dat
newdat[] <- key[cbind( as.vector(col(dat)), unlist(dat) )]
newdat
# x y z
#1 0 2 2
#2 0 2 1
#3 2 1 1
#4 0 1 2
#5 0 1 1
#6 1 2 1
Things would be even simpler if you specified key as a list:
key <- list(x=c(0,0,0,1,2),y=c(2,1,1,1,2),z=c(2,2,1,1,2))
data.frame(Map("[",key,dat))
# x y z
#1 0 2 2
#2 0 2 1
#3 2 1 1
#4 0 1 2
#5 0 1 1
#6 1 2 1
For posterity, I was discussing this issue with a friend, who suggested another approach. The benefits of this is that it still uses mapvalues() to do the rescoring, but does not require a for loop, instead uses "from" in sapply to do the indexing.
library(plyr)
scored <- sapply(1:ncol(raw), function(x, dat, key){
mapvalues(dat[,x], from = 1:ncol(key), to = key[x,])
}, dat = dat, key = key)
My current working approach is to use 1) mapvalues, which lives within package:plyr to do the heavy lifting: it takes a vector of data to modify, and two additional parameters "from", which is the original data (here 1:5), and "to", or what we want to convert the data to; and, 2) A for loop with index notation, in which we cycle through the available questions, extract the vector pertaining to each using the current loop value, and use it to select the proper row from our scoring key.
library(plyr)
newdat <- matrix(data=NA, nrow=nrow(dat), ncol=ncol(dat))
for (i in 1:3) {
newdat[,i] <- mapvalues(dat[,i], from = c(1,2,3,4,5),
to = c(key[i,1], key[i,2], key[i,3], key[i,4], key[i,5]))
}
head(newdat)
[,1] [,2] [,3]
[1,] 0 2 2
[2,] 0 2 1
[3,] 2 1 1
[4,] 0 1 2
[5,] 0 1 1
[6,] 1 2 1
I am pretty happy with this solution, but if anyone has any better approaches, I would love to see them!

combine tables into a data frame

How do I turn a list of tables into a data frame?
I have:
> (tabs <- list(table(c('a','a','b')),table(c('c','c','b')),table(c()),table(c('b','b'))))
[[1]]
a b
2 1
[[2]]
b c
1 2
[[3]]
< table of extent 0 >
[[4]]
b
2
I want:
> data.frame(a=c(2,0,0),b=c(1,1,2),c=c(0,2,0))
a b c
1 2 1 0
2 0 1 2
3 0 0 0
4 0 2 0
PS. Please do not assume that the tables were created by table calls! They were not!
c_names <- unique(unlist(sapply(tabs, names)))
df <- do.call(rbind, lapply(tabs, `[`, c_names))
colnames(df) <- c_names
df[is.na(df)] <- 0
This assumes the tables are one dimensional.
all.names <- unique(unlist(lapply(tabs, names)))
df <- as.data.frame(do.call(rbind,
lapply(
tabs, function(x) as.list(replace(c(x)[all.names], is.na(c(x)[all.names]), 0))
) ) )
names(df) <- all.names
df
There is probably a cleaner way to do this.
# a b c
# 1 2 1 0
# 2 0 1 2
# 3 0 0 0
# 4 0 2 0
tabs <- list(table(c('a','a','b')),table(c('c','c','b')),table(c()),table(c('b','b')))
dat.names <- unique(unlist(sapply(tabs, names)))
dat <- matrix(0, nrow = length(tabs), ncol = length(dat.names))
colnames(dat) <- dat.names
for (ii in 1:length(tabs)) {
dat[ii, ] <- tabs[[ii]][match(colnames(dat), names(tabs[[ii]]) )]
}
dat[is.na(dat)] <- 0
> dat
a b c
[1,] 2 1 0
[2,] 0 1 2
[3,] 0 0 0
[4,] 0 2 0
Here is a pretty clean approach:
library(reshape2)
newTabs <- melt(tabs)
newTabs
# Var1 value L1
# 1 a 2 1
# 2 b 1 1
# 3 b 1 2
# 4 c 2 2
# 5 b 2 4
newTabs$L1 <- factor(newTabs$L1, seq_along(tabs))
dcast(newTabs, L1 ~ Var1, fill = 0, drop = FALSE)
# L1 a b c
# 1 1 2 1 0
# 2 2 0 1 2
# 3 3 0 0 0
# 4 4 0 2 0
This makes use of the fact that there is a melt method for lists (see reshape2:::melt.list) which automatically adds in a variable (L1 for an unnested list) that identifies the index of the list element. Since your list has some items which are empty, they won't show up in your melted list, so you need to factor the "L1" column, specifying the levels you want. dcast takes care of restructuring your output and allows you to specify the desired fill value.

Compare all rows depending on several conditions and return value

I want to compare different cells in different rows and return a value if conditions are satisfied.
Suppose the following s_i =
[,1] [,2] [,3]
[1,] 0.43020494 0.7183179 0.4201009
[2,] 0.08625491 0.3007912 0.8768459
[3,] 0.80012649 0.8448729 0.7131344
I want to compare all the rows (pairs), so row 1,2; 1,3 ; 2,3; 2,1; 3,1 and row 3,2
The output dgpos contains the row numbers that are combined and the values returned.
I want to compare the rows. for the frist to combination or rows 1 and 2
1, if 2b≥1b
0, if 1a≥2c
(1a-2c )/ ((2b-2c) –(1b-1a), otherwise
where a, b and c are the columns of s_i
In R-ish for row 1 and 2
If (s_i[2,2]>= s_i[1,2])
dgpos[rowindex,3]=1
If (s_i[1,1]>= s_i[2,3])
dgpos[rowindex,3]=0
else (otherwise)
dgpos[rowindex,3] =(s_i[1,1]- s_i[2,3])/((s_i[2,2]-s_i[2,3])-(s_i[1,2]-s_i[1,1]))
The output I want aim for contains the combinations and the values returned in dgpos[,3]
[,1] [,2] [,3]
[1,] 1 2 0.5168453
[2,] 1 3 1
[3,] 2 3 1
[4,] 2 1 1
[5,] 3 1 0
[6,] 3 2 0.1235813
I have this:
s_i=matrix(runif(9),3)
dgpos=matrix(0,(dim(s_i)[2]*(dim(s_i)[2]-1)),3)
rowindex=1
for (i in 1:nrow(s_i)) {
for (j in 1:nrow(s_i)) {
if (i!=j)
c1=s_i[i,]
c2=s_i[j+1,]
dgpos[rowindex,1]=i
dgpos[rowindex,2]=j+1
if (c2[2] >= c1[2])
dgpos[rowindex,3]=1
dgpos[rowindex,3] = ifelse ((c1[1]=c2[3]), 0 , c1[1]-c2[3]/((c2[2]-c2[3])-(c1[2]-c1[1])))
rowindex=rowindex+1
}
}
I know that loops, are not preferred, but at the moment (my level of r-ish) I don’t know a better solution. I have tried adply with combn, without result.
MQ: how to compare different cells in different rows and return a value depending on several conditions ?
Your help and commends are appreciated.
I do not guarantee this is exactly the logic you want (there were inconsistencies between your example and your code) but this is the right approach to vectorize your algorithm:
First, create a data.frame of all row indices combinations:
n <- nrow(s_i)
dgpos <- rev(expand.grid(row2 = seq_len(n), row1 = seq_len(n)))
dgpos <- subset(dgpos, row1 != row2)
dgpos
# row1 row2
# 2 1 2
# 3 1 3
# 4 2 1
# 6 2 3
# 7 3 1
# 8 3 2
Then, compute your outcomes in one vectorized call, a nested ifelse:
dgpos <- transform(dgpos, out = { c1 <- s_i[row1, ]
c2 <- s_i[row2, ]
ifelse(c2[,2] >= c1[,2], 1,
ifelse(c1[,1] >= c2[,3], 0,
(c1[,1]-c2[,3]) / ((c2[,2]-c2[,3]) - (c1[,2]-c1[,1])))) })
dgpos
# row1 row2 out
# 2 1 2 0
# 3 1 3 0
# 4 2 1 0
# 6 2 3 0
# 7 3 1 0
# 8 3 2 0
This works thanks to #flodel. No doubt, it’s not the most elegant solution
dgpos = rev(expand.grid(row2 = seq_len(nrow(s_i)), row1 = seq_len(nrow(s_i))))
dgpos = subset(dgpos, row1 != row2)
for (i in 1:nrow(dgpos)) {
c1 = s_i[dgpos$row1[i], ]
c2 = s_i[dgpos$row2[i], ]
dgpos$out[i] = ifelse(c2[2] >= c1[2], 1,
ifelse(c1[1] >= c2[3], 0,
(c1[1]-c2[3]) / ((c2[2]-c2[3]) - (c1[2]-c1[1])))) }
dgpos
# row1 row2 out
# 2 1 2 0.5168453
# 3 1 3 1
# 4 2 1 1
# 6 2 3 1
# 7 3 1 0
# 8 3 2 0.1235813
I've managed to reproduce your desired output using this:
f <- function(i, j, s){
ifelse(s[j,2]>=s[i,2], 1, ifelse(s[i,1]>=s[j,3], 0,
(s[i,1]-s[j,3])/((s[j,2]-s[j,3])-(s[i,2]-s[i,1]))))
}
s_i <- rbind(
c(0.43020494, 0.7183179, 0.4201009),
c(0.08625491, 0.3007912, 0.8768459),
c(0.80012649, 0.8448729, 0.7131344))
y <- combn(nrow(s_i), 2)
dgpos <- t(cbind(y, y[2:1,]))
cbind(dgpos, f(dgpos[,1], dgpos[,2], s_i))
Result:
[,1] [,2] [,3]
[1,] 1 2 0.5168453
[2,] 1 3 1.0000000
[3,] 2 3 1.0000000
[4,] 2 1 1.0000000
[5,] 3 1 0.0000000
[6,] 3 2 0.1235813

Generate vectors using R

I would like to ask,if some of You dont know any simple way to solve this kind of problem:
I need to generate all combinations of A numbers taken from a set B (0,1,2...B), with their sum = C.
ie if A=2, B=3, C=2:
Solution in this case:
(1,1);(0,2);(2,0)
So the vectors are length 2 (A), sum of all its items is 2 (C), possible values for each of vectors elements come from the set {0,1,2,3} (maximum is B).
A functional version since I already started before SO updated:
A=2
B=3
C=2
myfun <- function(a=A, b=B, c=C) {
out <- do.call(expand.grid, lapply(1:a, function(x) 0:b))
return(out[rowSums(out)==c,])
}
> out[rowSums(out)==c,]
Var1 Var2
3 2 0
6 1 1
9 0 2
z <- expand.grid(0:3,0:3)
z[rowSums(z)==2, ]
Var1 Var2
3 2 0
5 1 1
7 0 2
If you wanted to do the expand grid programmatically this would work:
z <- expand.grid( rep( list(C), A) )
You need to expand as a list so that the items remain separate. rep(0:3, 3) would not return 3 separate sequences. So for A=3:
> z <- expand.grid(rep(list(0:3), 3))
> z[rowSums(z)==2, ]
Var1 Var2 Var3
3 2 0 0
6 1 1 0
9 0 2 0
18 1 0 1
21 0 1 1
33 0 0 2
Using the nifty partitions() package, and more interesting values of A, B, and C:
library(partitions)
A <- 2
B <- 5
C <- 7
comps <- t(compositions(C, A))
ii <- apply(comps, 1, FUN=function(X) all(X %in% 0:B))
comps[ii, ]
# [,1] [,2]
# [1,] 5 2
# [2,] 4 3
# [3,] 3 4
# [4,] 2 5

Resources