I have a question for R gurus out there. I'll illustrate it on the following example:
I have a vector, say 1,2,3,4,5,6,7,8
I'd like to get a vector of sums of 2 elements: 3,5,7,9,11,13,15
This is just an example, I'm not looking for a trick, I want to do it with just vectorization and indexing. Is there any way to get access to the implicit loop parameter as it goes through it?
Thanks a lot.
You can use rollapply from zoo package
> x <- 1:8
> rollapply(x, width=2, FUN=sum)
[1] 3 5 7 9 11 13 15
You can use sapply or a variation of it, and write a function that sums up appropriate elements given the indexes, and your matrix. For example,
m <- matrix(1:9, nrow=3)
m
Create a data frame with all possible index pairs
m_ind <- expand.grid(1:nrow(m),1:ncol(m), stringsAsFactors = FALSE)
names(m_ind) <- c("i","j")
m_ind
m[as.matrix(m_ind[,1:2])]
Diagonals, or the parallel lines can be described by constant diffs, or constant sums of the indexes
m_ind$dif_ij <- m_ind$i - m_ind$j
m_ind$sum_ij <- m_ind$i + m_ind$j
Then sum up the elements you want
m_ind$sum1 <- sapply(1:nrow(m_ind), function(k, mydf, colname, mymatr)
sum(mymatr[as.matrix(mydf[mydf[, colname]==mydf[k, colname], c("i","j")])]),mydf=m_ind, colname="dif_ij", mymatr=m)
m_ind$sum2 <- sapply(1:nrow(m_ind), function(k, mydf, colname, mymatr)
sum(mymatr[as.matrix(mydf[mydf[, colname]==mydf[k, colname], c("i","j")])]), mydf=m_ind, colname="sum_ij", mymatr=m)
and, finally combine them
m_ind$sum <- m_ind$sum1 + m_ind$sum2
m_ind
Related
df<- data.frame(a=c(1:10), b=c(21:30),c=c(1:10), d=c(14:23),e=c(11:20),f=c(-6:-15),g=c(11:20),h=c(-14:-23),i=c(4:13),j=c(1:10))
In this data frame, I have three block-diagonal matrices which are as shown in the image below
I want to apply two functions, one is the sine function for block diagonal and the second is cosine function for the other elements and generates the same structure of the data frame.
sin(df[1:2,1:2])
sin(df[3:5,3:5])
sin(df[6:10,6:10])
cos(the rest of the elements)
1) outer/arithmetic Create a logical block diagonal matrix indicating whether the current cell is on the block diagonal or not and then use that to take a convex combination of the sin and cos values giving a data.frame as follows:
v <- rep(1:3, c(2, 3, 5))
ind <- outer(v, v, `==`)
ind * sin(df) + (!ind) * cos(df)
2) ifelse Alternately, this gives a matrix result (or use as.matrix on the above). ind is from above.
m <- as.matrix(df)
ifelse(ind, sin(m), cos(m))
3) Matrix::bdiag Another approach is to use bdiag in the Matrix package (which comes with R -- no need to install it).
library(Matrix)
ones <- function(n) matrix(1, n, n)
ind <- bdiag(ones(2), ones(3), ones(5)) == 1
Now proceed as in the last line of (1) or as in (2).
If it's okay for you that the result is stored in a new data frame you could change the order of your instructions and do it like that:
ndf <- cos(df)
ndf[1:2,1:2] <- sin(df[1:2,1:2])
ndf[3:5,3:5] <- sin(df[3:5,3:5])
ndf[6:10,6:10] <- sin(df[6:10,6:10])
I have two variables (one independent and one dependent), containing 5 data points each, which I have created a function (x,y) to fit different models to them. This is working quite nice. However, the problem is that I also need to apply this same function to different combinations of these data points. In other words, I need to apply the function using the different combinations of using only 4, 3, and 2 data points. In total, there are 25 possible combinations. I was wondering what would be the most efficient way of doing it?
Please, see below an example of my data:
tte <- c(100,172,434,857,1361) #dependent variable
po <- c(446,385,324,290,280) #independent variable
Results <- myFunction (tte=tte, po=po) # customized function
Below is an example of how I am getting all the possible combinations using 4 data points:
tte4 <- combn(tte,4)
po4 <- combn(po,4)
Please, note that the first column of tte4 has always to be analyzed with the first column of po4. Then, the second column of tte4 with the second column of po4 and so on. What I need to do is to use myFunction on all these combinations.
I have tried to implement it through a for loop and through mapply without much success.
Any thoughts?
Consider using the simplify=FALSE argument of combn, then pass the list of vectors with mapply (or its wrapper Map).
tte_list <- combn(tte,4, simplify = FALSE)
po_list <- combn(po, 4, simplify = FALSE)
# MATRIX OR VECTOR RETURN
res_matrix <- mapply(myFunction, tte_list, po_list)
# LIST RETURN
res_list <- Map(myFunction, tte_list, po_list)
Since I don't know what function you want to perform, I just summed the columns. This function takes three arguments:
index = A sequence of 1 to how many columns there are in tte4 (should be same as po4)
x = tte4
y = po4.
Then it should use that index on both matrices to ID the columns you want. And in this case, I summed them.
tte <- c(100,172,434,857,1361) #dependent variable
po <- c(446,385,324,290,280) #independent variable
results <- function(index, x, y){
i.x <- x[,index]
i.y <- y[,index]
sum(i.x) + sum(i.y)
}
tte4 <- combn(tte, 4)
po4 <- combn(po,4)
index <- 1:ncol(tte4)
sapply(index, results, x = tte4, y = po4)
#[1] 3008 3502 3891 4092 4103
i am working with consumer price index CPI and in order to calculate it i have to multiply the index matrix with the corresponding weights:
grossCPI77_10 <- grossIND1977 %*% weights1910/100
grossCPI82_10 <- grossIND1982 %*% weights1910/100
of course i would rather like to have a code like the one beyond:
grossIND1982 <- replicate(20, cbind(1:61))
grossIND1993 <- replicate(20, cbind(1:61))
weights1910_sc <- c(1:20)
grossIND_list <- mget(ls(pattern = "grossIND...."))
totalCPI <- mapply("*", grossIND_list, weights1910_sc)
the problem is that it gives me a 1200x20 matrix. i expected a normal matrix (61x20) vector (20x1) multiplication which should result in a 20x1 vector? could you explain me what i am doing wrong? thanks
part of your problem is that you don't have matrices but 3D arrays, with one singleton dimension. The other issue is that mapply likes to try and combine the results into a matrix, and also that constant arguments should be passed via MoreArgs. But actually, this is more a case for lapply.
grossIND1982 <- replicate(20, cbind(1:61))[,1,]
grossIND1993 <- replicate(20, cbind(1:61))[,1,]
weights1910_sc <- c(1:20)
grossIND_list <- mget(ls(pattern = "grossIND...."))
totalCPI <- mapply("*", grossIND_list, MoreArgs=list(e2 = weights1910_sc), SIMPLIFY = FALSE)
totalCPI <- lapply(grossIND_list, "*", e2 = weights1910_sc)
I am not sure if I understood all aspects of your problem (especially concerning what should be colums, what should be rows, and in which order the crossproduct shall be applied), but I will try at least to cover some aspects. See comments in below code for clarifications of what you did and what you might want. I hope it helps, let me know if this is what you need.
#instead of using mget, I recommend to use a list structure
#otherwise you might capture other variables with similar names
#that you do not want
INDlist <- sapply(c("1990", "1991"), function(x) {
#this is how to set up a matrix correctly, check `?matrix`
#I think your combination of cbind and rep did not give you what you wanted
matrix(rep(1:61, 20), nrow = 61)
}, USE.NAMES = TRUE, simplify = F)
weights <- list(c(1:20))
#the first argument of mapply needs to be a function, in this case of two variables
#the body of the function calculates the cross product
#you feed the arguments (both lists) in the following part of mapply
#I have repeated your weights, but you might assign different weights for each year
res <- mapply(function(x, y) {x %*% y}, INDlist, rep(weights, length(INDlist)))
dim(res)
#[1] 61 2
(Please feel free to change the title to something more appropriate)
I would like extract all reciprocal pairs from a asymmetric square matrix.
Some dummy data to clarify:
m <- matrix(c(NA,0,1,0,0,-1,NA,1,-1,0,1,1,NA,-1,-1,-1,1,0,NA,0,-1,1,0,0,NA), ncol=5, nrow=5)
colnames(m) <- letters[seq(ncol(m))]
rownames(m) <- letters[seq(nrow(m))]
require(reshape2)
m.m <- melt(m) # get all pairs
m.m <- m.m[complete.cases(m.m),] # remove NAs
How would I now extract all "reciprocal duplicates" from m.m (or directly from m)?
This is what I mean with reciprocal duplicate:
Var1 Var2 value
b a 0
a b -1
And I would like to store each value combination, i.e. {1,1},{-1,-1},{1,0},{-1,0},{0,0} in a list with its Var combination {a,b},{a,c},{a,d},{a,e},{b,c},{b,d},{b,e},{c,d},{c,e},{d,e} pointing to it, something like
$`a,b`
[1] 0,-1
I haven't manage to solve this. Feel like it could be possible with merge() or inner_join. Also, I apologize for not providing the best example.
Any pointers would be highly appreciated.
Here's an approach based on the object m.m:
# extract the unique combinations
levs <- apply(m.m[-3], 1, function(x) paste(sort(x), collapse = ","))
# create a list of values for these combinations
split(m.m$value, levs)
Using the matrix representation, you can get vectors of each triangle of the matrix (which align as you wish) using:
m[upper.tri(m)]
t(m)[upper.tri(m)]
To name them:
nm <- matrix(paste("(",rep(rownames(m),times=nrow(m)), ",",rep(rownames(m),each=nrow(m)),")",sep=""), nrow=nrow(m))
nm[as.vector(upper.tri(m))]
Finally to convert to a list as you wish. First I put them in a new 2 x 10 matrix. Then I used lapply to create the list structure.
pairs<- cbind(m[upper.tri(m)], t(m)[upper.tri(m)] )
rownames(pairs) <- nm[as.vector(upper.tri(m))]
pairs
m.list <- lapply(seq_len(nrow(pairs)),function(i) pairs[i,])
names(m.list) <- rownames(pairs)
m.list
I'm trying to clean this code up and was wondering if anybody has any suggestions on how to run this in R without a loop. I have a dataset called data with 100 variables and 200,000 observations. What I want to do is essentially expand the dataset by multiplying each observation by a specific scalar and then combine the data together. In the end, I need a data set with 800,000 observations (I have four categories to create) and 101 variables. Here's a loop that I wrote that does this, but it is very inefficient and I'd like something quicker and more efficient.
datanew <- c()
for (i in 1:51){
for (k in 1:6){
for (m in 1:4){
sub <- subset(data,data$var1==i & data$var2==k)
sub[,4:(ncol(sub)-1)] <- filingstat0711[i,k,m]*sub[,4:(ncol(sub)-1)]
sub$newvar <- m
datanew <- rbind(datanew,sub)
}
}
}
Please let me know what you think and thanks for the help.
Below is some sample data with 2K observations instead of 200K
# SAMPLE DATA
#------------------------------------------------#
mydf <- as.data.frame(matrix(rnorm(100 * 20e2), ncol=20e2, nrow=100))
var1 <- c(sapply(seq(41), function(x) sample(1:51)))[1:20e2]
var2 <- c(sapply(seq(2 + 20e2/6), function(x) sample(1:6)))[1:20e2]
#----------------------------------#
mydf <- cbind(var1, var2, round(mydf[3:100]*2.5, 2))
filingstat0711 <- array(round(rnorm(51*6*4)*1.5 + abs(rnorm(2)*10)), dim=c(51,6,4))
#------------------------------------------------#
You can try the following. Notice that we replaced the first two for loops with a call to mapply and the third for loop with a call to lapply.
Also, we are creating two vectors that we will combine for vectorized multiplication.
# create a table of the i-k index combinations using `expand.grid`
ixk <- expand.grid(i=1:51, k=1:6)
# Take a look at what expand.grid does
head(ixk, 60)
# create two vectors for multiplying against our dataframe subset
multpVec <- c(rep(c(0, 1), times=c(4, ncol(mydf)-4-1)), 0)
invVec <- !multpVec
# example of how we will use the vectors
(multpVec * filingstat0711[1, 2, 1] + invVec)
# Instead of for loops, we can use mapply.
newdf <-
mapply(function(i, k)
# The function that you are `mapply`ing is:
# rbingd'ing a list of dataframes, which were subsetted by matching var1 & var2
# and then multiplying by a value in filingstat
do.call(rbind,
# iterating over m
lapply(1:4, function(m)
# the cbind is for adding the newvar=m, at the end of the subtable
cbind(
# we transpose twice: first the subset to multiply our vector.
# Then the result, to get back our orignal form
t( t(subset(mydf, var1==i & mydf$var2==k)) *
(multpVec * filingstat0711[i,k,m] + invVec)),
# this is an argument to cbind
"newvar"=m)
)),
# the two lists you are passing as arguments are the columns of the expanded grid
ixk$i, ixk$k, SIMPLIFY=FALSE
)
# flatten the data frame
newdf <- do.call(rbind, newdf)
Two points to note:
Try not to use words like data, table, df, sub etc which are commonly used functions
In the above code I used mydf in place of data.
You can use apply(ixk, 1, fu..) instead of the mapply that I used, but I think mapply makes for cleaner code in this situation