I have a set of vectors of length n, say, for example that n=3:
vec1<-c(1,2,3)
vec2<-c(2,2,2)
And a multidimensional array of size n^n:
threeDarray<-array(0,dim=c(3,3,3))
I want to create a loop that goes through my set of vectors and adds 1 to the corresponding index in the array. After analysing the two vectors above the array should be like:
threeDarray[1,2,3]=1
threeDarray[2,2,2]=1
I'm trying to use the multidimensional array to store the number of occurrences of each vector (my vectors are patterns in a time series).
The community is right (and the noob is wrong). Multidimensional arrays are not the way to go about this.
An example of code working with lists:
freqPatterns<-function(timeSeries,dimension){
temp<-character()
for (i in 1:(length(timeSeries)-dimension+1)){
pattern<-paste(as.character(rank(timeSeries[i:(i+dimension-1)])-1),collapse=", ")
#print(pattern)
temp[[length(temp)+1]] <- pattern
}
freqTable=sort(table(temp),decreasing=T)
return(freqTable)
}
Thank you guys!
Like you found out yourself, I wouldn't use a multidimensioanl array neither.
Here is a solution using a dataframe:
n=4 # dimension
ll = lapply(vector("list", n), function(x) x=1:n) # build list of vectors (n * 1:n)
df_occurs = expand.grid(ll, KEEP.OUT.ATTRS=F) # get all combinations
df_occurs$occurences = 0
# for-loop for storing the occurences
for(v in list(vec1, vec2)) {
v_match = apply(df_occurs[,1:n], 1, function(x) all(x==v))
df_occurs$occurences[v_match] = 1
}
Maybe performance is an issue with large n. If it's possible to build a character-key out of your vector, eg.
paste(vec1, collapse="")
the lookup in the dataframe would be easier:
df_occurs = data.frame(
key = apply(expand.grid(ll, KEEP.OUT.ATTRS=F), 1, paste, collapse=""),
occurences = 0
)
for(key in list(vec1, vec2)) {
df_occurs$occurences[df_occurs$key==paste(key, collapse="")] = 1
}
Related
I would like to declare a column in a data.frame that is a multidimensional character array (3 characters in each row). I'm driving myself crazy trying to figure this out.
simulations <- 1000
data <- data.frame(nonsing = character(simulations))
for(i in 1:simulations){
data$nonsing[i] = letters[1:3]
}
You need to collapse the 3 characters in one string which can be done with toString.
simulations <- 1000
data <- data.frame(nonsing = character(simulations))
for(i in 1:simulations){
data$nonsing[i] = toString(letters[sample(1:26, 3)])
}
letters[1:3] would always give 'a, b, c' hence I used sample to assign random 3 letters.
You can also use replicate :
data$nonsing <- replicate(simulations, toString(letters[sample(1:26, 3)]))
I have a priorly unknown number of variables, and for each variable I need to define a for loop and perform a series of operations. For each subsequent variable, I need to define a nested loop inside the previous one, performing the same operations. I guess there must be a way of doing this recursively, but I am struggling with it.
Consider for instance the following easy example:
results = c()
index = 0
for(i in 1:5)
{
a = i*2
for(j in 1:5)
{
b = a*2 + j
for(k in 1:5)
{
index = index + 1
c = b*2 + k
results[index] = c
}
}
}
In this example, I would have 3 variables. The loop on j requires information from the loop i, and the loop on k requires information from the loop j. This is a simplified example of my problem and the operations here are pretty simple. I am not interested on another way of getting the "results" vector, what I would like to know is if there is a way to recursevily do this operations for an unknown number of variables, lets say 10 variables, so that I do not need to nest manually 10 loops.
Here is one approach that you might be able to modify for your situation...
results <- 0 #initialise
for(level in 1:3){ #3 nested loops - change as required
results <- c( #converts output to a vector
outer(results, #results so far
1:5, #as in your loops
FUN = function(x,y) {x*2+y} #as in your loops
)
)
}
The two problems with this are
a) that your formula is different in the first (outer) loop, and
b) the order of results is different from yours
However, you might be able to find workarounds for these depending on your actual problem.
I have tried to change the code so that it is a function that allows to define how many iterations need to happen.
library(tidyverse)
fc <- function(i_end, j_end, k_end){
i <- 1:i_end
j <- 1:j_end
k <- 1:k_end
df <- crossing(i, j, k) %>%
mutate(
a = i*2,
b = a*2 + j,
c = b*2 + k,
index = row_number())
df
}
fc(5,5,5)
Given a list of 16 elements, where each element is a named numeric vector, I want to plot the length of the intersection of names between every 2 elements. That is; the intersection of element 1 with element 2, that of element 3 with element 4, etc.
Although I can do this in a very tedious, low-throughput manner, I'll have to repeat this sort of analysis, so I'd like a more programmatic way of doing it.
As an example, the first 5 entries of the first 2 list elements are:
topGenes[[1]][1:5]
3398 284353 219293 7450 54658
2.856363 2.654106 2.653845 2.635599 2.626518
topGenes[[2]][1:5]
1300 64581 2566 5026 146433
2.932803 2.807381 2.790484 2.739735 2.705030
Here, the first row of numbers are gene IDs & I want to know how many each pair of vectors (a treatment replicate) have in common, among, say, the top 100.
I've tried using lapply() in the following manner:
vectorOfIntersectLengths <- lapply(topGenes, function(x) lapply(topGenes, function(y) length(intersect(names(x)[1:100],names(y)[1:100]))))
This only seems to operate on the first two elements; topGenes[[1]] & topGenes[[2]].
I've also been trying to do this with a for() loop, but I'm unsure how to write this. Something along the lines of this:
lengths <- c()
for(i in 1:length(topGenes)){
lens[i] <- length(intersect(names(topGenes[[i]][1:200]),
names(topGenes[[i+1]][1:200])))
}
This returns a 'subscript out of bounds' error, which I don't really understand.
Thanks a lot for any help!
Is this what you're looking for?
# make some fake data
set.seed(123)
some_list <- lapply(1:16, function(x) {
y <- rexp(100)
names(y) <- sample.int(1000,100)
y
})
# identify all possible pairs
pairs <- t( combn(length(some_list), 2) )
# note: you could also use: pairs <- expand.grid(1:length(some_list),1:length(some_list))
# but in addition to a-to-b, you'd get b-to-a, a-to-a, and b-to-b
# get the intersection of names of a pair of elements with given indices kept for bookkeeping
get_intersection <- function(a,b) {
list(a = a, b = b,
intersection = intersect( names(some_list[[a]]), names(some_list[[b]]) )
)
}
# get intersection for each pair
intersections <- mapply(get_intersection, a = pairs[,1], b = pairs[,2], SIMPLIFY=FALSE)
# print the intersections
for(indx in 1:length(intersections)){
writeLines(paste('Intersection of', intersections[[indx]]$a, 'and',
intersections[[indx]]$b, 'contains:',
paste( sort(intersections[[indx]]$intersection), collapse=', ') ) )
}
I want to multiply and then sum the unique pairs of a vector, excluding pairs made of the same element, such that for c(1:4):
(1*2) + (1*3) + (1*4) + (2*3) + (2*4) + (3*4) == 35
The following code works for the example above:
x <- c(1:4)
bar <- NULL
for( i in 1:length(x)) { bar <- c( bar, i * c((i+1) : length(x)))}
sum(bar[ 1 : (length(bar) - 2)])
However, my actual data is a vector of rational numbers, not integers, so the (i+1) portion of the loop will not work. Is there a way to look at the next element of the set after i, e.g. j, so that I could write i * c((j : length(x))?
I understand that for loops are usually not the most efficient approach, but I could not think of how to accomplish this via apply etc. Examples of that would be welcome, too. Thanks for your help.
An alternative to a loop would be to use combn and multiply the combinations using the FUN argument. Then sum the result:
sum(combn(x = 1:4, m = 2, FUN = function(x) x[1] * x[2]))
# [1] 35
Even better to use prod in FUN, as suggested by #bgoldst:
sum(combn(x = 1:4, m = 2, FUN = prod))
I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k). distMat is a huge matrix and indSpam is a long vector. In particular to my situation, length(distMat[1,]) is 2412. When I enter in k as a vector of length one, I get a vector of length 2412. I want to be able to enter in k as a vector of length two and get a matrix of 2412x2. I am trying to use a while loop to let it go through the length of k, but it only returns to me a vector of length 2412. What am I doing wrong?
predNeighbor = function(k, distMat, indSpam){
counter = 1
while (counter<(length(k)+1))
{
preMatrix = apply(distMat, 1, order)
orderedMatrix = t(preMatrix)
truncate = orderedMatrix[,1:k[counter]]
checking = indSpam[truncate]
checking2 = matrix(checking, ncol = k[counter])
number = apply(checking2, 1, sum)
return(number[1:length(distMat[1,])] > (k[counter]/2))
counter = counter + 1
}
}
I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k)
Here's a function that does this.
foo <- function(k, distMat) {
return(matrix(0, nrow = length(distMat[1, ]), ncol = length(k)))
}
If you have other requirements, please describe them in words.
Based on your comment, I think I understand better your goal. You have a function that returns a vector of length k and you want to save it's output as rows in a matrix. This is a pretty common task. Let's do a simple example where k starts out as 1:10, and say we want to add some noise to it with a function foo() and see how the rank changes.
In the case where the input to the function is always the same, replicate() works very well. It will automatically put everything in a matrix
k <- 1:10
noise_and_rank <- function(k) {
rank(k + runif(length(k), min = -2, max = 2))
}
results <- replicate(n = 8, expr = {noise_and_rank(k)})
In the case where you want to iterate, i.e., the output from the one go is the input for the next, a for loop is good, and we just pre-allocate a matrix with 0's, to fill in one column/row at a time
k <- 1:10
n.sim <- 8
results <- matrix(0, nrow = length(k), ncol = n.sim)
results[, 1] <- k
for(i in 2:n.sim) {
results[, i] <- noise_and_rank(results[, i - 1])
}
What your original question seems to be about is how to do the pre-allocation. If the input is always the same, using replicate() means you don't worry about it. If the input is is different each time, then pre-allocate using matrix(), you don't need to write any special function.