R: Compare vectors of differing lengths - r

I'm actually having trouble phrasing my question, so if anyone has feedback on that, I'd love to hear it.
I'm working in R and have a vector and a data frame, of different lengths:
xp.data <- c(400,500,600,700)
XPTable <- data.frame("Level"=1:10,"XP"=c(10,50,100,200,400,600,700,800,900,1000))
What I'm hoping to obtain is a new vector:
> lv.data
[1] 5 5 6 7
The goal is to do so without using a loop, as the xp.data vector can be any length, and the XPTable data frame can also be of varying lengths.
If I was doing this without a vector for xp.data, I'd just use:
max(XPTable$Level[XPTable$XP < XP.data])
However, this only works if XP.data has a length of 1.

lv.data <- findInterval(xp.data, XPTable$XP)
print(lv.data)
# [1] 5 5 6 7

Related

Looping through items on a list in R

this may be a simple question but I'm fairly new to R.
What I want to do is to perform some kind of addition on the indexes of a list, but once I get to a maximum value it goes back to the first value in that list and start over from there.
for example:
x <-2
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
data[x]
1
data[x+12]
1
data[x+13]
3
or something functionaly equivalent. In the end i want to be able to do something like
v=6
x=8
y=9
z=12
values <- c(v,x,y,z)
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
set <- c(data[values[1]],data[values[2]], data[values[3]],data[values[4]])
set
5 7 8 11
values <- values + 8
set
1 3 4 7
I've tried some stuff with additon and substraction to the lenght of my list but it does not work well on the lower numbers.
I hope this was a clear enough explanation,
thanks in advance!
We don't need a loop here as vectors can take vectors of length >= 1 as index
data[values]
#[1] 5 7 8 11
NOTE: Both the objects are vectors and not list
If we need to reset the index
values <- values + 8
ifelse(values > length(data), values - length(data) - 1, values)
#[1] 1 3 4 7

Using R to count patterns in columns

I have a matrix in R containing 1000 columns and 4 rows. Each cell in the matrix contains an integer between 1-4. I want to know two things:
1) What is the number of columns that contain a "1", "2", "3", and "4" in any order? Ideally, I would like the code to not require that I input each possible combination of 1,2,3,4 to perform its count.
2) What is the number of columns that contain 3 of the possible integers, but not all 4?
Solution 1
The most obvious approach is to run apply() over the columns and test for the required tabulation of the column vector using tabulate(). This requires first building a factor() out of the column vector to normalize its storage representation to an integer vector based from 1. And since you don't care about order, we must run sort() before comparing it against the expected tabulation.
For the "4 of 4" problem the expected tabulation will be four 1s, while for the "3 of 4" problem the expected tabulation will be two 1s and one 2.
## generate data
set.seed(1L); NR <- 4L; NC <- 1e3L; m <- matrix(sample(1:4,NR*NC,T),NR);
sum(apply(m,2L,function(x) identical(rep(1L,4L),sort(tabulate(factor(x))))));
## [1] 107
sum(apply(m,2L,function(x) identical(c(1L,1L,2L),sort(tabulate(factor(x))))));
## [1] 545
Solution 2
v <- c(1L,2L,4L,8L);
sum(colSums(matrix(v[m],nrow(m)))==15L);
## [1] 107
v <- c(1L,3L,9L,27L);
s3 <- c(14L,32L,38L,16L,34L,22L,58L,46L,64L,42L,48L,66L);
sum(colSums(matrix(v[m],nrow(m)))%in%s3);
## [1] 545
Here's a slightly weird solution.
I was looking into how to use colSums() or colMeans() to try to find a quick test for columns that have 4 of 4 or 3 of 4 of the possible cell values. The problem is, there are multiple combinations of the 4 values that sum to the same total. For example, 1+2+3+4 == 10, but 1+1+4+4 == 10 as well, so just getting a column sum of 10 is not enough.
I realized that one possible solution would be to change the set of values that we're summing, such that our target combinations would sum to unambiguous values. We can achieve this by spreading out the original set from 1:4 to something more diffuse. Furthermore, the original set of values of 1:4 is perfect for indexing a precomputed vector of values, so this seemed like a particularly logical approach for your problem.
I wasn't sure what degree of diffusion would be required to make unique the sums of the target combinations. Some ad hoc testing seemed to indicate that multiplication by a fixed multiplier would not be sufficient to disambiguate the sums, so I moved up to exponentiation. I wrote the following code to facilitate the testing of different bases to identify the minimal bases necessary for this disambiguation.
tryBaseForTabulation <- function(N,tab,base) {
## make destination value set, exponentiating from 0 to N-1
x <- base^(seq_len(N)-1L);
## make a matrix of unique combinations of the original set
g <- unique(t(apply(expand.grid(x,x,x,x),1L,sort)));
## get the indexes of combinations that match the required tabulation
good <- which(apply(g,1L,function(x) identical(tab,sort(tabulate(factor(x))))));
## get the sums of good and bad combinations
hs <- rowSums(g[good,,drop=F]);
ns <- rowSums(g[-good,,drop=F]);
## return the number of ambiguous sums; we need to get zero!
sum(hs%in%ns);
}; ## end tryBaseForTabulation()
The function takes the size of the set (4 for us), the required tabulation (as returned by tabulate()) in sorted order (as revealed earlier, this is four 1s for the "4 of 4" problem, two 1s and one 2 for the "3 of 4" problem), and the test base. This is the result for a base of 2 for the "4 of 4" problem:
tryBaseForTabulation(4L,rep(1L,4L),2L);
## [1] 0
So we get the result we need right away; a base of 2 is sufficient for the "4 of 4" problem. But for the "3 of 4" problem, it takes one more attempt:
tryBaseForTabulation(4L,c(1L,1L,2L),2L);
## [1] 7
tryBaseForTabulation(4L,c(1L,1L,2L),3L);
## [1] 0
So we need a base of 3 for the "3 of 4" problem.
Note that, although we are using exponentiation as the tool to diffuse the set, we don't actually need to perform any exponentiation at solution run-time, because we can simply index a precomputed vector of powers to transform the value space. Unfortunately, indexing a vector with a matrix returns a flat vector result, losing the matrix structure. But we can easily rebuild the matrix structure with a call to matrix(), thus we don't lose very much with this idiosyncrasy.
The last step is to derive the destination value space and the set of sums that satisfy the problem condition. The value spaces are easy; we can just compute the power sequence as done within tryBaseForTabulation():
2L^(1:4-1L);
## [1] 1 2 4 8
3L^(1:4-1L);
## [1] 1 3 9 27
The set of sums was computed as hs in the tryBaseForTabulation() function. Hence we can write a new similar function for these:
getBaseSums <- function(N,tab,base) {
## make destination value set, exponentiating from 0 to N-1
x <- base^(seq_len(N)-1L);
## make a matrix of unique combinations of the original set
g <- unique(t(apply(expand.grid(x,x,x,x),1L,sort)));
## get the indexes of combinations that match the required tabulation
good <- which(apply(g,1L,function(x) identical(tab,sort(tabulate(factor(x))))));
## return the sums of good combinations
rowSums(g[good,,drop=F]);
}; ## end getBaseSums()
Giving:
getBaseSums(4L,rep(1L,4L),2L);
## [1] 15
getBaseSums(4L,c(1L,1L,2L),3L);
## [1] 14 32 38 16 34 22 58 46 64 42 48 66
Now that the solution is complete, I realize that the cost of the vector index operation, rebuilding the matrix, and the %in% operation for the second problem may render it inferior to other potential solutions. But in any case, it's one possible solution, and I thought it was an interesting idea to explore.
Solution 3
Another possible solution is to precompute an N-dimensional lookup table that stores which combinations match the problem condition and which don't. The input matrix can then be used directly as an index matrix into the lookup table (well, almost directly; we'll need a single t() call, since its combinations are laid across columns instead of rows).
For a large set of values, or for long vectors, this could easily become impractical. For example, if we had 8 possible cell values with 8 rows then we would need a lookup table of size 8^8 == 16777216. But fortunately for the sizing given in the question we only need 4^4 == 256, which is completely manageable.
To facilitate the creation of the lookup table, I wrote the following function, which stands for "N-dimensional combinations":
NDcomb <- function(N,f) {
x <- seq_len(N);
g <- do.call(expand.grid,rep(list(x),N));
array(apply(g,1L,f),rep(N,N));
}; ## end NDcomb()
Once the lookup table is computed, the solution is easy:
v <- NDcomb(4L,function(x) identical(rep(1L,4L),sort(tabulate(factor(x)))));
sum(v[t(m)]);
## [1] 107
v <- NDcomb(4L,function(x) identical(c(1L,1L,2L),sort(tabulate(factor(x)))));
sum(v[t(m)]);
## [1] 545
We can use colSums. Loop over 1:4, convert the matrix to a logical matrix, get the colSums, check whether it is not equal to 0 and sum it.
sapply(1:4, function(i) sum(colSums(m1==i)!=0))
#[1] 6 6 9 5
If we need the number of columns that contain 3 and not have 4
sum(colSums(m1!=4)!=0 & colSums(m1==3)!=0)
#[1] 9
data
set.seed(24)
m1 <- matrix(sample(1:4, 40, replace=TRUE), nrow=4)

Apply an operation to some elements of a vector by using indices

I've got a fairly basic question concerning vector operations in R. I want to apply a certain operation (i.e. increment) to specific elements of a vector by using a vector containing the indices of the elements.
For example:
ind <- c(2,5,8)
vec <- seq(1,10)
I want to add 1 to the 2nd, 5th and 8th element of vec. In the end I'd like to have:
vec <- c(1,3,3,4,6,6,7,9,8,10)
I tried vec[ind] + 1
but that returns only the three elements. I could use a for-loop, of course, but knowing R, I'm sure there's a more elegant way.
Any help would be much appreciated.
We have to assign it
vec[ind] <- vec[ind] + 1
vec
#[1] 1 3 3 4 6 6 7 9 9 10

Vectors of different lengths from a `for` cycle in R: merging in a data frame [duplicate]

This question already has answers here:
Create a Data Frame of Unequal Lengths
(6 answers)
Closed 9 years ago.
I have the following elementary issue in R.
I have a for (k in 1:x){...} cycle which produces numerical vectors whose length depends on k.
For each value of k I produce a single numerical vector.
I would like to collect them as rows of a data frame in R, if possible. In other words, I would like to introduce a data frame data s.t.
for (k in 1:x) {
data[k,] <- ...
}
where the dots represent the command producing the vector with length depending on k.
Unfortunately, as far as I know, the length of the rows of a dataframe in R is constant, as it is a list of vectors of equal length. I have already tried to complete each row with a suitable number of zeroes to arrive at a constant length (in this case equal to x). I would like to work "dynamically", instead.
I do not think that this issue is equivalent to merge vectors of different lengths in a dataframe; due to the if cycle, only 1 vector is known at each step.
Edit
A very easy example of what I mean. For each k, I would like to write the vector whose components are 1,2,...,k and store it as kth row of the dataframe data. In the above setting, I would write
for (k in 1:x) {
data[k,] <- seq(1,k,1)
}
As the length of seq(1,k,1) depends on k the code does not work.
You could consider using ldply from plyr here.
set.seed(123)
#k is the length of each result
k <- sample( 5 , 3 , repl = TRUE )
#[1] 2 4 3
# Make a list of vectors, each a sequence from 1:k
ll <- lapply( k , function(x) seq_len(x) )
#[[1]]
#[1] 1 2
#[[2]]
#[1] 1 2 3 4
#[[3]]
#[1] 1 2 3
# take our list and rbind it into a data.frame, filling in missing values with NA
ldply( ll , rbind)
# 1 2 3 4
#1 1 2 NA NA
#2 1 2 3 4
#3 1 2 3 NA

swapping byrow for vector without converting to matrix

I have a matrix that I'm storing as a vector for speed & memory considerations. I want to essentially swap from 'byrow=FALSE' to 'byrow=TRUE' without actually converting it to a matrix (again, for speed and memory considerations, the data could potentially be very large).
It's trivial to do going through a call to matrix, e.g. if I have a 2x3 matrix,
> a <- 1:6
> a
[1] 1 2 3 4 5 6
> as.vector(matrix(a, nrow=2, ncol=3, byrow=TRUE))
[1] 1 4 2 5 3 6
I think I could come up with a manual solution involving pulling out every ith entry and reordering, etc, etc, but was hoping there might be a more straightforward solution.
Any ideas?
Thanks.

Resources