Removing certain elements form a list of lists - r

I am working on a list object containing hundreds of "lists" of random integers in the following format:
assignments <- list(
as.integer(c(1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3)),
as.integer(c(1, 1, 1, 0, 0, 0, 3, 3)),
as.integer(c(1, 3, 3, 3, 3, 3, 3, 2, 2)),
as.integer(c(1, 2, 0, 3, 2, 3, 2, 2, 2))
)
[[1]]
[1] 1 1 1 1 1 1 2 2 2 3 3
[[2]]
[1] 1 1 1 0 0 0 3 3
[[3]]
[1] 1 3 3 3 3 3 3 2 2
[[4]]
[1] 1 2 0 3 2 3 2 2 2
from which to extract the most frequent "non-zero" integer from a given list. However, in some lists of this list object, zero appears to be the most frequent integer, such as the second list [[2]]. The created some problems on my analysis.
Is there anyway to loop through a list of lists to remove certain elements, such as zero, from each list of this big list?
One method I've experimented earlier was to loop through this list of lists and use != to exclude values that equal zero
for(i in assignments){i[i != 0]}
but this didn't work.

lapply(assignments,function(x) x[x!=0])

Related

How to identify each integer sequence regardless of ties in a vector

This question is related to this identify whenever values repeat in r
While searching for answer there this new question arose:
I have this vector:
vector <- c(1, 1, 2, 3, 5, 6, 6, 7, 1, 1, 1, 1, 2, 3, 3)
I would like to identify each consecutive (by 1) integer sequence e.g. 1,2,3,.. or 3,4,5,.. or 4,5,6,7,...
BUT
It should allow ties 1,1,2,3,.. or 3,3,4,5,... or 4,5,5,6,6,7
The expected output would be a list like:
sequence1 <- c(1, 1, 2, 3)
sequence2 <- c(5, 6, 6, 7)
sequence3 <- c(1, 1, 1, 1, 2, 3, 3)
So far the nearest approach I found here Check whether vector in R is sequential?, but could not transfer it to what I want.
An option is with diff and cumsum
split(vector, cumsum(c(TRUE, abs(diff(vector)) > 1)))
-output
`1`
[1] 1 1 2 3
$`2`
[1] 5 6 6 7
$`3`
[1] 1 1 1 1 2 3 3

How to select a random vector

I have 4 vectors that contain integers.
I want to perform calculations based on 2 of the vectors, selected randomly.
I tried creating a new vector containing all the vectors, but sample() only gives me the first element of each vector.
My vectors if it helps:
A <- c(4, 4, 4, 4, 0, 0)
B <- c(3, 3, 3, 3, 3, 3)
C <- c(6, 6, 2, 2, 2, 2)
D <- c(5, 5, 5, 1, 1, 1)
The output I wanted is for example: A, B or B, D or D, A etc.
A thousand thanks in advance!
This is easier to do if you store your vectors in a list:
vecs <- list(
A = c(4, 4, 4, 4, 0, 0),
B = c(3, 3, 3, 3, 3, 3),
C = c(6, 6, 2, 2, 2, 2),
D = c(5, 5, 5, 1, 1, 1)
)
idx <- sample(1:length(vecs), 2, replace = F)
sampled <- vecs[idx]
sampled
$D
[1] 5 5 5 1 1 1
$B
[1] 3 3 3 3 3 3
You can then access your two sampled vectors, regardless of their names, with sampled[[1]] and sampled[[2]].
You first need make a list or a dataframe, on which you can do sample(). size= says the number of vectors that you want in each sample, which is 2 here.
LIST
> LIST <- list(A, B, C, D)
> sample(LIST, size = 2)
[[1]]
[1] 3 3 3 3 3 3
[[2]]
[1] 4 4 4 4 0 0
Dataframe
> df <- data.frame(A, B, C, D)
> sample(df, size = 2)
B C
1 3 6
2 3 6
3 3 2
4 3 2
5 3 2
6 3 2
I think you were sampling on the wrong object.
Make a list:
LIST = list(A,B,C,D)
names(LIST) = c("A","B","C","D")
This gives you a sample of 2 from the list
sample(LIST,2)
To add them for example, do:
Reduce("+",sample(LIST,2))

index from one vector to another by closest values

Given two sorted vectors, how can you get the index of the closest values from one onto the other.
For example, given:
a = 1:20
b = seq(from=1, to=20, by=5)
how can I efficiently get the vector
c = (1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
which, for each value in a, provides the index of the largest value in b that is less than or equal to it. But the solution needs to work for unpredictable (though sorted) contents of a and b, and needs to be fast when a and b are large.
You can use findInterval, which constructs a sequence of intervals given by breakpoints in b and returns the interval indices in which the elements of a are located (see also ?findInterval for additional arguments, such as behavior at interval boundaries).
a = 1:20
b = seq(from = 1, to = 20, by = 5)
findInterval(a, b)
#> [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
We can use cut
as.integer(cut(a, breaks = unique(c(b-1, Inf)), labels = seq_along(b)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

Count occurence of multiple numbers in vector one by one

I have two vectors
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- (1, 2, 3, 4, 5, 6)
I want to know how many times each element in b occurs in a. So the result should be
c(3, 3, 2, 1, 2, 0)
All methods I found like match(),==, %in% etc. are not suited for entire vectors. I know I can use a loop over all elements in b,
for (i in 1:length(b)) {
c[I] <- sum(a==b, na.rm=TRUE)
}
but this is used often and takes to long. That's why I'm looking for a vectorized way, or a way to use apply().
You can do this using factor and table
table(factor(a, unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Since you mentioned match, here is a possibility without sapply loop (thanks to #thelatemail)
table(factor(match(a, b), unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Here is a base R option, using sapply with which:
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- c(1, 2, 3, 4, 5, 6)
sapply(b, function(x) length(which(a == x)))
[1] 3 3 2 1 2 0
Demo
Here is a vectorised method
x = expand.grid(b,a)
rowSums( matrix(x$Var1 == x$Var2, nrow = length(b)))
# [1] 3 3 2 1 2 0

Group sequence of integers

I have bunch of observations
x = c(1, 2, 4, 1, 6, 7, 11, 11, 12, 13, 14)
that I want to turn into the group:
y = c(1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3)
I.e I want the first 5 integers (1 to 5) to constitute one group, the next 5 integers to constitute the next group (6 to 10), and so on.
Is there a straightforward way to accomplish this without a loop?
Clarification: I need to programmatically create the groups form the input vector (x)
We can use %/% to create the group
x%/%5+1
#[1] 1 1 1 1 2 2 3 3 3 3 3
You can use ceiling to create groups
ceiling(x/5)
# [1] 1 1 1 1 2 2 3 3 3 3 3

Resources