I want r to arrange/merge my resamples in vector - r

I have a vector from 1 to 10 which I divided into blocks of size 2 each and then resample the blocks but I am not happy with the result arrangement in r. the below r code demonstrates my experience:
ts <- 1:10 # generate a number from 1 to 10
bs <- 2 # let block size equals 2
nb <- length(ts) / bs # number of blocks
blk <- split(ts, rep(1:nb, each=bs)) # split the generated numbers into "blk"
res<-sample(blk, replace=T, 20) # resample the blk 20 times with replacement
res # gives me the below result
#$`5`
#[1] 9 10
#$`5`
#[1] 9 10
#$`5`
#[1] 9 10
#$`3`
#[1] 5 6
#$`1`
#[1] 1 2
#$`2`
#[1] 3 4
#$`5`
#[1] 9 10
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`3`
#[1] 5 6
#$`2`
#[1] 3 4
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
I rather want the result like:
(9,10,9,10,9,10,5,6,1,2,3,4,9,10,1,2,1,2,1,2,7,8,7,8,7,8,7,8,7,8,5,6,3,4,1,2,1,2,1,2)
such that I can call "res" as a univariate and I should be able to write the result into either one line or one column .csv file like this:
write.csv(res, "resamples.csv")

You can use unlist() to flatten the list structure. You can also remove the auto-generated names if you don't want/need them:
ts <- 1:10 # generate a number from 1 to 10
bs <- 2 # let block size equals 2
nb <- length(ts) / bs # number of blocks
blk <- split(ts, rep(1:nb, each=bs)) # split the generated numbers into "blk"
res<-unlist(sample(blk, replace=TRUE, 20)) # resample the blk 20 times with replacement
names(res) <- NULL # optional
res
#> [1] 3 4 9 10 1 2 7 8 3 4 3 4 7 8 5 6 7 8 3 4 1 2 3
#> [24] 4 9 10 5 6 7 8 9 10 9 10 5 6 9 10 1 2
Created on 2019-09-23 by the reprex package (v0.3.0)

Related

Adding lists together with a loop

I'm trying to add lists together using a loop. Here is some example data.
df <- data.frame(var1 = c(1,1,2,2,2,2,3,3,3,3,3), var2= 1:11)
> df
var1 var2
1 1 1
2 1 2
3 2 3
4 2 4
5 2 5
6 2 6
7 3 7
8 3 8
9 3 9
10 3 10
11 3 11
I've run this loop code, and would like the items to be stored in a file that contains 3 lists
list_container <- list()
for (i in unique(df$var1) ) {
templist <- df[ df$var1==i , "var2"]
list_container <- list(list_container, templist)
}
it doesn't work, and ends up looking like this
> list_container
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
list()
[[1]][[1]][[2]]
[1] 1 2
[[1]][[2]]
[1] 3 4 5 6
[[2]]
[1] 7 8 9 10 11
I want the 3 sets of list to sit separately, it should end up like this
list_result <- list(1:2, 3:6, 7:11)
> list_result
[[1]]
[1] 1 2
[[2]]
[1] 3 4 5 6
[[3]]
[1] 7 8 9 10 11
Is there anyway I can modify my code to get the desired result? Any help greatly appreciated. Thanks
split would be more direct and faster
with(df, unname(split(var2, var1)))
-output
[[1]]
[1] 1 2
[[2]]
[1] 3 4 5 6
[[3]]
[1] 7 8 9 10 11
If we want to use the == with unique elements, initialize with a NULL list of length same as the length of unique elements of 'var1' column. Loop over the sequence of unique elements, and assign the subset of 'var2' to the ith element of 'list_container'
un1 <- unique(df$var1)
list_container <- vector('list', length(un1))
for(i in seq_along(un1))
list_container[[i]] <- df$var2[df$var1 == un1[i]]
-output
list_container
[[1]]
[1] 1 2
[[2]]
[1] 3 4 5 6
[[3]]
[1] 7 8 9 10 11
Another base R option using tapply
> with(df, tapply(var2, var1, c))
$`1`
[1] 1 2
$`2`
[1] 3 4 5 6
$`3`
[1] 7 8 9 10 11
or aggregate
> aggregate(var2 ~ ., df, c)$var2
[[1]]
[1] 1 2
[[2]]
[1] 3 4 5 6
[[3]]
[1] 7 8 9 10 11
You could also use unstack:
unstack(df, var2~var1)
$`1`
[1] 1 2
$`2`
[1] 3 4 5 6
$`3`
[1] 7 8 9 10 11
if you do not want the names, you can get rid of them:
unname(unstack(df, var2~var1))
[[1]]
[1] 1 2
[[2]]
[1] 3 4 5 6
[[3]]
[1] 7 8 9 10 11

How to do a simple pattern coding

If I want to have the numbers coded like this, could you give me some suggestion about the codes?
1
2 3
6 5 4
7 8 9 10
15 14 13 12 11
...........
Thanks!!
Another base R option
v <- choose((1:5) + 1, 2)
Map(function(x, y) {
ifelse(x %% 2, rev, I)((y - x + 1):y)
}, seq_along(v), v)
gives
[[1]]
[1] 1
[[2]]
[1] 2 3
[[3]]
[1] 6 5 4
[[4]]
[1] 7 8 9 10
[[5]]
[1] 15 14 13 12 11
Here is one option with split
lst1 <- split(1:15, rep(1:5, 1:5))
lst1[c(TRUE, FALSE)] <- lapply(lst1[c(TRUE, FALSE)], rev)
-output
lst1
#$`1`
#[1] 1
#$`2`
#[1] 2 3
#$`3`
#[1] 6 5 4
#$`4`
#[1] 7 8 9 10
#$`5`
#[1] 15 14 13 12 11
Or another option is
m1 <- matrix(NA, 5, 5)
m1[upper.tri(m1, diag = TRUE)] <- 1:15
m1 <- t(m1)
m1[c(TRUE, FALSE),] <- t(apply(m1[c(TRUE, FALSE),], 1,
function(x) c(rev(x[!is.na(x)]), x[is.na(x)])))

How can you generate multiple vectors of different length in R?

I would like to use a for loop to generate multiple vectors and save their values for later use. The end result ideally would be:
vector_1 = c(1)
vector_2 = c(1,2,3)
vector_3 = c(1,2,3,4,5,6)
.
.
.
vector_i = c(1,2,3,...,n) #for some n generated during the loop. This n does not always have an upper bound.
This is so that I can use each vector later on to plot multiple lines on the same graph with the axis of the graph scaled correctly.
The following code is the best example I can come up with to try and describe the idea but obviously using 'vector_i' for each loop is not going to work.
for (i in 1:n){
length = sample(1:i^2,1)
vector_i = seq(1,length)
}
You could use the following function:
make_vectors <- function(n) lapply(seq(n), function(i) seq(sample(i^2, 1)))
Which allows:
vector <- make_vectors(5)
vector
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2 3 4
#>
#> [[3]]
#> [1] 1 2 3 4
#>
#> [[4]]
#> [1] 1 2 3 4 5 6
#>
#> [[5]]
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
And you can access each one like this:
vector[[5]]
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
This keeps your global environment tidy and allows you to iterate through your vectors more easily than writing them all as independent entities.
We could use sequence
v1 <- sequence(c(1, 3, 6))
v1
#[1] 1 1 2 3 1 2 3 4 5 6
If we need it in a list
split(v1, cumsum(v1 == 1))
#$`1`
#[1] 1
#$`2`
#[1] 1 2 3
#$`3`
#[1] 1 2 3 4 5 6

How to subset a vector inside list of list

I have the following list of list:
foo <- list(a = list(x = 1:10, y = 11:25), b = list(x = 1:10, y = 100:110))
It looks like this:
> foo
$a
$a$x
[1] 1 2 3 4 5 6 7 8 9 10
$a$y
[1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
$b
$b$x
[1] 1 2 3 4 5 6 7 8 9 10
$b$y
[1] 100 101 102 103 104 105 106 107 108 109 110
What I want to do is to only pick first 3 of the list yielding:
$a
$a$x
[1] 1 2 3
$a$y
[1] 11 12 13
$b
$b$x
[1] 1 2 3
$b$y
[1] 100 101 102
How can I achieve that?
A wild ?rapply appears:
rapply(foo, f=head, n=3, how="list")
#$a
#$a$x
#[1] 1 2 3
#
#$a$y
#[1] 11 12 13
#
#
#$b
#$b$x
#[1] 1 2 3
#
#$b$y
#[1] 100 101 102
This is a recursive apply which will go through each list until a non-list is found and then apply a f=unction. The how="list" argument just ensures that you get the same list structure back again.
Use nested lapply
lapply(foo, lapply, `[`, 1:3)
#$a
#$a$x
#[1] 1 2 3
#$a$y
#[1] 11 12 13
#$b
#$b$x
#[1] 1 2 3
#$b$y
#[1] 100 101 102
Or use head
lapply(foo, lapply, head, 3)
library(purrr)
foo %>% map_depth(2,head,3)
#$a
#$a$x
#[1] 1 2 3
#
#$a$y
#[1] 11 12 13
#
#
#$b
#$b$x
#[1] 1 2 3
#
#$b$y
#[1] 100 101 102

Finding unique combinations irrespective of position [duplicate]

This question already has answers here:
pair-wise duplicate removal from dataframe [duplicate]
(4 answers)
Closed 6 years ago.
I'm sure it's something simple, but I have a data frame
df <- data.frame(a = c(1, 2, 3),
b = c(2, 3, 1),
c = c(3, 1, 4))
And I want a new data frame that contains the unique combinations of values in the rows, irrespective of which column they're in. So in the case above I'd want
a b c
1 2 3
3 1 4
I've tried
unique(df[c('a', 'b', 'c')])
but it sees (1, 2, 3) as unique from (2, 3, 1), which I don't want.
Maybe something like that
indx <- !duplicated(t(apply(df, 1, sort))) # finds non - duplicates in sorted rows
df[indx, ] # selects only the non - duplicates according to that index
# a b c
# 1 1 2 3
# 3 3 1 4
If your data.frame is quite big, the speed may be a matter for you. You can find duplicated sets much faster with the following idea.
Let's imaginary assign each possible value in rows a prime number and count products for each row. For example, for given df we can accept primenums = c(2,3,5,7) and count products c(30,30,70). Then duplicates in this products-vector correspond to duplicated sets in our data.frame. As multiplication is being computed much faster then any kinds of sorting, you can gain efficiency.
The code is following.
require("numbers")
primenums <- Primes(100)[1:4]
dfmult <- apply(as.matrix(df), 1, function(z) prod(primenums[z]) )
my_indx <- !duplicated(dfmult)
df[my_indx,]
Here we initialize vector primenums with the help of function Primes from package numbers, but you can do manually in other way.
Take a look at the example. Here I show comparison of efficiency.
require("numbers")
# generate all unique combinations 10 out of 20
allcomb <- t(combn(20,10))
# make sample of 1 million rows
set.seed(789)
df <- allcomb[sample(nrow(allcomb), 1e6, T),]
# lets sort matrix to show we have duplicates
df <- df[do.call(order, lapply(1:ncol(df), function(i) df[, i])), ]
head(df, 10)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 2 3 4 5 6 7 8 9 10
# [2,] 1 2 3 4 5 6 7 8 9 10
# [3,] 1 2 3 4 5 6 7 8 9 10
# [4,] 1 2 3 4 5 6 7 8 9 10
# [5,] 1 2 3 4 5 6 7 8 9 11
# [6,] 1 2 3 4 5 6 7 8 9 11
# [7,] 1 2 3 4 5 6 7 8 9 11
# [8,] 1 2 3 4 5 6 7 8 9 11
# [9,] 1 2 3 4 5 6 7 8 9 11
# [10,] 1 2 3 4 5 6 7 8 9 11
# to be fair need to permutate numbers in rows before searching for identical sets
df <- t(apply(df, 1, function(z) z[sample(10,10)] ))
df <- as.data.frame(df)
names(df) <- letters[1:10]
# how does it look like now?
head(df, 10)
# a b c d e f g h i j
# 1 2 3 7 9 10 1 4 8 5 6
# 2 4 2 6 3 8 10 9 1 5 7
# 3 4 2 6 8 5 1 10 7 3 9
# 4 6 8 5 4 2 1 10 9 7 3
# 5 11 2 7 6 8 1 9 4 5 3
# 6 9 6 3 11 4 2 8 7 5 1
# 7 5 2 3 11 1 8 6 9 7 4
# 8 3 9 7 1 2 5 4 8 11 6
# 9 6 2 8 3 4 1 11 5 9 7
# 10 4 6 3 9 7 2 1 5 11 8
# now lets shuffle rows to make df more plausible
df <- df[sample(nrow(df), nrow(df)),]
Now when data.frame is ready we can test different algorithms.
system.time(indx <- !duplicated(t(apply(df, 1, sort))) )
# user system elapsed
# 119.75 0.06 120.03
# doesn't impress, frankly speaking
library(sets)
system.time(indx <- !duplicated(apply(df, 1, as.set)) )
# user system elapsed
# 91.60 0.00 91.89
# better, but we want faster! =)
# now lets check out the method with prime numbers
primenums <- Primes(100)[1:20]
# [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71
system.time({
dfmult <- apply(as.matrix(df), 1, function(z) prod(primenums[z]) )
my_indx <- !duplicated(dfmult) })
# user system elapsed
# 6.44 0.16 6.61
# not bad, isn't it? but lets compare results
identical(indx, my_indx)
# [1] TRUE
# So, if there is no difference, why wait more? ;)
There is one important assumption here - we use as.matrix(df), but what if there are not only numeric variables in our data.frame? A more unificated solution will be as follows:
system.time({
dfmult <- apply(
apply(df, 2, function(colmn) as.integer(factor(colmn,
levels = unique(c(as.matrix(df)))))),
1, function(z) prod(primenums[z]) )
my_indx <- !duplicated(dfmult) })
# user system elapsed
# 27.48 0.34 27.84
# is distinctly slower but still much faster then previous methods
And what about if we have very much columns or very much different variables? In this case instead of prod() we can use sum(log()) (which is being computed probably even faster for large numbers). Take a look at this.
pr <- Primes(5e7)
length(pr)
# [1] 3001134
system.time(N <- sum(log(pr)))
# user system elapsed
# 0.12 0.00 0.13
N
# [1] 49993718
It's hard to imagine df with 3 mln columns, but here it's ok. This way allows us to carry df of any incredibly huge size with as many columns our RAM can hold.
As an alternative approach, the package sets provides a fast way of checking for set equality:
library(sets)
df.sets <- apply(df, 1, as.set)
#[[1]]
#{1, 2, 3}
#[[2]]
#{1, 2, 3}
#[[3]]
#{1, 3, 4}
df[!duplicated(df.sets),]
# a b c
#1 1 2 3
#3 3 1 4

Resources