I want to take "random" samples from a vector called data but with increasing size and without replacement.
To illustrate my point data looks for example like:
data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
What I need is to get different sampling vectors with increasing sampling size (starting with size=2) for example by 2 but without duplicates between the different vectors and store everything into a list so that the result would look something like this:
sample_1<-c("s","d")
sample_2<-c("s","d","a","f")
sample_3<-c("s","d","a","f","m","n")
sample_4<-c("s","d","a","f","m","n","l","c")
sample_5<-c("s","d","a","f","m","n","l","c","j","x")
sample_6<-c("s","d","a","f","m","n","l","c","j","x","v","k")
sample_7<-c("s","d","a","f","m","n","l","c","j","x","v","k","g","b")
sample_8<-c("s","d","a","f","m","n","l","c","j","x","v","k","g","b","h")
samples<-list(sample_1,sample_2,sample_3,sample_4,sample_5,sample_6,sample_7,sample_8)
What i have so far is:
samples<-sapply(seq(from=2, to=length(data), by=2), function(i) sample(data,size=i,replace=F),simplify=F,USE.NAMES=T )
What does not work is to have the increasing sample size but keeping the samples of the previous steps and to have a last list element with all observations.
Is something like this possible?
I'm not sure whether I understood you correctly, but perhaps you only need to scramble the data once:
data = letters
data_random = sample(data)
sapply(seq(from=2, to=length(data), by=2),
function (x) data_random[1:x],
simplify = FALSE)
After your comments on other answer I think I get what you want to achieve, so extending my previous code I end up with:
data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
set.seed(123)
nbitems=length(data)/2+length(data)%%2
results=vector("list",nbitems)
results[[1]] <- sample(data,2) # get first sample
for (i in 2:nbitems) { # Loop for each result
samplesavail <- data[!data %in% results[[i-1]]] # Reduce the samples available
results[[i]] <- c(results[[i-1]], sample( samplesavail, min( length(samplesavail), 2) ) ) # concatenate a new sample, size depends on step and remaining samples available.
}
Hope this match your intended use:
> results
[[1]]
[1] "n" "f"
[[2]]
[1] "n" "f" "a" "g"
[[3]]
[1] "n" "f" "a" "g" "m" "v"
[[4]]
[1] "n" "f" "a" "g" "m" "v" "x" "l"
[[5]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j"
[[6]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h"
[[7]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h" "d" "s"
[[8]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h" "d" "s" "c"
Previous approach:
If I understood you well (but far unsure):
data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
set.seed(123) # fix the seed for repro of answer, remove in real case
nbitems=length(data)/2+length(data)%%2 # Get how much entries we should have when stepping by 2
results=vector("list",nbitems) # preallocate the list (as we'll start by end)
results[[nbitems]] = sample(data,length(data)) # sample the datas
for (i in nbitems:2) {
results[[i-1]] <- results[[i]][1:(length(results[[i]]) - 2)] # for each iteration, take down the 2 last entries.
}
This give a single entry as first result.
Just noticed this is the same idea as #sbstn answer but with a more complicated backward approach, posting in case it can have some value.
I have a large sequence of bytes, and I would like to generate a list containing an arbitrary number of subsets of that sequence. I suspect I need to use one of the apply functions, but the trick is that I need to iterate over the vector of starting positions, not the sequence itself.
Here's an example of how I want it to work --
extrct_by_mod <- function(x, startpos, endpos, lrecl)
{
x[1:length(x) %% lrecl %in% startpos:endpos]
}
tmp_seq <- letters[1:25]
startpos <- c(0, 2)
endpos <- c(1, 5)
lrecl <- 5
list_one <- extrct_by_mod(x=tmp_seq, startpos=startpos[1], endpos=endpos[1], lrecl=lrecl)
list_two <- extrct_by_mod(x=tmp_seq, startpos=startpos[2], endpos=endpos[2], lrecl=lrecl)
what_i_want <- list(list_one, list_two)
Ideally, I'd like to be able to just add more values to startpos and endpos, thus automatically generate more subsets to add to my list. Note that the subsets will not be the same length, and in some cases, not even the same type.
My datasets are fairly large, so something that scales well would be ideal. I realize that this could be done with a loop, but I'm understanding that you generally want to avoid looping in R.
Thank you!
Saving some time by pre-calculating the modulo-selection index:
> cats <- 1:length(tmp_seq) %% lrecl
> mapply(function(start,end) { tmp_seq[cats %in% start:end]} , startpos, endpos)
[[1]]
[1] "a" "e" "f" "j" "k" "o" "p" "t" "u" "y"
[[2]]
[1] "b" "c" "d" "g" "h" "i" "l" "m" "n" "q" "r" "s" "v" "w" "x"
(It is not correct that R apply functions are any faster than equivalent loops.)
Suppose I have the current membership status of a group, i.e. names of members. Additionally, I have data on times when some new member may have been added to the group and / or an old member may have been removed from the group.
The task at hand is to recreate the membership of the group at all these points in time. I've looked around but did not find a ready solution for this problem. Does anybody know an elegant method of doing this?
Reproducible example:
Input:
periods <- 5
indx <- paste0("t-", seq_len(periods))
[1] "t-1" "t-2" "t-3" "t-4" "t-5"
current <- letters[seq_len(10)]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
incoming <- setNames(letters[seq_len(periods) + 5], indx)
incoming[2] <- NA
t-1 t-2 t-3 t-4 t-5
"f" NA "h" "i" "j"
outgoing <- setNames(letters[seq_len(periods) + 10], indx)
outgoing[4] <- NA
t-1 t-2 t-3 t-4 t-5
"k" "l" "m" NA "o"
Output:
$current
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
$`t-1`
[1] "a" "b" "c" "d" "e" "g" "h" "i" "j" "k"
$`t-2`
[1] "a" "b" "c" "d" "e" "g" "h" "i" "j" "k" "l"
$`t-3`
[1] "a" "b" "c" "d" "e" "g" "i" "j" "k" "l" "m"
$`t-4`
[1] "a" "b" "c" "d" "e" "g" "j" "k" "l" "m"
$`t-5`
[1] "a" "b" "c" "d" "e" "g" "k" "l" "m" "o"
Disclaimer: I've written a solution for this which I will be posting as my answer to the question. The intent is to document this problem and a possible solution and to elicit other ingenious and / or existing solutions or improvements.
The function create_mem_ts (membership timeseries) will generate the desired output as posted in the question.
create_mem_ts <- function (ctime, added, removed, current) {
# Create a time-series of membership of a set.
# Inputs:
## ctime: Time of changes in set.
## An atomic vector of a time-series class or otherwise,
##
## interpretable as a time-series in descending order (for e.g.
## `t-1`, `t-2`, `t-3` etc.
##
## Is an index of when the changes in membership happened in time.
## Allows repeats but no NAs.
## added: Member(s) added to the set.
## An atomic vector or a list of the same length as ctime.
##
## If an atomic vector, represents exactly one member added at
## the corresponding ctime.
##
## If a list, represents multiple members added at corresponding
## ctime.
## removed: Member(s) removed from the set.
## An atomic vector or a list of the same length as ctime.
##
## If an atomic vector, represents exactly one member removed at
## the corresponding ctime.
##
## If a list, represents multiple members removed at the
## corresponding ctime.
## current: Current membership of the set.
## An atomic vector listing the current membership of the set.
# Output:
## A list of the same length as ctime named by values in ctime (coerced to
## character by the appropriate method).
stopifnot(is.atomic(ctime),
is.atomic(added) || is.list(added),
is.atomic(removed) || is.list(removed))
if (any(is.na(ctime))) stop("NAs not allowed in the ctime.")
stopifnot(length(ctime) == length(added),
length(added) == length(removed))
if (any(duplicated(ctime))) {
ctime.u <- unique(ctime)
ctime.f <- factor(ctime, levels=as.character(ctime.u))
added <- split(added, ctime.f)
removed <- split(removed, ctime.f)
} else {
ctime.u <- ctime
}
out <- setNames(vector(mode="list", length=length(ctime.u) + 1),
c("current", as.character(ctime.u)))
out[["current"]] <- current
for (i in 2:length(out))
out[[i]] <- union(setdiff(out[[i - 1]], added[[i - 1]]),
na.omit(removed[[i - 1]]))
attr(out, "index") <- ctime.u
out
}
Moreover, if ctime is a valid time-series class in the function above, the output from that can be used to generate membership on any time-stamp using the function (within the range in ctime) using this function memship_at.
memship_at <- function (mem_ts, at) {
stopifnot(inherits(at, class(attr(mem_ts, "index"))))
just.before <- which(at > attr(mem_ts, "index"))[1]
if (just.before > 1)
mem_ts[[just.before - 1]]
else
mem_ts[[1]]
}
I have a character vector, and I want to exclude elements from it which are present in a second vector. I don't know how to work the negation in this case while still considering the entire vector
vector[vector ! %in% vector2]
I can obviously do vector[vector != single_character] but that only works for a single character.
You're close
vector[!vector %in% vector2]
or, even though you said "not using setdiff"
setdiff(vector, vector2)
vector1 <- letters[1:4]
set.seed(001)
vector2 <- sample(letters[1:15], 10, replace=TRUE)
vector1
[1] "a" "b" "c" "d"
vector2
[1] "d" "f" "i" "n" "d" "n" "o" "j" "j" "a"
vector2 [!(vector2 %in% vector1)] # elements in vector2 that are not in vector1
[1] "f" "i" "n" "n" "o" "j" "j"
You can define a new operator,
`%ni%` = Negate(`%in%`)
A more elegant solution is available now:
library(textclean)
# master character vector
vector1 = c("blue", "green", "red")
# vector containing elements to be removed from master vector
vector2 = c("green", "red")
drop_element_fixed(vector1, vector2)
# Output:
# [1] "blue"
I would like to use the characters in a vector as the names of character objects
aiming to get
first as say "d","e","a","t" etc.
tried this approach but am clearly missing some function to apply to x[i]
x <- c("first","second","third"..)
for (i in 1:length(x)) {
x[i] <- sample(letters,4)
}
TIA
The function you are looking for is assign():
> x <- c("first","second","third")
> for (i in 1:length(x)) {
+ assign(x[i], sample(letters,4))
+ }
>
> ls()
[1] "first" "i" "second" "third" "x"
> first
[1] "t" "d" "u" "j"
> second
[1] "o" "i" "p" "l"
> third
[1] "w" "v" "r" "n"
As an alternative, you could build these vectors as different elements of a list:
> mylist <- list()
> for (i in 1:length(x)) {
+ mylist[[x[i]]] <- sample(letters,4)
+ }
> mylist
$first
[1] "e" "l" "y" "d"
$second
[1] "t" "o" "k" "h"
$third
[1] "g" "x" "p" "b"
You don't say what you will be doing with this object. You may get the simplest structure by using a named vector:
names(x) <- x
x[] <- sample(letters, 4)
If you do not use the paired bracket on the LHS, the whole vector gets replaced and the names will be lost. You can now access the values with quoted names:
> x
first second third fourth
"w" "c" "r" "x"
> x["second"]
second
"c"