Switch the some of the vector in R - r

For example, I have a vector below:
a = c(1,1,1,1,1,2,2,2,3,3,3,3)
Now I want to randomly pick 4 of the elements from all the elements then change them into different value, for instance,
if the elements I pick is 1, 1, 2, 3, then I need to change them randomly , like, 2, 3, 1, 2
The resulting vector is the following
a' = c(1,2,3,1,1,2,1,2,3,3,3,2)
No idea how to make this.

May be this function helps
# #param vec - input vector
# #param n - number of values to replace
# #param n1 - number of unique value threshold
# #return replaced sampled vector
sample_fn <- function(vec, n, n1) {
flag <- TRUE
while(flag) {
# // sample on the positions
pos <- sample(seq_along(vec), n, replace = FALSE)
print(pos)
# // extract the values based on the position index
as <- vec[pos]
# // get the unique values
un1 <- unique(as)
print(un1)
if(length(un1) > n1)
flag <- FALSE
}
# // sample the unique and set it as names of unique
# // use named vector to match and replace
# // assign the output back to the same positions in the vector
vec[pos] <- setNames(sample(un1), un1)[as.character(as)]
vec
}
sample_fn(a, 4, 2)
#[1] 10 1 12 2
#[1] 3 1
#[1] 1 8 4 3
#[1] 1 2
#[1] 7 11 4 12
#[1] 2 3 1
# [1] 1 1 1 2 1 2 3 2 3 3 1 1

I am not sure if the values for random replacement are also from a. If so, the code below might be an option
replace(a,sample(seq_along(a),4),sample(unique(a),4,replace = TRUE))

Related

R: pass multiple arguments to accumulate/reduce

This is related to R: use the newly generated data in the previous row
I realized the actual problem I was faced with is a bit more complicated than the example I gave in the thread above - it seems I have to pass 3 arguments to the recursive calculation to achieve what I want. Thus, accumulate2 or reduce may not work. So I open a new question here to avoid possible confusion.
I have the following dataset grouped by ID:
ID <- c(1, 2, 2, 3, 3, 3)
pw <- c(1:6)
add <- c(1, 2, 3, 5, 7, 8)
x <- c(1, 2, NA, 4, NA, NA)
df <- data.frame(ID, pw, add, x)
df
ID pw add x
1 1 1 1 1
2 2 2 2 2
3 2 3 3 NA
4 3 4 5 4
5 3 5 7 NA
6 3 6 8 NA
Within each group for column x, I want to keep the value of the first row as it is, while fill in the remaining rows with lagged values raised to the power stored in pw, and add to the exponent the value in add. I want to update the lagged values as I proceed. So I would like to have:
ID pw add x
1 1 1 1 1
2 2 2 2 2
3 2 3 3 2^3 + 3
4 3 4 5 4
5 3 5 7 4^5 + 7
6 3 6 8 (4^5 + 7)^6 + 8
I have to apply this calculation to a large dataset, so it would be perfect if there is a fast way to do this!
If we want to use accumulate2, then specify the arguments correctly i.e. it takes two input arguments as 'pw' and 'add' and an initialization argument which would be the first value of 'x'. As it is a grouped by 'ID', do the grouping before we do the accumulate2, extract the lambda default arguments ..1, ..2 and ..3 respectively in that order and create the recursive function based on this
library(dplyr)
library(purrr)
out <- df %>%
group_by(ID) %>%
mutate(x1 = accumulate2(pw[-1], add[-1], ~ ..1^..2 + ..3,
.init = first(x)) %>%
flatten_dbl ) %>%
ungroup
out$x1
#[1] 1 2 11
#[4] 4 1031 1201024845477409792
With more than 3 arguments, a for loop would be better
# // initialize an empty vector
out <- c()
# // loop over the `unique` ID
for(id in unique(df$ID)) {
# // create a temporary subset of data based on that id
tmp_df <- subset(df, ID == id)
# // initialize a temporary storage output
tmp_out <- numeric(nrow(tmp_df))
# // initialize first value with the first element of x
tmp_out[1] <- tmp_df$x[1]
# // if the number of rows is greater than 1
if(nrow(tmp_df) > 1) {
// loop over the rows
for(i in 2:nrow(tmp_df)) {
#// do the recursive calculation and update
tmp_out[i] <- tmp_out[i - 1]^ tmp_df$pw[i] + tmp_df$add[i]
}
}
out <- c(out, tmp_out)
}
out
#[1] 1 2 11
#[4] 4 1031 1201024845477409792
In base R we could use the following solution for more than two arguments.
In this solution I first subset the original data set on ID values
Then I chose row id values through seq_len(nrow(tmp))[-1] omitting the first row id since it was provided by init
In anonymous function I used in Reduce, b argument represents accumulated/ previous value starting from init and c represents new/current values of our vector which is row numbers
So in every iteration our previous value (starting from init) will be raised to the power of new value from pw and will be summed by new value from add
cbind(df[-length(df)], unlist(lapply(unique(df$ID), function(a) {
tmp <- subset(df, df$ID == a)
Reduce(function(b, c) {
b ^ tmp$pw[c] + tmp$add[c]
}, init = tmp$x[1],
seq_len(nrow(tmp))[-1], accumulate = TRUE)
}))) |> setNames(c(names(df)))
ID pw add x
1 1 1 1 1.000000e+00
2 2 2 2 2.000000e+00
3 2 3 3 1.100000e+01
4 3 4 5 4.000000e+00
5 3 5 7 1.031000e+03
6 3 6 8 1.201025e+18
Data
structure(list(ID = c(1, 2, 2, 3, 3, 3), pw = 1:6, add = c(1,
2, 3, 5, 7, 8), x = c(1, 2, NA, 4, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
Base R, not using Reduce() but rather a while() Loop:
# Split-apply-combine while loop: res => data.frame
res <- do.call(rbind, lapply(with(df, split(df, ID)), function(y){
# While there are any NAs in x:
while(any(is.na(y$x))){
# Store the index of the first NA value: idx => integer scalar
idx <- with(y, head(which(is.na(x)), 1))
# Calculate x at that index using the business rule provided:
# x => numeric vector
y$x[idx] <- with(y, x[(idx-1)] ** pw[idx] + add[idx])
}
# Explicitly define the return object: y => GlobalEnv
y
}
)
)
OR recursive function:
# Recursive function: estimation_func => function()
estimation_func <- function(value_vec, exponent_vec, add_vec){
# Specify the termination condition; when all elements
# of value_vec are no longer NA:
if(all(!(is.na(value_vec)))){
# Return value_vec: numeric vector => GlobalEnv
return(value_vec)
# Otherwise recursively apply the below:
}else{
# Store the index of the first na value: idx => integer vector
idx <- Position(is.na, value_vec)
# Calculate the value of the value_vec at that index;
# using the provided business logic: value_vec => numeric vector
value_vec[idx] <- (value_vec[(idx-1)] ** exponent_vec[idx]) + add_vec[idx]
# Recursively apply function: function => Local Env
return(estimation_func(value_vec, exponent_vec, add_vec))
}
}
# Split data.frame into a list on ID;
# Overwrite x values, applying recursive function;
# Combine list into a data.frame
# res => data.frame
res <- data.frame(
do.call(
rbind,
Map(function(y){y$x <- estimation_func(y$x, y$pw, y$add); y}, split(df, df$ID))
), row.names = NULL
)

Adding numeric vectors of unequal length in a list together with an offset based on the index?

I have a list of vectors as given by:
list_num <- list(c(1,1,1,1,1), c(2,2), c(5), c(3,3,3,3,3))
I want to add all these vectors together, but offset each vector by the value of it's position in the list. i.e.- when adding the second vector c(2,2), we add it to the second position onwards to the first. So essentially, it'd look like the following, where all the elements are added up together
list_num <- list(c(1,1,1,1,1), c(0,2,2), c(0,0,5), c(0,0,0,3,3,3,3,3))
# Output:
>> 1 3 8 4 4 3 3 3
My current approach involves generating a vector to house the added results and iterating over each element to add it in:
# Find the length for each of the vectors in the list
list_len <- unlist(lapply(list_num, function(x) { return(length(x))}))
# Find how long will the vector to add the results have to be
list_len <- 1:length(list_num)+list_len
# Generate a vector to house the added results
list_len <- rep(0, max(list_len)-1)
# Then iterate over each of the elements by index i
for(i in 1:length(list_num)){
# Add the vector at position i to the subset of our aggregated vector
list_len[i:(i+length(list_num[[i]])-1)] <- list_len[i:(i+length(list_num[[i]])-1)] + list_num[[i]]
}
print(list_len)
>> 1 3 8 4 4 3 3 3
But I think this is rather inefficient; I'm looking for a more efficient way to go about aggregating these vectors.
We can use lapply to add offset 0's using rep
out <- lapply(seq_along(list_num), function(n) c(rep(0, n-1), list_num[[n]]))
out
#[[1]]
#[1] 1 1 1 1 1
#[[2]]
#[1] 0 2 2
#[[3]]
#[1] 0 0 5
#[[4]]
#[1] 0 0 0 3 3 3 3 3
We can then add NA's to make length equal and calculate row-wise sum.
rowSums(sapply(out, `[`, 1:max(lengths(out))), na.rm = TRUE)
#[1] 1 3 8 4 4 3 3 3
The requirement does not seem very hard to code in C++, hence here is an option using Rcpp:
library(Rcpp)
cppFunction("NumericVector psumUnevenList(List a, int len) {
NumericVector res(len), v;
for (int i=0; i<a.length(); i++) {
v = a[i];
for (int j=0; j<v.length(); j++) {
res[i+j] += v[j];
}
}
return res;
}")
maxn <- max(seq_along(list_num) - 1L + lengths(list_num))
psumUnevenList(list_num, maxn)
#[1] 1 3 8 4 4 3 3 3
Base R solution (two-steps):
# Store a scalar valued at the length of longest vector in the list:
vec_list_max_length <- max(lengths(vec_list))
# Set the length of each vector to be equal to the maximum length, rowbind the list
# together and get the sum of each row:
rowSums(sapply(vec_list, function(x) {
length(x) = vec_list_max_length
return(replace(x, is.na(x), 0))
}))
Data:
vec_list <- list(c(1,1,1,1,1), c(0,2,2), c(0,0,5), c(0,0,0,3,3,3,3,3))

Flag rows in matrix that contain the same set of values

I have a matrix of integers
m <- rbind(c(1,2),
c(3,6),
c(5,1),
c(2,1),
c(6,3))
and I am looking for a function that takes this matrix as input and outputs a vector flag with length(flag) == ncol(m) that assigns the rows that contain the same set of integers the same unique (let's say integer) value.
For the above example, the desired output would be:
flag <- c(1, 2, 3, 1, 2)
So rows 1 and 4 inm get the same flag 1, because they both contain the same set of integers, in this case {1, 2}. Similarly, rows 2 and 5 get the same flag.
The solution should work for any number of columns.
The only thing I could come up with is the following approach ...
FlagSymmetric <- function(x) {
vec_sim <- rep(NA, nrow(x)) # object containing flags
ind_ord <- ncol(x)
counter <- 1
for(i in 1:nrow(x)) {
if(is.na(vec_sim[i])) { # if that row is not flagged yet, proceed ...
vec_sim[i] <- counter # ... and give the next free flag
for(j in (i+1):nrow(x)) {
if( (i+1) > nrow(x) ) next # in case of tiny matrices
ind <- x[j, ] %in% x[i, ]
if(sum(ind)==ind_ord) vec_sim[j] <- counter # if the same, assign flag
}
counter <- counter + 1
}
}
return(vec_sim)
}
... which does what I want:
> FlagSymmetric(m)
[1] 1 2 3 1 2
If n = nrow(m) this needs 1/2 n^2 operations. Of course, I could make it much quicker by writing this in C++, but this only alleviates my problem to some extent, because I am working with matrices with a potentially huge number of rows.
I guess there must be a smarter way of doing this.
EDIT:
Additional, more general example (sorting row and pasting to character string not possible):
m2 <- rbind(c(1,112),
c(11,12),
c(12,11),
c(112,1),
c(6,3))
flag2 <- c(1, 2, 2, 1, 3) # desired output
FlagSymmetric(m2) # works
[1] 1 2 2 1 3
Assuming you only have numeric data in your matrix.
First converting the matrix to dataframe,
m <- data.frame(m)
We can sort every row and paste them together. Convert them to factor and then to numeric to get unique numbers for every combination
m$flag <- as.numeric(factor(apply(m, 1, function(x) paste0(sort(x), collapse = ""))))
m
# X1 X2 flag
#1 1 2 1
#2 3 6 3
#3 5 1 2
#4 2 1 1
#5 6 3 3
EDIT
The above solution does not work for every combination as explained in the new example. To differentiate between each number, as #d.b commented we can use any non-empty collapse argument. For updated example,
as.numeric(factor(apply(m2, 1, function(x) paste0(sort(x), collapse = "-"))))
#[1] 1 2 2 1 3

'Random' Sorting with a condition in R for Psychology Research

I have Valence Category for word stimuli in my psychology experiment.
1 = Negative, 2 = Neutral, 3 = Positive
I need to sort the thousands of stimuli with a pseudo-randomised condition.
Val_Category cannot have more than 2 of the same valence stimuli in a row i.e. no more than 2x negative stimuli in a row.
for example - 2, 2, 2 = not acceptable
2, 2, 1 = ok
I can't sequence the data i.e. decide the whole experiment will be 1,3,2,3,1,3,2,3,2,2,1 because I'm not allowed to have a pattern.
I tried various packages like dylpr, sample, order, sort and nothing so far solves the problem.
I think there's a thousand ways to do this, none of which are probably very pretty. I wrote a small function that takes care of the ordering. It's a bit hacky, but it appeared to work for what I tried.
To explain what I did, the function works as follows:
Take the vector of valences and samples from it.
If sequences are found that are larger than the desired length, then, (for each such sequence), take the last value of that sequence at places it "somewhere else".
Check if the problem is solved. If so, return the reordered vector. If not, then go back to 2.
# some vector of valences
val <- rep(1:3,each=50)
pseudoRandomize <- function(x, n){
# take an initial sample
out <- sample(val)
# check if the sample is "bad" (containing sequences longer than n)
bad.seq <- any(rle(out)$lengths > n)
# length of the whole sample
l0 <- length(out)
while(bad.seq){
# get lengths of all subsequences
l1 <- rle(out)$lengths
# find the bad ones
ind <- l1 > n
# take the last value of each bad sequence, and...
for(i in cumsum(l1)[ind]){
# take it out of the original sample
tmp <- out[-i]
# pick new position at random
pos <- sample(2:(l0-2),1)
# put the value back into the sample at the new position
out <- c(tmp[1:(pos-1)],out[i],tmp[pos:(l0-1)])
}
# check if bad sequences (still) exist
# if TRUE, then 'while' continues; if FALSE, then it doesn't
bad.seq <- any(rle(out)$lengths > n)
}
# return the reordered sequence
out
}
Example:
The function may be used on a vector with or without names. If the vector was named, then these names will still be present on the pseudo-randomized vector.
# simple unnamed vector
val <- rep(1:3,each=5)
pseudoRandomize(val, 2)
# gives:
# [1] 1 3 2 1 2 3 3 2 1 2 1 3 3 1 2
# when names assigned to the vector
names(val) <- 1:length(val)
pseudoRandomize(val, 2)
# gives (first row shows the names):
# 1 13 9 7 3 11 15 8 10 5 12 14 6 4 2
# 1 3 2 2 1 3 3 2 2 1 3 3 2 1 1
This property can be used for randomizing a whole data frame. To achieve that, the "valence" vector is taken out of the data frame, and names are assigned to it either by row index (1:nrow(dat)) or by row names (rownames(dat)).
# reorder a data.frame using a named vector
dat <- data.frame(val=rep(1:3,each=5), stim=rep(letters[1:5],3))
val <- dat$val
names(val) <- 1:nrow(dat)
new.val <- pseudoRandomize(val, 2)
new.dat <- dat[as.integer(names(new.val)),]
# gives:
# val stim
# 5 1 e
# 2 1 b
# 9 2 d
# 6 2 a
# 3 1 c
# 15 3 e
# ...
I believe this loop will set the Valence Category's appropriately. I've called the valence categories treat.
#Generate example data
s1 = data.frame(id=c(1:10),treat=NA)
#Setting the first two rows
s1[1,"treat"] <- sample(1:3,1)
s1[2,"treat"] <- sample(1:3,1)
#Looping through the remainder of the rows
for (i in 3:length(s1$id))
{
s1[i,"treat"] <- sample(1:3,1)
#Check if the treat value is equal to the previous two values.
if (s1[i,"treat"]==s1[i-1,"treat"] & s1[i-1,"treat"]==s1[i-2,"treat"])
#If so draw one of the values not equal to that value
{
a = 1:3
remove <- s1[i,"treat"]
a=a[!a==remove]
s1[i,"treat"] <- sample(a,1)
}
}
This solution is not particularly elegant. There may be a much faster way to accomplish this by sorting several columns or something.

Sorting and Ordering in R

I am currently working through an intro class and I and was having some difficulty with this particular problem:
Create a function that takes in a vector of numbers V.Size and a single number N as inputs and outputs a list object of size N where each list member is a vector that contains elements of V.Size such that the largest value in V.Size is in the vector of the first list item, the second largest value in V.Sizeis in the vector of the second list item, etc. The (N+1) ordered value of V.Size should be in the first vector of the list, the (N+2) ordered value ofV.Size should be in the second vector of the list and so on.
Now, this is what I have done thus far, I am trying to make an example code:
V.Size <- c(5,4,2,3,1)
n <- 5
Function <- c(V.Size, n)
Function
[1] 5 4 2 3 1 5
sort(Function, decreasing=TRUE)
[1] 5 5 4 3 2 1
The issue I am having is with (N+1), (N+2) and its ordering.
The first step to addressing this would be to create a vector of the list position for each element in sorted V.size. This is basically the vector (1, 2, ..., N, 1, 2, ..., N, ...), of total length V.size. You can get that with:
V.Size <- c(5,4,2,3,1)
n <- 2
rep(1:n, length.out=length(V.Size))
# [1] 1 2 1 2 1
Now you can use the split function to create a list based on these assignments:
split(sort(V.Size, decreasing=TRUE), rep(1:n, length.out=length(V.Size)))
# $`1`
# [1] 5 3 1
#
# $`2`
# [1] 4 2

Resources