R For loop replace previsouly assigned values - r

I'd like to use uniform distribution to randomly assign value 1 or 2 for five groups(generate 5 random uniform distribution), with each group containing 10 samples.
I try to write:
for(i in 1:5){
rf <- runif(10)
result[rf<=0.5]=1
result[rf>0.5]=2
}
However this will replace the previously assigned values when the loop goes on.
The code produces only 10 results:
1 2 1 2 2 1 1 1 2 1
But I want a total of 50 randomized values:
1 2 1 2 ...... 2 1 1
How to do this? Thank you

Since, you are working on random number generated from same distribution every time, you can better generate 50 numbers in once, and assign value using ifelse function.
Try this:
a <- ifelse(runif(50) <= 0.5, 1, 2)
dim(a) <- c(10,5) #if result in matrix

To add to Gregor Thomas' advice, sample... You can also covert the stream into a matrix of 5 columns (groups) of 10.
nums <- sample(1:2, 50, replace = TRUE)
groups <- matrix(nums, ncol = 5)

Related

Is there any way to make a data frame using the last values of a repeated loop?

I'd like to make a data frame using only the last computed values from a Repeat loop.
For the repeat and sample functions, I'm using this data. The numbers in Prob column are the probabilities of each number to occur.
enter image description here
b <- 1
repeat {
c <- sample(a$Plus, size=1, prob=(a$Prob))
cat(b, '\t', c, '\n')
b <- b + 1
if (c >= 10) {
{
break
}
}
}
#I'm interested in the result greater than 10 only
If I run the code above, then it will compute something like
1 4
2 8
3 13
If I run this again, it will compute different results like..
1 9
2 3
3 7
4 3
5 11
What I'd like to do is to make a data frame using only the last outputs of each loop.
For example, using the computed data above, I'd like to make a frame that looks like
Trial Result
3 13
5 11
Is there any way to repeat this loop the number of times I want to and make a data frame using only the last outputs of each repeated function?
You can use a user defined function to do this. Since you haven't given your dataframe a, I've defined it as follows:
library(tidyverse)
a <- tibble(
Plus = 1:15,
Prob = seq(from = 15, to = 1, by = -1)
)
The following function does the same thing as your repeat loop, but stores the relevant results in a tibble. I've left your variable b out of this because as far as I can see, it doesn't contribute to your desired output.
samplefun <- function(a) {
c <- sample(a$Plus, size=length(a$Plus), prob=a$Prob)
res <- tibble(
Trial = which(c >= 10)[1],
Result = c[which(c >= 10)[1]]
)
return(res)
}
Then use map_dfr to return as many samples as you like:
nsamples <- 5
map_dfr(1:nsamples, ~ samplefun(a))
Output:
# A tibble: 5 x 2
Trial Result
<int> <int>
1 4 11
2 6 14
3 5 11
4 2 10
5 4 15

How to create a counter by comparing between random number and vector that contains probabilities in R?

I'm a beginner in R :)
I created the following dateframe (just an example, the real data frame contains +2000 rows):
RowNum <-c(1:5)
Prob <- c(0.5,0.3,0.8,0.2,0.9)
Attempts <- c(rep(0,5))
Succeeded <- c(rep(0,5))
Overall <- data.frame(RowNum,Prob,Attempts,Succeeded)
I want to create the following:
Creating random number (between 0-1) ->
A. If the random number is lower than the prob in the first row -> add 1 to Attempts and 1 to Succeeded (just for the first row) and then create another random number and move to the next row and check if is greater or lower - if is lower, add 1 to Attempts and 1 to Succeeded (just for the second row) and then create another random number and move to the next row...and so on...
B. If the random number is greater than the prob in the first row -> add 1 only to Attempts (just for the first row) and then create another random number and do the same but don't continue to the next row.
It's like creating a counter for each element in the vector, and to calculation should move to the next row only when the random number is lower than the prob.
Thanks for your help!
Made two functions to solve this. addone is a recursive function that will call itself again if the random is greater than Prob. make.new.df is just a for loop to cycle through the rows.
Confused about the succeeded column, only one success before moving on to the next row, so succeeded will only ever be 1?
set.seed(1)
addone <- function(vec) {
vec$Attempts <- vec$Attempts + 1
if(runif(1) < vec$Prob) {
vec$Succeeded <- vec$Succeeded + 1
} else {
vec <- addone(vec)
}
vec
}
make.new.df <- function(df) {
for(i in 1:nrow(df)) {
df[i, ] <- addone(df[i, ])
}
df
}
df1 <- make.new.df(Overall)
df1
Here is the output using set.seed(1)
RowNum Prob Attempts Succeeded
1 1 0.5 1 1
2 2 0.3 4 1
3 3 0.8 3 1
4 4 0.2 2 1
5 5 0.9 1 1

How to find the index of the value sampled?

In R, I would like to know how I can find the index/indices of the value(s) sampled, for examaple using function sample.
In Matlab, it appears this is quite easily done by requesting output argument idx in function datasample. Explictly, taken from Matlab's documentation page for function datasample:
[y,idx] = datasample(data,k,...) returns an index vector indicating
which values datasample sampled from data.
I would like to know if such a thing can be accomplished in R, and how.
Example:
set.seed(12)
sample(c(0.3,78,45,0.8,0.3,0.8,77), size=1, replace=TRUE)
0.3
How can I know which of the two 0.3's was that one?
We can created a named vector and then sample
v1 <- c(LETTERS[1:10], LETTERS[1])
names(v1) <- seq_along(v1)
v2 <- sample(v1, 20, replace=TRUE)
as.integer(names(v2))
#[1] 10 11 4 2 1 4 6 9 1 1 2 9 2 2 2 3 4 7 3 6
Using the OP's data
set.seed(12)
v1 <- c(0.3,78,45,0.8,0.3,0.8,77)
names(v1) <- seq_along(v1)
set.seed(12)
sample(v1, size=1, replace=TRUE)
# 1
#0.3

R looping through multiple dataframes in a list

Attempting to calculate differences between every two values in a row then sum the total differences for each dataframe in a list. I know for/while loops in R absolutely suck. I had this working before, but I've broken it. Can someone suggest how to optimize this using an alternative in the apply family? Current code:
for (i in 1:length(refdata)) { #for each dataframe in a list
refdif <- as.data.frame(matrix(0, ncol = 1, nrow = nrow(refdata[[i]])))
refdif1 <- c()
for (z in 1:ncol(refdata[[i]])) { #for each column in a dataframe
for(x in 1:nrow(refdata[[i]])) { #for each row in a dataframe
refdif <- (refdata[[i]][x,z] - refdata[[i]][x,z+1]) #difference of first value + the enxt
refdif1[x,1] <- (refdif1[x,1] + refidf) #sum of latest difference
}
}
print(refdif1) #where I can conduct tests on each individual dataframe with a column of sums of differences
}
example data:
list 1 refdata[[1]]
$`1`
var1 var2 var3 var4
1 1 2 3 4
2 5 6 7 8
$`2`
var1 var2 var3 var4
1 1 2 3 4
2 5 6 7 8
var 1 + 2 has the difference calculated, var 3 and 4 has the difference calculated, then each difference is summed together and placed in a new dataframe in a single column. (5-6) + (7-8), (1-2) + (3-4), etc etc:
$`1`
dif
1 -2
2 -2
$`2`
dif
1 -2
2 -2
One way to do it (per unlisted dataframe) could be by using logical vectors for indexing - their values are recycled - that way calculating the difference between every other column and finally summing the resulting df row-wise.
refdata1<-rowSums(refdata[c(T,F)]-refdata[c(F,T)])
Edit
Exact output can be obtained by
lapply(refdata, function(df){ data.frame(dif=rowSums(df[c(T,F)]-df[c(F,T)])) })
thx Heroka
# Create test data
x <- rbind(1:4, 5:8)
refdata <- list(x,x)
# Calculate results (all elements should have an even number of columns)
lapply(refdata, FUN = function(x) x %*% rep_len(c(1, -1), NCOL(x)))

'Random' Sorting with a condition in R for Psychology Research

I have Valence Category for word stimuli in my psychology experiment.
1 = Negative, 2 = Neutral, 3 = Positive
I need to sort the thousands of stimuli with a pseudo-randomised condition.
Val_Category cannot have more than 2 of the same valence stimuli in a row i.e. no more than 2x negative stimuli in a row.
for example - 2, 2, 2 = not acceptable
2, 2, 1 = ok
I can't sequence the data i.e. decide the whole experiment will be 1,3,2,3,1,3,2,3,2,2,1 because I'm not allowed to have a pattern.
I tried various packages like dylpr, sample, order, sort and nothing so far solves the problem.
I think there's a thousand ways to do this, none of which are probably very pretty. I wrote a small function that takes care of the ordering. It's a bit hacky, but it appeared to work for what I tried.
To explain what I did, the function works as follows:
Take the vector of valences and samples from it.
If sequences are found that are larger than the desired length, then, (for each such sequence), take the last value of that sequence at places it "somewhere else".
Check if the problem is solved. If so, return the reordered vector. If not, then go back to 2.
# some vector of valences
val <- rep(1:3,each=50)
pseudoRandomize <- function(x, n){
# take an initial sample
out <- sample(val)
# check if the sample is "bad" (containing sequences longer than n)
bad.seq <- any(rle(out)$lengths > n)
# length of the whole sample
l0 <- length(out)
while(bad.seq){
# get lengths of all subsequences
l1 <- rle(out)$lengths
# find the bad ones
ind <- l1 > n
# take the last value of each bad sequence, and...
for(i in cumsum(l1)[ind]){
# take it out of the original sample
tmp <- out[-i]
# pick new position at random
pos <- sample(2:(l0-2),1)
# put the value back into the sample at the new position
out <- c(tmp[1:(pos-1)],out[i],tmp[pos:(l0-1)])
}
# check if bad sequences (still) exist
# if TRUE, then 'while' continues; if FALSE, then it doesn't
bad.seq <- any(rle(out)$lengths > n)
}
# return the reordered sequence
out
}
Example:
The function may be used on a vector with or without names. If the vector was named, then these names will still be present on the pseudo-randomized vector.
# simple unnamed vector
val <- rep(1:3,each=5)
pseudoRandomize(val, 2)
# gives:
# [1] 1 3 2 1 2 3 3 2 1 2 1 3 3 1 2
# when names assigned to the vector
names(val) <- 1:length(val)
pseudoRandomize(val, 2)
# gives (first row shows the names):
# 1 13 9 7 3 11 15 8 10 5 12 14 6 4 2
# 1 3 2 2 1 3 3 2 2 1 3 3 2 1 1
This property can be used for randomizing a whole data frame. To achieve that, the "valence" vector is taken out of the data frame, and names are assigned to it either by row index (1:nrow(dat)) or by row names (rownames(dat)).
# reorder a data.frame using a named vector
dat <- data.frame(val=rep(1:3,each=5), stim=rep(letters[1:5],3))
val <- dat$val
names(val) <- 1:nrow(dat)
new.val <- pseudoRandomize(val, 2)
new.dat <- dat[as.integer(names(new.val)),]
# gives:
# val stim
# 5 1 e
# 2 1 b
# 9 2 d
# 6 2 a
# 3 1 c
# 15 3 e
# ...
I believe this loop will set the Valence Category's appropriately. I've called the valence categories treat.
#Generate example data
s1 = data.frame(id=c(1:10),treat=NA)
#Setting the first two rows
s1[1,"treat"] <- sample(1:3,1)
s1[2,"treat"] <- sample(1:3,1)
#Looping through the remainder of the rows
for (i in 3:length(s1$id))
{
s1[i,"treat"] <- sample(1:3,1)
#Check if the treat value is equal to the previous two values.
if (s1[i,"treat"]==s1[i-1,"treat"] & s1[i-1,"treat"]==s1[i-2,"treat"])
#If so draw one of the values not equal to that value
{
a = 1:3
remove <- s1[i,"treat"]
a=a[!a==remove]
s1[i,"treat"] <- sample(a,1)
}
}
This solution is not particularly elegant. There may be a much faster way to accomplish this by sorting several columns or something.

Resources