Randomly selecting from a subset of rows

Randomly selecting from a subset of rows - r

I have data in blocks[[i]] where i = 4 to 6 like so
Stimulus Response PM
stretagost s <NA>
colpublo s <NA>
zoning d <NA>
epilepsy d <NA>
resumption d <NA>
incisive d <NA>
440 rows in each block[[i]].
Currently my script does some stuff to 1 randomly selected item out of every 15 trials (except for the first 5 trials every 110, also I have it set so I can never choose rows less than 2 apart) for each block [[i]].
What I would like to be able to do is do stuff to 1 item from every 15 trials, randomly selected out of only those where response == "d". i.e., I don't want my random selection to ever do stuff to rows where response=="s". I have no idea how to achieve this but here is the script I have so far, which just randomly chooses 1 row out of each 15:
PMpositions <- list()
for (i in 4:6){
positions <- c()
x <- 0
for (j in c(seq(5, 110-15, 15),seq(115, 220-15, 15),seq(225, 330-15, 15),seq(335,440-15, 15)))
{
sub.samples <- setdiff(1:15 + j, seq(x-2,x+2,1))
x <- sample(sub.samples, 1)
positions <- c(positions,x)
}
PMpositions[[i]] <- positions
blocks[[i]]$Response[PMpositions[[i]]] <- Wordresponse
blocks[[i]]$PM[PMpositions[[i]]] <- PMresponse
blocks[[i]][PMpositions[[i]],]$Stimulus <- F[[i]]
}
I ended up dealing with it like so
PMpositions <- list()
for (i in 1:3){
startingpositions <- c(seq(5, 110-15, 15),seq(115, 220-15, 15),seq(225, 330-15,
15),seq(335, 440-15, 15))
positions <- c()
x <- 0
for (j in startingpositions)
{
sub.samples <- setdiff(1:15 + j, seq(x-2,x+2,1))
x <- sample(sub.samples, 1)
positions <- c(positions,x)
}
repeat {
positions[which(blocks[[i]][positions,2]==Nonwordresponse)]<-
startingpositions[which(blocks[[i]][positions,2]==Nonwordresponse)]+sample(1:15,
size=length(which(blocks[[i]][positions,2]==Nonwordresponse)), replace = TRUE)
distancecheck<- which ( abs( c(positions[2:length(positions)],0)-positions ) < 2)
if (length(positions[which(blocks[[i]][positions,2]==Nonwordresponse)])== 0 & length
(distancecheck)== 0) break
}
PMpositions[[i]] <- positions
blocks[[i]]$Response[PMpositions[[i]]] <- Wordresponse
blocks[[i]]$PM[PMpositions[[i]]] <- PMresponse
blocks[[i]][PMpositions[[i]],]$Stimulus <- as.character(NF[[i]][,1])
Nonfocal[[i]] <- blocks[[i]]
}
I realised when getting stuck on repeat loops that sometimes I have 15 "s" in response in a row! doh. Would be nice to be able to fix this but it is ok for what I need, when I get stuck I'm just running it again (the location of d/s are randomly generated).

EDIT: Here's a different approach that only samples 'd' rows. It's pretty customized code, but the main idea is to use the prob argument to only sample rows where "Response"=="d" and set the probably of sampling all other rows to zero.
Response <- rep(c("s","d"),220)
chunk <- sort(rep(1:30,15))[1:440] # chunks of 15 up to 440
# function to randomly sample from each set of 15 rows
sampby15 <- function(i){
sample((1:440)[chunk==i], 1,
# use the `prob` argument to only sample 'd' values
prob=rep(1,length=440)[chunk==i]*(Response=="d")[chunk==i])
}
s <- sapply(1:15,FUN=sampby15) # apply to each chunk to get sample rows
Response[s] # confirm only 'd' values
# then you have code to do whatever to those rows...

So the really basic function you'll want to operate on each block is like this:
subsetminor <- function(dataset, only = "d", rows = 1) {
remainder <- subset(dataset, Response == only)
return(remainder[sample(1:nrow(remainder), size = rows), ])
}
We can spruce it up a bit to avoid rows next to each other:
subsetminor <- function(dataset, only = "d", rows = 1) {
remainder <- subset(dataset, Response == only)
if(rows > 1) {
sampled <- sample(1:nrow(remainder), size = rows)
pairwise <- t(combn(sampled, 2))
while(any(abs(pairwise[, 1] - pairwise[, 2]) <= 2)) {
sampled <- sample(1:nrow(remainder), size = rows)
pairwise <- t(combn(sampled, 2))
}
}
out <- remainder[sampled, ]
return(out)
}
The above can be simplified/DRY'd out quite a bit, but it should get the job done.

Related

for loop with 2 vectors to calculate power in R fails

I have 2 vectors containing numbers, I'm using to simulate power of my study but keeps getting this error at the for loop section
Error in pwr.2p2n.test(h, n1 = i, n2 = j, sig.level = 0.05) :
number of observations in the first group must be at least 2
would be grateful for your suggestions to get it working
##sample code
grp1.n <- seq(30,150,5) ##group 1, N
grp2.n <- seq(30,150,5)-15 ## group 2, N - 15
h=0.85 #specify large effect size
grp1.length <- length(grp1.n)
grp2.length <- length(grp2.n)
power.holder <- array(numeric(grp1.length*grp2.length), dim=c(grp1.length,grp2.length),dimnames=list(grp1.n,grp2.n))
for (i in 1:grp1.length){
for (j in 1:grp2.length){
result.pwr.2p2n.test <- pwr.2p2n.test(h, n1=i, n2=j, sig.level=0.05)
power.holder[i,j] <- ceiling(result.pwr.2p2n.test$power)
return(result.pwr.2p2n.test)
}
}

I'm not entirely sure if this is what you want, but I think it is:
grp1.n <- seq(30,150,5) ##group 1, N
grp2.n <- seq(30,150,5)-15 ## group 2, N - 15
h=0.85 #specify large effect size
grp1.length <- length(grp1.n)
grp2.length <- length(grp2.n)
power.holder <- array(numeric(grp1.length*grp2.length), dim=c(grp1.length,grp2.length),dimnames=list(grp1.n,grp2.n))
for (i in 1:grp1.length){
for (j in 1:grp2.length){
result.pwr.2p2n.test <- pwr.2p2n.test(h, n1=grp1.n[i], n2=grp2.n[j], sig.level=0.05)
power.holder[i,j] <- ceiling(result.pwr.2p2n.test$power)
return(power.holder)
}
}
The changes are in the pwr.2p2n.test function as well as the object you want to return.
Old: pwr.2p2n.test(h, n1=i, n2=j, sig.level=0.05)
New: pwr.2p2n.test(h, n1=grp1.n[i], n2=grp2.n[j], sig.level=0.05)
Note there was also a missing } bracket in your code.

R generates NA_real vector in while loop, but not when code line is run separately, how to fix the loop?

I'm trying to "pseudo-randomize" a vector in R using a while loop.
I have a vector delays with the elements that need to be randomized.
I am using sample on a vector values to index randomly into delays. I cannot have more than two same values in a row, so I am trying to use an if else statement. If the condition are met, the value should be added to random, and removed from delays.
When I run the individual lines outside the loop they are all working, but when I try to run the loop, one of the vector is populated as NA_real, and that stops the logical operators from working.
I'm probably not great at explaining this, but can anyone spot what I'm doing wrong? :)
delay_0 <- rep(0, 12)
delay_6 <- rep(6, 12)
delays <- c(delay_6, delay_0)
value <- c(1:24)
count <- 0
outcasts <- c()
random <- c(1,2)
while (length(random) < 27) {
count <- count + 1
b <- sample(value, 1, replace = FALSE)
a <- delays[b]
if(a == tail(random,1) & a == head(tail(random,2),1) {
outcast <- outcasts + 1
}
else {
value <- value[-(b)]
delays <- delays[-(b)]
random <- c(random,a)
}
}

Two problems with your code:
b can take a value that is greater than the number of elements in delays. I fixed this by using sample(1:length(delays), 1, replace = FALSE)
The loop continues when delays is empty. You could either change length(random) < 27 to length(random) < 26 I think or add length(delays) > 0.
The code:
delay_0 <- rep(0, 12)
delay_6 <- rep(6, 12)
delays <- c(delay_6, delay_0)
value <- c(1:24)
count <- 0
outcasts <- c()
random <- c(1, 2)
while (length(random) < 27 & length(delays) > 0) {
count <- count + 1
b <- sample(1:length(delays), 1, replace = FALSE)
a <- delays[b]
if (a == tail(random, 1) & a == head((tail(random, 2)), 1))
{
outcast <- outcasts + 1
}
else {
value <- value[-(b)]
delays <- delays[-(b)]
random <- c(random, a)
}
}

Trouble speeding up algorithm

I have made an algorithm in R to combine multiple sensor readings together under one timestamp.
Most sensor readings are taken every 500ms but some sensors only report changes. Therefor I had to make an algorithm that takes the last known value of a sensor at a given time.
Now the algorithm works, however it is so slow that when i would start using it for the actual 20+ sensors it would take ages to complete. My hypothesis is that it is slow because of my use of dataframes or the way I access and move my data.
I have tried making it faster by only walking trough every dataframe once and not iterating over them for every timestamp. I have also preallocated all space needed for the data.
Any suggestions would be very welcome. I am very new to the R language so I don't really know what datatypes are slow and which are fast.
library(tidyverse)
library(tidytext)
library(stringr)
library(readr)
library(dplyr)
library(pracma)
# take a list of dataframes as a parameter
generalise_data <- function(dataframes, timeinterval){
if (typeof(dataframes) == "list"){
# get the biggest and smallest datetime stamp from every dataframe
# this will be used to calculate the size of the resulting frame ((largest time - smallest time)/1000 = dataframe rows)
# this means one value every second
largest_time <- 0
smallest_time <- as.numeric(Sys.time())*1000 # everything will be smaller than the current time
for (i in 1:length(dataframes)){
dataframe_max <- max(dataframes[[i]]$TIMESTAMP)
dataframe_min <- min(dataframes[[i]]$TIMESTAMP)
if (dataframe_max > largest_time) largest_time <- dataframe_max
if (dataframe_min < smallest_time) smallest_time <- dataframe_min
}
# result dataframe wil have ... rows
result.size <- floor((largest_time - smallest_time)/timeinterval)
sprintf("Result size: %i", result.size)
# create a numeric array that contains the indexes of every dataframe, all set to 1
dataframe_indexes <- numeric(length(dataframes))
dataframe_indexes[dataframe_indexes == 0] <- 1
# data vectors for the dataframe
result.timestamps <- numeric(result.size)
result <- list(result.timestamps)
for (i in 2:(length(dataframes)+1)) result[[i]] <- numeric(result.size) # add an empty vector for every datapoint
# use progressbar
pb <- txtProgressBar(1, result.size, style = 3)
# make a for loop to run through every data row of the resulting data frame (creating a row every run through)
# every run through increase the index of dataframes until the resulting row exceeds the result rows timestamp, than go one index back
#for (i in 1:200){
for (i in 1:result.size){
current_timestamp <- smallest_time + timeinterval*(i-1)
result[[1]][i] <- current_timestamp
for (i2 in 1:length(dataframes)){
while (dataframes[[i2]]$TIMESTAMP[dataframe_indexes[i2]] < current_timestamp && dataframes[[i2]]$TIMESTAMP[dataframe_indexes[i2]] != max(dataframes[[i2]]$TIMESTAMP)){
dataframe_indexes[i2] <- dataframe_indexes[i2]+1
}
if (dataframe_indexes[i2] > 1){
dataframe_indexes[i2] <- dataframe_indexes[i2]-1 # take the one that's smaller
}
result[[i2+1]][i] <- dataframes[[i2]]$VALUE[dataframe_indexes[i2]]
}
setTxtProgressBar(pb, i)
}
close(pb)
result.final <- data.frame(result)
return(result.final)
} else {
return(NA)
}
}

I fixed it today by changing every dataframe to a matrix. The code ran in 9.5 seconds instead of 70 minutes.
Conclusion: dataframes are VERY bad for performance.
library(tidyverse)
library(tidytext)
library(stringr)
library(readr)
library(dplyr)
library(pracma)
library(compiler)
# take a list of dataframes as a parameter
generalise_data <- function(dataframes, timeinterval){
time.start <- Sys.time()
if (typeof(dataframes) == "list"){
# store the sizes of all the dataframes
resources.largest_size <- 0
resources.sizes <- numeric(length(dataframes))
for (i in 1:length(dataframes)){
resources.sizes[i] <- length(dataframes[[i]]$VALUE)
if (resources.sizes[i] > resources.largest_size) resources.largest_size <- resources.sizes[i]
}
# generate a matrix that can hold all needed dataframe values
resources <- matrix(nrow = resources.largest_size, ncol = length(dataframes)*2)
for (i in 1:length(dataframes)){
j <- i*2
resources[1:resources.sizes[i],j-1] <- dataframes[[i]]$TIMESTAMP
resources[1:resources.sizes[i],j] <- dataframes[[i]]$VALUE
}
# get the biggest and smallest datetime stamp from every dataframe
# this will be used to calculate the size of the resulting frame ((largest time - smallest time)/1000 = dataframe rows)
# this means one value every second
largest_time <- 0
smallest_time <- as.numeric(Sys.time())*1000 # everything will be smaller than the current time
for (i in 1:length(dataframes)){
dataframe_max <- max(dataframes[[i]]$TIMESTAMP)
dataframe_min <- min(dataframes[[i]]$TIMESTAMP)
if (dataframe_max > largest_time) largest_time <- dataframe_max
if (dataframe_min < smallest_time) smallest_time <- dataframe_min
}
# result dataframe wil have ... rows
result.size <- floor((largest_time - smallest_time)/timeinterval)
sprintf("Result size: %i", result.size)
# create a numeric array that contains the indexes of every dataframe, all set to 1
dataframe_indexes <- numeric(length(dataframes))
dataframe_indexes[dataframe_indexes == 0] <- 1
# data matrix for the result
result <- matrix(data = 0, nrow = result.size, ncol = length(dataframes)+1)
# use progressbar
pb <- txtProgressBar(1, result.size, style = 3)
# make a for loop to run through every data row of the resulting data frame (creating a row every run through)
# every run through increase the index of dataframes until the resulting row exceeds the result rows timestamp, than go one index back
#for (i in 1:200){
for (i in 1:result.size){
current_timestamp <- smallest_time + timeinterval*(i-1)
result[i,1] <- current_timestamp
for (i2 in 1:length(dataframes)){
j <- i2*2
while (resources[dataframe_indexes[i2],j-1] < current_timestamp && resources[dataframe_indexes[i2],j-1] != resources.sizes[i2]){
dataframe_indexes[i2] <- dataframe_indexes[i2]+1
}
# at the moment the last value of the array is never selected, needs to be fixed
if (dataframe_indexes[i2] > 1){
dataframe_indexes[i2] <- dataframe_indexes[i2]-1 # take the one that's smaller
}
result[i,i2+1] <- resources[dataframe_indexes[i2], j] #dataframes[[i2]]$VALUE[dataframe_indexes[i2]]
}
setTxtProgressBar(pb, i)
}
close(pb)
result.final <- data.frame(result)
time.end <- Sys.time()
print(time.end-time.start)
return(result.final)
} else {
return(NA)
}
}

Calculating distance and subsetting with multiple for loops

Everyone. I'm trying to filter GPS location data based on distance (UTMs) and time (H:M:S) criteria independently and concurrently. Here's the data structure:
head(collar)
FID animal date time zone easting northing
1 URAM01_2012 6/24/2012 10:00:00 AM 13S 356664 3971340
2 URAM01_2012 6/24/2012 1:02:00 PM 13S 356760 3971480
3 URAM01_2012 6/24/2012 4:01:00 PM 13S 357482 3972325
4 URAM01_2012 6/24/2012 7:01:00 PM 13S 356882 3971327
5 URAM01_2012 6/25/2012 4:01:00 AM 13S 356574 3971765
6 URAM01_2012 6/25/2012 7:01:00 AM 13S 357796 3972231
Right now I'm filtering by distance only but I'm having some issues. The code should calculate the distance between FID[1] and FID[2] and then assign that distance to FID[1] in a new column ($step.length). After all distances have been calculated, the data is then subsetted based on a distance rule. Right now I have it set to where I want all locations that are >200m apart. Once subsetted, the process is then repeated until the distance between all subsequent locations is >200m. Here's the code that I've written that accomplishes only a portion of what I'd like to do:
reps <- 10
#Begin loop for the number of reps. Right now it's at 10 just to see if the code works.
for(rep in 1:reps){
#Begin loop for the number of GPS locations in the file
for(i in 1:length(collar$FID)){
#Calculate the distance between a GPS location and the next GPS locations. the formula is the hypotenuse of the Pythagorean theorem.
collar$step.length[i] <- sqrt(((collar$easting[i] - collar$easting[i+1])^2) + ((collar$northing[i] - collar$northing[i+1])^2))
}
#Subset the data. Select all locations that are >200m from the next GPS location.
collar <- subset(collar, step.length >200)
}
Now, the code isn't perfect and I would like to add 2 conditions into the code.
1.) Animal ID isn't considered. Therefore, a distance for the last location of an animal will be generated using the first location of a new animal when the distance should be NA. I thought using for(i in 1:unique(collar$animal)) might work, but it didn't (shocking) and I'm not sure what to do since for(i in length(collar$animal)) doesn't use only unique values.
2.) I'd also like to insert a break in the for loop when all locations that are >200m. I'm sure there has to be a better way of doing this, but I thought I'd set reps to something large (e.g., 10000) and once a criteria was met then R would break:
if(collar$step.length > 200){
break }
Yet, since the if condition is >1 only the first element is used. I've haven't thought about time or distance/time yet, but if anyone has any suggestions for those endeavors, I'd appreciate the advice. Thanks for your help and guidance.

I don't quite understand what you are trying to do with the reps but you can take advantage of the split and unsplit functions to focus on each individual animal.
First I created a distance() function that finds the columns named easting and northing from the object to create a vector of distances. Then we split collar up by the animal, and apply the distance function to each animal. We add this list of distances to the list of animals with some mapply code and then unsplit the results to make everything go back together.
Let me know what you want to do with the ">200" step.
distance <- function(x){
easting <- x$easting
northing <- x$northing
easting2 <- c(easting[-1], NA)
northing2 <- c(northing[-1], NA)
sqrt((easting - easting2)^2 + (northing - northing2)^2)
}
s <- split(collar, collar$animal)
distances <- lapply(s, distance)
s2 <- mapply(cbind, s, "Distance" = distances, SIMPLIFY = F)
collar.new <- unsplit(s2, collar$animal)
EDIT:
Apologies if this is cumbersome, I'm sure I can get it shorter but as of now let me know if it works for you. I would also be curious to see how fast it runs as I have been making up my own data.
filterout <- function(input, value = NULL){
# requirements of the input object
stopifnot(all(c("FID","animal","easting","northing") %in% colnames(input)))
distance <- function(x){ # internal distance function
e1 <- x$easting; e2 <- c(NA, e1[-nrow(x)])
n1 <- x$northing; n2 <- c(NA, n1[-nrow(x)])
sqrt((e1 - e2)^2 + (n1 - n2)^2)
}
nc <- ncol(input) # save so we can "rewrite" Distance values each reiteration
f <- function(input){ # the recursive function (will run until condition is met)
z <- split(input[,-(nc+1)], input$animal) # split by animal & remove (if any) prior Distance column
distances <- lapply(z, distance) # collect distances
z2 <- mapply(cbind, z, "Distance" = distances, SIMPLIFY = F) # attach distances
r1 <- lapply(z2, function(x) { # delete first row under criteria
a <- x$Distance < value # CRITERIA
a[is.na(a)] <- FALSE # Corrects NA values into FALSE so we don't lose them
first <- which(a == T)[1] # we want to remove one at a time
`if`(is.na(first), integer(0), x$FID[first]) # returns FIDs to remove
})
z3 <- unsplit(z2, input$animal)
# Whether to keep going or not
if(length(unlist(r1)) != 0){ # if list of rows under criteria is not empty
remove <- which(z3$FID %in% unlist(r1, use.names = F)) # remove them
print(unlist(r1, use.names = F)) # OPTIONAL*** printing removed FIDs
f(z3[-remove,]) # and run again
} else {
return(z3) # otherwise return the final list
}
}
f(input)
}
And the function can be used as follows:
filterout(input = collar, value = 200)
filterout(input = collar, value = 400)
filterout(input = collar, value = 600)
EDIT2:
I opened up a bounty question to figure out how to do a certain step but hopefully this answer helps. It might take a little ~ a minute to do 37k rows but let me know~
x <- collar
skipdistance <- function(x, value = 200){
d <- as.matrix(dist(x[,c("easting","northing")]))
d[lower.tri(d)] <- 0
pick <- which(d > value, arr.ind = T) # pick[order(pick[,"row"]),] # visual clarity
findConnectionsBase <- function(m) {
n <- nrow(m)
myConnections <- matrix(integer(0), nrow = n, ncol = 2)
i <- j <- 1L
k <- 2L
while (i <= n) {
myConnections[j, ] <- m[i, ]
while (k <= n && m[i, 2] != m[k, 1]) {k <- k + 1L}
i <- k
j <- j + 1L
}
myConnections[!is.na(myConnections[,1]), ]
}
keep.ind <- findConnectionsBase(pick)
keep.row <- unique(c(keep.ind))
cbind(x[keep.row,], Distance = c(NA,d[keep.ind]))
}
a <- do.call(rbind,lapply(split(x, x$animal), skipdistance, value = 200))
dim(a)
Edit #3:
library(lubridate) # great package for string -> dates
# changed to give just rows that satisfy greater than value criteria
skip <- function(dist.var, value = 200){
d <- as.matrix(dist(dist.var))
d[lower.tri(d)] <- 0
pick <- which(d > value, arr.ind = T) # pick[order(pick[,"row"]),] # visual clarity
findConnectionsBase <- function(m) {
n <- nrow(m)
myConnections <- matrix(integer(0), nrow = n, ncol = 2)
i <- j <- 1L
k <- 2L
while (i <= n) {
myConnections[j, ] <- m[i, ]
while (k <= n && m[i, 2] != m[k, 1]) {k <- k + 1L}
i <- k
j <- j + 1L
}
myConnections[!is.na(myConnections[,1]), ]
}
unique(c(findConnectionsBase(pick)))
}
collar <- structure(list(FID = 1:8, animal = c("URAM01_2012", "URAM01_2012", "URAM01_2012", "URAM01_2012", "URAM01_2013", "URAM01_2013", "URAM01_2013", "URAM01_2013"), date = c("6/24/2012", "6/24/2012", "6/24/2012", "6/24/2012", "6/25/2012", "6/25/2012", "6/25/2012", "6/25/2012" ), time = c("10:00:00AM", "1:02:00PM", "4:01:00PM", "7:01:00PM", "4:01:00AM", "7:01:00AM", "7:01:00AM", "7:01:00AM"), zone = c("13S", "13S", "13S", "13S", "13S", "13S", "13S", "13S"), easting = c(356664L,
356760L, 356762L, 356882L, 356574L, 357796L, 357720L, 357300L), northing = c(3971340L, 3971480L, 3971498L, 3971498L, 3971765L, 3972231L, 3972230L, 3972531L)), .Names = c("FID", "animal", "date", "time", "zone", "easting", "northing"), class = "data.frame", row.names = c(NA, -8L))
collar[skip(dist.var = collar[,c("easting","northing")],
value = 200),]
# dist function works on dates, but it makes sense to convert to hours
dist(lubridate::mdy_hms(paste(collar$date, collar$time)))
hours <- 2.99
collar[ skip(dist.var = lubridate::mdy_hms(paste(collar$date, collar$time)),
value = hours * 3600), ]

Big thanks and shout out to Evan for all of his hard work. Obviously, the code that he generated is a bit different than what I proposed, but that's the great thing about this community; sharing unique solutions ourselves may not think come to. See Edit #2 for the final code which filters GPS collar data by the distance between consecutive points.

Writing a for loop with the output as a data frame in R

I am currently working my way through the book 'R for Data Science'.
I am trying to solve this exercise question (21.2.1 Q1.4) but have not been able to determine the correct output before starting the for loop.
Write a for loop to:
Generate 10 random normals for each of μ= −10, 0, 10 and 100.
Like the previous questions in the book I have been trying to insert into a vector output but for this example, it appears I need the output to be a data frame?
This is my code so far:
values <- c(-10,0,10,100)
output <- vector("double", 10)
for (i in seq_along(values)) {
output[[i]] <- rnorm(10, mean = values[[i]])
}
I know the output is wrong but am unsure how to create the format I need here. Any help much appreciated. Thanks!

There are many ways of doing this. Here is one. See inline comments.
set.seed(357) # to make things reproducible, set random seed
N <- 10 # number of loops
xy <- vector("list", N) # create an empty list into which values are to be filled
# run the loop N times and on each loop...
for (i in 1:N) {
# generate a data.frame with 4 columns, and add a random number into each one
# random number depends on the mean specified
xy[[i]] <- data.frame(um10 = rnorm(1, mean = -10),
u0 = rnorm(1, mean = 0),
u10 = rnorm(1, mean = 10),
u100 = rnorm(1, mean = 100))
}
# result is a list of data.frames with 1 row and 4 columns
# you can bind them together into one data.frame using do.call
# rbind means they will be merged row-wise
xy <- do.call(rbind, xy)
um10 u0 u10 u100
1 -11.241117 -0.5832050 10.394747 101.50421
2 -9.233200 0.3174604 9.900024 100.22703
3 -10.469015 0.4765213 9.088352 99.65822
4 -9.453259 -0.3272080 10.041090 99.72397
5 -10.593497 0.1764618 10.505760 101.00852
6 -10.935463 0.3845648 9.981747 100.05564
7 -11.447720 0.8477938 9.726617 99.12918
8 -11.373889 -0.3550321 9.806823 99.52711
9 -7.950092 0.5711058 10.162878 101.38218
10 -9.408727 0.5885065 9.471274 100.69328
Another way would be to pre-allocate a matrix, add in values and coerce it to a data.frame.
xy <- matrix(NA, nrow = N, ncol = 4)
for (i in 1:N) {
xy[i, ] <- rnorm(4, mean = c(-10, 0, 10, 100))
}
# notice that i name the column names post festum
colnames(xy) <- c("um10", "u0", "u10", "u100")
xy <- as.data.frame(xy)

As this is a learning question I will not provide the solution directly.
> values <- c(-10,0,10,100)
> for (i in seq_along(values)) {print(i)} # Checking we iterate by position
[1] 1
[1] 2
[1] 3
[1] 4
> output <- vector("double", 10)
> output # Checking the place where the output will be
[1] 0 0 0 0 0 0 0 0 0 0
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
Error in output[[i]] <- rnorm(10, mean = values[[i]]) :
more elements supplied than there are to replace
As you can see the error say there are more elements to put than space (each iteration generates 10 random numbers, (in total 40) and you only have 10 spaces. Consider using a data format that allows to store several values for each iteration.
So that:
> output <- ??
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
> output # Should have length 4 and each element all the 10 values you created in the loop

# set the number of rows
rows <- 10
# vector with the values
means <- c(-10,0,10,100)
# generating output matrix
output <- matrix(nrow = rows,
ncol = 4)
# setting seed and looping through the number of rows
set.seed(222)
for (i in 1:rows){
output[i,] <- rnorm(length(means),
mean=means)
}
#printing the output
output

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Randomly selecting from a subset of rows - r

Related

for loop with 2 vectors to calculate power in R fails

R generates NA_real vector in while loop, but not when code line is run separately, how to fix the loop?

Trouble speeding up algorithm

Calculating distance and subsetting with multiple for loops

Writing a for loop with the output as a data frame in R

Categories

Resources