I intend to fill a matrix I created that has 1000 rows and 2 columns. Here B is 1000.
resampled_ests <- matrix(NA, nrow = B, ncol = 2)
names(resampled_ests) <- c("Intercept_Est", "Slope_Est")
I want to fill it using a for loop looping from 1 to 1000.
ds <- diamonds[resampled_values[b,],]
Here, each of the ds(there should be 1000 versions of it in the for loop) is a data frame with 2 columns and 2000 rows. and I would like to use the lm() function to get the Beta coefficients of the two columns of data.
for (b in 1:B) {
#Write code that fills in the matrix resample_ests with coefficent estimates.
ds <- diamonds[resampled_values[b,],]
lm2 <- lm(ds$price~ds$carat, data = ds)
rowx <- coefficients(lm2)
resampled_ests <- rbind(rowx)
}
However, after I run the loop, resampled_ests, which is supposed to be a matrix of 1000 rows only shows 1 row, 1 pair of coefficients. But when I test the code outside of the loop by replacing b with numbers, I get different results which are correct. But by putting them together in a for loop, I don't seem to be row binding all of these different pairs of coefficients. Can someone explain why the result matrix resampled_etsis only showing one result case(1 row) of data?
rbind(x) returns x because you're not binding it to anything. If you want to build a matrix row by row, you need something like
resampled_ests <- rbind(resampled_ests, rowx)
This also means you need to initialize resampled_ests before the loop.
Which, if you're doing that anyway, I might just make a 1000 x 2 matrix of zeros and fill in the rows in the loop. Something like...
resampled_ests <- matrix(rep(0, 2*B), nrow=B)
for (b in 1:B) {
ds <- diamonds[resampled_values[b,],]
lm2 <- lm(ds$price~ds$carat, data = ds)
rowx <- coefficients(lm2)
resampled_ests[b,] <- rowx
}
Related
I currently have a random bunch numbers listed from -1billion to positive 1 billion in a dataset, that is all. I want to write a function so that it will pull 5 random numbers 100,000 times from the dataset and then see how many times the number is below 0. Can I use the mapply function for this? or would another function be better.
The dataset is called numbers as has 2 columns, with the column i want to pull the numbers from called listofnumbers.
Currently i have a for loop but it seems to take forever to run, the code is below
```
n=5
m=100000
base_table <- as_tibble (0)
for (j in 1:m)
{
for (i in 1:n)
{
base_table[i,j] <- as_tibble(sample_n(numbers, 1) %>% pull(listofnumbers))
}
}
```
Can anyone help?
Here is a reduced size example:
r <- -100:100
n <- 10
collect <- 1000
output <- replicate(collect, sum(sample(r, n) < 0))
hist(output)
You would simply replace r, n and collect with your values. I.e., r = numbers$listofnumbers, n = 5, collect = 100000
I am looking to reorder my data into a new dataframe (list in the example below) which the first observation is first, then the last observation is second, both observations are removed from the initial dataframe and then repeat.
data <- seq(1,12,1)
i <- 1
ii <- 1:length(data)
newData <- seq(1,12,1)
for (i in ii){
a <- 1
newData[i] <- data[a]
i <- i + 1
b <- as.numeric(length(data))
newData[i]<- data[b]
index <- c(a, b)
data <- data[-index]
i <- i + 1
}
I receive the error: "Error in newData[i] <- data[b] : replacement has length zero" and the loop stops at i = 8, and the list "data" is empty.
If I run the contents of the loop, but not the loop itself, I get my desired outcome both in this example and my task; but obviously I want to run the loop given the size of my data.
As MrFlick mentioned, you can't modify index in a for loop. But given you only need every second index, you can specify that your loop definition, by using
ii <- seq(1,length(data),2)
However, you don't need a for loop for rearranging the elements of your vector data. you only need a vector of the form (firs, last, second, secon last, etc.):
m = matrix(c(1:6,12:7), ncol=2)
i = as.vector(t(m))
newdata = data[i]
search <- function(x,max_hp){
count <- 1
result <- matrix(NA, nrow =nrow(x), ncol = ncol(x))
for(i in 1:nrow(x)){
temp_row <- x[i,]
if(temp_row[4] < max_hp){
result[count,] <- temp_row
count <- count + 1
}
}
return(result)
}
I want to search the rows of mtcars data frame in R that have hp > 240
using a for loop (iterating over each row of the data frame) and then, return only the ones that match. But, my code doesn't work. I want to store each matched row in an empty matrix.
I have too few points to comment but I have a couple points to share. First, I agree with #Otto Kässi or #seeellayewhy. I would just add that if you don't whant any NAs in mtcars$hp to remain in your result, you need to use
result <- mtcars[which(mtcars$hp > 240),]
Regarding substituting rows, I would just follow the above command with
result <- rbind(result,newrows)
R will complain if any attributes of the columns in newrows are different than in result, especially if any of your columns are factor data types with any difference in the levels defined.
I've looked through previous help threads and haven't found something that has helped me with this specific problem. I know that a for loop would be a better way to generate the same data, but I'm interested in making this work with a repeat loop (mostly just as an exercise) and am struggling with the solution.
So I'm looping to create 3 iterations of 100 rnorm observations, changing the means each time from 5, to 25, to 45.
i <- 1
repeat{
x <- rnorm(100, mean = j, sd = 3)
j <- 5*i
i <- i + 4
if (j > 45) break
cat(x, "\n",j, "\n")
}
All of my tinkering to get a combined saved output for each iteration (for a total of 300 values) has failed. Help!
You can use lapply to get this:
lapply(c(5,25,45), function(x){
rnorm(100, mean = x, sd = 3)
})
This will give you a list with 3 elements:
Each containing 100 observations drawn from the respective normal-distribution.
Depends on what structure of data do you want.
For lists it would be:
r = list()
repeat{
r[[length(r)+1]] = list(x,j)
}
Then: r[[1]][[1]] will be x for 1 loop and r[[1]][[2]] would be j.
Since you know how many observations you want to store, you can pre-allocate a matrix of that size, and store the data in it as it's generated.
# preallocate the space for the values you want to store
x <- matrix(nrow=100, ncol=3)
# save the three means in a vector
j_vals <- c(5,25,45)
# if you really need a repeat loop you can do it like so:
i <- 1
repeat {
# save the random sample in a column of the matrix x
x[,i] <- rnorm(100, mean = j_vals[i], sd = 3)
# print the random sample to the console (you can omit this)
cat(x[,i], "\n",j_vals[i], "\n")
i <- i+1
if (i > 3) break
}
You should get out a matrix x with the random samples stored in the columns. You can access each column like x[,1], x[,2] etc.
I would like to be able create a new dataframe with 6 columns from an existing dataframe with 4 columns. The two extra columns should be the value of the counters (i and j) whilst the loop is working.
my draft code is as follows
a is binary,
b is categorical
c is a number (in this case 1 to 200)
d is a number (in this example 1 to 5, in real life 1 to 2500)
#### make an example of mydata
a<- c(0,0,0,0,0,0,0,0,0,0,1,1,0,1)
b<- c("a","b","a","b","b","c","a","e","c","a","a","b","d","f")
c<- c(20,30,40,40,54,76,23,23,78,23,34,1,88,1)
d<- c(1,1,1,2,2,2,3,3,4,5,5,5,5,5)
mydata<-data.frame(a,b,c,d)
## this just generates random numbers to randomly
##select row to bind together later
set.seed(1)
choose.test<- data.frame(matrix(NA, nrow = 20, ncol = 30))
for (i in 1:20)
{
choose.test[,i]<-sample(5, 20, replace = TRUE, prob = NULL)
#random selction of sites WITH replacment
}
# this is the bit I am having trouble with
data<- NULL
for( j in 1:10){
for (i in choose.test[,j])
{ data <- rbind(data, mydata[mydata[,4]== i,])
data[,5]<-j
data[,6]<-i
}}
It would also be acceptable to create separate dataframes at each loop iteration (in the second loop using i as a counter), or open to other better suggestions as I am new to r. I also tried using assign to do this with no luck.
At each iteration I need to rbind together all the rows in column 4 which have a value equal to a random number between 1 and 5 ( in this example anyway in real life it will be between 1 and 2500 sites). These random numbers are stored in a data frame, called choose.test , where the random numbers in each column is used only once then the next iteration moves onto the next column.
Without the "data[,5]<-j data[,6]<-i" it does what almost what I want , but I would really like to have a 5th and 6th column that identify which iteration of the i and j loop the rows were from so I can analyse the data at each iteration (I am bootstrapping with this data). Clearly the code above does not work, but I am not sure how to get it to do what I want. In the current version it just add the maximum counter value to all rows at columns 5 and 6.
Many thanks,
Ben
The following code fixed my problem
data<- NULL
for( j in 1:10){
for (i in choose.test[,j])
{ data <- rbind(data, cbind(mydata[mydata[,4]== i,], i=i, j=j))}}
Credit goes to MrFlick for providing a useful comment!