I have this population:
MyPopulation <- c(1:100)
and I want to create a data frame of 40 columns and 5 lines. Each column has to be a random sample of MyPopulation, so I try this:
MySample <- data.frame(NoSample = c(1:5))
for (i in 1:40) {
MySample$i <- sample(MyPopulation,5)
}
The result is a data frame with only 1 more column (named i) with a random sample as values.
What am I doing wrong?
The easiest solution probably would be
MyPopulation <- c(1:100)
MySample <- data.frame(NoSample = c(1:5))
for (i in 1:40) {
MySample[,i+1] <- sample(MyPopulation,5)
}
You cannot assign new columns that way, try MySample[paste(i)] = ...
That is you cannot assign a numeric value to a column, hence strings.
Maybe you can try replicate + as.data.frame
MySample <- as.data.frame(replicate(40,sample(MyPopulation,5)))
You can also create a single stream of random values and then state the column dimension in a matrix with the row count being imputed:
m <- matrix(sample(1:1000, 200, replace = TRUE), ncol = 40)
df <- as.data.frame(m)
Related
Hello everyone I have two data frame trying to do bootstrapping with below script1 in my script1 i am taking number of rows from data frame one and two. Instead of taking rows number from entire data frame I wanted split individual columns as a data frame and remove the zero values and than take the row number than do the bootstrapping using below script. So trying with script2 where I am creating individual data frame from for loop as I am new to R bit confused how efficiently do add the script1 function to it
please suggest me below I am providing script which is running script1 and the script2 I am trying to subset each columns creating a individual data frame
Script1
set.seed(2)
m1 <- matrix(sample(c(0, 1:10), 100, replace = TRUE), 10)
m2 <- matrix(sample(c(0, 1:5), 50, replace = TRUE), 5)
m1 <- as.data.frame(m1)
m2 <- as.data.frame(m2)
nboot <- 1e3
n_m1 <- nrow(m1); n_m2 <- nrow(m2)
temp<- c()
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
value <- colSums(m2)/colSums(m1[boot,])
temp <- rbind(temp, value)
}
boot_data<- apply(temp, 2, median)
script2
for (i in colnames(m1)){
m1_subset=(m1[m1[[i]] > 0, ])
m1_subset=m1_subset[i]
m2_subset=m2[m2[[i]] >0, ]
m2_subset=m2_subset[i]
num_m1 <- nrow(m1_subset); n_m2 <- nrow(m2_subset)# after this wanted add above script changing input
}
If I understand correctly, you want to do the sampling and calculation on each column individually, after removing the 0 values. I. modified your code to work on a single vector instead of a dataframe (i.e., using length() instead of nrow() and sum() instead of colSums(). I also suggest creating the empty matrix for your results ahead of time, and filling in -- it will be fasted.
temp <- matrix(nrow = nboot, ncol = ncol(m1))
for (i in seq_along(m1)){
m1_subset = m1[m1[,i] > 0, i]
m2_subset = m2[m2[,i] > 0, i]
n_m1 <- length(m1_subset); n_m2 <- length(m2_subset)
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
temp[j, i] <- sum(m2_subset)/sum(m1_subset[boot])
}
}
boot_data <- apply(temp, 2, median)
boot_data <- setNames(data.frame(t(boot_data)), names(m1))
boot_data
I want to do an operation if each data frame of a list. I want to perform the Kolmogorov–Smirnov (KS) test for one column in each data frame. I am using the code below but it is not working:
PDF_mean <- matrix(nrow = length(siteNumber), ncol = 4)
PDF_mean <- data.frame(PDF_mean)
names(PDF_mean) <- c("station","normal","gamma","gev")
listDF <- mget(ls(pattern="DSF_moments_"))
length(listDF)
i <- 1
for (i in length(listDF)) {
PDF_mean$station[i] <- siteNumber[i]
PDF_mean$normal[i] <- ks.test(list[i]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(list[i]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(list[i]$mean,"gamma")$p.value
}
Any help?
It is not length(listDF) instead, it would be seq_along(listDF) or 1:length(listDF) (however, it is more appropriate with seq_along) because length is a single value and it is not doing any loop
for(i in seq_along(listDF)) {
PDF_mean$station[i] <- listDF[[i]]$siteNumber
PDF_mean$normal[i] <- ks.test(listDF[[i]]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(listDF[[i]]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(listDF[[i]]$mean,"gamma")$p.value
}
I want to sample 60 random rows 1000 times with the replace=TRUE and calculate the correlation coefficients between first and second columns in each sample.
I don't know how to sample row randomly, so I tried to sample 60 numbers in 1:60, and matched the row numbers.
The row data is 60x2 matrix which is called data1.
My code is
k <- list()
data.sam <- list()
set.seed(1)
for (j in 1:60){
for (i in 1:1000){
k[[i]] <- sample(1:60, 60, replace = TRUE)
}
data.sam[[i]][j,] <- data1[k[[i]][j],]
corr <- vector()
corr[i] <- cor(data.sam[[i]][,1],data.sam[[i]][,2])
}
And the statement is showed:
Error in `*tmp*`[[i]] : subscript out of bounds
It doesn't look like the j variable is doing very much. Your indexing is already vectorized by k[[i], so you don't need two loops explicitly. Also don't reset the corr variable inside the loop.
Instead, I might write:
data1 <- matrix(rnorm(120), 60,2)
for (i in 1:1000){
k[[i]] <- sample(1:60, 60, replace = TRUE)
data.sam[[i]] <- data1[k[[i]],]
corr[i] <- cor(data.sam[[i]][,1],data.sam[[i]][,2])
}
Which give this:
hist(corr)
I want to write a function that will create n random samples of a data set without replacement.
In this example I am using the iris data set.
The iris data set has 150 observations and say I want 10 samples.
My attempt:
#load libraries
library(dplyr)
# load the data
data(iris)
head(iris)
# name df
df = iris
# set the number of samples
n = 10
# assumption: the number of observations in df is divisible by n
# set the number of observations in each sample
m = nrow(df)/n
# create a column called row to contain initial row index
df$row = rownames(df)
# define the for loop
# that creates n separate data sets
# with m number of rows in each data set
for(i in 1:n){
# create the sample
sample = sample_n(df, m, replace = FALSE)
# name the sample 'dsi'
x = assign(paste("ds",i,sep=""),sample)
# remove 'dsi' from df
df = df[!(df$row %in% x$row),]
}
When I run this code I get what I want.
I get the random samples named ds1,ds2,...,ds10.
Now when I try to turn it into a function:
samplez <- function(df,n){
df$row = rownames(df)
m = nrow(df)/n
for(i in 1:n){
sample = sample_n(df, m, replace = FALSE)
x = assign(paste("ds",i,sep=""),sample)
df = df[!(df$row %in% x$row),]
}
}
Nothing happens when I execute 'samplez(iris,10)'. What am I missing?
Thanks
Just save the results in a list and return that. Then you'll have a single object, the list of samples, in your global environment, rather than cluttering up your environment with a bunch of similar data frames.
I'm not sure what you're trying to do with df, but here is how to return all of the samples. Let me know what you want to do with df and I can add that as well:
samplez <- function(df,n){
samples = list()
df$row = rownames(df)
m = nrow(df)/n
for(i in 1:n){
samples[[paste0("ds",i)]] = sample_n(df, m, replace = FALSE)
df = df[!(df$row %in% samples[[i]]$row),]
}
return(samples)
}
I have a data frame that contains a list of row positions, column positions and values, like so:
combs <- as.data.frame(t(combn(1:10,2)))
colnames(combs) <- c('row','column')
combs$value <- rnorm(nrow(combs))
I would like to fill in a matrix with these values such that every value appears in the matrix in exactly the position specified by row and column. I suppose I could do this manually
mat <- matrix(nrow=10,ncol=10)
for(i in 1:nrow(combs)) {
mat[combs[i,'row'],combs[i,'column']] <- combs[i,'value']
}
But surely there is a more elegant way to accomplish this in R?
Like this:
mat <- matrix(nrow = 10, ncol = 10)
mat[cbind(combs$row, combs$column)] <- combs$value
You could also consider building a sparse matrix using the Matrix package:
library(Matrix)
mat <- sparseMatrix(i = combs$row, j = combs$column, x = combs$value,
dims = c(10, 10))