I have a numeric vector stock_data containing thousands of floating point numbers, I know i can sample them using
sample(stock_data, sample_size)
I want to take 100 different samples and populate them in a list of samples.
How do i do that without using a loop to append the samples to a list?
I thought of creating a list replicating the stock data 100 times then using lapply on them.
I tried:
all_repl <- as.list(rep(stock_data,100))
all_samples <- lapply(all_repl, sample, size=100)
But all_repl doesn't contain a list of data, it contains a single numeric vector which has replicated the data 100 times.
Can anyone suggest what's wrong and point out a better method to do what i want.
We can use replicate
replicate(100, sample(stock_data, sample_size))
Using simplify=FALSE get the output in a list. Using a reproducible example
replicate(5, sample(1:9, 5), simplify=FALSE)
Related
I'm new to R and I would like to know how to take a certain number of samples from a csv file made entirely of numbers in Excel. I managed to import the data to R and use each number as a row and then take random rows as samples but it seems impractical. The whole file is displayed as a column and I took some samples with the next code:
Heights[sample(nrow(Heights), 5), ]
[1] 1.84 1.65 1.73 1.70 1.72
Also please let me know if there is a way to repeat this step at least 100 times and save each sample in another chart maybe, to work with it later.
This is how you'd take 100 samples and store them:
my_samples <- replicate(100, Heights[sample(nrow(Heights), 5), ])
If your .csv file just comma separated values of one type (the heights), and not structured as a table, you may want to turn it into a vector instead. Most R functions that read textual formats of data are going to turn the data into a data frame or some other table like format.
heights <- unlist(strsplit(readLines("yourfile.csv"), ","))
readLines("yourfile.csv") with a .csv file of comma separated values will turn it into a character vector. strsplit() then does the separating work for you.
To put this all together, with a dummy example:
writeLines(c("1,2,3,4,5", "6,7,8,9,10"), "test.csv")
heights <- as.numeric(unlist(strsplit(readLines("test.csv"), ",")))
set.seed(123)
my_samples <- replicate(100, sample(heights, 5))
dim(my_samples)
# [1] 5 100
You can see that my_samples is a matrix of 5 rows (with each row corresponding to a single element sampled from heights), and 100 columns (with each column corresponding to one of one hundred sampling events).
You can use the infer package that is used for bootstrapping.
library(infer)
rep_sample_n(size = 100, replace = TRUE, reps = 1)
Here "size" is the number of samples. "replace" (if true) allows you to replace an observation when sampling - that is, you spin the roulette wheel without taking numbers off the wheel once they come up. 'reps' allows you to repeat the sampling process.
I am trying to construct a large sparse matrix with a split-apply-combine approach by separately calling sparse.model.matrix() from the package Matrix on subsets of columns of a dataframe and then binding them together into a full matrix. I have to do this because of memory limitations (I can't call sparse.model.matrix on the whole df at once). This process works fine, and I get a list of sparse matrices, but these have different dimensions and when I try to bind them together, I can't.
ex:
data(iris)
set.seed(100)
iris$v6 <- sample(c("a","b","c",NA), 150, replace=TRUE)
iris$v7 <- sample(c("x","y",NA), 150, replace = TRUE)
sparse_m1 <- sparse.model.matrix(~., iris[,1:5])
sparse_m2 <- sparse.model.matrix(~.-1, iris[, 6:7])
dim(sparse_m1)
[1] 150 7
dim(sparse_m2)
[1] 71 4
cbind2(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(sparse_m1, sparse_m2)
cbind(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(..1, r)
The matrices have the same row names, just some rows have been omitted from sparse_m2 because they had missing values in both columns. Is there any way to combine them?
I also tried using rbind.fill.matrix() from the plyr package, by first transposing and then calling it and then re-transposing, but then I lose column names since row names are ignored in rbind.fill.matrix.
Any ideas?
An old question still in need of an answer...
One approach is to create an empty Matrix of the required dimensions and then populate it:
m12.dimnames<-list(union(rownames(sparse_m1),rownames(sparse_m2)),c(colnames(sparse_m1),colnames(sparse_m2)))
m12<- Matrix(0,nrow=length(m12.dimnames[[1]]),ncol=length(m12.dimnames[[2]]),dimnames=m12.dimnames)
m12[rownames(sparse_m2),colnames(sparse_m2)]<-sparse_m1
m12[rownames(sparse_m2),colnames(sparse_m2)]<-sparse_m2
recently bumped in the same issue, and nowadays you can
install.packages("Matrix.utils")
library(Matrix.utils)
sparse_filled <- rBind.fill(sparse_m1, sparse_m2)
I need a function that recognises every x amount of columns as a separate site. So in df1 below there are 8 columns, with 4 sites each consisting of 2 variables. Previously, I have used a procedure like this as answered here Selecting column sequences and creating variables.
set.seed(24)
df1 <- as.data.frame(matrix(sample(0:20, 8*10, replace=TRUE), ncol=8))
I then need to calculate a column sum so that a total for each variable is obtained.
colsums <- as.data.frame(t(colSums(df1)))
I subsequently split the dataframe using this technique...
lst1 <- setNames(lapply(split(1:ncol(colsums), as.numeric(gl(ncol(colsums),
2, ncol(colsums)))), function(i) colsums[,i]), paste0('site', 1:4))
list2env(lst1, envir=.GlobalEnv)
And organise into one dataframe...
Combined <- as.matrix(mapply(c,site1,site2,site3,site4))
rownames(Combined) <- c("Site.1","Site.2","Site.3","Site.4")
Whilst this technique has been great on smaller dataframes, where there are a substantial amount of sites (>500) typing out each site following the mapply function takes up a lot of code and could lead to some sites getting missed off if I'm typing them all in manually. Is there an easy way to overcome this following the colsums stage?
A matrix is a vector with dimensions. Matrices are stored in column-major order in R.
The call matrix(colsums, nrow=2) should help you a lot.
NB.: Polluting the "global" environment is generally a bad idea.
I am sure there is a simple way to achieve this, but I cannot arrive at one based on existing questions. I have a matrix where although the View command outputs a dataframe-like feature, the structure is different. It consists of a series of lists with all the values being recognised within this as a separate numeric value.
Some example data where this data is outputted:
set.seed(24)
df1 <- as.data.frame(matrix(sample(0:20, 500*500, replace=TRUE), ncol=500))
colsums <- as.data.frame(t(colSums(df1)))
matrix <- matrix(colsums, nrow=2)
str(matrix)
I have tried as.data.frame and melt functions but they do not seem to help the problem.
One option is to unlist the whole thing and convert to a two-rowed matrix and then convert to a data.frame class, e.g.
df <- as.data.frame(matrix(unlist(matrix), nrow = 2))
I have a list containing 4 matrices, each with 21 random numbers in 3 columns and 7 rows.
I want to create new list using lapply function in which each matrix is sorted by the first column.
I tried:
#example data
set.seed(1)
list.a <- replicate(4, list(matrix(sample(1:99, 21), nrow=7)))
ordered <- order(list.a[,1])
lapply(list.a, function(x){[ordered,]})
but at the first step the R gives me error "incorrect number of dimensions". Don't know what to do. It works with one matrix, though.
Please help me. Thanks!
You were almost there - but you would need to iterate through the list to reorder each matrix.
Its easier to do this is one lapply statement
lapply(list.a, function(x) x[order(x[,1]),])
Note that x in the function call represents the matrices in the list.