Months ago, I was seraching and finding a function that could do the following things:
From script A, it could execute script B n times. As a result, it would create a list with n elements and each list element would contain only the last-requested object of script B. For example, if the last line of script B contains a vector consisting of the means of some data, the list would contain only that vector.
I can't seem to find this function any more.
Here is what I need to do:
My script B contains of simulations and calculations performed on the simulated data. As a result, the script prints a matrix. I want to re-execute this script n times, resulting in a list with n elements, each containing a result matrix.
A bonus would be to be able to vary the seed in script B - that is, list[[1]] containing the data simulated with set.seed(1).
I know this problem begs for a function from the apply family - in the following example, source() was not accepted as a function.
listsmalln <- lapply(n, source("Small samples/scriptB.R"))
Hope this was understandable! Thanks in advance!
I think you should use:
lapply(1:n, function(x) source("test.R"))
The most easy way to change the seed is to create a function from the script with an input parameter with the seed.
If your script B could be a function which returns the matrix it's straightforward:
Script B: (as functionB.R):
functionB <- function(x){
set.seed(x)
return matrix(runif(100),ncol=10)
}
Script A:
source('functionB.R')
lapply(1:10,function(x) functionB(x))
Related
I have loaded two source files, performed some iterative calculations, and then i need to display/export the results. There are hundreds of iterative calculations, hence hundreds of results. However, only results of the final calculation is displayed.
In this example, i have shortened the list of calculations to only 3. Please refer to line 7 (k in 1:3). How do i get R to display result of all calculations?
Many thanks in advance to those who can offer help. If this question has already been asked before, a link would be great. I could not find this probably because i do not know the right terms to search for.
# Load files
d1<-read.csv('testhourly.csv',sep=",",header=F)
names(d1)<-c("elapsedtime","units")
d2<-read.csv('testevent.csv',sep=",",header=F)
names(d2)<-c("eventno","starttime","endtime","starttemp","endtemp")
# Perform for calculations 1 to 3
for(k in 1:3){
a<-d2[k,2]
b<-d2[k,3]
x<-d1[a:b,]$q
a2<-d2[k,2]-1
b2<-d2[k,3]-1
y<-d1[a2:b2,]$q
z <- (x-y)}
results <- sum(z)
# Export results
write.csv(results, file = "results.csv")
You are not saving your output inside the loop for every iteration, so your loop only returns the final value of the last iteration.
temp=vector("list",3)
for(k in 1:3) {
a<-d2[k,2]
b<-d2[k,3]
x<-d1[a:b,]$q
a2<-d2[k,2]-1
b2<-d2[k,3]-1
y<-d1[a2:b2,]$q
temp[[k]] <- (x-y)
}
results <- sum(unlist(temp))
I have following code.
for(i in 1:100)
{
for(j in 1:100)
R[i,j]=gcm(i,j)
}
gcm() is some function which returns a number based on the values of i and j and so, R has all values. But this calculation takes a lot of time. My machine's power was interrupted several times due to which I had to start over. Can somebody please help, how can I save R somewhere after every iteration, so as to be safe? Any help is highly appreciated.
You can use the saveRDS() function to save the result of each calculation in a file.
To understand the difference between save and saveRDS, here is a link I found useful. http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/
If you want to save the R-workspace have a look at ?save or ?save.image (use the first to save a subset of your objects, the second one to save your workspace in toto).
Your edited code should look like
for(i in 1:100)
{
for(j in 1:100)
R[i,j]=gcm(i,j)
save.image(file="path/to/your/file.RData")
}
About your code taking a lot of time I would advise trying the ?apply function, which
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix
You want gmc to be run for-each cell, which means you want to apply it for each combination of row and column coordinates
R = 100; # number of rows
C = 100; # number of columns
M = expand.grid(1:R, 1:C); # Cartesian product of the coordinates
# each row of M contains the indexes of one of R's cells
# head(M); # just to see it
# To use apply we need gmc to take into account one variable only (that' not entirely true, if you want to know how it really works have a look how at ?apply)
# thus I create a function which takes into account one row of M and tells gmc the first cell is the row index, the second cell is the column index
gmcWrapper = function(x) { return(gmc(x[1], x[2])); }
# run apply which will return a vector containing *all* the evaluated expressions
R = apply(M, 1, gmcWrapper);
# re-shape R into a matrix
R = matrix(R, nrow=R, ncol=C);
If the apply-approach is again slow try considering the snowfall package which will allow you to follow the apply-approach using parallel computing. An introduction to snowfall usage can be found in this pdf, look at page 5 and 6 in particular
I have a data.frame dim = (200,500)
I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:
colstoremove <- list();
for (i in range(dim(I.df.nocov)[2])) {
x <- shapiro.test(I.df.nocov[1:200,i])
colstoremove[[i]] <- x[2]
}
However this is failing. Some pointers? (background is mainly python, not much of an R user)
Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:
colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])
Here is what happens in
for (i in range(dim(I.df.nocov)[2]))
For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.
dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)
dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5
range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).
Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!
The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.
Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.
On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).
I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.
Let's say I have a function that accepts a vector of parameters and returns a vector of results (of the same length). And let's say I want to call this function 100 times always with the same parameter - a 100 elements long vector of 1 - ideally getting a list of vectors as a result.
The first thing that came to my mind was to use lapply, specifically to call lapply on a list of vectors. My testing on smaller data proved that it should work and that it returns data in required format. The problem is that I'm unable to generate the list of vectors I need as the argument.
All I found online was how to generate a vector which doesn't help me much as I already know how to do that. The problem is how to generate a list out of these vectors (using list(rep(1, 100), rep(1, 100), ...) is out of question as I'd have to repeat the rep(1, 100) part a hundred times.
The quickest way to do this is to use R's built in replicate function, like so:
replicate(100, rep(1, 100), simplify = FALSE)
where rep(1, 100) gets replaced by the vector you actually want a list of 100 copies of. An equivalent statement would be to use lapply and an anonymous function, like so:
lapply(1:100, function(x){ rep(1, 100) })
Essentially, what this is doing is writing a function that takes its input, throws it away, and outputs your vector of choice. In fact, that's not much different than what replicate does under the hood, according to the documentation:
replicate is a wrapper for the common use of sapply for repeated evaluation of an expression
The only difference from the standard use of replicate is that, by default, replicate returns your list of vectors simplified to an array. But as you can see it's easy enough to force it not to do that by passing simplify = FALSE.