Create empty object in R - r

I'm trying to create empty numeric object like this
corr <- cor()
to use it later on in a loop.
but, it keep returning this error
Error in is.data.frame(x) : argument "x" is missing, with no default.
Here is my full script:
EVI <- "D:\\Modis_EVI\\Original\\EVI_Stack_single5000.tif"
y.EVI <- brick(EVI)
m.EVI.cropped <- as.matrix(y.EVI)
time <- 1:nlayers(y.EVI)
corr <- cor()
inf2NA <- function(x) { x[is.infinite(x)] <- NA; x }
for (i in 1:nrow(m.EVI.cropped)){
EVI.m <- m.EVI.cropped[i,]
time <- 1:nlayers(y.EVI)
Corr[i] <- cor(EVI.m, time, method="pearson", use="pairwise.complete.obs")
}
Any advice please?

Since you are asking for advice:
It is very likely that you don't need to do this since you can probably use (i) a vectorized function or (ii) a lapply loop that pre-allocates the return object for you. If you insist on using a for loop, set it up properly. This means you should pre-allocate which you can, e.g., do by using corr <- numeric(n), where n is the number of iterations. Appending to a vector is extremely slooooooow.

We can create empty objects with numeric(0), logical(0), character(0) etc.
For example
num_vec <- numeric(0)
creates an empty numeric vector that can be filled up later on:
num_vec[1] <- 2
num_vec
# [1] 2
num_vec[2] <- 1
num_vec
# [1] 2 1

Related

Creating function in R

I just started learning to write functions in R. As a start, I am trying to replicate the summary function as below. But not able to return expected result
summary_function <- function(df = as.data.frame(x)){
result <- summary(x)
return(as.table(result))
}
> summary_function(df = iris) ## below is the output I am getting
Length Class Mode
a 4 -none- numeric
b 10 -none- numeric
c 20 -none- numeric
d 100 -none- numeric
Expected output is actual summary of iris. Is there a way to achieve this?
I also tried with below function so that the output should return first 10 rows of the dataset. But the output is not returning as expected
first_ten_rows <- function(df = x){
result <- head(x, n = 10)
return(result)
}
the "as.table()" in your attempt was coercing the summary object into a less legible format
summary_function <- function(df = as.data.frame(x)){
result <- summary(df)
return(result)
}
R typically returns the last calculated line too, so if you want to shorten it:
summary_function <- function(df = as.data.frame(x)){
summary(df)
}
would get you the same result.
If you would like to read more about "summary" objects, running
?summary
will open more documentation in the help panel in RStudio
In your second question, there is an "x" out of place in "head()" which should be "df"
first_ten_rows <- function(df = x){
result <- head(df, n = 10)
return(result)
}

How to make a slice from an matrix in an env in R

So have an an env that I put data.frame in
dtm <- DocumentTermMatrix(corpus)
termCount = c(".94", ".96", ".98" ,".99")
freqMatrix <- new.env()
spam <- new.env()
for (v in termCount){
# Remove sparse terms to get a managable number of terms.
dtmEnv[[v]] <- removeSparseTerms(dtm, as.numeric(v))
# Convert the document term matrix to a standard matrix.
freqMatrix[[v]] <- as.data.frame( as.matrix(dtmEnv[[v]]))
# Normalize the frequency matrix: 0 if absent, 1 if present.
spam[[v]] <- (freqMatrix[[v]] > 0) + 0 # Add 0 to convert from logical to int.
}
Then however when I try taking slices from my data frame I get an error
for (v in termCount){
trainData <- (spam[[v]])[folds$subsets[folds$which != i], ]
testData <- (spam[[v]])[folds$subsets[folds$which == i], ]
# ... more stuff hear ...
}
Error in spam[[v]] (from #8) : wrong arguments for subsetting an environment
Print the resulting accuracies.
What am I doing wrong?
Is there a cleaner way of doing such an iteration for different values in termCount?

probe global variables to call inside function

I want to pass variables within the .Globalenv when inside a function. Basically concatenate x number of data frames into a matrix.
Here is some dummy code;
Alpha <- data.frame(lon=124.9167,lat=1.53333)
Alpha_2 <- data.frame(lon=3.13333, lat=42.48333)
Alpha_3 <- data.frame(lon=-91.50667, lat=27.78333)
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
mat <- as.matrix(rbind(vars[1], vars[2], vars[3]))
return(mat)
}
When calling myfunc('Alpha') I would like the same thing to be returned as when you run;
as.matrix(rbind(Alpha, Alpha_2, Alpha_3)
lon lat
1 124.91670 1.53333
2 3.13333 42.48333
3 -91.50667 27.78333
Any pointers would be appreciated, thanks!
You can use get to retrieve variables by name. We do this here in a loop with lapply, and then use rbind to bind them together.
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
df <- do.call(rbind, mget(vars, .GlobalEnv)) # courtesy #Roland
return(df)
}
myfunc("Alpha")
# lon lat
# 1 124.91670 1.53333
# 2 3.13333 42.48333
# 3 -91.50667 27.78333
Note, in practice, you probably want to check that the variables that match the pattern actually are what you think they are, but this gives you the rough tools you want.
Old version (2nd line of func):
df <- do.call(rbind, lapply(vars, get, envir=.GlobalEnv))

Store values in For Loop

I have a for loop in R in which I want to store the result of each calculation (for all the values looped through). In the for loop a function is called and the output is stored in a variable r in the moment. However, this is overwritten in each successive loop. How could I store the result of each loop through the function and access it afterwards?
Thanks,
example
for (par1 in 1:n) {
var<-function(par1,par2)
c(var,par1)->var2
print(var2)
So print returns every instance of var2 but in var2 only the value for the last n is saved..is there any way to get an array of the data or something?
initialise an empty object and then assign the value by indexing
a <- 0
for (i in 1:10) {
a[i] <- mean(rnorm(50))
}
print(a)
EDIT:
To include an example with two output variables, in the most basic case, create an empty matrix with the number of columns corresponding to your output parameters and the number of rows matching the number of iterations. Then save the output in the matrix, by indexing the row position in your for loop:
n <- 10
mat <- matrix(ncol=2, nrow=n)
for (i in 1:n) {
var1 <- function_one(i,par1)
var2 <- function_two(i,par2)
mat[i,] <- c(var1,var2)
}
print(mat)
The iteration number i corresponds to the row number in the mat object. So there is no need to explicitly keep track of it.
However, this is just to illustrate the basics. Once you understand the above, it is more efficient to use the elegant solution given by #eddi, especially if you are handling many output variables.
To get a list of results:
n = 3
lapply(1:n, function(par1) {
# your function and whatnot, e.g.
par1*par1
})
Or sapply if you want a vector instead.
A bit more complicated example:
n = 3
some_fn = function(x, y) { x + y }
par2 = 4
lapply(1:n, function(par1) {
var = some_fn(par1, par2)
return(c(var, par1)) # don't have to type return, but I chose to make it explicit here
})
#[[1]]
#[1] 5 1
#
#[[2]]
#[1] 6 2
#
#[[3]]
#[1] 7 3

mapply for row cor.test function

I am trying to use cor.test over the rows in 2 matrices, namely cer and par.
cerParCorTest <-mapply(function(x,y)cor.test(x,y),cer,par)
mapply,however, works on columns.
This issue has been discussed in Efficient apply or mapply for multiple matrix arguments by row . I tried that split solution (as below)
cer <- split(cer, row(cer))
par <- split(par, row(par))
and it results in the error (plus it is slow)
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable
I also tried t(par) and t(cer) to get it running over the rows, but it results in the error
Error in cor.test.default(x, y) : not enough finite observations
The martices are shown below (for cer and same in par):
V1698 V1699 V1700 V1701
YAL002W(cer) 0.01860500 0.01947700 0.02043300 0.0214740
YAL003W(cer) 0.07001600 0.06943900 0.06891200 0.0684330
YAL005C(cer) 0.02298100 0.02391900 0.02485800 0.0257970
YAL007C(cer) -0.00026047 -0.00026009 -0.00026023 -0.0002607
YAL008W(cer) 0.00196200 0.00177360 0.00159490 0.0014258
My question is why transposing the matrix does not work and what is a short solution that will allow running over rows with mapply for cor.test().
I apologise for the long post and thanks in advance for any help.
I don't know what are the dimensions of your matrix , but this works fine for me
N <- 3751 * 1900
cer.m <- matrix(1:N,ncol=1900)
par.m <- matrix(1:N+rnorm(N),ncol=1900)
ll <- mapply(cor.test,
split(par.m,row(par.m)),
split(cer.m,row(cer.m)),
SIMPLIFY=FALSE)
this will give you a list of 3751 elements(the correlation for each row)
EDIT without split, you give the index of the row , this should be fast
ll <- mapply(function(x,y)cor.test(cer.m[x,],par.m[y,]),
1:nrow(cer.m),
1:nrow(cer.m),
SIMPLIFY=FALSE)
EDIT2 how to get the estimate value:
To get the estimate value for example :
sapply(ll,'[[','estimate')
You could always just program things in a for loop, seems reasonably fast on these dimensions:
x1 <- matrix(rnorm(10000000), nrow = 2000)
x2 <- matrix(rnorm(10000000), nrow = 2000)
out <- vector("list", nrow(x1))
system.time(
for (j in seq_along(out)) {
out[[j]] <- cor.test(x1[j, ], x2[j, ])
}
)
user system elapsed
1.35 0.00 1.36
EDIT: If you only want the estimate, I wouldn't store the results in a list, but a simple vector:
out2 <- vector("numeric", nrow(x1))
for (j in seq_along(out)) {
out2[j] <- cor.test(x1[j, ], x2[j, ])$estimate
}
head(out2)
If you want to store all the results and simply extract the estimate from each, then this should do the trick:
> out3 <- as.numeric(sapply(out, "[", "estimate"))
#Confirm they are the same
> all.equal(out2, out3)
[1] TRUE
The tradeoff is that the first method stores all the data in a list which may be useful for further processing vs a mroe simple method that only grabs what you initially want.

Resources