How to use data in a text file? - r

Given three rasters, I need to extract values corresponding to u (row) = 5 and c (column) from a text file smoke.
smoke <- matrix(c(5, 4, 2, 9, 2, 2), ncol=2, byrow=TRUE)
The function I'm using is:
library(raster)
r <- raster(nrows=10, ncols=10)
r <- setValues(r, 1:ncell(r))
func <- function(c, u, sit){
rasters <- mget(c('r', paste0('r', 1:2)))
x <- sapply(rasters, function(x) getValues(x, c)[u])
y <- sapply(rasters, function(x) getValues(x, u)[c])
g <- data.frame(y, x)
write.table(g, paste0("res_", sit, u, "_", c, ".txt"))
}
Here's how I use the function to extract values that correspond to c and u in smoke:
res <- lapply(split(smoke[,c('c', 'u', 'sit')], 1:nrow(smoke)),
FUN=function(x) func(c=x[1], u=x[2], sit=x[3]))
I get this error: error: value for ‘r’ not found

By default, mget only looks for the objects r, r1, and r2 in the environment in which it's called, but in your case, those objects are in the global environment. You can either add inherits=TRUE to the mget call, which will force it to keep looking in parent environments, or else specify the environment to look in with envir=.GlobalEnv.
You have a couple of other problems, though.
First, r1 and r2 don't exist in your example.
Second, the list arising from split is a list of data frames, and you need to index them accordingly in your function. This means either using $ notation (e.g. x$c), double brackets (e.g. x[[1]]), or use a comma (e.g. x[, 1]).
Implementing all of this, you should have something like:
func <- function(c, u, sit) {
rasters <- mget(c('r', paste0('r', 1:2)), envir=.GlobalEnv)
x <- sapply(rasters, function(x) getValues(x, c)[u])
y <- sapply(rasters, function(x) getValues(x, u)[c])
g <- data.frame(y, x)
write.table(g, paste0("res_", sit, u, "_", c, ".txt"))
}
library(raster)
res <- lapply(split(smoke[, c('c', 'u', 'sit')], 1:nrow(smoke)),
function(x) func(c=x[[1]], u=x[[2]], sit=x[[3]]))
Finally, res will just be a list of NULL values, since your function returns the value of write.table, which is NULL. If you want func to return g, then add a final line to the function that simply reads g (or, explicitly, return(g)).
I'm not sure how closely your small example reflects your true data, but you could probably approach this more efficiently - see ?extract, ?cellFromRowCol, and ?stack, for example.

Related

Apply function to dataset when function calls from two sources

I have a function that I want to apply to a dataset, but the function also uses global variables as arguments as these variables are needed elsewhere.
With this reduced example I want to apply 'pterotest' to the rows of 'data'. This test case works when the function is given V as a vector, and M and g as a single value.
df<- data.frame(matrix(ncol = 1, nrow = 3))
row.names(df) <- c("Apsaravis_ukhaana", "Jeholornis_prima", "Changchengornis_hengdaoziensis")
colnames(df) <- "M"
mass_var <- c(0.1840000, 1.6910946, 0.0858997)
df$M <- mass_var
V <- seq(0.25,30, by = 0.05)
g <- 9.81
pterotest <- function(V, M, g) {
out1 <- M*g
out2 <- V*M
return(list(V, out1, out2))
}
apply(df,1,pterotest, M = "M", g = g, V = V)
However, all I get is an error of the form:
Error in match.fun(FUN) : '1' is not a function, character or symbol
EDIT: Turning this on it's head, what I could do would be to run a loop over each row, using the multiple columns as different arguments to the function, but with a 4.2M line dataset I feel vectorising might be quicker...

Applying a Function to a Data Frame : lapply vs traditional way

I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)

R cbind with get paste

cbind() function works as x <- cbind(a,b)
where column name 'b' can be specified for the function b = get(paste0('var',i)),
that is x <- cbind(a,b = get(paste0('var',i)))
I am trying to do the following:
x <- cbind(a, get(paste0('var',i))) = j), where "j" can be a vector or a function.
however, got the following error: Error: unexpected '=' in "x <- cbind(a, get(paste0('var',i))) = j)"
If i just specify "x <- cbind(a, get(paste0('var',i))))", then the 2nd column name is "get(paste0('var',i))))", which is not convenient.
How can I define column names with a function get(paste()) within cbind() or rbind() or bind_cols()? Or what would be the alternative solution?
An example would have been helpful to understand the problem but maybe this?
x <- cbind(a, j)
colnames(x)[2] <- get(paste0('var',i))
Or if you want to do it in single line -
x <- cbind(a, setNames(j, get(paste0('var',i))))
We can use
x <- data.frame(a, j)
colnames(x)[2] <- get(paste('var', i, sep=""))
Or use tibble
tibble(a, !! b := j)

R function ambigous call

I'm working on a project, trying to convert an R function to CUDA C++, but I can't understand some R function call, I'm really new to R and I can't find what I'm really looking after. To be exactly, this is the main R function code:
for (i in 1:ncy) {
res <- apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
depth[i] <- sum(res[1,])
localdepth[i] <- sum(res[2,])
}
The part that I can't really understand is "banddepthforonecurve" function call, this is the "banddepthforonecurve" function code:
banddepthforonecurve <- function(x, xdata, ydata, tau, use) {
envsup <- apply(xdata[,x], 1, max)
envinf <- apply(xdata[,x], 1, min)
inenvsup <- ydata <= envsup
inenvinf <- ydata >= envinf
depth <- all(inenvsup) & all(inenvinf)
localdepth <- depth & use(envsup-envinf) <= tau
res <- c(depth,localdepth)
return(res)
}
When it is called in:
res <- apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
I don't really get what it set for the first parameter "x" of the "banddepthforonecurve", I supposed its like banddepthforonecurve(i, xdata=x, ydata=y[,i], tau = tau, use=use)
but if I try to run it separately on R studio to try to understand it better I get:
apply(xdata[, x], 1, max) : dim(X) must have a positive length
Why when I compile the whole R project there isn't this error? What it set for the "x" parameter when called in the "res <- apply(...)"? I hope I was clear, sorry for my bad english, Thank you in advance !
# This apply function
res = apply(X = input, MAR = 2, FUN = foo, ...)
# is essentially syntactical sugar for this:
res = list()
for(i in 1:ncol(X)) {
res[[i]] = foo(X[, i], ...)
}
# plus an attempt simplify `res` (e.g., to a matrix or vector)
So in your line:
apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
In a single iteration of your for loop, the first parameter of banddepthforonecurve (x) will be allubset[, 1], then allsubset[, 2], ..., allsubset[, ncol(allsubset)].
The xdata parameter is always x, the tau and use parameters are always tau and use, and the for loop iterates over the columns of y to use as the ydata argument. You can think of it as a nested loop, for each column of y, use it as ydata and (via apply) iterate over all columns of allsubset.
(If the MAR argument of apply was 1, then it would iterate over rows instead of columns.)

set matrix element using apply in R

I am trying to assign the values from the dataframe into a matrix. The columns 2 and 3 are mapped to rows and columns respectively in the matrix. This is not working since the sim.mat is not storing the values.
score <- function(x, sim.mat) {
r <- as.numeric(x[2])
c <- as.numeric(x[3])
sim.mat[r,c] <- as.numeric(x[4])
}
mat <- apply(sim.data, 1, score, sim.mat)
Is this the right approach? If yes how can I get it to work.
No need for apply, try this:
score <- function(x, sim.mat) {
r <- as.numeric(x[[2]])
c <- as.numeric(x[[3]])
sim.mat[cbind(r,c)] <- as.numeric(x[[4]])
sim.mat
}
mat <- score(sim.data, sim.mat)
Check the "Matrices and arrays" section of ?"[" for documentation.
If you really wanted to use apply like you did, you would need your function to modify sim.data in the calling environment, do:
score <- function(x, sim.mat) {
r <- as.numeric(x[2])
c <- as.numeric(x[3])
sim.mat[r,c] <<- as.numeric(x[4])
}
apply(sim.data, 1, score, sim.mat)
sim.mat
This type of programming where functions have side-effects is really not recommended.

Resources