Apply concordance dataframe to zoo objects - r

I have a zoo object made of several time series, like this:
indices <- seq.Date(as.Date('2000-01-01'),as.Date('2005-01-30'),by="year")
a <- zoo(rnorm(5), order.by=indices)
b <- zoo(rnorm(5), order.by=indices)
c <- zoo(rnorm(5), order.by=indices)
ts_origin <- merge(a,b,c)
I would like to multiply each zoo series from ts_origin by a ratio contained in a dataframe, an put
the results in another zoo object (ts_final) that contains the time seris d,e,f. In other words,
the dataframe is a concordance file between a,b,c and d,e,f , and the ratio would be applied this way:
ts_final$d = ts_origin$a * 10 ; ts_final$e = ts_origin$b * 100 ; ts_final$f = ts_origin$c * 1000.
df <- data.frame(original = c("a","b","c"),
final = c("d","e","f"),
ratio = c(10,100,1000))
indices <- seq.Date(as.Date('2000-01-01'),as.Date('2005-01-30'),by="year")
d <- zoo(, order.by=indices)
e <- zoo(, order.by=indices)
f <- zoo(, order.by=indices)
ts_final <- merge(d,e,f)
Not too sure what the best approach for this. I was trying with the apply function, but couldn't make
it work... any help would be greatly appreciated!

1) Map/merge
Use Map to iterate over final, original and ratio executing the products required producing a list of zoo objects L. Note that Map takes the names from the first argument after fun. Then merge the list components forming zoo object ts_final.
fun <- function(f, o, r) ts_origin[, o] * r
L <- with(df, Map(fun, final, original, ratio))
ts_final <- do.call("merge", L)
The result using the inputs shown in the Note at the end is this zoo object:
> ts_final
d e f
2000-01-01 -5.6047565 46.09162 400.7715
2001-01-01 -2.3017749 -126.50612 110.6827
2002-01-01 15.5870831 -68.68529 -555.8411
2003-01-01 0.7050839 -44.56620 1786.9131
2004-01-01 1.2928774 122.40818 497.8505
2005-01-01 17.1506499 35.98138 -1966.6172
2) sweep
Another approach is to sweep out the ratios setting the names appropriately giving the same result as in (1).
with(df, sweep(setNames(ts_origin[, original], final), 2, ratio, "*"))
3) rep
Set the names and multiply by ratio repeated appropriately giving the same result as in (1).
nr <- nrow(df)
with(df, setNames(ts_origin[, original], final) * rep(ratio, each = nr))
Note
We can define the input reproducibly like this:
set.seed(123)
tt <- as.Date(ISOdate(2000:2005, 1, 1))
m <- matrix(rnorm(6*3), 6, dimnames = list(NULL, c("a", "b", "c")))
ts_origin <- zoo(m, tt)
df <- data.frame(original = c("a","b","c"),
final = c("d","e","f"),
ratio = c(10,100,1000))

Here is a one-liner, with wrong final names.
ts_final <- t(df$ratio * t(ts_origin))
ts_final
# a b c
#2000-01-01 -5.382213 -12.64773 -513.6408
#2001-01-01 -9.218280 -98.55123 -1826.6430
#2002-01-01 2.114663 -28.58910 290.8008
#2003-01-01 -3.576460 -23.47314 -166.5473
#2004-01-01 6.490508 -36.29317 -398.0389
#2005-01-01 -5.382213 -12.64773 -513.6408
Now assign final names.
colnames(ts_final) <- df$final

Related

Apply function to dataset when function calls from two sources

I have a function that I want to apply to a dataset, but the function also uses global variables as arguments as these variables are needed elsewhere.
With this reduced example I want to apply 'pterotest' to the rows of 'data'. This test case works when the function is given V as a vector, and M and g as a single value.
df<- data.frame(matrix(ncol = 1, nrow = 3))
row.names(df) <- c("Apsaravis_ukhaana", "Jeholornis_prima", "Changchengornis_hengdaoziensis")
colnames(df) <- "M"
mass_var <- c(0.1840000, 1.6910946, 0.0858997)
df$M <- mass_var
V <- seq(0.25,30, by = 0.05)
g <- 9.81
pterotest <- function(V, M, g) {
out1 <- M*g
out2 <- V*M
return(list(V, out1, out2))
}
apply(df,1,pterotest, M = "M", g = g, V = V)
However, all I get is an error of the form:
Error in match.fun(FUN) : '1' is not a function, character or symbol
EDIT: Turning this on it's head, what I could do would be to run a loop over each row, using the multiple columns as different arguments to the function, but with a 4.2M line dataset I feel vectorising might be quicker...

Appending every nth column using loop in R

I have a data frame which consists of paired columns of ratings given by participants and the reasons for giving their ratings. I would like to insert a blank column after each pair of columns, so that after column 1 and 2 there's a new column. I managed to do this manually by creating a vector, inserting them all at the end, and then reorganizing myself. Here's the code for that so it is clear what I am trying to achieve:
v <- rep(NA, 184)
Scheme1$Code1.1 <- v
Scheme1$Code2.1 <- v
Scheme1$Code1.2 <- v
Scheme1$Code2.2 <- v
Scheme1$Code1.3 <- v
Scheme1$Code2.3 <- v
Scheme1$Code1.4 <- v
Scheme1$Code2.4 <- v
Scheme1$Code1.5 <- v
Scheme1$Code2.5 <- v
Scheme1$Code1.6 <- v
Scheme1$Code2.6<- v
Scheme1$Code1.7 <- v
Scheme1$Code2.7 <- v
# Reorganize
Scheme1 <- Scheme1[,c(1,2,15,16,3,4,17,18,5,6,19,20,7,8,21,22,9,10,23,24
,11,12,25,26,13,14,27,28)]
I wanted to see how this could be achieved by using a for loop.
Thanks!
Based on the description, may be this helps
lst1 <- split.default(Scheme1, as.integer(gl(ncol(Scheme1), 2, ncol(Scheme1))))
do.call(cbind, unname(Map(function(x, i) {x[paste0(names(x), ".", i)] <- NA;x}, lst1, names(lst1))))
dta
set.seed(24)
Scheme1 <- as.data.frame(matrix(rnorm(14 * 5), ncol = 14))

How to repeat codes changing the variables in a sequence in R

This is the code I want to repeat
A_1981 <- Base[1:12]]
B <- sum(A_1981)
MFI_1981 <- sum(A_1981^2)/B
Base is a Raster brick
A_1981 is for a year
MFI_1981 is the final result
So i have to continue with the next year
A_1982 <- Base[13:24]]
B <- sum(A_1982)
MFI_1982 <- sum(A_1982^2)/B
To repeat the same code I think in replace values only in the names:
a <- seq(1,421,by=12)
b <- seq(12,432,by=12)
c <- seq(1981,2016, by=1)
And do it in sequence for the next third year, would be something like this
A_a[3] <- Base[[b[3]:c[3]]
B <- sum(A_a[3])
MFI_a[3] <- sum(A_[3]^2)/B
Have to be some way with for or make a function. But have no idea where to start.
I think you are looking for something like this
Example data (48 layers, i.e, 4 "years")
library(raster)
f <- system.file("external/rlogo.grd", package="raster")
Base <- stack(rep(f, 4*4))
Approach 1
f <- function(year) {
start <- (year-1981) * 12 + 1
A <- Base[[start:(start+11)]]
sum(A^2)/sum(A)
}
mfi <- lapply(1981:1984, f)
MFI <- stack(mfi)
Approach 2
for (year in 1981:1984) {
start <- (year-1981) * 12 + 1
A <- Base[[start:(start+11)]]
mfi <- sum(A^2)/sum(A)
writeRaster(mfi, paste0(year, ".tif"))
}
s <- stack(paste0(1981:1984, ".tif"))
Approach 3, with mapply as in Rui Barradas' answer, but fixed for when Base is a RasterBrick (and also including the last year)
n <- nlayers(Base)
a <- seq(1, n, by = 12)
mfi <- mapply(function(i, j) sum(Base[[i:j]]^2)/sum(Base[[i:j]]), a, a+11)
s <- stack(mfi)
The following does what you want using mapply and creates only one object in the .GlobalEnv, which I named MFI.
I start by creating a vector Base, since you have not posted a dataset example.
set.seed(2469) # Make the results reproducible
n <- 432
Base <- sample(100, n, TRUE)
step <- 12
b <- seq(1 + step, n, by = step)
a <- seq(1, n - step, by = step)
MFI <- mapply(function(i, j) sum(Base[i:j]^2)/sum(Base[i:j]), a, b)
head(MFI)
#[1] 63.66472 70.54014 67.60567 53.15550 58.71111 65.37008
Another way would be to use Map, like #Parfait suggests in his comment.
obj <- Map(function(i, j) sum(Base[i:j]^2)/sum(Base[i:j]), a, b)
names(obj) <- paste("MFI", 1980 + seq_along(obj), sep = "_")
obj$MFI_1981
#[1] 63.66472
Note that length(obj) is 35 and therefore the last obj is obj$MFI_2015 and not MFI_2016 like is said in the question. This can be easily solved by making n <- 444 right at the beginning of the code.

Least error prone way to add columns to an R data.frame through functions

The case I have is I want to "tack on" a bunch of columns to an existing data.frame, where each column is a function that does math on other columns. My goals are:
I want to specify the functions once
I don't want to worry about having to pass arguments in the right order and/or match them by name
I want to specify the order in which to apply the functions once
I want the new column names to be the function names
Ideally I want something like:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) a + b
z <- function (x) b * y
df2 <- lapply (list (y, z), df)
where df2 is a data.frame with 4 columns: a, b, y and z. I think this achieves the goals.
The closest I've gotten to this is the following:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) x$a + x$b
z <- function (x) x$b * x$y
funs <- list (
y = y,
z = z
)
df2 <- df
df2$y <- funs$y(df2)
df2$z <- funs$z(df2)
This achieves goals 1 and 2, but not 3 and 4.
Thanks in advance for the help.
This maybe the thing you want. After defining the function dfapply, it can be used very similar to your original intention without too much things like x$a etc, except to use expression instead of function.
dfapply <- function(exprs, df){
for (expr in exprs) {
df <- within(df, eval(expr))
}
df
}
df <- data.frame(a = rnorm(10), b = rnorm(10))
expr1 <- expression(y <- a + b)
expr2 <- expression(z <- b * y)
df2 <- dfapply(c(expr1, expr2), df)

How do you find the sample sizes used in calculations on r?

I am running correlations between variables, some of which have missing data, so the sample size for each correlation are likely different. I tried print and summary, but neither of these shows me how big my n is for each correlation. This is a fairly simple problem that I cannot find the answer to anywhere.
like this..?
x <- c(1:100,NA)
length(x)
length(x[!is.na(x)])
you can also get the degrees of freedom like this...
y <- c(1:100,NA)
x <- c(1:100,NA)
cor.test(x,y)$parameter
But I think it would be best if you show the code for how your are estimating the correlation for exact help.
Here's an example of how to find the pairwise sample sizes among the columns of a matrix. If you want to apply it to (certain) numeric columns of a data frame, combine them accordingly, coerce the resulting object to matrix and apply the function.
# Example matrix:
xx <- rnorm(3000)
# Generate some NAs
vv <- sample(3000, 200)
xx[vv] <- NA
# reshape to a matrix
dd <- matrix(xx, ncol = 3)
# find the number of NAs per column
apply(dd, 2, function(x) sum(is.na(x)))
# tack on some column names
colnames(dd) <- paste0("x", seq(3))
# Function to find the number of pairwise complete observations
# among all pairs of columns in a matrix. It returns a data frame
# whose first two columns comprise all column pairs
pairwiseN <- function(mat)
{
u <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
h <- expand.grid(x = u, y = u)
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
h$n <- mapply(f, h[, 1], h[, 2])
h
}
# Call it
pairwiseN(dd)
The function can easily be improved; for example, you could set h <- expand.grid(x = u[-1], y = u[-length(u)]) to cut down on the number of calculations, you could return an n x n matrix instead of a three-column data frame, etc.
Here is a for-loop implementation of Dennis' function above to output an n x n matrix rather than have to pivot_wide() that result. On my databricks cluster it cut the compute time for 1865 row x 69 column matrix down from 2.5 - 3 minutes to 30-40 seconds.
Thanks for your answer Dennis, this helped me with my work.
pairwise_nxn <- function(mat)
{
cols <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
nn <- data.frame(matrix(nrow = length(cols), ncol = length(cols)))
rownames(nn) <- colnames(nn) <- cols
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
for (i in 1:nrow(nn))
for (j in 1:ncol(nn))
nn[i,j] <- f(rownames(nn)[i], colnames(nn)[j])
nn
}
If your variables are vectors named a and b, would something like sum(is.na(a) | is.na(b)) help you?

Resources