How to repeat codes changing the variables in a sequence in R - r

This is the code I want to repeat
A_1981 <- Base[1:12]]
B <- sum(A_1981)
MFI_1981 <- sum(A_1981^2)/B
Base is a Raster brick
A_1981 is for a year
MFI_1981 is the final result
So i have to continue with the next year
A_1982 <- Base[13:24]]
B <- sum(A_1982)
MFI_1982 <- sum(A_1982^2)/B
To repeat the same code I think in replace values only in the names:
a <- seq(1,421,by=12)
b <- seq(12,432,by=12)
c <- seq(1981,2016, by=1)
And do it in sequence for the next third year, would be something like this
A_a[3] <- Base[[b[3]:c[3]]
B <- sum(A_a[3])
MFI_a[3] <- sum(A_[3]^2)/B
Have to be some way with for or make a function. But have no idea where to start.

I think you are looking for something like this
Example data (48 layers, i.e, 4 "years")
library(raster)
f <- system.file("external/rlogo.grd", package="raster")
Base <- stack(rep(f, 4*4))
Approach 1
f <- function(year) {
start <- (year-1981) * 12 + 1
A <- Base[[start:(start+11)]]
sum(A^2)/sum(A)
}
mfi <- lapply(1981:1984, f)
MFI <- stack(mfi)
Approach 2
for (year in 1981:1984) {
start <- (year-1981) * 12 + 1
A <- Base[[start:(start+11)]]
mfi <- sum(A^2)/sum(A)
writeRaster(mfi, paste0(year, ".tif"))
}
s <- stack(paste0(1981:1984, ".tif"))
Approach 3, with mapply as in Rui Barradas' answer, but fixed for when Base is a RasterBrick (and also including the last year)
n <- nlayers(Base)
a <- seq(1, n, by = 12)
mfi <- mapply(function(i, j) sum(Base[[i:j]]^2)/sum(Base[[i:j]]), a, a+11)
s <- stack(mfi)

The following does what you want using mapply and creates only one object in the .GlobalEnv, which I named MFI.
I start by creating a vector Base, since you have not posted a dataset example.
set.seed(2469) # Make the results reproducible
n <- 432
Base <- sample(100, n, TRUE)
step <- 12
b <- seq(1 + step, n, by = step)
a <- seq(1, n - step, by = step)
MFI <- mapply(function(i, j) sum(Base[i:j]^2)/sum(Base[i:j]), a, b)
head(MFI)
#[1] 63.66472 70.54014 67.60567 53.15550 58.71111 65.37008
Another way would be to use Map, like #Parfait suggests in his comment.
obj <- Map(function(i, j) sum(Base[i:j]^2)/sum(Base[i:j]), a, b)
names(obj) <- paste("MFI", 1980 + seq_along(obj), sep = "_")
obj$MFI_1981
#[1] 63.66472
Note that length(obj) is 35 and therefore the last obj is obj$MFI_2015 and not MFI_2016 like is said in the question. This can be easily solved by making n <- 444 right at the beginning of the code.

Related

Appending every nth column using loop in R

I have a data frame which consists of paired columns of ratings given by participants and the reasons for giving their ratings. I would like to insert a blank column after each pair of columns, so that after column 1 and 2 there's a new column. I managed to do this manually by creating a vector, inserting them all at the end, and then reorganizing myself. Here's the code for that so it is clear what I am trying to achieve:
v <- rep(NA, 184)
Scheme1$Code1.1 <- v
Scheme1$Code2.1 <- v
Scheme1$Code1.2 <- v
Scheme1$Code2.2 <- v
Scheme1$Code1.3 <- v
Scheme1$Code2.3 <- v
Scheme1$Code1.4 <- v
Scheme1$Code2.4 <- v
Scheme1$Code1.5 <- v
Scheme1$Code2.5 <- v
Scheme1$Code1.6 <- v
Scheme1$Code2.6<- v
Scheme1$Code1.7 <- v
Scheme1$Code2.7 <- v
# Reorganize
Scheme1 <- Scheme1[,c(1,2,15,16,3,4,17,18,5,6,19,20,7,8,21,22,9,10,23,24
,11,12,25,26,13,14,27,28)]
I wanted to see how this could be achieved by using a for loop.
Thanks!
Based on the description, may be this helps
lst1 <- split.default(Scheme1, as.integer(gl(ncol(Scheme1), 2, ncol(Scheme1))))
do.call(cbind, unname(Map(function(x, i) {x[paste0(names(x), ".", i)] <- NA;x}, lst1, names(lst1))))
dta
set.seed(24)
Scheme1 <- as.data.frame(matrix(rnorm(14 * 5), ncol = 14))

How to quantify the frequency of all possible row combinations of a binary matrix in R in a more efficient way?

Lets assume I have a binary matrix with 24 columns and 5000 rows.
The columns are Parameters (P1 - P24) of 5000 subjects. The parameters are binary (0 or 1).
(Note: my real data can contain as much as 40,000 subjects)
m <- matrix(, nrow = 5000, ncol = 24)
m <- apply(m, c(1,2), function(x) sample(c(0,1),1))
colnames(m) <- paste("P", c(1:24), sep = "")
Now I would like to determine what are all possible combinations of the 24 measured parameters:
comb <- expand.grid(rep(list(0:1), 24))
colnames(comb) <- paste("P", c(1:24), sep = "")
The final question is: How often does each of the possible row combinations from comb appear in matrix m?
I managed to write a code for this and create a new column in comb to add the counts. But my code appears to be really slow and would take 328 days to complete to run. Therefore the code below only considers the 20 first combinations
comb$count <- 0
for (k in 1:20){ # considers only the first 20 combinations of comb
for (i in 1:nrow(m)){
if (all(m[i,] == comb[k,1:24])){
comb$count[k] <- comb$count[k] + 1
}
}
}
Is there computationally a more efficient way to compute this above so I can count all combinations in a short time?
Thank you very much for your help in advance.
Data.Table is fast at this type of operation:
m <- matrix(, nrow = 5000, ncol = 24)
m <- apply(m, c(1,2), function(x) sample(c(0,1),1))
colnames(m) <- paste("P", c(1:24), sep = "")
comb <- expand.grid(rep(list(0:1), 24))
colnames(comb) <- paste("P", c(1:24), sep = "")
library(data.table)
data_t = data.table(m)
ans = data_t[, .N, by = P1:P24]
dim(ans)
head(ans)
The core of the function is by = P1:P24 means group by all the columns; and .N the number of records in group
I used this as inspiration - How does one aggregate and summarize data quickly?
and the data_table manual https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
If all you need is the combinations that occur in the data and how many times, this will do it:
m2 <- apply(m, 1, paste0, collapse="")
m2.tbl <- xtabs(~m2)
head(m2.tbl)
m2
# 000000000001000101010010 000000000010001000100100 000000000010001110001100 000000000100001000010111 000000000100010110101010 000000000100101000101100
# 1 1 1 1 1 1
You can use apply to paste the unique values in a row and use table to count the frequency.
table(apply(m, 1, paste0, collapse = '-'))

Apply concordance dataframe to zoo objects

I have a zoo object made of several time series, like this:
indices <- seq.Date(as.Date('2000-01-01'),as.Date('2005-01-30'),by="year")
a <- zoo(rnorm(5), order.by=indices)
b <- zoo(rnorm(5), order.by=indices)
c <- zoo(rnorm(5), order.by=indices)
ts_origin <- merge(a,b,c)
I would like to multiply each zoo series from ts_origin by a ratio contained in a dataframe, an put
the results in another zoo object (ts_final) that contains the time seris d,e,f. In other words,
the dataframe is a concordance file between a,b,c and d,e,f , and the ratio would be applied this way:
ts_final$d = ts_origin$a * 10 ; ts_final$e = ts_origin$b * 100 ; ts_final$f = ts_origin$c * 1000.
df <- data.frame(original = c("a","b","c"),
final = c("d","e","f"),
ratio = c(10,100,1000))
indices <- seq.Date(as.Date('2000-01-01'),as.Date('2005-01-30'),by="year")
d <- zoo(, order.by=indices)
e <- zoo(, order.by=indices)
f <- zoo(, order.by=indices)
ts_final <- merge(d,e,f)
Not too sure what the best approach for this. I was trying with the apply function, but couldn't make
it work... any help would be greatly appreciated!
1) Map/merge
Use Map to iterate over final, original and ratio executing the products required producing a list of zoo objects L. Note that Map takes the names from the first argument after fun. Then merge the list components forming zoo object ts_final.
fun <- function(f, o, r) ts_origin[, o] * r
L <- with(df, Map(fun, final, original, ratio))
ts_final <- do.call("merge", L)
The result using the inputs shown in the Note at the end is this zoo object:
> ts_final
d e f
2000-01-01 -5.6047565 46.09162 400.7715
2001-01-01 -2.3017749 -126.50612 110.6827
2002-01-01 15.5870831 -68.68529 -555.8411
2003-01-01 0.7050839 -44.56620 1786.9131
2004-01-01 1.2928774 122.40818 497.8505
2005-01-01 17.1506499 35.98138 -1966.6172
2) sweep
Another approach is to sweep out the ratios setting the names appropriately giving the same result as in (1).
with(df, sweep(setNames(ts_origin[, original], final), 2, ratio, "*"))
3) rep
Set the names and multiply by ratio repeated appropriately giving the same result as in (1).
nr <- nrow(df)
with(df, setNames(ts_origin[, original], final) * rep(ratio, each = nr))
Note
We can define the input reproducibly like this:
set.seed(123)
tt <- as.Date(ISOdate(2000:2005, 1, 1))
m <- matrix(rnorm(6*3), 6, dimnames = list(NULL, c("a", "b", "c")))
ts_origin <- zoo(m, tt)
df <- data.frame(original = c("a","b","c"),
final = c("d","e","f"),
ratio = c(10,100,1000))
Here is a one-liner, with wrong final names.
ts_final <- t(df$ratio * t(ts_origin))
ts_final
# a b c
#2000-01-01 -5.382213 -12.64773 -513.6408
#2001-01-01 -9.218280 -98.55123 -1826.6430
#2002-01-01 2.114663 -28.58910 290.8008
#2003-01-01 -3.576460 -23.47314 -166.5473
#2004-01-01 6.490508 -36.29317 -398.0389
#2005-01-01 -5.382213 -12.64773 -513.6408
Now assign final names.
colnames(ts_final) <- df$final

For loop function is looping too many times

I am calculating a community weighted mean of functional trait values (studying forestry). I have to multiply the relative abundances of each species (tree) by the trait values. I have 2dataframes, 1 with the relative abundances of each species within each site and one with the average trait values for each species. I made a loop to automize the calculation, but the endresults return the multiplication 13 times instead of 1 time (I have 13plots, so maybe it has something to do with this) I'm already busy with this script for several days since i'm new to R, but i have to do this for my masterthesis. I think I reached my limit of logical thinking today and can't find my error :) can someone help me please? I'll paste the script below:
load data, apply some column names, fill NAs with 0
library(data.table)
traits <- read.csv("Trait value.csv", sep = ";")
plots_Maiz <- read.csv("CWM Maiz plot.csv", sep = ";")
plots_Maiz[is.na(plots_Maiz)] <- 0
colnames(plots_Maiz) <- c("site", "species","y0","y1", "y2", "y3", "y4", "y5")
traits[,1:17][is.na(traits[,1:17])] <- 0
#function for finding the corresponding species for a plot in the traitlist
traitsf <- function(df, traitlist){
plottraits <- subset(traitlist, species %in% df[,2])
return(plottraits)
}
traitcalc <- function(traits, plots_Maiz){
multlist <- list()
blist <- list()
vmult <- vector()
tickcount <- 0
plotsplit <- split.data.frame(plots_Maiz, plots_Maiz$site)
testlist <- lapply(plotsplit, traitsf, traitlist = traits)
for (q in 1:length(plotsplit)){
df1 <- testlist[[q]]
df2 <- plotsplit[[q]]
plot <- as.character(plotsplit[[q]][1,1])
for (i in 1:nrow(df1)){
v <- as.numeric(as.vector(t(df1[i,2:ncol(df1)])))
species <- as.character(df1[i,1])
for (j in 1:(ncol(df2)-2)){
tickcount <- tickcount + 1
vmult <-as.vector(v * (as.numeric(as.vector(df2[i,j+2]))))
vmult <- as.list(c(vmult, j-1, species, plot))
multlist[[tickcount]] <- vmult
}
}
b <- do.call(rbind, multlist)
b <- data.table::rbindlist(multlist)
blist[[q]] <- b
}
return(blist)
}
endresults <- traitcalc(traits,plots_Maiz)
endresultsdf2<- do.call("rbind", endresults)

creating a function for processing my dataframe calculations

I am doing systematic calculations for my created dataframe. I have the code for the calculations but I would like to:
1) Wite it as a function and calling it for the dataframe I created.
2) reset the calculations for next ID in the dataframe.
I would appreciate your help and advice on this.
The dataframe is created in R using the following code:
#Create a dataframe
dosetimes <- c(0,6,12,18)
df <- data.frame("ID"=1,"TIME"=sort(unique(c(seq(0,30,1),dosetimes))),"AMT"=0,"A1"=NA,"WT"=NA)
doserows <- subset(df, TIME%in%dosetimes)
doserows$AMT[doserows$TIME==dosetimes[1]] <- 100
doserows$AMT[doserows$TIME==dosetimes[2]] <- 100
doserows$AMT[doserows$TIME==dosetimes[3]] <- 100
doserows$AMT[doserows$TIME==dosetimes[4]] <- 100
#Add back dose information
df <- rbind(df,doserows)
df <- df[order(df$TIME,-df$AMT),]
df <- subset(df, (TIME==0 & AMT==0)==F)
df$A1[(df$TIME==0)] <- df$AMT[(df$TIME ==0)]
#Time-dependent covariate
df$WT <- 70
df$WT[df$TIME >= 12] <- 120
#The calculations are done in a for-loop. Here is the code for it:
#values needed for the calculation
C <- 2
V <- 10
k <- C/V
#I would like this part to be written as a function
for(i in 2:nrow(df))
{
t <- df$TIME[i]-df$TIME[i-1]
A1last <- df$A1[i-1]
df$A1[i] = df$AMT[i]+ A1last*exp(-t*k)
}
head(df)
plot(A1~TIME, data=df, type="b", col="blue", ylim=c(0,150))
The other thing is that the previous code assumes the subject ID=1 for all time points. If subject ID=2 when the WT (weight) changes to 120. How can I reset the calculations and make it automated for all subject IDs in the dataframe? In this case the original dataframe would be like this:
#code:
rm(list=ls(all=TRUE))
dosetimes <- c(0,6,12,18)
df <- data.frame("ID"=1,"TIME"=sort(unique(c(seq(0,30,1),dosetimes))),"AMT"=0,"A1"=NA,"WT"=NA)
doserows <- subset(df, TIME%in%dosetimes)
doserows$AMT[doserows$TIME==dosetimes[1]] <- 100
doserows$AMT[doserows$TIME==dosetimes[2]] <- 100
doserows$AMT[doserows$TIME==dosetimes[3]] <- 100
doserows$AMT[doserows$TIME==dosetimes[4]] <- 100
df <- rbind(df,doserows)
df <- df[order(df$TIME,-df$AMT),]
df <- subset(df, (TIME==0 & AMT==0)==F)
df$A1[(df$TIME==0)] <- df$AMT[(df$TIME ==0)]
df$WT <- 70
df$WT[df$TIME >= 12] <- 120
df$ID[(df$WT>=120)==T] <- 2
df$TIME[df$ID==2] <- c(seq(0,20,1))
Thank you in advance!
In general, when doing calculations on different subject's data, I like to split the dataframe by ID, pass the vector of individual subject data into a for loop, do all the calculations, build a vector containing all the newly calculated data and then collapse the resultant and return the dataframe with all the numbers you want. This allows for a lot of control over what you do for each subject
subjects = split(df, df$ID)
forResults = vector("list", length=length(subjects))
# initialize these constants
C <- 2
V <- 10
k <- C/V
myFunc = function(data, resultsArray){
for(k in seq_along(subjects)){
df = subjects[[k]]
df$A1 = 100 # I assume this should be 100 for t=0 for each subject?
# you could vectorize this nested for loop..
for(i in 2:nrow(df)) {
t <- df$TIME[i]-df$TIME[i-1]
A1last <- df$A1[i-1]
df$A1[i] = df$AMT[i]+ A1last*exp(-t*k)
}
head(df)
# you can add all sorts of other calculations you want to do on each subject's data
# when you're done doing calculations, put the resultant into
# the resultsArray and we'll rebuild the dataframe with all the new variables
resultsArray[[k]] = df
# if you're not using RStudio, then you want to use dev.new() to instantiate a new plot canvas
# dev.new() # dont need this if you're using RStudio (which doesnt allow multiple plots open)
plot(A1~TIME, data=df, type="b", col="blue", ylim=c(0,150))
}
# collapse the results vector into a dataframe
resultsDF = do.call(rbind, resultsArray)
return(resultsDF)
}
results = myFunc(subjects, forResults)
Do you want this:
ddf <- data.frame("ID"=1,"TIME"=sort(unique(c(seq(0,30,1),dosetimes))),"AMT"=0,"A1"=NA,"WT"=NA)
myfn = function(df){
dosetimes <- c(0,6,12,18)
doserows <- subset(df, TIME%in%dosetimes)
doserows$AMT[doserows$TIME==dosetimes[1]] <- 100
doserows$AMT[doserows$TIME==dosetimes[2]] <- 100
doserows$AMT[doserows$TIME==dosetimes[3]] <- 100
doserows$AMT[doserows$TIME==dosetimes[4]] <- 100
#Add back dose information
df <- rbind(df,doserows)
df <- df[order(df$TIME,-df$AMT),]
df <- subset(df, (TIME==0 & AMT==0)==F)
df$A1[(df$TIME==0)] <- df$AMT[(df$TIME ==0)]
#Time-dependent covariate
df$WT <- 70
df$WT[df$TIME >= 12] <- 120
#The calculations are done in a for-loop. Here is the code for it:
#values needed for the calculation
C <- 2
V <- 10
k <- C/V
#I would like this part to be written as a function
for(i in 2:nrow(df))
{
t <- df$TIME[i]-df$TIME[i-1]
A1last <- df$A1[i-1]
df$A1[i] = df$AMT[i]+ A1last*exp(-t*k)
}
head(df)
plot(A1~TIME, data=df, type="b", col="blue", ylim=c(0,150))
}
myfn(ddf)
For multiple calls:
for(i in 1:N) {
myfn(ddf[ddf$ID==i,])
readline(prompt="Press <Enter> to continue...")
}

Resources