combining lists and dataframes in R from raster values - r

QUESTION EDITED FOR CLARITY AND REPRODUCIBILITY
I am trying to summarize proportions of landcover classes within many buffers contained within a list. Although it appears to be a common problem, I have not found an appropriate solution:
I have a raster stack called hab_stack with discrete values 1-6 for each of 3 layers (each layer == year). I also have locational data with >800,000 locations called dat_sf. I have extracted hab_stack raster values within a 400 m buffer around each location.
I now have a large list with ~800,000 elements (not all hab classes 1-6 are represented in each list). So I tried to create a dummy dataframe with all hab_stack values 1-6 called true_names with assigned frequency/proportion == zero for classes not represented within the buffer because I need to combine all proportions together. I have tried to accomplish this using an lapply looping structure but can't seem to get it quite right. Below is the full function and error:
sum_class <- lapply(values_hab, function(x){
true_names <- data.frame(x = 1:6, Freq = 0)
prop_df <- as.data.frame(prop.table(table(x))) %>%
mutate(x = as.numeric(x))
true_names %>%
anti_join(prop_df, by = "x") %>%
bind_rows(prop_df) %>%
arrange(x)
Error in `mutate()`:
! Problem while computing `x = as.numeric(x)`.
x `x` must be size 0 or 1, not 1659.
Run `rlang::last_error()` to see where the error occurred.
})
When I dissect the function, the error arises from the table(values_hab) argument = Error in table(values_hab) : all arguments must have the same length.
I think a hypothetical list could look something like this, where there's different numbers of NAs and not all classes are represented in each element; also, see a dataframe of my desired output below:
list <- list(c(1,1,1,2,2,2,3,3,4,4,4,NA,NA,NA,5,6),
c(1,2,3,4,NA,NA,NA,NA,4,4,4,4,NA,5,1,1)
c(5,5,5,5,5,1,2,2,2,2,NA,NA,NA,NA,NA,3))
desired_output <- data.frame(`1` = c(0.4, 0.5, 0.6, 0.5, 0.5, 0.3),
`2` = c(0.1, 0.1, 0.1, 0.1, 0.1, 0.2),
`3` = c(0.1, 0.1, 0.0, 0.1, 0.0, 0.3),
`4` = c(0.3, 0.2, 0.0, 0.1, 0.1, 0.1),
`5` = c(0.0, 0.1, 0.2, 0.2, 0.1, 0.0),
`6` = c(0.1, 0.0, 0.1, 0.0, 0.2, 0.1))
Any help is much appreciated. Best,

It looks like my function works and this was a very easy fix. dplyr::mutate was recognizing x as the entire list when in fact I wanted it to apply mutate the vector x within each list. R is still running in the background but this should have taken care of it.
sum_class_function <- function(x){
true_names <- data.frame(x = 1:6, Freq = 0)
prop_df <- as.data.frame(prop.table(table(x)))
prop_df$x <- as.numeric(prop_df$x)
temp<- true_names %>%
anti_join(prop_df, by = "x") %>%
bind_rows(prop_df) %>%
arrange(x)
return(temp)
}
sum_class <- lapply(values_hab, sum_class_function)

Related

How do I need to assign values to each other in triplets using R?

The situation is as follows:
I need to create a dataset of triplets where we have discrete distribution of stock prices S <- c(80,100,120,140,160), with probability P <- c(0.2, 0.3, 0.2, 0.2, 0.1), call option C <- max(S-120,0) = c(0,0,0,20,40) and liability of an option which pays 30 if in a certain region otherwise zero, namely L = I{110 \leq S \leq 150} = c(0,0,30,30,0) <- c(0,0,30,30,0). It is important to mention that if P[1] = 80, then C[1] and L[1]. This holds for i = 1,2,3,4,5. How do you create a dataset for N = 10000 simulations where each value for i corresponds to the other two values for the same i?
This is the code I had for now. Note that X_1 = S, X_2 = C and Y = L.
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
sample(X_1 - 120, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
Y <- function(n) {
sample(L, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
##Creating triplets##
df <- data.frame(S_T = X_1(10000), C_T = X_2(10000), L_T =Y(10000))
df```
I'm not sure if you want C_T to be dependent on the S_T values. If you do, I think you just want to call X_1, assign the results to an object, then use that as the argument to X_2 (or just subtract 120, which is what X_2 does).
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
# Call that function
S_T <- X_1(10) # for practice
C_T <- S_T - 120 # that's all you're doing in function X_2, if you want to use S_T
If you want to C_T to contain values independent of S_T, you can create function within function
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
X_1(n) - 120
}
S_T <- X_1(10) # Same as above
C_T <- X_2(10) # Gives values not dependent on S_T
EDIT to address comment below:
It's hard to read the comment, but it looks like you want create a function that takes the results of function X_1 and returns a result based on a condition. Use ifelse to read each element one at at time. You can create another function and then input the results of function X_1
Y <- function(X_1_func){
ifelse( X_1_func == 80,
return(0),
ifelse(X_1_func == 100,
return(0),
ifelse(X_1_func == 120,
return(30),
return(60) # Add a default value here or the last possible value if others are F
)
)
)
}
sapply(X_1(10), Y) # Use an apply to input one element of function X_1 at a time. Assign results to L or whatever you with to call.
If this all works for you, you can accept the answer.

R systemfit: how to run with a list of many formula

I have a wide dataset of stock returns and their lagged values. I want to use systemfit package for a SUR estimation but couldn't put in multiple formulas
A sample of my dataset looks like:
df = data.frame('stock.1' = c(0, - 0.2, 0.3, 0.5, 0.2), 'lag.stock.1'
= c(0.2,0,- 0.2, 0.3, 0.5), 'stock.2' = c(0, - 0.1, 0.4, 0.7, -0.1), 'lag.stock.2' = c(0.1,0, - 0.1, 0.4, 0.7))
If I use the following code, it works
fitsur = systemfit(list(stock.1 ~ lag.stock.1, stock.2 ~ lag.stock.2), data = df)
But I have many stocks, so I create a list of formula first
stock.list = c("stock.1", "stock.2")
fm.list = list()
for (i in 1:length(stock.list)) {
stock = stock.list[i]
formula = paste0(stock,"~","lag.",stock)
fm.list[i] = formula
}
Using the this list does not work
fitsur = systemfit(fm.list, data = df)
Error in systemfit(fm.list, data = return.joindf) :
the list of argument 'formula' must contain only objects of class 'formula'
Is there a way to use the formula list correctly?
Use the as.formula function:
formula = paste0(stock,"~","lag.",stock)
fm.list[[i]] = as.formula(formula)

How to for loop using different columns of data frame?

Basically I was working on a portfolio return problem. The stock return is like:
AMZN <- c(0.1, 0.3, 0.4, 0.2)
BBY <- c(0.2, 0.4, 0.5, 0.3)
TGT <- c(-0.1, -0.3, -0.2,-0.5)
df1 <- data.frame(AMZN, BBY, TGT)
date <- c("2000-01-01","2000-02-01", "2000-03-01", "2000-04-01")
date <- as.Date(date, "%Y-%m-%d")
df1 <- cbind(date, df1)
xts <- xts(df1[,-1], order.by=df1[,1])
I want to use Return.portfolio(xts, weight) to calculate portfolio return. So
The weight is like
w1 <- c(0.2, 0.3, 0.1, 0.4)
w2 <- c(0.5, 0.1, 0.1, 0.3)
w3 <- c(0.1, 0.1, 0.4, 0.4)
Weights <- data.frame(w1, w2, w3)
Since there are several groups of weights assigned, I need to get multiple portfolio return.
The code I tried is
for (i in colnames(Weights)){
Return.portfolio(xts, (Weights[[i]]))
}
Although R does not report any error, the only thing I got is a value which i is "w3".
I think you may need to initialize a NULL object first. Maybe something like this
Return<-NULL
for (i in 1:ncol(Weights)){
Return<- cbind(Return, Return.portfolio(xts, (Weights[[i]])))
}

R. lapply multinomial test to list of dataframes

I have a data frame A, which I split into a list of 100 data frames, each having 3 rows (In my real data each data frame has 500 rows). Here I show A with 2 elements of the list (row1-row3; row4-row6):
A <- data.frame(n = c(0, 1, 2, 0, 1, 2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(24878, 33605, 12100 , 25899, 34777, 13765))
# This is the list:
nest <- split(A, rep(1:2, each = 3))
I want to apply the multinomial test to each of these data frames and extract the p-value of each test. So far I have done this:
library(EMT)
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = FALSE, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
lapply(nest, fun)
However, I get:
"Error in multinomial.test(x$counts_set, prob = x$norm_genome, useChisq = F, :
Observations have to be stored in a vector, e.g. 'observed <- c(5,2,1)'"
Does anyone have a smarter way of doing this?
The results of split are created with names 1, 2 and so on. That's why x$count in fun cannot access it. To make it simpler, you can combine your splitted elements using the list function and then use lapply:
n <- c(0,1,2,0,1,2)
prob <- c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1)
count <- c(24878, 33605, 12100 , 25899, 34777, 13765)
A <- cbind.data.frame(n, prob, count)
nest = split(A,rep(1:2,each=3))
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = F, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
# Create a list of splitted elements
new_list <- list(nest$`1`, nest$`2`)
lapply(new_list, fun)
A solution with dplyr.
A = data.frame(n = c(0,1,2,0,1,2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(43, 42, 9, 74, 82, 9))
library(dplyr)
nest <- A %>%
mutate(pattern = rep(1:2,each=3)) %>%
group_by(pattern) %>%
dplyr::summarize(mn_pvals = multinomial.test(count, prob)$p.value)
nest

R: function cut

Here is my data:
>my.cut <- cut(my.variable, breaks = c(-Inf, -0.5, -0.25, -0.1, 0, 0.02, 0.05, 0.15, 0.3, 0.5, 1, Inf), right = FALSE)
>levels(my.cut)
"[-Inf,-0.5)" "[-0.5,-0.25)" "[-0.25,-0.1)" "[-0.1,0)" "[0,0.02)" "[0.02,0.05)" "[0.05,0.15)" "[0.15,0.3)" "[0.3,0.5)" "[0.5,1)" "[1, Inf)"
Expected result:
>levels(my.cut)
"[-Inf,-0.5)" "[-0.5,-0.25)" "[-0.25,-0.1)" "[-0.1,0)" "0" "(0,0.02)" "[0.02,0.05)" "[0.05,0.15)" "[0.15,0.3)" "[0.3,0.5)" "[0.5,1)" "[1, Inf)"
In the expected result, there is single figure 0 which I only want to choose the my.variable==0, but with the formula of the my.cut, there is no single 0 cause breaks can only be used for interval. So how could I do?
Hope to get your answer soon! Thanks!
You could explicitly put each value into a group. This is more flexible, but also a lot more verbose.
One way of doing this could be to define a bespoke cut function and then apply it to every element of your vector.
my.variable <- rnorm(100)
bespoke_cut <- function(value){
if (value < 0.1) return('[-Inf, 0.1)')
if (value < 0) return('[0.1, 0)')
if (value == 0) return('0')
return('(0, Inf]')
}
my.cut <- sapply(my.variable, bespoke_cut)
my.cut <- factor(my.cut)
I've only done a few of the groupings you wanted, but I think it should be apparent how to add extra groups.
I think the best you can hope for with 'cut' is to specify a really small range for 0, i.e.,
cps = c(-Inf, -0.1, 0-.Machine$double.eps, 0+.Machine$double.eps, 0.02, Inf)
bgroup = cut(c(-10, 10, 0,0), breaks = cps)
cat(deparse(levels(bgroup)), "\n") ## use this to edit the levels more easily
levels(bgroup) = c("(-Inf, -0.1]", "(-0.1,0)", "0", "(0,0.02]", "(0.02, Inf]")
table(bgroup)
Obviously, the display levels are not identical to those used to cut the data, but if you are okay with that window around 0, then the solution is to form the cuts with that value then change the labels.
You could do this : cut, assign 0 where my.variable == 0, refactor.
my.variable <- rnorm(100)
my.variable[sample(1:100,2)] <- 0
my.cut <- cut(my.variable, breaks = c(-Inf, -0.5, -0.25, -0.1, 0, 0.02, 0.05, 0.15, 0.3, 0.5, 1, Inf), right = FALSE)
lvl <- levels(my.cut)
lvlR <- c(lvl[1:4],"0","(0,0.02)",lvl[6:11])
my.cut <- as.character(my.cut)
my.cut[my.variable == 0] <- 0
my.cut <- factor(my.cut,levels=lvlR)
rm(lvl,lvlR)

Resources