Error: Unable to apply package function to each row in R - r

I am trying to apply fm.Choquet function (Rfmtool package) to my R data frame, but no success. The function works like this (ref. here):
# let x <- 0.6 (N = 1)
# and y <- c(0.3, 0.5). y elements are always 2 power N (here, 2)
# env<-fm.Init(1). env is propotional to N
# fm.Choquet(0.6, c(0.3, 0.5),env) gives a single value output
I have this sample data frame:
set.seed(123456)
a <- qnorm(runif(30,min=pnorm(0),max=pnorm(1)))
b <- qnorm(runif(30,min=pnorm(0),max=pnorm(1)))
c <- qnorm(runif(30,min=pnorm(0),max=pnorm(1)))
df <- data.frame(a=a, b=b, c=c)
df$id <- seq_len(nrow(df))
I would like to apply fm.Choquet function to each row of my df such that, for each row (or ID), a is read as x, while b and c are read as y vector (N = 2), and add the function output as a new column for each row. However, I am getting the dimension error "The environment mismatches the dimension to the fuzzy measure.".
Here is my attempt.
df2 <- df %>% as_tibble() %>%
rowwise() %>%
mutate(ci = fm.Choquet(df$a,c(df[,2],df[,3]), env)) %>%
mutate(sum = rowSums(across(where(is.numeric)))) %>% # Also tried adding sum which works
as.matrix()
I am using dplyr::rowwise(), but I am open to looping or other suggestions. Can someone help me?
EDIT 1:
A relevant question is identified as a possible solution for the above question, but using one of the suggestions, by(), still throws the same error:
by(df, seq_len(nrow(df)), function(row) fm.Choquet(df$a,c(df$b,df$c), env))

set.seed(123456)
a <- qnorm(runif(30, min = pnorm(0), max = pnorm(1)))
b <- qnorm(runif(30, min = pnorm(0), max = pnorm(1)))
c <- qnorm(runif(30, min = pnorm(0), max = pnorm(1)))
df <- data.frame(a = a, b = b, c = c)
df$id <- seq_len(nrow(df))
library(Rfmtool)
library(tidyverse)
env <- fm.Init(1)
map_dbl(
seq_len(nrow(df)),
~ {
row <- slice(df,.x)
fm.Choquet(
x = row$a,
v = c(row$b, row$c), env
)
}
)

Related

Using cur_column() with mutate(across(all_of())) - Recursive Error

My goal is to take a subset of columns (COL_NAMES) within a dataframe (LR_DATA) and apply a function (FUNCTION). The dataframe (LR_DATA) is mostly nested vectors except for one identifying column (var1). However, I cannot seem to correctly pass the override inputs which are nested under the current column name with the additional suffix "_OVERRIDE".
TRENDED_LR_DATA <- LR_DATA %>% mutate(across(all_of(COL_NAMES), ~list(FUNCTION(var1, unlist(var2), unlist(.x), unlist(!!sym(paste0(cur_column(), "_OVERRIDE")))))))
Specifically I get the error:
If I replace cur_column() with a hardcoded string the code works (though obviously not as intended since it would be referencing the same override column for each specified column in COL_NAMES. Any tips this group has would be greatly appreciated - I'm relatively new to R so please bear with me ^_^.
EDIT: Below is code to reproduce the error in full. Sorry for not including this on the original question submission.
library(dplyr)
LR_DATA <- data.frame(STATE = c(1,2,3),
YEAR = c(2000,2001,2002),
DEVT_A = c(2,4,6),
DEVT_B = c(3,6,9),
DEVT_C = c(4,8,12))
LOSS_COLS <- c("DEVT_A", "DEVT_B", "DEVT_C")
DATA_OVERRIDE <- data.frame(STATE = c(1,2,3),
DEVT_A_OVERRIDE = c(NaN,1,1),
DEVT_B_OVERRIDE = c(1,1,1),
DEVT_C_OVERRIDE = c(1.5,1.5,1.5))
LR_DATA <- LR_DATA %>% left_join(DATA_OVERRIDE, by = 'STATE')
TRENDED_LR_DATA <- LR_DATA %>% summarise(across(everything(), list), .groups = "keep") %>%
mutate(across(all_of(LOSS_COLS), ~list(TREND_LOSS(unlist(.x), unlist(YEAR), unlist( !!sym(paste0(cur_column(), "_OVERRIDE"))) ))))
TREND_LOSS <-
function(LOSSES,
YEARS,
OVERRIDES) {
x <- YEARS
y = log(LOSSES)
xy = x * y
x_sq = x * x
sum_x <- sum(x)
sum_y <- sum(y)
sum_xy <- sum(xy)
sum_x_sq <- sum(x_sq)
n <- length(YEARS)
Slope <- (n*sum_xy - sum_x*sum_y) / (n*sum_x_sq - sum_x*sum_x)
OVERRIDES[is.na(OVERRIDES)] <- Slope
TRENDED_LOSSES <- LOSSES*exp(OVERRIDES)
return(TRENDED_LOSSES)
}
}

r: Use list of arguments in function

I wrote a function to calculate odds ratios of two variables, CI's and bind everything together with the n and the name of one of the variables. My example including my function looks like this:
library(DescTools)
a <- as.numeric(replicate(1, sample(c(0:1), 100, replace = TRUE)))
b <- as.numeric(replicate(1, sample(c(0:1), 100, replace = TRUE)))
c <- as.numeric(replicate(1, sample(c(0:1), 100, replace = TRUE)))
x <- as.data.frame(cbind(a, b, c))
orr <- function (var1, var2){
con <- table(var1, var2)
o <- OddsRatio(con, conf.level = 0.95)
n <- sum(con[2, 1:2])
name <- deparse(substitute(var2))
df <- data.frame(rbind(o), n, "ind.varname" = name)
return(df)
}
a<-orr(x$b,x$a)
b<-orr(x$c,x$a)
rbind(a,b)
Now, in my example, I want to pass a list of arguments to the function (arg2) to have the odds calculated for several variables with the first argument staying the same. Thus, the final output would preferably be a data.frame that looks like the following, just with more lines (see above):
Does anybody have any tips on how to do that? Thanks in advance.
You can try this function :
orr <- function (data, var1, var2){
val1 <- data[[var1]]
val2 <- data[[var2]]
con <- table(val1, val2)
o <- OddsRatio(con, conf.level = 0.95)
n <- sum(con[2, 1:2])
df <- data.frame(rbind(o), n, ind.varname = var2, row.names = NULL)
return(df)
}
do.call(rbind, lapply(names(x[-1]), orr, data = x, var1 = "a"))
# odds.ratio lwr.ci upr.ci n ind.varname
#1 1.739130 0.7706866 3.924519 61 b
#2 1.519481 0.6704328 3.443777 61 c
Little shorter with purrr::map_df :
purrr::map_df(names(x[-1]), orr, data = x, var1 = "a")

How to create new variable at the end of each loop iteration in R

I am trying to create a variable that is a function of 4 other variables. I have the following code:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
for (i in group) {
df <- df1[df1$group == i,]
x_ <- vector(mode="numeric", length=1000)
assign(eval(paste0("X_", i)), globalenv()) #This is the issue
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X_[i] = (a + c*(z-df$zbar))/(-b)
}
I am unable to create a unique group-specific variable (e.g. X_A, X_B, ...) and I am unsure why the -assign( )- function is not working properly. The dataframe df1 has 6 rows (one for each group) and then the number of columns is equal to the number of variables plus a string variable for group. I am not trying to append this new variables X_[i] to the dataset I am just trying to place it in the global environment. I believe the issue lies in my assigning the placement of the variable, but it isn't generating a numeric variable X.
df1 is a dataframe with 6 observations of 9 variables containing a, sea, b, seb, c, sec, zbar, se_z. These are just the means and standard deviations of a, b, c, and z, respectively. The 9th variable is group which contains A, B, ..., F. When I use the code df <-df1[df1$group == i,] I am trying to create a unique X variable for each group entity.
Try something like this:
dynamicVariableName <- paste0("X_", i)
assign(dynamicVariableName, (a + c*(z-df$zbar))/(-b))
Alternatively to the answer from #ErrorJordan, you can write your loop like that:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
for(i in group)
{
df <- df1[df1$group == i,]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X <- (a + c*(z-df$zbar))/(-b)
assign(paste0("X_",i),X,.GlobalEnv)
}
As suggested by #MrFlick, you can also stored your data into a list, to do so you can just modify your loop to get:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
X = vector("list",length(group))
names(X) = group
for(i in 1:length(group))
{
df <- df1[df1$group == group[i],]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X[[i]] <- (a + c*(z-df$zbar))/(-b)
}
df1 dataframe
df1 = data.frame(a = c(1:6),
b = c(1:6),
c = c(1:6),
zbar = c(1:6),
sea = rep(1,6),
seb = rep(1,6),
sec = rep(1,6),
se_z = rep(1,6),
group = group)
It's a little hard to parse what you want to do, but I'm assuming it's something like
for each value in group make an object (in the global env) called X_A, X_B, ...
for each one of those objects, assign it the value (a + c*(z-df$zbar))/(-b)
I think this should do that for you:
set.seed(123)
group <- c('A','B','C','D','E','F')
for (i in group) {
df <- df1[df1$group == i,]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
assign(paste0("X_", i), (a + c*(z-df$zbar))/(-b), globalenv())
}
Note that in the code example you gave, the command iter <- 1000 has no effect, and the command x_ <- vector(mode="numeric", length=1000) also has no effect. By that I mean, you make those objects, but never subsequently use them in any further computation. If those commands should do something meaningful I'll need your help in explaining their intended purpose.

Calculations across dataframes

I have two lists of dataframes, the first list of dfs hold values that extend down the column and the second list of dfs holds single values like this:
dynamic_df_1 <- data.frame(x = 1:10)
dynamic_df_2 <- data.frame(y = 1:10)
df_list <- list(dynamic_df_1, dynamic_df_2)
df_list
static_df_1 <- data.frame(mu = 10,
stdev = 5)
static_df_2 <- data.frame(mu = 12,
stdev = 6)
static_df_list <- list(stat_df1 = static_df_1,
stat_df2 = static_df_2)
static_df_list
I would like to add a column to each dataframe (dynamic_df_1 and dynamic_df_2) using values from static_df_1 and static_df_2 to perform the calculation where the calculation for dynamic_df_1 computes with static_df_1 and the calculation for dynamic_df_2 computes with static_df_2.
The result I'm aiming for is this:
df_list[[1]] <- df_list[[1]] %>%
mutate(z = dnorm(x = df_list[[1]]$x, mean = static_df_list$stat_df1$mu, sd = static_df_list$stat_df1$stdev))
df_list
df_list[[2]] <- df_list[[2]] %>%
mutate(z = dnorm(x = df_list[[2]]$y, mean = static_df_list$stat_df2$mu, sd = static_df_list$stat_df2$stdev))
df_list
I can take a loop approach which gets messy with more complex functions in my real code:
for (i in 1:length(df_list)) {
df_list[[i]]$z <- dnorm(x = df_list[[i]][[1]], mean = static_df_list[[i]]$mu, sd = static_df_list[[i]]$stdev)
}
df_list
I'm trying to find an lapply / map / mutate type solution that calculates across dataframes - imagine a grid of dataframes where the objective is to calculate across rows. Also open to other solutions such as single df with nested values but haven't figured out how to do that yet.
Hope that is clear - I did my best!
Thanks!
This Map solution seems to be simpler. And the results are identical(). The code that creates df_list2 and df_list3 follows below.
df_list4 <- df_list
fun <- function(DF, Static_DF){
DF[["z"]] = dnorm(DF[[1]], mean = Static_DF[["mu"]], sd = Static_DF[["stdev"]])
DF
}
df_list4 <- Map(fun, df_list4, static_df_list)
identical(df_list2, df_list3)
#[1] TRUE
identical(df_list2, df_list4)
#[1] TRUE
Data.
After running the question's code that creates the initial df_list, run the dplyr pipe and for loop code:
df_list2 <- df_list
df_list2[[1]] <- df_list2[[1]] %>%
mutate(z = dnorm(x = df_list2[[1]]$x, mean = static_df_list$stat_df1$mu, sd = static_df_list$stat_df1$stdev))
df_list2[[2]] <- df_list2[[2]] %>%
mutate(z = dnorm(x = df_list2[[2]]$y, mean = static_df_list$stat_df2$mu, sd = static_df_list$stat_df2$stdev))
df_list3 <- df_list
for (i in 1:length(df_list3)) {
df_list3[[i]]$z <- dnorm(x = df_list3[[i]][[1]], mean = static_df_list[[i]]$mu, sd = static_df_list[[i]]$stdev)
}

Two same type of dataframes perform differently in a function

Below is my data
set.seed(100)
toydata <- data.frame(A = sample(1:50,50,replace = T),
B = sample(1:50,50,replace = T),
C = sample(1:50,50,replace = T)
)
Below is my swapping function
derangement <- function(x){
if(max(table(x)) > length(x)/2) return(NA)
while(TRUE){
y <- sample(x)
if(all(y != x)) return(y)
}
}
swapFun <- function(x, n = 10){
inx <- which(x < n)
y <- derangement(x[inx])
if(length(y) == 1) return(NA)
x[inx] <- y
x
}
In the first case,I get the new data toy by swapping the entire dataframe. The code is below:
toydata<-as.matrix(toydata)
toy<-swapFun(toydata)
toy<-as.data.frame(toy)
In the second case, I get the new data toy by swapping each column respectively. Below is the code:
toydata<-as.data.frame(toydata)
toy2 <- toydata # Work with a copy
toy2[] <- lapply(toydata, swapFun)
toy<-toy2
Below is the function that can output the difference of contigency table after swapping.
# the function to compare contingency tables
f = function(x,y){
table1<-table(toydata[,x],toydata[,y])
table2<-table(toy[,x],toy[,y])
sum(abs(table1-table2))
}
# vectorise your function
f = Vectorize(f)
combn(x=names(toydata),
y=names(toydata), 2) %>%# create all combinations of your column names
t() %>% # transpose
data.frame(., stringsAsFactors = F) %>% # save as dataframe
filter(X1 != X2) %>% # exclude pairs of same
# column
mutate(SumAbs = f(X1,X2)) # apply function
In the second case, this mutate function works.
But in the first case, this mutatefunction does not work. It says:
+ filter(X1 != X2) %>% # exclude pairs of same column
+ mutate(SumAbs = f(X1,X2)) # apply function
Error in combn(x = names(toydata), y = names(toydata), 2) : n < m
However in the two cases, the toy data are all dataframes with the same dimension, the same row names and the same column names. I feel confused.
How can I fix it? Thanks.

Resources