apply multiple formula to create a column in dataframe - r

i tried to create new column in a dataframe using different equations and inputs from other column in the dataframe. the equation i want to apply will also slightly differs conditioned on another column. here is the dummy dataframe
set.seed(123)
df <-
data.frame(
N = (c(0,0,rep(10,18))),
I0 = runif(20, 0,10),
Dt = c(1:20),
Isolator = rep(1:10,each=2)
)
I want to create new column name Pcol using this equation 1-exp(-(x)*((df$I0/df$N)*df$Dt))
and x variables changes based on Isolator. I managed to create column Pcol using ifelese() and mutate() based on isolators but the input is not taken from the same row. to illustrate
df1<-mutate(df, Pcol = ifelse(Isolator %in% 1:4, 1-exp(-(0.5])*((df$I0/df$N)*df$Dt)),
ifelse(Isolator %in% 5:7, 1-exp(-(0.7)*((df$I0/df$N)*df$Dt)),
ifelse(Isolator %in% 8:10, 1-exp(-(0.9)*((df$I0/df$N)*df$Dt)), NA))))
I also calculated Pcol seperately by subseting dataframe based on isolators
col1<- df %>% filter(Isolator <= 4)
col2<- df %>% filter(Isolator >= 5 & Isolator < 8)
col3<- df %>% filter(Isolator >=9 )
Pcol1<-1-exp(-(0.5)*((col1$I0/col1$N)*col1$Dt))
Pcol2<-1-exp(-(0.7)*(((col2$I0/col2$N)*col2$Dt))
Pcol3<-1-exp(-(0.9)*((col3$I0/col3$N)*col3$Dt))
and the Pcol in dataframe differs drom Pcol calculated from subset group. i think ifelse() apply in the dataframe taking in input wrongly when it calculate Pcol but i don't know how to fix it or maybe there is a simpler way to apply equations into dataframe
Please help! thank you

Apply group of functions are normally used for iterative calculations or for manipulating lists, whereas,
Will this do?
df %>% mutate(Pcol = case_when(Isolator %in% 1:4 ~ 0.5,
Isolator %in% 5:7 ~ 0.7,
TRUE ~ 0.9),
Pcol = 1-exp(-(Pcol)*((I0/N)*Dt)))
N I0 Dt Isolator Pcol
1 0 2.8757752 1 1 1.0000000
2 0 7.8830514 2 1 1.0000000
3 10 4.0897692 3 2 0.4585288
4 10 8.8301740 4 2 0.8289903
5 10 9.4046728 5 3 0.9047422
6 10 0.4555650 6 3 0.1277415
7 10 5.2810549 7 4 0.8425062
8 10 8.9241904 8 4 0.9718350
9 10 5.5143501 9 5 0.9690084
10 10 4.5661474 10 5 0.9590868
11 10 9.5683335 11 6 0.9993686
12 10 4.5333416 12 6 0.9778076
13 10 6.7757064 13 7 0.9979002
14 10 5.7263340 14 7 0.9963455
15 10 1.0292468 15 8 0.7507959
16 10 8.9982497 16 8 0.9999976
17 10 2.4608773 17 9 0.9768357
18 10 0.4205953 18 9 0.4940738
19 10 3.2792072 19 10 0.9963296
20 10 9.5450365 20 10 1.0000000
Your manual calculations (using subsets)
col1<- df %>% filter(Isolator <= 4)
col2<- df %>% filter(Isolator >= 5 & Isolator < 8)
col3<- df %>% filter(Isolator >=8 )
Pcol1 <- 1-exp(-(0.5)*((col1$I0/col1$N)*col1$Dt))
Pcol2 <- 1-exp(-(0.7)*((col2$I0/col2$N)*col2$Dt))
Pcol3 <- 1-exp(-(0.9)*((col3$I0/col3$N)*col3$Dt))
Pcol2
Pcol <- c(Pcol1, Pcol2, Pcol3)
df %>% mutate(Pcol = case_when(Isolator %in% 1:4 ~ 0.5,
Isolator %in% 5:7 ~ 0.7,
TRUE ~ 0.9),
Pcol = 1-exp(-(Pcol)*((I0/N)*Dt))) -> df1
Pcol - df1$Pcol
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Related

Creating a function to run a conditional Sum in R

I have a dataframe like this:
dat<- data.frame (
'Ones'=c(0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0),
'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,3,4,5,6,7,4,3,4,5))
I have to create a function (gap1) that detects each 1 in Ones and than sums n-1, n and n+1 in Thats, with n being in the same row as 1.
For example in this dataset I have two 1.
dat<- data.frame (
'Ones'=c(0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0),
'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,3,4,5,6,7,4,3,4,5))
dat
This should be the output:
Ones Thats gap1
1 4 17 #(8+4+5)
1 1 7 #(3+1+3)
I would like to extend this gap at will, for example:
Ones Thats gap1 gap2 gap3 ...
1 4 17 29 #(6+8+4+5+6)
1 1 7 9 #(8+3+1+3+4)
There is another problem I have to consider:
Suppose we have this data frame:
dat<- data.frame (
'Ones'=c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0),
'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,NA,4,5,6,7,4,3,4,5))
In case there is a 1 at the beginning (or at the end), or if there is an NA, the function should use available data.
In this case, for example:
Ones Thats gap1 gap2
1 0 5 (0+5) 8 #(0+5+3)
1 4 17 (8+4+5) 29 #(6+8+4+5+6)
1 1 4 (3+1+NA) 16 #(8+3+1+NA+4)
Do you have any advice?
Using tidyverse / collapse
For arbitrary number of lead and lags the collapse package offers a nice function flag, which has further arguments to specify columns (cols), or grouping variables g.
library(dplyr)
f <- function(df, n){
df %>%
collapse::flag(-n:n) %>%
transmute(Ones, Thats, gap = rowSums(., na.rm = T) - 1) %>%
filter(Ones == 1)
}
x <- data.frame (
'Ones'=c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0),
'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,NA,4,5,6,7,4,3,4,5))
# we can now specify how many lags to count:
f(x, 1)
Ones Thats gap
1 1 0 5
2 1 4 17
3 1 1 4
f(x, 2)
Ones Thats gap
1 1 0 8
2 1 4 29
3 1 1 16
Or if you want to specify the number of gaps to compute, we can simplify the function to
f <- function(df, n){
df %>%
collapse::flag(-n:n) %>%
rowSums(na.rm = T) - 1
}
x %>%
mutate(gap1 = f(., 1),
gap2 = f(., 2)) %>%
filter(Ones == 1)
Ones Thats gap1 gap2
1 1 0 5 8
2 1 4 17 29
3 1 1 4 16
Base R
If you like terse functions:
f <- Vectorize(\(df, n) rowSums(collapse::flag(df, -n:n), na.rm = T) - 1, "n")
x[paste0("gap", 1:2)] <- f(x, 1:2) ; subset(x, Ones == 1)
Ones Thats gap1 gap2
1 1 0 5 8
6 1 4 17 29
11 1 1 4 16
With BaseR,
myfun <- function(data,gap=1) {
points <- which(data["Ones"]==1)
sapply(points, function(x) {
bottom <- ifelse(x-gap<=0,1,x -gap)
top <- ifelse(x+ gap > nrow(data),nrow(data),x +gap)
sum(data[bottom:top,"Thats"], na.rm=T)
})
}
#> myfun(dat,1)
#[1] 5 17 4
#> myfun(dat,2)
#[1] 8 29 16
Another base R solution
f <- function(dat, width = 1)
{
dat$gaps <- sapply(seq(nrow(dat)), function(x) {
if(dat$Ones[x] == 0) return(0)
i <- x + seq(2 * width + 1) - (width + 1)
i <- i[i > 0]
i <- i[i < nrow(dat)]
sum(dat$Thats[i])
})
dat[dat$Ones == 1,]
}
f(dat, 1)
#> Ones Thats gaps
#> 6 1 4 17
#> 11 1 1 7
f(dat, 2)
#> Ones Thats gaps
#> 6 1 4 29
#> 11 1 1 19

How to get the sum of rows using a vector and the make the result in a column

I have a dataframe and i want to calculate the sum of variables present in a vector in every row and make the sum in other variable after i want the name of new variable created to be from the name of the variable in vector
for example
data
Name A_12 B_12 C_12 D_12 E_12
r1 1 5 12 21 15
r2 2 4 7 10 9
r3 5 15 16 9 6
r4 7 8 0 7 18
let's say i have two vectors
vector_1 <- c("A_12","B_12","C_12")
vector_2 <- c("B_12","C_12","D_12","E_12")
The result i want is :
New_data >
Name A_12 B_12 C_12 ABC_12 D_12 E_12 BCDE_12
r1 1 5 12 18 21 15 54
r2 2 4 7 13 10 9 32
r3 5 15 16 36 9 6 45
r4 7 8 0 15 7 18 40
I created for loop to get the sum of the rows in a vector but i didn't get the correct result
Please tell me ig you need any more informations or clarifications
Thank you
You can use rowSums and simple column-subsetting:
dat$ABC_12 <- rowSums(dat[,vector_1])
dat$BCDE_12 <- rowSums(dat[,vector_2])
dat
# Name A_12 B_12 C_12 D_12 E_12 ABC_12 BCDE_12
# 1 r1 1 5 12 21 15 18 53
# 2 r2 2 4 7 10 9 13 30
# 3 r3 5 15 16 9 6 36 46
# 4 r4 7 8 0 7 18 15 33
Note that if your frames inherit from data.table, then you'll need to use either subset(dat, select=vector_1) or dat[,..vector_1] instead of simply dat[,vector_1]; if you aren't already using data.table, then you can safely ignore this paragraph.
Like this (using dplyr/tidyverse)
df %>%
rowwise() %>%
mutate(
ABC_12 = sum(c_across(vector_1)),
BCDE_12 = sum(c_across(vector_2))
)
Though I'm not sure the sums are correct in your example
-=-=-=EDIT-=-=-=-
Here's a function to help with the naming.
ex_fun <- function(vec, n_len){
paste0(paste(substr(vec,1,n_len), collapse = ""), substr(vec[1],n_len+1,nchar(vec[1])))
}
Which can then be implemented like so.
df %>%
rowwise() %>%
mutate(
!!ex_fun(vector_1, 1) := sum(c_across(vector_1)),
!!ex_fun(vector_2, 1) := sum(c_across(vector_2)),
)
-=-= Extra note -=--=
If you list your vectors up you could then combine this with r2evans answer and stick into a loop if you prefer.
vectors = list(vector_1, vector_2)
for (v in vectors){
df[ex_fun(v, 1)] <- rowSums(df[,v])
}
I believe this might work, so long as only the starting digits are different:
library("tidyverse")
#Input dataframe.
data <- data.frame(Name =c("r1", "r2", "r3", "r4"), A_12 = c(1, 2, 5, 7), B_12 = c(5, 4, 15, 8),
C_12 = c(12, 7, 16, 0), D_12 = c(21, 10, 9, 7), E_12 = c(15, 9, 6, 18))
#add all vectors to the "vectors" list. I have added vector_1 and vector_2, but
#there can be as many vectors as needed, they just need to be put in the list.
vector_1 <- c("A_12","B_12","C_12")
vector_2 <- c("B_12","C_12","D_12","E_12")
vector_list<-list(vector_1, vector_2)
vector_sum <- function(data, vector_list){
output <- data |>
dplyr::select(1, all_of(vector_list[[1]]))
for (i in vector_list) {
name1 <- substring(as.character(i), 1,1) |> paste(collapse = '')
name2 <- substring(as.character(i[1]), 2)
input_temp <- dplyr::select(data, all_of(i))
input_temp <- mutate(input_temp, temp=rowSums(input_temp))
names(input_temp)[names(input_temp) == "temp"] <- paste(name1, name2)
output = cbind(output, input_temp)
}
output[, !duplicated(colnames(output))]
}
vector_sum(data, vector_list)

Replace values in one dataframe with another thats not NA

I have two dataframes A and B, that share have the same column names and the same first column (Location)
A <- data.frame("Location" = 1:3, "X" = c(21,15, 7), "Y" = c(41,5, 5), "Z" = c(12,103, 88))
B <- data.frame("Location" = 1:3, "X" = c(NA,NA, 14), "Y" = c(50,8, NA), "Z" = c(NA,14, 12))
How do i replace the values in dataframe A with the values from B if the value in B is not NA?
Thanks.
We can use coalesce
library(dplyr)
A %>%
mutate(across(-Location, ~ coalesce(B[[cur_column()]], .)))
-output
# Location X Y Z
#1 1 21 50 12
#2 2 15 8 14
#3 3 14 5 12
Here's an answer in base R:
i <- which(!is.na(B),arr.ind = T)
A[i] <- B[i]
A
Location X Y Z
1 1 21 50 12
2 2 15 8 14
3 3 14 5 12
One option with fcoalesce from data.table pakcage
list2DF(Map(data.table::fcoalesce,B,A))
gives
Location X Y Z
1 1 21 50 12
2 2 15 8 14
3 3 14 5 12

How to compare the sign of two columns?

I have a dataframe with two columns. I want to compare the signs of each element in the column and see when it differs. It is easier to see with an example.
This is the dataframe:
df = data.frame(COL1 = rnorm(15, 0, 1), COL2 = rnorm(15, 0, 1))
COL1 COL2
1 0.01274137 -0.97966119
2 -0.48455106 1.19248167
3 -0.79149435 -1.45365392
4 -0.18961660 0.02216361
5 -0.34771000 1.39026672
6 0.28199427 0.49143945
7 -0.28650800 -0.71676355
8 -0.29677529 1.13092654
9 -0.24240084 0.99432286
10 2.13540200 0.66348347
11 1.94442199 0.53371032
12 -1.63108069 -0.21556863
13 0.38334186 -0.91472900
14 1.15981803 -0.54540520
15 1.04363634 -1.68835445
I would like to have a code that compares the signs of COL1 and COL2 and tells me when it differs. The outcome should be:
# rows where the sign differs: 1, 2, 3, 4, 5, 8, 9, 13, 14, 15
Can anyone help me with this?
Thanks
You can retrieve sign of each element with sign, and which retrieves the index of the inequalities
which(sign(df$COL1) != sign(df$COL2))
Edit: Warning, all three current answers above fail when there are NA values.
set.seed(4)
df2 = data.frame(COL1 = rnorm(15, 0, 1), COL2 = rnorm(15, 0, 1))
df2[1, 1] <- NA
COL1 COL2
1 NA 0.1690268
2 -0.54249257 1.1650268
3 0.89114465 -0.0442040
4 0.59598058 -0.1003684
5 1.63561800 -0.2834446
6 0.68927544 1.5408150
7 -1.28124663 0.1651690
8 -0.21314452 1.3076224
9 1.89653987 1.2882569
10 1.77686321 0.5928969
11 0.56660450 -0.2829437
12 0.01571945 1.2558840
13 0.38305734 0.9098392
14 -0.04513712 -0.9280281
15 0.03435191 1.2401808
which(sign(df2$COL1) != sign(df2$COL2))
[1] 2 3 4 5 7 8 11
which(sign(df2[,1] * df2[,2]) == -1)
[1] 2 3 4 5 7 8 11
which(df2$COL1 < 0 & df2$COL2 > 0 | df2$COL1 > 0 & df2$COL2 < 0)
[1] 2 3 4 5 7 8 11
Here is a solution that works if you have NA values, which tests equality and retrieves index when equality values are not in ! ... %in% TRUE, as opposed to != TRUE
which(!(sign(df2$COL1) == sign(df2$COL2)) %in% TRUE)
[1] 1 2 3 4 5 7 8 11
Compare output of
! NA %in% TRUE
[1] TRUE
NA != TRUE
[1] NA
How about multiplying the columns together and getting the sign with sign?
which(sign(data[,1] * data[,2]) == -1)
[1] 1 2 4 5 8 9 13 14 15
You can just apply logic comparing the columns if they're are < or > zero.
library(dplyr)
df %>%
filter(COL1 < 0 & COL2 > 0 | COL1 > 0 & COL2 < 0)
The index of rows can be obtained using which
which(df$COL1 < 0 & df$COL2 > 0 | df$COL1 > 0 & df$COL2 < 0)

mutate based on conditional sum in a group

Say I have a dataframe like this:
set.seed(1)
n <- 20
df <- data.frame(ID = sample(1:5, n, replace = TRUE),
Fac1 = sample(letters[1:5], n, replace = TRUE),
Fac2 = sample(LETTERS[10:15], n, replace = TRUE),
Val1 = sample(1:10, n, replace = TRUE)) %>%
arrange(ID) %>% group_by(ID,Fac1) %>%
summarise(Val1 = sum(Val1),Fac2 = first(Fac2)) %>%
group_by(ID,Fac2) %>%
mutate(Val2 = sum(Val1))
df
ID Fac1 Val1 Fac2 Val2
1 1 b 9 N 9
2 1 c 9 O 9
3 2 a 4 K 4
4 2 b 10 M 18
5 2 c 4 L 4
6 2 d 8 M 18
7 2 e 10 N 10
8 3 d 14 N 14
9 4 b 8 L 22
10 4 c 14 L 22
11 4 d 9 K 9
12 4 e 6 N 6
13 5 a 13 M 13
14 5 b 3 N 3
ID is a grouping variable. Rows with an Fac1 value of e should have the Fac2 value changed to be that same as the other row in the group where Fac1 is either b or c and the sum of Val 2 for the two rows if greater than 20. (I've simplified this to the point where you probably don't get why but just work with me).
This is what I have tried so far:
result <- df %>% group_by(ID) %>%
mutate(Fac2 = case_when(
Fac1 == "e" &
sum(Val2,ifelse(Fac1 %in% c("b","c"), Val2, 0)) > 20 ~
ifelse(sum(Val2,ifelse(Fac1 %in% c("b","c"),Val2,0)) > 20,
as.character(Fac2),
NA_character_),
TRUE ~ as.character(Fac2)
))
It doesn't work properly because it is summing the first value of Val2 in the group rather than only doing so when Fac1 is b or c.
Any ideas?
Adding desired outcome:
ID Fac1 Val1 Fac2 Val2
1 1 b 9 N 9
2 1 c 9 O 9
3 2 a 4 K 4
4 2 b 10 M 18
5 2 c 4 L 4
6 2 d 8 M 18
7 2 e 10 M 10 **Changed to M b/c row 4 is M and 10 + 18 > 20
8 3 d 14 N 14
9 4 b 8 L 22
10 4 c 14 L 22
11 4 d 9 K 9
12 4 e 6 L 6 **Changed to L b/c row 10 is L and 6 + 22 > 20
13 5 a 13 M 13
14 5 b 3 N 3
I'm having a hard time following what you are wanting the values to be changed to.
But when I have multiple conditions or decisions that need to be made in a sequence, I use a loop and a series of if statements to go through the data frame. I prefer while loops, so that's what I'll use in the example.
counter <- 1
stopper <- nrow(df)
while (counter <= stopper) {
fac1 <- df$Fac1[counter1]
if (fac1 == 'e') {
if ([INSERT NEXT CONDITION]) #Change whichever value your trying to change using the counter to reference the correct row.
else #Change whichever value your trying to change using the counter to reference the correct row.
}
counter <- counter + 1
}
For me, simplifying the code makes it a lot easier for me to keep track of what decisions are being made. It also allows for complex decisions that are difficult to get functions to work with.
I was able to get the desired result with this code. I made a new column containing the result of the test for what value to replace Fac2 with, which wasn't entirely necessary but makes it more readable and debugable.
The key thing was to use first(na.omit()) to get the value from a different row in the same group which met the condition.
result <- df %>% group_by(ID) %>%
mutate(Max_bc_Val = ifelse(Val2 == max(ifelse(Fac1 %in% c("b","c"),
Val2,0)),
ifelse(Fac1 %in% c("b","c"),
as.character(Fac2),NA),NA)) %>%
mutate(Fac2 = case_when(
Fac1 == "e" ~ ifelse(is.na(first(na.omit(Max_bc_Val))),
NA_character_,
first(na.omit(Max_bc_Val))),
TRUE ~ as.character(Fac2)))
This works but doesn't seem like the best solution. Any other ideas?

Resources