Im trying to write a function with nested if-else in R. How can I convert a data.frame where the values of columns are set to:
Input
df <- read.table(header = TRUE, text="Chr start end num seg.mean seg.mean.1 seg.mean.2
1 68580000 68640000 A8430 0.7000 0 0.1032
1 115900000 116260000 B8430 0.0039 2.7202 2.7202
1 173500000 173680000 C5 -1.7738 -2.0746 -0.2722")
condition:
x > 0 & x< 1 : 1
x >= 1 : 2
x < 0 & x > - 1 : -1
x <= -1 : -2
x = 0 : 0
expected output
df <- read.table(header = TRUE, text="Chr start end num seg.mean seg.mean.1 seg.mean.2
1 68580000 68640000 A8430 1 0 1
1 115900000 116260000 B8430 1 2 2
1 173500000 173680000 C5 -2 -2 -1")
fun_cond <- function(x) { ifelse( x >= 1, 2,ifelse( x > 0 & x < 1, 1),ifelse( x <= 1, 2,ifelse( x < 0 & x > -1, -1)))}
new_df[5:length(df)] <- lapply(new_df[5:length(df)], fun_cond)
I think what you want is this:
x = c(-1, 1, 0, 0, 1, -1, 0.5, 0.3, -0.4)
fun_cond(x)
fun_cond <- function(x){
ifelse(x >= 1, 2, ifelse(x < 1 & x > 0, 1, ifelse(x < 0 & x > -1, -1, -2)))
}
> fun_cond(x)
#[1] -2 2 -2 -2 2 -2 1 1 -1
Try it out...
Note that x == 0 is -2. There is no x <= 0 ... or x >= 0 ... expression like you described it.
If you want 0 as zero then use:
x = c(-1,1,0,0,1,-1,0.5,0.3, -0.4)
fun_cond(x)
fun_cond <- function(x){
ifelse(x >= 1, 2, ifelse(x < 1 & x > 0, 1, ifelse( x == 0, 0, ifelse(x < 0 & x > -1, -1, -2))))
}
> fun_cond(x)
#[1] -2 2 0 0 2 -2 1 1 -1
Try cut in base R:
cols <- grep("seg.mean", names(df))
res <- sapply(cols, function(i)
cut(df[,i], breaks = c(-Inf, -1, 0, 1, Inf), labels = c(-2,-1,1,2)))
# to leave zeros untouched
res[df[cols]==0] <- 0
If you want to get your expected output:
df[cols] <- res
# Chr start end num seg.mean seg.mean.1 seg.mean.2
# 1 1 68580000 68640000 A8430 1 0 1
# 2 1 115900000 116260000 B8430 1 2 2
# 3 1 173500000 173680000 C5 -2 -2 -1
Related
For the example data df, I want to replace the negative values in the first column (x1) with 0 and the third column (x3) with NA by the function replace_negatives as follows:
df <- data.frame(x1 = -3:1,
x2 = -1,
x3 = -2:2)
df
Out:
x1 x2 x3
1 -3 -1 -2
2 -2 -1 -1
3 -1 -1 0
4 0 -1 1
5 1 -1 2
Please note that I do not index by column names because there are many columns in the actual data and the column names are not fixed.
replace_negatives <- function(data){
df <<- data %>%
mutate(.[[1]] = if_else(.[[2]] < 0, 0, .[[1]])) %>%
mutate(.[[3]] = if_else(.[[3]] < 0, NA, .[[3]]))
return(df)
}
lapply(df, replace_negatives)
But it raises an error:
> replace_negatives <- function(data){
+ df <<- data %>%
+ mutate(.[[1]] = if_else(.[[2]] < 0, 0, .[[1]])) %>%
Error: unexpected '=' in:
" df <<- data %>%
mutate(.[[1]] ="
> mutate(.[[3]] = if_else(.[[3]] < 0, NA, .[[3]]))
Error: unexpected '=' in " mutate(.[[3]] ="
> return(df)
Error: no function to return from, jumping to top level
> }
Error: unexpected '}' in "}"
Any helps would be appreciated.
The expected output:
x1 x2 x3
1 0 -1 NA
2 0 -1 NA
3 0 -1 0
4 0 -1 1
5 1 -1 2
To perform the required operation, here's a base R method:
df <- data.frame(x1 = -3:1,
x2 = -1,
x3 = -2:2)
df[[1]] <- ifelse(df[[1]] < 0, 0, df[[1]])
df[[3]] <- ifelse(df[[3]] < 0, NA, df[[3]])
df
#> x1 x2 x3
#> 1 0 -1 NA
#> 2 0 -1 NA
#> 3 0 -1 0
#> 4 0 -1 1
#> 5 1 -1 2
Created on 2022-04-18 by the reprex package (v2.0.1)
You could use across in the function:
library(tidyverse)
replace_negatives <- function(data){
df <- data %>%
mutate(across(1, ~ ifelse(. < 0, 0, .)),
across(3, ~ ifelse(. < 0, NA, .)))
return(df)
}
replace_negatives(df)
Output
x1 x2 x3
1 0 -1 NA
2 0 -1 NA
3 0 -1 0
4 0 -1 1
5 1 -1 2
Here is base R version of your function:
replace_negatives <- function(df){
is.na(df[,1]) <- df[,1] < 0
index <- df[,3] < 0
df[,3][index] <- 0
return(df)
}
replace_negatives(df)
x1 x2 x3
1 NA -1 0
2 NA -1 0
3 NA -1 0
4 0 -1 1
5 1 -1 2
My objective is to do a cumulative sum of the elements of a vector and assign the result to each element. But when certain condition is reached, then reset the cumulative sum.
For example:
vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)
Now, suppose that the condition to reset the cumulative sum is that the next element has a different sign.
Then the desired output is:
vector_B <- c(1, 2, -1, -2, -3, 1, -1, -2, 1, -1)
How can I achieve this?
A base R option with Reduce
> Reduce(function(x, y) ifelse(x * y > 0, x + y, y), vector_A, accumulate = TRUE)
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
or using ave + cumsum
> ave(vector_A, cumsum(c(1, diff(sign(vector_A)) != 0)), FUN = cumsum)
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
Using ave:
ave(vector_A, data.table::rleid(sign(A)), FUN = cumsum)
# [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
A formula version of accumulate:
purrr::accumulate(vector_A, ~ ifelse(sign(.x) == sign(.y), .x + .y, .y))
# [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
You can use a custom function instead of cumsum and accumulate results using e.g. purrr::accumulate:
library(purrr)
vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)
purrr::accumulate(vector_A, function(a,b) {
if (sign(a) == sign(b))
a+b
else
b
})
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
or if you want to avoid any branch:
purrr::accumulate(vector_A, function(a,b) { b + a*(sign(a) == sign(b))})
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
The approach that comes to mind is to find the runs (rle()) defined by the
condition (sign()) in the data, apply cumsum() on each run separately
(tapply()), and the concatenate back into a vector (unlist()). Something
like this:
vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)
run_length <- rle(sign(vector_A))$lengths
run_id <- rep(seq_along(run_length), run_length)
unlist(tapply(vector_A, run_id, cumsum), use.names = FALSE)
#> [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
Wrapping the process up a bit, I’d maybe put finding the grouping factor (run
index) in a function? And then the grouped summary will need to be done using
existing tools, like tapply() above, or a creative ave(), or in the
context of data frames, a group_by() and summarise() with dplyr.
run_index <- function(x) {
with(rle(x), rep(seq_along(lengths), lengths))
}
ave(vector_A, run_index(sign(vector_A)), FUN = cumsum)
#> [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
I have the following dataframe:
> dput(df)
structure(list(x = c(0.871877138037235, 0.534444199409336, 0.677225327817723,
0.124835065566003, 0.972407285822555, 0.179870884865522, 0.468708630651236,
0.405605535488576, 0.717907374724746, 0.157441936200485), y = c(0,
1, 2, 0, 0, 0, 0, 0, 1, 0)), class = "data.frame", row.names = c(NA,
-10L))
i.e.
> df
x y
1 0.8718771 0
2 0.5344442 1
3 0.6772253 2
4 0.1248351 0
5 0.9724073 0
6 0.1798709 0
7 0.4687086 0
8 0.4056055 0
9 0.7179074 1
10 0.1574419 0
I would like to obtain a new dataframe considering the following rules:
If in column y appear 1 and 2 (or 2 and 1) sequentially, then multiply the next 3 values in column x by -1.4
If in column y appears 1 (and just 1), then multiply the next 3 values column x by -1
If in column y appear 1 and 3 (or 3 and 1) sequentially, then multiply the next 3 values column x by -0.6
If in column y appears 2 (and just 2), then multiply the next 3 values column x by 1.4
In our case the desired result is:
> df
x y
1 0.8718771 0
2 0.5344442 1
3 0.6772253 2
4 -0.1747691 0
5 -1.36137 0
6 -0.2518193 0
7 0.4687086 0
8 0.4056055 0
9 0.7179074 1
10 -0.1574419 0
This solution may sound ugly but I think it's quite stable, however it may need further testings and improvements:
library(dplyr)
# First I set out to detect every observation that falls into any of the 4 categories
df %>%
mutate(z = case_when(
lag(y, n = 2, default = 0) %in% c(1, 2) & lag(y, default = 0) %in% c(2, 1) ~ 1,
lag(y, n = 2, default = 0) == 0 & lag(y, default = 0) == 1 & y == 0 ~ 2,
lag(y, n = 2, default = 0) %in% c(1, 3) & lag(y, default = 0) %in% c(2, 1) ~ 3,
lag(y, n = 2, default = 0) == 0 & lag(y, default = 0) == 2 & y == 0 ~ 4,
TRUE ~ 0
)) -> DF
# Then I wrote a custom function to apply multiplication phase on a sequence of three rows
fn <- function(x) {
out <- x$x
for(i in 1:nrow(x)) {
if(x$z[i] == 1) {
out[i:(i+2)] <- out[i:(i+2)] * (-1.4)
} else if(x$z[i] == 2) {
out[i:(i+2)] <- out[i:(i+2)] * (-1)
} else if(x$z[i] == 3) {
out[i:(i+2)] <- out[i:(i+2)] * (-0.6)
} else if(x$z[i] == 4) {
out[i:(i+2)] <- out[i:(i+2)] * (1.4)
} else {
out[i:(i+2)] <- out[i:(i+2)] * 1
}
}
dt <- cbind(new_x = out[!is.na(out)], y = x$y) |> as.data.frame()
dt
}
fn(DF)
new_x y
1 0.8718771 0
2 0.5344442 1
3 0.6772253 2
4 -0.1747691 0
5 -1.3613702 0
6 -0.2518192 0
7 0.4687086 0
8 0.4056055 0
9 0.7179074 1
10 -0.1574419 0
A for loop
df <- structure(list(x = c(0.871877138037235, 0.534444199409336, 0.677225327817723,
0.124835065566003, 0.972407285822555, 0.179870884865522, 0.468708630651236,
0.405605535488576, 0.717907374724746, 0.157441936200485), y = c(0,
1, 2, 0, 0, 0, 0, 0, 1, 0)), class = "data.frame", row.names = c(NA,
-10L))
df
#> x y
#> 1 0.8718771 0
#> 2 0.5344442 1
#> 3 0.6772253 2
#> 4 0.1248351 0
#> 5 0.9724073 0
#> 6 0.1798709 0
#> 7 0.4687086 0
#> 8 0.4056055 0
#> 9 0.7179074 1
#> 10 0.1574419 0
for(i in 2:nrow(df)){
if((df$y[i] == 1 & df$y[i+1] ==2) | (df$y[i] == 2 & df$y[i+1] ==1)) {
df$x[seq(i+2, by = 1, length.out= min(nrow(df) - (i+1), 3))] <- df$x[seq(i+2, by = 1, length.out=min(nrow(df) - (i+1), 3))] * -1.4
} else if ((df$y[i] == 1 & df$y[i+1] ==3) | (df$y[i] == 3 & df$y[i+1] ==1)){
df$x[seq(i+2, by = 1, length.out= min(nrow(df) - (i+1), 3))] <- df$x[seq(i+2, by = 1,length.out= min(nrow(df) - (i+1), 3))] * -0.6
} else if (df$y[i] == 1 & !df$y[i+1] %in% c(1,2,3) & !df$y[i-1] %in% c(1,2,3) ) {
df$x[seq(i+1, by = 1, length.out=min(nrow(df) - (i), 3)) ] <- df$x[seq(i+1, by = 1, length.out= min(nrow(df) - (i), 3))] * -1
} else if (df$y[i] == 2 & !df$y[i+1] %in% c(1,2,3) & !df$y[i-1] %in% c(1,2,3)) {
df$x[seq(i+1, by = 1, length.out=min(nrow(df) - (i), 3)) ] <- df$x[seq(i+1, by = 1, length.out=min(nrow(df) - (i), 3))] * 1.4
}
}
df
#> x y
#> 1 0.8718771 0
#> 2 0.5344442 1
#> 3 0.6772253 2
#> 4 -0.1747691 0
#> 5 -1.3613702 0
#> 6 -0.2518192 0
#> 7 0.4687086 0
#> 8 0.4056055 0
#> 9 0.7179074 1
#> 10 -0.1574419 0
Created on 2021-06-26 by the reprex package (v2.0.0)
library(lpSolveAPI)
my.lp <- make.lp(nrow = 3, ncol = 2)
set.column(my.lp, 1, c(1, 1, 2))
set.column(my.lp, 2, c(3, 1, 0))
set.objfn(my.lp, c(1, 0))
set.constr.type(my.lp, rep("<=", 3))
set.rhs(my.lp, c(4, 2, 3))
set.bounds(my.lp, lower = c(-Inf, -Inf), upper = c(Inf, Inf))
> my.lp
Model name:
C1 C2
Minimize 1 0
R1 1 3 <= 4
R2 1 1 <= 2
R3 2 0 <= 3
Kind Std Std
Type Real Real
Upper Inf Inf
Lower -Inf -Inf
In my.lp, the objective function is set to minimization. How can I change this to maximization? It's not clear to me by looking at the help page of set.objfn.
Set the sense (minimize/maximize) with lp.control.
lp.control(my.lp, sense="max")
my.lp
#Model name:
# C1 C2
#Maximize 1 0
#R1 1 3 <= 4
#R2 1 1 <= 2
#R3 2 0 <= 3
#Kind Std Std
#Type Real Real
#Upper Inf Inf
#Lower -Inf -Inf
solve(my.lp) # returns 0, success
#[1] 0
get.variables(my.lp)
#[1] 1.5 0.0
get.objective(my.lp)
#[1] 1.5
Input
final_table =
Chr start end num seg.mean seg.mean.1 seg.mean.2
1 68580000 68640000 A8430 0.7000 0.1440 0.1032
1 115900000 116260000 B8430 0.0039 2.7202 2.7202
1 173500000 173680000 C5 -1.7738 -0.0746 -0.2722
How can I make a new data.frame where the values of columns 5 through 7 are set to:
-1, if value < -0.679
0, if -0.679 <= value <= 0.450
+1, if value > 0.450
Expected output
Chr start end num seg.mean seg.mean.1 seg.mean.2
1 68580000 68640000 A8430 1 0 0
1 115900000 116260000 B8430 0 1 1
1 173500000 173680000 C5 -1 0 0
try this:
# read the data in
df <- read.table(header = TRUE, text="Chr start end num seg.mean seg.mean.1 seg.mean.2
1 68580000 68640000 A8430 0.7000 0.1440 0.1032
1 115900000 116260000 B8430 0.0039 2.7202 2.7202
1 173500000 173680000 C5 -1.7738 -0.0746 -0.2722")
# get the column-names of the columns you wanna change
cols <- names(df[5:length(df)])
# set a function for the different values you want for the value-ranges
fun_cond <- function(x) {
ifelse(x < -0.679 , -1, ifelse(
x >= -0.679 & x <= 0.450, 0, 1))
}
# copy the data-frame so the old one doesnt get overwritten
new_df <- df
# work with data-table to apply the function to the columns
library(data.table)
setDT(new_df)[ , (cols) := lapply(.SD, fun_cond), .SDcols = cols]
output:
Chr start end num seg.mean seg.mean.1 seg.mean.2
1: 1 68580000 68640000 A8430 1 0 0
2: 1 115900000 116260000 B8430 0 1 1
3: 1 173500000 173680000 C5 -1 0 0
same thing without using any additional packages:
cols <- names(df[5:length(df)])
fun_cond <- function(x) {
ifelse(x < -0.679 , -1, ifelse(
x >= -0.679 & x <= 0.450, 0, 1))
}
new_df <- df
new_df[5:length(df)] <- lapply(new_df[5:length(df)], fun_cond)
I'd use the cut function and apply it to the last three columns individually.
Here's a simple example:
original = data.frame(a=c(rep("A", 2), rep("B", 2)), seg.mean=c(-1, 0, 0.4, 0.5));
original$segmented = cut(original$seg.mean, c(-Inf, -0.679, 0.450, Inf), labels = c(-1,0,1))
One thing to be careful about: the new column will be a factor. If you need numerical values, you may need to apply as.numeric to it.
You can also try to use labels=FALSE which will give you numerical values (but likely 1,2,3 rather than -1,0,1). You can fix that by subtracting 2:
original$segmented = cut(original$seg.mean, c(-Inf, -0.679, 0.450, Inf), labels = FALSE)-2
You can directly replace fields in the data frame by subsetting
df[, 5:7] <- ifelse(df[, 5:7] < -0.679, -1,
ifelse(df[, 5:7] < 0.450, 0,
1))