dataframe column wise subtraction and division. - r

need help in N number or column wise subtraction and division, Below are the columns in a input dataframe.
input dataframe:
> df
A B C D
1 1 3 6 2
2 3 3 3 4
3 1 2 2 2
4 4 4 4 4
5 5 2 3 2
formula - a, (b - a) / (1-a)
MY CODE
ABC <- cbind.data.frame(DF[1], (DF[-1] - DF[-ncol(DF)])/(1 - DF[-ncol(DF)]))
Expected out:
A B C D
1 Inf -1.5 0.8
3 0.00 0.0 -0.5
1 Inf 0.0 0.0
4 0.00 0.0 0.0
5 0.75 -1.0 0.5
But i dont want to use ncol here, cause there is a last column after column D in the actual dataframe.
So want to apply this formula only till first 4 column, IF i use ncol, it will traverse till last column in the dataframe.
Please help thanks.

What about trying:
df <- matrix(c(1,3,6,2,3,3,3,4,1,2,2,2,4,4,4,4,5,2,3,2), nrow = 5, byrow = TRUE)
df_2 <- matrix((df[,2]-df[,1])/(1-df[,1]),5,1)
df_3 <- matrix((df[,3]-df[,2])/(1-df[,2]),5,1)
df_4 <- matrix((df[,4]-df[,3])/(1-df[,3]),5,1)
cbind(df[,1],df_2,df_3,df_4)
edit: a loop version
df <- matrix(c(1,3,6,2,3,3,3,4,1,2,2,2,4,4,4,4,5,2,3,2), nrow = 5, byrow = TRUE)
test_bind <- c()
test_bind <- cbind(test_bind, df[,1])
for (i in 1:3){df_1 <- matrix((df[,i+1]-df[,i])/(1-df[,i]),5,1)
test_bind <- cbind(test_bind,df_1)}
test_bind

here is one option with tidyverse
library(dplyr)
library(purrr)
map2_df(DF[2:4], DF[1:3], ~ (.x - .y)/(1- .y)) %>%
bind_cols(DF[1], .)
# A B C D
#1 1 Inf -1.5 0.8
#2 3 0.00 0.0 -0.5
#3 1 Inf 0.0 0.0
#4 4 0.00 0.0 0.0
#5 5 0.75 -1.0 0.5

Related

How to calculate values in an R dataframe when columns are dependent on each other

First time poster - so apologies if this question is basic/poorly explained. Grateful to anyone can help or point me in the right direction!
I would like to do the following within an R dataframe if possible:
Existing Data
Column A is a vector of values, say 10 to 20.
New Data/columns
Column B will be be Column A multiplied by Column C
Column C will be Column C minus Column B from the previous row of data, i.e data$C[-1] - data$B[-1], apart the first row of Column C of course, which I will give a fixed value.
I have tried these as separate steps, but I keep overwriting columns B and C, and have a feeling I have been going about this the wrong way! I could share my code, but I think this would confuse matters!
Thanks in advance!
EDIT TO ADD CODE:
A <- c(0.1,0.2,0.3,0.4,0.5)
df1 <- data.frame(A)
df1$B <- 0
df1$C <- 0
df1$C[1] <- 100
df2 <- df1 %>%
mutate(B = C * A,
C = lag(C-B))
RESULT FROM THE ABOVE
A
B
C
1
0.1
10
NA
2
0.2
0
90
3
0.3
0
0
4
0.4
0
0
5
0.5
0
0
EXPECTED OUTPUT
A
B
C
1
0.1
10
100
2
0.2
18
90
3
0.3
21.6
72
4
0.4
20.16
50.4
5
0.5
15.12
30.24
C2 = C1 - B1
B2 = C2 * A2
We can use accumulate from purrr to do recursive update
library(dplyr)
library(purrr)
tmp <- with(df1, accumulate(A, ~ .x - (.x * .y), .init = first(C)))
df2 <- df1 %>%
mutate(C = head(tmp, -1), B = -diff(tmp))
df2
# A B C
#1 0.1 10.00 100.00
#2 0.2 18.00 90.00
#3 0.3 21.60 72.00
#4 0.4 20.16 50.40
#5 0.5 15.12 30.24
Or use base R
tmp <- with(df1, Reduce(function(x, y) x - (x * y), A,
accumulate = TRUE, init = C[1]))
df2 <- transform(df1, C = head(tmp, -1), B = -diff(tmp))
If you don't mind using a mathematical approach, you can first derive the general expression for the recursion and then have the R code afterwards.
Below is one implementation with base R
transform(
transform(
df1,
C = C[1] * c(1, cumprod(1 - A)[-nrow(df1)])
),
B = A * C
)
which gives
A B C
1 0.1 10.00 100.00
2 0.2 18.00 90.00
3 0.3 21.60 72.00
4 0.4 20.16 50.40
5 0.5 15.12 30.24
A data.table option in a similar manner is
> setDT(df1)[, C := first(C) * c(1, cumprod(1 - A)[-.N])][, B := A * C][]
A B C
1: 0.1 10.00 100.00
2: 0.2 18.00 90.00
3: 0.3 21.60 72.00
4: 0.4 20.16 50.40
5: 0.5 15.12 30.24

How to collect outputs of multivariable vector-valued function into a dataframe?

I have a function f1 that take a pair of real numbers (x, y) and returns a triple of real numbers. I would like to collect all outputs of this function for all x in a vector a and y in a vector b. Could you please elaborate on how to do so?
f1 <- function(x, y){
return (c(x+y, x-y, x*y))
}
a <- seq(0, pi, 0.1)
b <- seq(0, 2 * pi, 0.1)
Update: I mean for all pair $(x, y) \in a \times b$.
Here is a data.table option
setDT(expand.grid(a, b))[, fval := do.call(Vectorize(f1, SIMPLIFY = FALSE), unname(.SD))][]
where expand.grid + do.call + Vectorize are used, giving
Var1 Var2 fval
1: 0.0 0.0 0,0,0
2: 0.1 0.0 0.1,0.1,0.0
3: 0.2 0.0 0.2,0.2,0.0
4: 0.3 0.0 0.3,0.3,0.0
5: 0.4 0.0 0.4,0.4,0.0
---
2012: 2.7 6.2 8.90,-3.50,16.74
2013: 2.8 6.2 9.00,-3.40,17.36
2014: 2.9 6.2 9.10,-3.30,17.98
2015: 3.0 6.2 9.2,-3.2,18.6
2016: 3.1 6.2 9.30,-3.10,19.22
A more compact one is using CJ(a,b) instead of setDT(expand.grid(a, b)) (Thank #akrun's advise)
We can use expand.grid to expand the data between 'a', and 'b' values, then loop over the row with apply, MARGIN = 1 and apply the f1
out <- as.data.frame(t(apply(expand.grid(a, b), 1, function(x) f1(x[1], x[2]))))
Or with tidyverse
library(dplyr)
library(purrr)
library(tidyr)
out2 <- crossing(x = a, y = b) %>%
pmap_dfr(f2)
-output
head(out2)
# A tibble: 6 x 3
# add subtract multiply
# <dbl> <dbl> <dbl>
#1 0 0 0
#2 0.1 -0.1 0
#3 0.2 -0.2 0
#4 0.3 -0.3 0
#5 0.4 -0.4 0
#6 0.5 -0.5 0
where f2
f2 <- function(x, y){
return (tibble(add = x+y, subtract = x-y, multiply = x*y))
}
It may be better to return a list or tibble so that it becomes easier
Create all possible combinations with expand.grid and use Map to apply f1 to every pair.
val <- expand.grid(a, b)
result <- do.call(rbind, Map(f1, val$Var1, val$Var2))
head(result)
# [,1] [,2] [,3]
#[1,] 0.0 0.0 0
#[2,] 0.1 0.1 0
#[3,] 0.2 0.2 0
#[4,] 0.3 0.3 0
#[5,] 0.4 0.4 0
#[6,] 0.5 0.5 0

Find the nth largest values in the top row and omit the rest of the columns in R

I am trying to change a data frame such that I only include those columns where the first value of the row is the nth largest.
For example, here let's assume I want to only include the columns where the top value in row 1 is the 2nd largest (top 2 largest).
dat1 = data.frame(a = c(0.1,0.2,0.3,0.4,0.5), b = c(0.6,0.7,0.8,0.9,0.10), c = c(0.12,0.13,0.14,0.15,0.16), d = c(NA, NA, NA, NA, 0.5))
a b c d
1 0.1 0.6 0.12 NA
2 0.2 0.7 0.13 NA
3 0.3 0.8 0.14 NA
4 0.4 0.9 0.15 NA
5 0.5 0.1 0.16 0.5
such that a and d are removed, because 0.1 and NA are not the 2nd largest values in
row 1. Here 0.6 and 0.12 are larger than 0.1 and NA in column a and d respectively.
b c
1 0.6 0.12
2 0.7 0.13
3 0.8 0.14
4 0.9 0.15
5 0.1 0.16
Is there a simple way to subset this? I do not want to order it, because that will create problems with other data frames I have that are related.
Complementing pieca's answer, you can encapsulate that into a function.
Also, this way, the returning data.frame won't be sorted.
get_nth <- function(df, n) {
df[] <- lapply(df, as.numeric) # edit
cols <- names(sort(df[1, ], na.last = NA, decreasing = TRUE))
cols <- cols[seq(n)]
df <- df[names(df) %in% cols]
return(df)
}
Hope this works for you.
Sort the first row of your data.frame, and then subset by names:
cols <- names(sort(dat1[1,], na.last = NA, decreasing = TRUE))
> dat1[,cols[1:2]]
b c
1 0.6 0.12
2 0.7 0.13
3 0.8 0.14
4 0.9 0.15
5 0.1 0.16
You can get an inverted rank of the first row and take the top nth columns:
> r <- rank(-dat1[1,], na.last=T)
> r <- r <= 2
> dat1[,r]
b c
1 0.6 0.12
2 0.7 0.13
3 0.8 0.14
4 0.9 0.15
5 0.1 0.16

Conditional slicing|filtering top and bottom n rows from grouped data

I have come to an issue that filtering or slicing the top and bottom n number of rows at the same time from the grouped data.
So it is different than this Select first and last row from grouped data
What I need to do that if sub_gr==a then filter|slice top three rows
if sub_gr==b then filter|slice bottom two rows that's it!
my data something like this
df <- data.frame(gr=rep(seq(1,2),each=10),sub_gr=rep(rep(c("a","b"),each=5),2),
y = rep(c(sort(runif(5,0,0.5),decreasing=TRUE), sort(runif(5,0,0.5),,decreasing=TRUE)),2),
x = rep(c(seq(0.1,0.5,0.1),rev(seq(-0.5,-0.1,0.1))),2))
gr sub_gr y x
1 1 a 0.37851909 0.1
2 1 a 0.33305165 0.2
3 1 a 0.22478005 0.3
4 1 a 0.09677654 0.4
5 1 a 0.07060651 0.5
6 1 b 0.41999445 -0.1
7 1 b 0.35356301 -0.2
8 1 b 0.33274398 -0.3
9 1 b 0.20451400 -0.4
10 1 b 0.03714828 -0.5
11 2 a 0.37851909 0.1
12 2 a 0.33305165 0.2
13 2 a 0.22478005 0.3
14 2 a 0.09677654 0.4
15 2 a 0.07060651 0.5
16 2 b 0.41999445 -0.1
17 2 b 0.35356301 -0.2
18 2 b 0.33274398 -0.3
19 2 b 0.20451400 -0.4
20 2 b 0.03714828 -0.5
library(dplyr)
Here is what I tried,
df%>%
group_by(gr, sub_gr)%>%
slice(if(any(sub_gr=="a")) {row_number()==1:3} else {row_number()==4:n()})
Warning messages:
1: In 1:5 == 1:3 :
longer object length is not a multiple of shorter object length
2: In 1:5 == 4:5L :
longer object length is not a multiple of shorter object length
3: In 1:5 == 1:3 :
longer object length is not a multiple of shorter object length
4: In 1:5 == 4:5L :
longer object length is not a multiple of shorter object length
thanks for your help in advance!
There are probably more elegant solutions, but I think the following works. I set seed for reproducibility.
set.seed(123)
df <- data.frame(gr=rep(seq(1,2),each=10),sub_gr=rep(rep(c("a","b"),each=5),2),
y = rep(c(sort(runif(5,0,0.5),decreasing=TRUE), sort(runif(5,0,0.5),,decreasing=TRUE)),2),
x = rep(c(seq(0.1,0.5,0.1),rev(seq(-0.5,-0.1,0.1))),2))
df %>%
group_by(gr, sub_gr) %>%
filter((sub_gr %in% "a" & row_number() %in% 1:3) |
(sub_gr %in% "b" & row_number() %in% (n() - 1):n())) %>%
ungroup()
# # A tibble: 10 x 4
# gr sub_gr y x
# <int> <fctr> <dbl> <dbl>
# 1 1 a 0.47023364 0.1
# 2 1 a 0.44150870 0.2
# 3 1 a 0.39415257 0.3
# 4 1 b 0.22830737 -0.4
# 5 1 b 0.02277825 -0.5
# 6 2 a 0.47023364 0.1
# 7 2 a 0.44150870 0.2
# 8 2 a 0.39415257 0.3
# 9 2 b 0.22830737 -0.4
# 10 2 b 0.02277825 -0.5
library(tidyverse)
# create a custom function to take the head or tail based on your rule
cond_slice <- function(x) {
if (unique(x$sub_gr) == "a") {
head(x, 3)
} else {
tail(x, 2)
}
}
# create a column to split by and then map across the subsets
result <- x %>%
unite(split_by, gr, sub_gr, remove = F) %>%
split(.$split_by) %>%
map(cond_slice) %>%
bind_rows() %>%
select(-split_by)

Tab Delimited to Square Matrix

I have a tab delimited file like
A B 0.5
A C 0.75
B D 0.2
And I want to convert it to a square matrix, like
A B C D
A 0 0.5 0.75 0
B 0 0 0.2
C 0 0
D 0
How can I go about it in R?
Thanks,
If you have the data in a data frame with the following column names:
Var1 Var2 value
you can use
xtabs(value ~ Var1 + Var2, data = df)
See the plyr package for some more general data reshaping functions also.
Another approach (not as elegant as JoFrhwld's)
df<- read.table(textConnection("
Var1 Var2 value
A B 0.5
A C 0.75
B D 0.2
"),header = T)
lev = unique(c(levels(df$Var1),levels(df$Var2)))
A = matrix(rep(0,length(lev)^2),nrow=length(lev))
colnames(A) = lev
rownames(A) = lev
apply(df,1,function(x) A[x[1],x[2]]<<-as.numeric(x[3]))
> A
A B C D
A 0 0.5 0.75 0.0
B 0 0.0 0.00 0.2
C 0 0.0 0.00 0.0
D 0 0.0 0.00 0.0
>
I'm guessing this is a weighted adjacency matrix for a graph. If so, you might be interested in the igraph package, to read the data as a weighted edge list.

Resources