pairwise subtraction of columns in a dataframe in R - r

I was wondering is there a way to automate (e.g., loop) the subtraction of (X2-X1), (X3-X1), (X3-X2) in my data below and add them as three new columns to the data?
m="
id X1 X2 X3
A 1 0 4
B 2 2 2
C 3 4 1"
data <- read.table(text = m, h = T)

This is very similar to this question; we basically just need to change the function that we are using in map2_dfc:
library(tidyverse)
combn(names(data)[-1], 2) %>%
map2_dfc(.x = .[1,], .y = .[2,],
.f = ~transmute(data, !!paste0(.y, "-", .x) := !!sym(.y) - !!sym(.x))) %>%
bind_cols(data, .)
#> id X1 X2 X3 X2-X1 X3-X1 X3-X2
#> 1 A 1 0 4 -1 3 4
#> 2 B 2 2 2 0 0 0
#> 3 C 3 4 1 1 -2 -3

With combn:
dif <- combn(data[-1], 2, \(x) x[, 2] - x[, 1])
colnames(dif) <- combn(names(data)[-1], 2, \(x) paste(x[2], x[1], sep = "-"))
cbind(data, dif)
# id X1 X2 X3 X2-X1 X3-X1 X3-X2
#1 A 1 0 4 -1 3 4
#2 B 2 2 2 0 0 0
#3 C 3 4 1 1 -2 -3

Related

nested for loop in R, where the second index counts inside the first one

I have for example a datset like this:
data <- data.frame(matrix(c(1,2,2,3,4,5,5,"a","a","b","a","a","a","b"), nrow = 7, ncol = 2, byrow = F))
X1 X2
1 a
2 a
2 b
3 a
4 a
5 a
5 b
then I add another variable "tag", initially set to 0.
data$tag <- 0
X1 X2 tag
1 a 0
2 a 0
2 b 0
3 a 0
4 a 0
5 a 0
5 b 0
I'd like to have "tag" equal to 1 for each row that is repeated, like:
X1 X2 tag
1 a 0
2 a 1
2 b 1
3 a 0
4 a 0
5 a 1
5 b 1
I used the followed code:
for (i in data$X1) {
for (j in 1:length(data$X1)) {
if (j==2) {data$tag[j] <- 1}
}
}
but it doesn't work like I would like to. I'd like the second loop (j) to work inside the previous one in order to obtain what I want, where j starts from 1 every time X1 changes.
How can I manage it?
Thanks a lot
Maybe you can try ave
within(
data,
tag <- +(ave(X1, X1, FUN = length) > 1)
)
which gives
X1 X2 tag
1 1 a 0
2 2 a 1
3 2 b 1
4 3 a 0
5 4 a 0
6 5 a 1
7 5 b 1
You can use duplicated from both the ends in base R :
data$tag <- as.integer(duplicated(data$X1) |
duplicated(data$X1, fromLast = TRUE))
data
# X1 X2 tag
#1 1 a 0
#2 2 a 1
#3 2 b 1
#4 3 a 0
#5 4 a 0
#6 5 a 1
#7 5 b 1
An option with add_count
library(dplyr)
data %>%
add_count(X1) %>%
mutate(n = +(n > 1))

How to apply a function to a data frame for multiple inputs and create columns with the outputs using dplyr?

Given the following data
data_in <- data.frame(X1 = c(1, 3, 5, 2, 6),
X2 = c(2, 4, 5, 1, 8),
X3 = c(3, 2, 4, 1, 4))
I wrote a function, which takes the data frame, a value (here called distance) and a string (to add a column name) to count the number of values being smaller or equal to the input value.
custom_function <- function(some_data_frame, distance, name) {
some_data_frame %>%
mutate(!!name := rowSums(. <= distance, na.rm = TRUE)) %>%
return()
}
I can apply the function to the data as follows:
data_in %>%
custom_function(., 5, "some_name")
What I would like now is to use a vector of distances and create a column for each distance using my custom function. Let's say for c(1, 3, 5), I would like to get three columns in an automatic manner and not in hardcoding (applying the function manually three times).
There is an easy way to do that with mapply (using the same distances as in #Sotos ansswer):
(dst <- c(5, 3, 1, 6, 7, 8))
# [1] 5 3 1 6 7 8
(cnm <- paste('some_name', dst, sep = '_'))
# [1] "some_name_5" "some_name_3" "some_name_1" "some_name_6" "some_name_7" "some_name_8"
data_in[, cnm] <- mapply(function(d) rowSums(data_in <= d, na.rm = T), d = dst)
data_in
# X1 X2 X3 some_name_5 some_name_3 some_name_1 some_name_6 some_name_7 some_name_8
# 1 1 2 3 3 3 1 3 3 3
# 2 3 4 2 3 2 0 3 3 3
# 3 5 5 4 3 0 0 3 3 3
# 4 2 1 1 3 3 2 3 3 3
# 5 6 8 4 1 0 0 2 2 3
You can obtain the same results within tidyverse using purrr::map2:
cbind(
data_in,
purrr::map2(dst, cnm, ~custom_function(data_in, .x, .y))
)
# X1 X2 X3 some_name_5 some_name_3 some_name_1 some_name_6 some_name_7 some_name_8
# 1 1 2 3 3 3 1 3 3 3
# 2 3 4 2 3 2 0 3 3 3
# 3 5 5 4 3 0 0 3 3 3
# 4 2 1 1 3 3 2 3 3 3
# 5 6 8 4 1 0 0 2 2 3
With custom_function() defined as:
custom_function <- function(some_data_frame, distance, name) {
some_data_frame %>%
transmute(!!name := rowSums(. <= distance, na.rm = TRUE))
}
You can use sapply to loop through your vector and cbind at the end, i.e.
cbind.data.frame(data_in,
do.call(cbind.data.frame, sapply(c(5, 3, 1, 6, 7, 8), function(i)
custom_function(data_in, i, paste0('some_name_', i))[ncol(data_in) + 1])))
which gives,
X1 X2 X3 some_name_5 some_name_3 some_name_1 some_name_6 some_name_7 some_name_8
1 1 2 3 3 3 1 3 3 3
2 3 4 2 3 2 0 3 3 3
3 5 5 4 3 0 0 3 3 3
4 2 1 1 3 3 2 3 3 3
5 6 8 4 1 0 0 2 2 3

If/else function on a data frame to create conditional matrix

Here is the data set for reproducibility:
a=c(90.41,37.37,18.98)
b=c(103.39,39.44,51.68)
c=c(83.51,36.41,47.46)
d=c(94.60,38.57,50.22)
e=c(95.04,38.81,50.49)
xx=t(data.frame(a,b,c,d,e))
df=data.frame(xx)
And here is the if/else function I am trying run on the data frame
classify=function(df){
if (df>=110) {
class="5"}
else if (df<110 & df>=103){
class="4"}
else if (df<103 & df>=95){
class="3"}
else if (df<95 & df>=76){
class="2"}
else if (df<76){
class="1"}
else {class="none"}
}
However, what I want the if/else function to produce is a new data frame that looks like this:
df
X1 X2 X3
a 2 1 1
b 4 1 1
c 2 1 1
d 2 1 1
e 3 1 1
I am unsure as to how to do this so any help would be super appreciated. I anticipate something is wrong in the if/else function itself but I am quite inexperienced and I don't know how to detect errors in the script that easily. Thank you!
sapply(df, function(x) {as.numeric(as.character(cut(x, c(-Inf,76,95,103,110,Inf), seq(1:5))))})
X1 X2 X3
[1,] 2 1 1
[2,] 4 1 1
[3,] 2 1 1
[4,] 2 1 1
[5,] 3 1 1
Use cut to set the intervals (its 2nd argument) and the labels (its 3rd argument). However, it returns a factor, so convert back to numeric if you like that. Since you want to run the function over the the full dataframe, use sapply or lapply.
You can do this with findInterval. All you have to do is to pass it a non-decreasing vector of break points.
classify <- function(DF, breaks = c(-Inf, 76, 95, 103, 110, Inf)){
f <- function(x, breaks) findInterval(x, breaks)
DF[] <- lapply(DF, f, breaks)
DF
}
classify(df)
# X1 X2 X3
#a 2 1 1
#b 4 1 1
#c 2 1 1
#d 2 1 1
#e 3 1 1
Quite a similar approach to your example, using case_when from dplyr:
library(dplyr)
classify <- function(x){
case_when(
x >= 110 ~ "5",
x >= 103 & x < 110 ~ "4",
x >= 95 & x < 103 ~ "3",
x >= 76 & x < 95 ~ "2",
x < 76 ~ "1",
TRUE ~ "none"
)
}
a = c(90.41, 37.37, 18.98)
b = c(103.39, 39.44, 51.68)
c = c(83.51, 36.41, 47.46)
d = c(94.60, 38.57, 50.22)
e = c(95.04, 38.81, 50.49)
df <- data.frame(matrix(c(a, b, c, d, e), ncol = 3, byrow = T))
mutate_all(df, classify)
# X1 X2 X3
#1 2 1 1
#2 4 1 1
#3 2 1 1
#4 2 1 1
#5 3 1 1
In case if:
df
# X1 X2 X3
#1 -Inf 37.37 18.98
#2 103.39 NaN 51.68
#3 83.51 36.41 47.46
#4 94.60 Inf 50.22
#5 95.04 38.81 NA
The results look like this:
mutate_all(df, classify)
# X1 X2 X3
#1 1 1 1
#2 4 none 1
#3 2 1 1
#4 2 5 1
#5 3 1 none

using crossprod under specific conditions

I am trying to organise a dataset in a very specific way for my research, however I am new to R and I am really struggling, any assistance would be greatly appreciated.
I am attempting to take the value of the cell at every third column (starting from the first one) and multiply it by the column beside it, but only if there is a negative value in said cell. Following this, I would like to sum the results together and store it in a new column in an external spreadsheet.
so far the code I have written is as follows:
NegTotal = NULL
p = NULL
for (i in 1:nrow(Datafile))
{for (j in 1:ncol(Datafile))
{if ((j %% 3 == 0) && (Datafile [i,j] < 0)) {
p <- (datafile[i,j] * datafile[i,j+1])
NegTotal <- sum(p) }
else { }
}
}
for (l in seq(along = NegTotal)) {
dim(newColumn)
AsNewData.DataColumn("datafile", GetType(System.String))
NewColumn.DefaultValue = "NegTotal"
table.Columns.Add(newColumn)
}
I am aware that this code is probably completely wrong, this is the first time I've used R and I am not very proficient at computer programming in general.
The current data is arranged as follows:
df <- data.frame(F1 = c( 1, -2, -1), E1 = c(1, 1, 0), Y1 = c(0, 0, 1),
F2 = c(-1, 2, -1), E2 = c(1, 1, 1), Y2 = c(0, 0, 1),
F3 = c(-2, -2, -1), E3 = c(1, 1, 1), Y3 = c(1, 1, 0))
# F1 E1 Y1 F2 E2 Y2 F3 E3 Y3
# 1 1 1 0 -1 1 0 -2 1 1
# 2 -2 1 0 2 1 0 -2 1 1
# 3 -1 0 1 -1 1 1 -1 1 0
Desired Output:
# F1 E1 Y1 F2 E2 Y2 F3 E3 Y3 NegTotal
# 1 1 1 0 -1 1 0 -2 1 1 -3
# 2 -2 1 0 2 1 0 -2 1 1 -4
# 3 -1 0 1 -1 1 1 -1 1 0 -2
So if x = Fy * Ey;
NegTotal = x1 + x2 + x3, only when F$y < 0.
I hope that all makes sense!
Here's how I would approach this with dplyr and tidyr:
library(dplyr)
library(tidyr)
# Add a respondent column (i.e. row number)
df$respondent <- 1:nrow(df)
df %>%
gather(key, value, -respondent) %>%
separate(key, c("letter", "letter_sub"), sep = 1) %>%
spread(letter, value) %>%
mutate(Neg = ifelse(F < 0, E * F, NA)) %>%
group_by(respondent) %>%
summarise(NegTotal = sum(Neg, na.rm = TRUE))
# Source: local data frame [3 x 2]
#
# respondent NegTotal
# (int) (dbl)
# 1 1 -3
# 2 2 -4
# 3 3 -2
To understand what's going on, I would run the pipeline in pieces. For example, look at the results of the first few functions:
df %>%
gather(key, value, -respondent) %>%
separate(key, c("letter", "letter_sub"), sep = 1) %>%
spread(letter, value)
# respondent letter_sub E F Y
# 1 1 1 1 1 0
# 2 1 2 1 -1 0
# 3 1 3 1 -2 1
# 4 2 1 1 -2 0
# 5 2 2 1 2 0
# 6 2 3 1 -2 1
# 7 3 1 0 -1 1
# 8 3 2 1 -1 1
# 9 3 3 1 -1 0
Getting the data in this form, makes it easier to perform the summary tasks.
This code will give you your desired output. However, if your actual dataset is more complex than the example you gave, you may need a more elegant solution.
df$NegTotal<- (pmin(0,df$F1) * df$E1) + (pmin(0,df$F2) * df$E2) + (pmin(0,df$F3) * df$E3)

How to sum by group an "Origin-Destination" data frame?

I have this kind of data frame:
df<-data.frame(Origin=c(1,1,1,2,2,3,3,3),
Var= c(2,4,1,3,5,6,2,1),
Desti= c(2,2,3,2,1,2,1,3))
I would like to get the sum of Var, for each value of Origin, grouped by Desti (Out.x) and by Origin (In.x). The result would be for df:
Out.1 Out.2 Out.3 In.1 In.2 In.3
1 0 6 1 0 5 2
2 5 3 0 6 3 6
3 2 6 1 1 0 1
Any ideas ?
May be this helps
res <- cbind(xtabs(Var~., df), xtabs(Var~Desti+Origin, df))
colnames(res) <- paste(rep(c("Out", "In"), each=3), 1:3, sep=".")
res
# Out.1 Out.2 Out.3 In.1 In.2 In.3
#1 0 6 1 0 5 2
#2 5 3 0 6 3 6
#3 2 6 1 1 0 1
Or, the above can be simplied
r1 <- xtabs(Var~., df)
res <- cbind(r1, t(r1)) #change the `column names` accordingly
Or using reshape2
library(reshape2)
res1 <- cbind(acast(df, Origin~Desti, value.var='Var', sum),
acast(df, Desti~Origin, value.var='Var', sum))
colnames(res1) <- colnames(res)

Resources