R :: dynamic sum across sequence of column names based on another column - r

Dataset:
sumx is the output column required
id a1 a2 a3 a4 a5 mon sumx
x 1 2 1 0 1 2 4
y 2 3 1 0 3 4 3
z 0 0 2 2 0 1 4
Requirement: (based on mon):
for x: sumx = sum(a2 to a5)
for y: sumx = sum(a4 to a5)
for z: sumx = sum(a1 to a5)
Code I try to used gives an error stating that “numerical expression has n elements: only the first used”
df$sumx <- rowSums(df[c(paste("a", df$mon:5 , sep = ""))])
What I want to achieve is that based on the mon variable, the new variable created should sum from the sequence of variables (a1 to a5) starting from the respective number in mon to the last variable in sequence.

You could try a simple for loop:
test.dat <- matrix(c(1,2,1,0,1,2,2,3,1,0,3,4,0,0,2,2,0,1), nrow = 3, byrow = TRUE)
sum.vec <- c()
for (i in 1:nrow(test.dat)){
test.vec <- c()
for (j in test.dat[i,6]:5){
test.vec <- c(test.vec, test.dat[i,j])
}
sum.vec[i] = sum(test.vec)
}
test.dat <- cbind(test.dat, sum.vec)

Related

Finding all sum of 2 power value combination values of a given number in R

R data frame 1 :
Index
Powervalue
0
1
1
2
2
4
3
8
4
16
5
32
R dataframe 2 :
CombinedValue
20
50
Expected Final Result :
Can we get the output as in the image. If yes please help.
One of stackoverflow mate provided below code. Am looking how to seperate , values as columns with 1 and 0.
df <- data.frame(sum = c(50, 20, 6))
values_list <- list()
for (i in 1:nrow(df)) {
sum <- df$sum[i]
values <- c()
while (sum > 0) {
value <- 2^floor(log2(sum))
values <- c(values, value)
sum <- sum - value
}
values_list[[i]] <- values
}
df$values <- values_list
Can we fix columns till power 31 as shown in attached image. The columns match with possiblecodes then place 1 and 0 else 0 for the remaining columns. Please help.
Here is a function whose output matches the expected output.
toCodes <- function(x) {
n <- floor(log2(x))
pow <- rev(seq.int(max(n)))
# 'y' is the matrix of codes
y <- t(sapply(x, \(.x) (.x %/% 2^pow) %% 2L))
i_cols <- apply(y, 2, \(.y) any(.y != 0L))
colnames(y) <- sprintf("code_%d", 2^pow)
#
possiblecodes <- apply(y, 1, \(p) {
codes <- 2^pow[as.logical(p)]
paste(rev(codes), collapse = ",")
})
data.frame(combinedvalue = x, possiblecodes, y[, i_cols])
}
x <- c(20L, 50L)
toCodes(x)
#> combinedvalue possiblecodes code_32 code_16 code_4 code_2
#> 1 20 4,16 0 1 1 0
#> 2 50 2,16,32 1 1 0 1
Created on 2022-12-19 with reprex v2.0.2

How to assign 1s and 0s to columns if variable in row matches or not match in R

I'm an absolute beginner in coding and R and this is my third week doing it for a project. (for biologists, I'm trying to find the sum of risk alleles for PRS) but I need help with this part
df
x y z
1 t c a
2 a t a
3 g g t
so when code applied:
x y z
1 t 0 0
2 a 0 1
3 g 1 0
```
I'm trying to make it that if the rows in y or z match x the value changes to 1 and if not, zero
I started with:
```
for(i in 1:ncol(df)){
df[, i]<-df[df$x == df[,i], df[ ,i]<- 1]
}
```
But got all NA values
In reality, I have 100 columns I have to compare with x in the data frame. Any help is appreciated
An alternative way to do this is by using ifelse() in base R.
df$y <- ifelse(df$y == df$x, 1, 0)
df$z <- ifelse(df$z == df$x, 1, 0)
df
# x y z
#1 t 0 0
#2 a 0 1
#3 g 1 0
Edit to extend this step to all columns efficiently
For example:
df1
# x y z w
#1 t c a t
#2 a t a a
#3 g g t m
To apply column editing efficiently, a better approach is to use a function applied to all targeted columns in the data frame. Here is a simple function to do the work:
edit_col <- function(any_col) any_col <- ifelse(any_col == df1$x, 1, 0)
This function takes a column, and then compare the elements in the column with the elements of df1$x, and then edit the column accordingly. This function takes a single column. To apply this to all targeted columns, you can use apply(). Because in your case x is not a targeted column, you need to exclude it by indexing [,-1] because it is the first column in df.
# Here number 2 indicates columns. Use number 1 for rows.
df1[, -1] <- apply(df1[,-1], 2, edit_col)
df1
# x y z w
#1 t 0 0 1
#2 a 0 1 1
#3 g 1 0 0
Of course you can also define a function that edit the data frame so you don't need to do apply() manually.
Here is an example of such function
edit_df <- function(any_df){
edit_col <- function(any_col) any_col <- ifelse(any_col == any_df$x, 1, 0)
# Create a vector containing all names of the targeted columns.
target_col_names <- setdiff(colnames(any_df), "x")
any_df[,target_col_names] <-apply( any_df[,target_col_names], 2, edit_col)
return(any_df)
}
Then use the function:
edit_df(df1)
# x y z w
#1 t 0 0 1
#2 a 0 1 1
#3 g 1 0 0
A tidyverse approach
library(dplyr)
df <-
tibble(
x = c("t","a","g"),
y = c("c","t","g"),
z = c("a","a","t")
)
df %>%
mutate(
across(
.cols = c(y,z),
.fns = ~if_else(. == x,1,0)
)
)
# A tibble: 3 x 3
x y z
<chr> <dbl> <dbl>
1 t 0 0
2 a 0 1
3 g 1 0

operating between columns and classifing values per groups R

I try to obtain percentages grouping values regarding one variable.
For this I used sapply to obtain the percentage of each column regarding another one, but I dont know how to group these values by type (another variable)
x <- data.frame("A" = c(0,0,1,1,1,1,1), "B" = c(0,1,0,1,0,1,1), "C" = c(1,0,1,1,0,0,1),
"type" = c("x","x","x","y","y","y","x"), "yes" = c(0,0,1,1,0,1,1))
x
A B C type yes
1 0 0 1 x 0
2 0 1 0 x 0
3 1 0 1 x 1
4 1 1 1 y 1
5 1 0 0 y 0
6 1 1 0 y 1
7 1 1 1 x 1
I need to obtaing the next value (percentage): A==1&yes==1/A==1, and for this I use the next code:
result <- as.data.frame(sapply(x[,1:3],
function(i) (sum(i & x$yes)/sum(i))*100))
result
sapply(x[, 1:3], function(i) (sum(i & x$yes)/sum(i)) * 100)
A 80
B 75
C 75
Now I need to obtain the same math operation but taking into account the varible "type". It means, obtaing the same percentage but discriminating it by type. So, my expected table was:
type sapply(x[, 1:3], function(i) (sum(i & x$yes)/sum(i)) * 100)
A x 40
A y 40
B x 25
B y 50
C x 50
C y 25
In the example it's possible to observe that, by letters, the percentage sum is the same value that the obtained in the first result, just here is discriminated by type.
thanks a lot.
You can do the following using data.table:
Code
setDT(df)
cols = c('A', 'B', 'C')
mat = df[yes == 1, lapply(.SD, function(x){
100 * sum(x)/df[, lapply(.SD, sum), .SDcols = cols][[substitute(x)]]
# Here, the numerator is sum(x | yes == 1) for x == columns A, B, C
# If we look at the denominator, it equals sum(x) for x == columns A, B, C
# The reason why we need to apply substitute(x) is because df[, lapply(.SD, sum)]
# generates a list of column sums, i.e. list(A = sum(A), B = sum(B), ...).
# Hence, for each x in the column names we must subset the list above using [[substitute(x)]]
# Ultimately, the operation equals sum(x | yes == 1)/sum(x) for A, B, C.
}), .(type), .SDcols = cols]
# '.(type)' simply means that we apply this for each type group,
# i.e. once for x and once for y, for each ABC column.
# The dot is just shorthand for 'list()'.
# .SDcols assigns the subset that I want to apply my lapply statement onto.
Result
> mat
type A B C
1: x 40 25 50
2: y 40 50 25
Long format (your example)
> melt(mat)
type variable value
1: x A 40
2: y A 40
3: x B 25
4: y B 50
5: x C 50
6: y C 25
Data
df <- data.frame("A" = c(0,0,1,1,1,1,1), "B" = c(0,1,0,1,0,1,1), "C" = c(1,0,1,1,0,0,1),
"type" = c("x","x","x","y","y","y","x"), "yes" = c(0,0,1,1,0,1,1))

Merge all possible combinations of multiple data frames

I would like to merge by columns all the possible pair combinations of these three data frames (i.e. nine combinations)
frame1 = data.frame(a=c(1,2,3), b=c(1,2,3), c=c(1,2,3))
frame2 = data.frame(a=c(2,1,3), b=c(2,1,3), c=c(2,1,3))
frame3 = data.frame(a=c(3,2,1), b=c(3,2,1), c=c(3,2,1))
which contain the same 3 rows each but not in the same order, so I would also like that the merging be by coincidence of the pair of values of the columns a and b in the two files merged. Example:
a b c
1 1 1
2 2 2
3 3 3
+
a b c
2 2 2
1 1 1
3 3 3
=
a.x b.x c.x a.y b.y c.y
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
I wanted then to obtain the difference between each pair of values of the columns c.x and c.y present in each merged file, in absolute values, and sum all these differences thus obtaining a "score" (of course this would be zero in this example), which I would like to add to an empty matrix 3x3 in the correspondant cell (i.e., the score of frame1 vs. frame 2 should be located in cell [2,1], etc.):
nframes = 3
frames = c(frame1,frame2,frame3)
matrix = matrix(, nrow = nframes, ncol = nframes)
matrix_scores = data.frame(matrix)
for (i in frames){
for (j in frames)
{
x = merge(i, j, by=c("a","b"))
score = sum(abs(x$c.x - x$c.y))
matrix_scores[j,i] <- score
}
}
However, when I run the loop I obtain the following message:
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns
Also, I understand that the line
matrix_scores[j,i] <- score
will give an error, too, but I do not know how to express that I want the score to be stored in cell [1,1], for the first iteration of the loop (frame1 vs. frame1).
The resulting matrix should be a 3x3 matrix containing all zeros:
f1 f2 f3
frame1 0 0 0
frame2 0 0 0
frame3 0 0 0
You can do:
# Put all frames in a list
d <- list(frame1, frame2, frame3)
# get all merge-combinations
gr <- expand.grid(1:length(d), 1:length(d))
# function to merge and get the sum diff:
foo <- function(i, x, gr){
tmp <- merge(x[[gr[i, 1]]], x[[gr[i, 2]]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(sapply(1:nrow(gr), foo, d, gr), length(d), length(d), byrow = T)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
# The scores are set as followed:
matrix(apply(gr, 1, paste, collapse="_"), 3, 3, byrow = T)
[,1] [,2] [,3]
[1,] "1_1" "2_1" "3_1"
[2,] "1_2" "2_2" "3_2"
[3,] "1_3" "2_3" "3_3"
# alternative using apply:
# function to merge and get the sum diff:
foo <- function(y, x){
tmp <- merge(x[[ y[1] ]], x[[ y[2] ]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(apply(gr, 1, foo, d), length(d), length(d), byrow = T)

Subtract every column from each other in an xts object

I am a newbie learning to code and have an xts object of 1000 rows and 10 columns. I need to subtract every column from each other creating a new xts object keeping the date column. I've tried to use combn but could not get it to create B-A result since it did A-B. What I'm looking for is below.
DATA RESULT
A B C ---> A-B A-C B-A B-C C-A C-B
2010-01-01 1 3 5 2010-01-01 -2 -4 2 -2 4 2
2010-01-02 2 4 6 2010-01-02 -2 -4 2 -2 4 2
2010-01-03 3 5 2 2010-01-03 -2 1 2 3 -1 -3
We could use outer to get pairwise combinations of the column names, subset the dataset 'xt1' based on the column names, get the difference in a list.
f1 <- Vectorize(function(x,y) list(setNames(xt1[,x]-xt1[,y],
paste(x,y, sep='_'))))
lst <- outer(colnames(xt1), colnames(xt1), FUN = f1)
We Filter out the list elements that have sum=0 i.e. the difference between columns A-A, B-B, and C-C, and cbind to get the expected output.
res <- do.call(cbind,Filter(sum, lst))
res[,order(colnames(res))]
# A_B A_C B_A B_C C_A C_B
#2010-01-01 -2 -4 2 -2 4 2
#2010-01-02 -2 -4 2 -2 4 2
#2010-01-03 -2 1 2 3 -1 -3
data
d1 <- data.frame(A=1:3, B=3:5, C=c(5,6,2))
library(xts)
xt1 <- xts(d1, order.by=as.Date(c('2010-01-01', '2010-01-02', '2010-01-03')))
I built the data using:
x <- zoo::zoo(
data.frame(
A = c(1, 2, 3),
B = c(3, 4, 5),
C = c(5, 6, 2)),
order.by = as.Date(c("2010-01-01", "2010-01-02", "2010-01-03")))
Then I defined a function for creating all possible pairs of two sets:
cross <- function(x, y = x) {
result <- list()
for (a in unique(x)) {
for (b in unique(y)) {
result <- append(result, list(list(left = a, right = b)))
}
}
result
}
To answer your question:
# Build a list of column combinations
combinations <- cross(names(x), setdiff(names(x), names(x)[1]))
# Remove any entries where the left equals the right
combinations <- combinations[vapply(combinations, function(x) { x$left != x$right }, logical(1))]
# Build a user friendly list of names
names(combinations) <- vapply(combinations, function(x) { paste0(x$left, "-", x$right) }, character(1))
# Do the actual computation and combine the results into one object
do.call(cbind, lapply(combinations, function(x, data) { data[, x$left, drop = T] - data[, x$right, drop = T] }, data = x))

Resources