derive multiple columns from multiple columns in r - r

Consider that we have the below data and would like to derive variables z1,z2,z3 from x1y1, x2y2 and x3*y3.
could you please help me how i can achieve this in R
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c('A','B','C','D','E','F')
y1 <- c(1,2,3,4,5,6)
y2 <- c(2,3,4,5,6,7)
y3 <- c(3,4,5,6,7,8)
testa <- data.frame(x1,x2,x3,x4,y1,y2,y3)

Assuming the integrity of your structure and naming conventions, you can select the x and y variables, multiple them together as a group, and then assign back to z.
var_i <- 1:3
testa[paste0("z", var_i)] <- testa[paste0("x", var_i)] * testa[paste0("y", var_i)]
x1 x2 x3 x4 y1 y2 y3 z1 z2 z3
1 1 2 3 A 1 2 3 1 4 9
2 2 3 4 B 2 3 4 4 9 16
3 3 4 5 C 3 4 5 9 16 25
4 4 5 6 D 4 5 6 16 25 36
5 5 6 7 E 5 6 7 25 36 49
6 6 7 8 F 6 7 8 36 49 64

If we want to do this automatically, a tidyverse option is
library(dplyr)
library(stringr)
testa <- testa %>%
mutate(across(x1:x3, ~ .x * get(str_replace(cur_column(), "x",
"y")), .names = "{str_replace(.col, 'x', 'z')}"))
-output
testa
x1 x2 x3 x4 y1 y2 y3 z1 z2 z3
1 1 2 3 A 1 2 3 1 4 9
2 2 3 4 B 2 3 4 4 9 16
3 3 4 5 C 3 4 5 9 16 25
4 4 5 6 D 4 5 6 16 25 36
5 5 6 7 E 5 6 7 25 36 49
6 6 7 8 F 6 7 8 36 49 64

Related

How can ı sort dataframe without specific columns in R

I have a table like below
x1
x2
x3
x4
a
3
5
32
b
5
3
10
c
8
22
9
d
12
2
1
e
1
10
13
I want to sort from highest to lowest by looking at each column and row as follows
x1
x2
x3
x4
a
3
5
32
c
8
22
9
e
1
10
13
d
12
2
1
b
5
3
10

How to lag multiple specific columns of a data frame in R

I would like to lag multiple specific columns of a data frame in R.
Let's take this generic example. Let's assume I have defined which columns of my dataframe I need to lag:
Lag <- c(0, 1, 0, 1)
Lag.Index <- is.element(Lag, 1)
df <- data.frame(x1 = 1:8, x2 = 1:8, x3 = 1:8, x4 = 1:8)
My initial dataframe:
x1 x2 x3 x4
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
I would like to compute the following dataframe:
x1 x2 x3 x4
1 1 NA 1 NA
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
I would know how to do it for only one lagged column as shown here, but not able to find a way to do it for multiple lagged columns in an elegant way. Any help is very much appreciated.
You can use purrr's map2_dfc to lag different values by column.
purrr::map2_dfc(df, Lag, dplyr::lag)
# x1 x2 x3 x4
# <int> <int> <int> <int>
#1 1 NA 1 NA
#2 2 1 2 1
#3 3 2 3 2
#4 4 3 4 3
#5 5 4 5 4
#6 6 5 6 5
#7 7 6 7 6
#8 8 7 8 7
Or with data.table :
library(data.table)
setDT(df)[, names(df) := Map(shift, .SD, Lag)]
A data.table option using shift along with Vectorize
> setDT(df)[, Vectorize(shift)(.SD, Lag)]
x1 x2 x3 x4
[1,] 1 NA 1 NA
[2,] 2 1 2 1
[3,] 3 2 3 2
[4,] 4 3 4 3
[5,] 5 4 5 4
[6,] 6 5 6 5
[7,] 7 6 7 6
[8,] 8 7 8 7
Not sure whether this is elegant enough, but I would use dplyr's mutate_at function to tweak columns
df %>% dplyr::mutate_at(.vars = vars(x2,x4),.funs = ~lag(., default = NA))
We convert the lag to logical class, get the corresponding names and use across from dplyr
library(dplyr)
df %>%
mutate(across(names(.)[as.logical(Lag)], lag))
# x1 x2 x3 x4
#1 1 NA 1 NA
#2 2 1 2 1
#3 3 2 3 2
#4 4 3 4 3
#5 5 4 5 4
#6 6 5 6 5
#7 7 6 7 6
#8 8 7 8 7
Or we can do this in base R
df[as.logical(Lag)] <- rbind(NA, df[-nrow(df), as.logical(Lag)])

How to add row and column to a dataframe of different length?

I have two dataframes of different length:
Headers <- data.frame(x = paste0("x", 1:4), y = 1:4)
Dataset <- data.frame(H = c(20, 10, 11, 8, 10), W = c(30, 20, 30, 10, 6))
Headers
x y
1 x1 1
2 x2 2
3 x3 3
4 x4 4
Dataset
H W
1 20 30
2 10 20
3 11 30
4 8 10
5 10 6
I need to convert column 'x' from 'Headers' to header, and column 'y' to corresponding values, and then bind to 'Dataset':
H W x1 x2 x3 x4
20 30 1 2 3 4
10 20 1 2 3 4
11 30 1 2 3 4
8 10 1 2 3 4
10 6 1 2 3 4
Here is the code which I tried:
H <- t(Headers)
Dataset <- cbind(H, Dataset)
names(H) <- NULL
Dataset <- qpcR:::cbind.na(H, Dataset)
Any help will be appreciated.Thanks
Transpose 'y' and repeat to the desired number of rows. Set column names to 'x'.
cbind(Dataset, `colnames<-`(t(Headers$y)[rep(1, nrow(Dataset)), ], Headers$x))
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
A data.table approach:
library(data.table)
cbind(Dataset, dcast(Headers, . ~ x, value.var = "y")[,-1])
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
A tidyverse approach:
library(tidyverse)
Headers %>%
rownames_to_column %>%
spread(x, y) %>%
summarise_all(funs(first(na.omit(.)))) %>%
cbind(Dataset, .) %>% select(-rowname)
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
You could also go with basic R
cbind(Dataset,data.frame(matrix(rep(Headers$y,each=nrow(Dataset)),nrow=nrow(Dataset))))

Deleting rows in a data frame based on the contents of the rows

If I have a code like the following:
x1 <- list(1,2,3,4,5,5)
x2 <- list(1,4,7,8)
x3 <- list(5,6)
x4 <- list(1,4,4,5,6,7)
x5 <- list(1,2,3,5,6,9)
x6 <- list(1,4, 6,7,8,7)
myList <- list(x1, x2, x3, x4,x5,x6)
df <- data.frame(t(sapply(myList, function(x){c(x, rep(tail(x, 1),max(lengths(myList)) - length(x)))
})))
Which gives a data frame like this
X1 X2 X3 X4 X5 X6
1 1 2 3 4 5 5
2 1 4 7 8 8 8
3 5 6 6 6 6 6
4 1 4 4 5 6 7
5 1 2 3 5 6 9
6 1 4 6 7 8 7
How could I delete the 2 rows that have the highest values of X6 and the 2 rows that have the lowest values of X6.
Try this (I updated my answer based on your updated sample df):
o <- order(unlist(df[names(df)[ncol(df)]]))
df[-c(head(o, 2), tail(o, 2)),]
# X1 X2 X3 X4 X5 X6
#4 1 4 4 5 6 7
#6 1 4 6 7 8 7
names(df)[ncol(df)] gives the name of the right most column in df.
In baseR, using subsetting with [:
#function sort sorts the df$X6 vector which we subset for the two highest and lowest values
mycol <- df[[rev(names(df))[1]]]
df[!mycol %in% c(sort(mycol)[1:2], rev(sort(mycol))[1:2]), ]
# X1 X2 X3 X4 X5 X6
#4 1 4 4 5 6 7
#6 1 4 6 7 8 7
In base r few simple steps can be used to arrived desired data.
# Data is:
# X1 X2 X3 X4 X5 X6
#1 1 2 3 4 5 5
#2 1 4 7 8 8 8
#3 5 6 6 6 6 6
#4 1 4 4 5 6 7
#5 1 2 3 5 6 9
#6 1 4 6 7 8 7
#order on X6
df <- df[order(df$X6),]
# > df
# X1 X2 X3 X4 X5 X6
# 1 2 3 4 5 5
# 5 6 6 6 6 6
# 1 4 4 5 6 7
# 1 4 6 7 8 7
# 1 4 7 8 8 8
# 1 2 3 5 6 9
#Remove top 2 rows
df <- tail(df, nrow(df) - 2)
#Remove bottom 2 (highest) value one.
> df <- head(df, nrow(df) - 2)
#The result
# > df
# X1 X2 X3 X4 X5 X6
# 1 4 4 5 6 7
# 1 4 6 7 8 7

Doing rolling custom computations using data.table for multiple columns

I am doing rolling computations on a column,using the below code
dt <- data.table(x1=1:8,x2=2:10,x3=4:11,x4=6:12)
N = 3L
dt[, y1 := (2*dt$x1[.I] -dt$x1[(.I+N-1L)]), by=1:nrow(dt)]
dt
x1 x2 x3 x4 y1
1: 1 2 4 6 -1
2: 2 3 5 7 0
3: 3 4 6 8 1
4: 4 5 7 9 2
5: 5 6 8 10 3
6: 6 7 9 11 4
7: 7 8 10 12 13
8: 8 9 11 6 NA
9: 1 10 4 7 NA
sdcols=paste0("x",1:4)
how does one use sdcols to achieve the same result for columns x1 through x4, creating new columns y1 to y4
Perhaps we don't need a group by operation
nm1 <- names(dt)
dt[, paste0('y', seq_along(nm1)) := lapply(.SD,
function(x) c((2*shift(x)- shift(x, type = 'lead'))[-1], NA)), .SDcols = nm1]
dt
# x1 x2 x3 x4 y1 y2 y3 y4
#1: 1 2 4 6 -1 0 2 4
#2: 2 3 5 7 0 1 3 5
#3: 3 4 6 8 1 2 4 6
#4: 4 5 7 9 2 3 5 7
#5: 5 6 8 10 3 4 6 8
#6: 6 7 9 11 4 5 7 16
#7: 7 8 10 12 13 6 16 17
#8: 8 9 11 6 NA NA NA NA
#9: 1 10 4 7 NA NA NA NA

Resources