Replace column with Summation value in R - r

Given a data frame in R
# ID x1 x2 x3 x4
# 1 1 1 1 1 1
# 2 1 1 2 3 4
# 3 2 1 5 6 7
# 4 3 1 8 9 2
I want to replace the columns with their summation value
# ID x1 x2 x3 x4
# 1 1 4 16 19 14
However, trying to set the sum directly replaces all the values with the sum:
for (nm in names(df)) {
df[nm] = sum(df[nm])
}
# ID x1 x2 x3 x4
# 1 1 4 16 19 14
# 1 2 4 16 19 14
# 1 3 4 16 19 14
# 1 4 4 16 19 14

I believe the ID column is no longer needed. Then simply
colSums(df[, -1])
# x1 x2 x3 x4
# 4 16 19 14

Related

How can ı sort dataframe without specific columns in R

I have a table like below
x1
x2
x3
x4
a
3
5
32
b
5
3
10
c
8
22
9
d
12
2
1
e
1
10
13
I want to sort from highest to lowest by looking at each column and row as follows
x1
x2
x3
x4
a
3
5
32
c
8
22
9
e
1
10
13
d
12
2
1
b
5
3
10

How to add row and column to a dataframe of different length?

I have two dataframes of different length:
Headers <- data.frame(x = paste0("x", 1:4), y = 1:4)
Dataset <- data.frame(H = c(20, 10, 11, 8, 10), W = c(30, 20, 30, 10, 6))
Headers
x y
1 x1 1
2 x2 2
3 x3 3
4 x4 4
Dataset
H W
1 20 30
2 10 20
3 11 30
4 8 10
5 10 6
I need to convert column 'x' from 'Headers' to header, and column 'y' to corresponding values, and then bind to 'Dataset':
H W x1 x2 x3 x4
20 30 1 2 3 4
10 20 1 2 3 4
11 30 1 2 3 4
8 10 1 2 3 4
10 6 1 2 3 4
Here is the code which I tried:
H <- t(Headers)
Dataset <- cbind(H, Dataset)
names(H) <- NULL
Dataset <- qpcR:::cbind.na(H, Dataset)
Any help will be appreciated.Thanks
Transpose 'y' and repeat to the desired number of rows. Set column names to 'x'.
cbind(Dataset, `colnames<-`(t(Headers$y)[rep(1, nrow(Dataset)), ], Headers$x))
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
A data.table approach:
library(data.table)
cbind(Dataset, dcast(Headers, . ~ x, value.var = "y")[,-1])
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
A tidyverse approach:
library(tidyverse)
Headers %>%
rownames_to_column %>%
spread(x, y) %>%
summarise_all(funs(first(na.omit(.)))) %>%
cbind(Dataset, .) %>% select(-rowname)
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
You could also go with basic R
cbind(Dataset,data.frame(matrix(rep(Headers$y,each=nrow(Dataset)),nrow=nrow(Dataset))))

Removing a different value from each column of a data frame

I have the following items
A<-data.frame(replicate(5,c(1,2,3,4)))
A= X1 X2 X3 X4 X5
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
B<-c(1,2,3,4,1)
B = 1 2 3 4 5
I want to find a way of removing the first element of B from the first column of A, the second element of B from the second column of A and so on so I obtain the following result
A= X1 X2 X3 X4 X5
2 1 1 1 2
3 3 2 2 3
4 4 4 3 4
Using mapply we can pass A and B in parallel and filter the values which are not present in B
mapply(function(x, y) x[x != y], A, B)
# X1 X2 X3 X4 X5
#[1,] 2 1 1 1 2
#[2,] 3 3 2 2 3
#[3,] 4 4 4 3 4
PS - Make sure that ncol(A) and length(B) are the same otherwise it would lead to vector recycling giving some unexpected results.
A purrr solution:
A<-data.frame(replicate(5,c(1,2,3,4)))
# X1 X2 X3 X4 X5
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
B<-c(1,2,3,4,1)
# [1] 1 2 3 4 1
purrr::map2_df(A, B, ~.x[.x != .y]) # function(x,y) x[x != y]
# # A tibble: 3 x 5
# X1 X2 X3 X4 X5
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2 1 1 1 2
# 2 3 3 2 2 3
# 3 4 4 4 3 4

Deleting rows in a data frame based on the contents of the rows

If I have a code like the following:
x1 <- list(1,2,3,4,5,5)
x2 <- list(1,4,7,8)
x3 <- list(5,6)
x4 <- list(1,4,4,5,6,7)
x5 <- list(1,2,3,5,6,9)
x6 <- list(1,4, 6,7,8,7)
myList <- list(x1, x2, x3, x4,x5,x6)
df <- data.frame(t(sapply(myList, function(x){c(x, rep(tail(x, 1),max(lengths(myList)) - length(x)))
})))
Which gives a data frame like this
X1 X2 X3 X4 X5 X6
1 1 2 3 4 5 5
2 1 4 7 8 8 8
3 5 6 6 6 6 6
4 1 4 4 5 6 7
5 1 2 3 5 6 9
6 1 4 6 7 8 7
How could I delete the 2 rows that have the highest values of X6 and the 2 rows that have the lowest values of X6.
Try this (I updated my answer based on your updated sample df):
o <- order(unlist(df[names(df)[ncol(df)]]))
df[-c(head(o, 2), tail(o, 2)),]
# X1 X2 X3 X4 X5 X6
#4 1 4 4 5 6 7
#6 1 4 6 7 8 7
names(df)[ncol(df)] gives the name of the right most column in df.
In baseR, using subsetting with [:
#function sort sorts the df$X6 vector which we subset for the two highest and lowest values
mycol <- df[[rev(names(df))[1]]]
df[!mycol %in% c(sort(mycol)[1:2], rev(sort(mycol))[1:2]), ]
# X1 X2 X3 X4 X5 X6
#4 1 4 4 5 6 7
#6 1 4 6 7 8 7
In base r few simple steps can be used to arrived desired data.
# Data is:
# X1 X2 X3 X4 X5 X6
#1 1 2 3 4 5 5
#2 1 4 7 8 8 8
#3 5 6 6 6 6 6
#4 1 4 4 5 6 7
#5 1 2 3 5 6 9
#6 1 4 6 7 8 7
#order on X6
df <- df[order(df$X6),]
# > df
# X1 X2 X3 X4 X5 X6
# 1 2 3 4 5 5
# 5 6 6 6 6 6
# 1 4 4 5 6 7
# 1 4 6 7 8 7
# 1 4 7 8 8 8
# 1 2 3 5 6 9
#Remove top 2 rows
df <- tail(df, nrow(df) - 2)
#Remove bottom 2 (highest) value one.
> df <- head(df, nrow(df) - 2)
#The result
# > df
# X1 X2 X3 X4 X5 X6
# 1 4 4 5 6 7
# 1 4 6 7 8 7

Compute increase between rows for each same ID

A have a sorted data frame and I would like to compute the increase of x2 for each same ID.
The input is already sorted in a certain manner:
ID x2 x3 x4
1 10 11 2
2 100 12 4
1 20 13 10
7 24 3 1
1 30 14 0
3 6 15 1
2 90 15 1
I would like to get:
ID x2 increase x3 x4
1 10 11 2
2 100 12 4
1 20 +100% 13 10
7 24 3 1
1 30 +50% 14 0
3 6 15 1
2 90 -10% 15 1
You could do
df <- read.table(header=T, text="
ID x2 x3 x4
1 10 11 2
2 100 12 4
1 20 13 10
7 24 3 1
1 30 14 0
3 6 15 1
2 90 15 1")
df$increase <- ave(df$x2, df$ID, FUN = function(x) c(NA, diff(x)/head(x, -1))*100)
df$increase <- ifelse(is.na(df$increase), "", sprintf("%+.0f%%", df$increase))
df
# ID x2 x3 x4 increase
# 1 1 10 11 2
# 2 2 100 12 4
# 3 1 20 13 10 +100%
# 4 7 24 3 1
# 5 1 30 14 0 +50%
# 6 3 6 15 1
# 7 2 90 15 1 -10%

Resources