How can ı sort dataframe without specific columns in R - r

I have a table like below
x1
x2
x3
x4
a
3
5
32
b
5
3
10
c
8
22
9
d
12
2
1
e
1
10
13
I want to sort from highest to lowest by looking at each column and row as follows
x1
x2
x3
x4
a
3
5
32
c
8
22
9
e
1
10
13
d
12
2
1
b
5
3
10

Related

derive multiple columns from multiple columns in r

Consider that we have the below data and would like to derive variables z1,z2,z3 from x1y1, x2y2 and x3*y3.
could you please help me how i can achieve this in R
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c('A','B','C','D','E','F')
y1 <- c(1,2,3,4,5,6)
y2 <- c(2,3,4,5,6,7)
y3 <- c(3,4,5,6,7,8)
testa <- data.frame(x1,x2,x3,x4,y1,y2,y3)
Assuming the integrity of your structure and naming conventions, you can select the x and y variables, multiple them together as a group, and then assign back to z.
var_i <- 1:3
testa[paste0("z", var_i)] <- testa[paste0("x", var_i)] * testa[paste0("y", var_i)]
x1 x2 x3 x4 y1 y2 y3 z1 z2 z3
1 1 2 3 A 1 2 3 1 4 9
2 2 3 4 B 2 3 4 4 9 16
3 3 4 5 C 3 4 5 9 16 25
4 4 5 6 D 4 5 6 16 25 36
5 5 6 7 E 5 6 7 25 36 49
6 6 7 8 F 6 7 8 36 49 64
If we want to do this automatically, a tidyverse option is
library(dplyr)
library(stringr)
testa <- testa %>%
mutate(across(x1:x3, ~ .x * get(str_replace(cur_column(), "x",
"y")), .names = "{str_replace(.col, 'x', 'z')}"))
-output
testa
x1 x2 x3 x4 y1 y2 y3 z1 z2 z3
1 1 2 3 A 1 2 3 1 4 9
2 2 3 4 B 2 3 4 4 9 16
3 3 4 5 C 3 4 5 9 16 25
4 4 5 6 D 4 5 6 16 25 36
5 5 6 7 E 5 6 7 25 36 49
6 6 7 8 F 6 7 8 36 49 64

How to add row and column to a dataframe of different length?

I have two dataframes of different length:
Headers <- data.frame(x = paste0("x", 1:4), y = 1:4)
Dataset <- data.frame(H = c(20, 10, 11, 8, 10), W = c(30, 20, 30, 10, 6))
Headers
x y
1 x1 1
2 x2 2
3 x3 3
4 x4 4
Dataset
H W
1 20 30
2 10 20
3 11 30
4 8 10
5 10 6
I need to convert column 'x' from 'Headers' to header, and column 'y' to corresponding values, and then bind to 'Dataset':
H W x1 x2 x3 x4
20 30 1 2 3 4
10 20 1 2 3 4
11 30 1 2 3 4
8 10 1 2 3 4
10 6 1 2 3 4
Here is the code which I tried:
H <- t(Headers)
Dataset <- cbind(H, Dataset)
names(H) <- NULL
Dataset <- qpcR:::cbind.na(H, Dataset)
Any help will be appreciated.Thanks
Transpose 'y' and repeat to the desired number of rows. Set column names to 'x'.
cbind(Dataset, `colnames<-`(t(Headers$y)[rep(1, nrow(Dataset)), ], Headers$x))
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
A data.table approach:
library(data.table)
cbind(Dataset, dcast(Headers, . ~ x, value.var = "y")[,-1])
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
A tidyverse approach:
library(tidyverse)
Headers %>%
rownames_to_column %>%
spread(x, y) %>%
summarise_all(funs(first(na.omit(.)))) %>%
cbind(Dataset, .) %>% select(-rowname)
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
You could also go with basic R
cbind(Dataset,data.frame(matrix(rep(Headers$y,each=nrow(Dataset)),nrow=nrow(Dataset))))

Replace column with Summation value in R

Given a data frame in R
# ID x1 x2 x3 x4
# 1 1 1 1 1 1
# 2 1 1 2 3 4
# 3 2 1 5 6 7
# 4 3 1 8 9 2
I want to replace the columns with their summation value
# ID x1 x2 x3 x4
# 1 1 4 16 19 14
However, trying to set the sum directly replaces all the values with the sum:
for (nm in names(df)) {
df[nm] = sum(df[nm])
}
# ID x1 x2 x3 x4
# 1 1 4 16 19 14
# 1 2 4 16 19 14
# 1 3 4 16 19 14
# 1 4 4 16 19 14
I believe the ID column is no longer needed. Then simply
colSums(df[, -1])
# x1 x2 x3 x4
# 4 16 19 14

Compute increase between rows for each same ID

A have a sorted data frame and I would like to compute the increase of x2 for each same ID.
The input is already sorted in a certain manner:
ID x2 x3 x4
1 10 11 2
2 100 12 4
1 20 13 10
7 24 3 1
1 30 14 0
3 6 15 1
2 90 15 1
I would like to get:
ID x2 increase x3 x4
1 10 11 2
2 100 12 4
1 20 +100% 13 10
7 24 3 1
1 30 +50% 14 0
3 6 15 1
2 90 -10% 15 1
You could do
df <- read.table(header=T, text="
ID x2 x3 x4
1 10 11 2
2 100 12 4
1 20 13 10
7 24 3 1
1 30 14 0
3 6 15 1
2 90 15 1")
df$increase <- ave(df$x2, df$ID, FUN = function(x) c(NA, diff(x)/head(x, -1))*100)
df$increase <- ifelse(is.na(df$increase), "", sprintf("%+.0f%%", df$increase))
df
# ID x2 x3 x4 increase
# 1 1 10 11 2
# 2 2 100 12 4
# 3 1 20 13 10 +100%
# 4 7 24 3 1
# 5 1 30 14 0 +50%
# 6 3 6 15 1
# 7 2 90 15 1 -10%

Change variable value-- repeated IDs

I've this data set
id <- c(0,0,1,1,2,2,3,3,4,4)
gender <- c("m","m","f","f","f","f","m","m","m","m")
x1 <-c(1,1,1,1,2,2,3,3,10,10)
x2 <- c(3,7,5,6,9,15,10,15,12,20)
alldata <- data.frame(id,gender,x1,x2)
which looks like:
id gender x1 x2
0 m 1 3
0 m 1 7
1 f 1 5
1 f 1 6
2 f 2 9
2 f 2 15
3 m 3 10
3 m 3 15
4 m 10 12
4 m 10 20
Notice that for each unique id x1 are similar, but x2 are different. I need to sort data by id and x2 (from smallest to largest)
and then for each unique id I need to set x1(for the second record) = x2 (for the first record).
The data would look like:
id gender x1 x2
0 m 1 3
0 m 3 7
1 f 1 5
1 f 5 6
2 f 2 9
2 f 9 15
3 m 3 10
3 m 10 15
4 m 10 12
4 m 12 20
I found this easier using data.table
> library(data.table)
> dt = data.table(alldata)
> setkey(dt, id, x2) #sort the data
This next line says: within each ID for x1, take the first value of x1, then every remaining value take from x2 as needed.
> dt[,x1 := c(x1[1], x2)[1:.N],keyby=id]
> dt
id gender x1 x2
1: 0 m 1 3
2: 0 m 3 7
3: 1 f 1 5
4: 1 f 5 6
5: 2 f 2 9
6: 2 f 9 15
7: 3 m 3 10
8: 3 m 10 15
9: 4 m 10 12
10: 4 m 12 20
Here's another possible solution using the seq command to select every other record:
alldata <- alldata[order(id, x2),]
alldata$x1[seq(2, length(alldata$x1), 2)] <- alldata$x2[seq(1, length(alldata$x2) - 1, 2)]
Here is a dplyr solution.
library(dplyr)
arrange(alldata,id,x2) %>%
group_by(id) %>%
mutate(x1= c(first(x1), first(x2)))
Source: local data frame [10 x 4]
Groups: id
id gender x1 x2
1 0 m 1 3
2 0 m 3 7
3 1 f 1 5
4 1 f 5 6
5 2 f 2 9
6 2 f 9 15
7 3 m 3 10
8 3 m 10 15
9 4 m 10 12
10 4 m 12 20
`rownames<-`(do.call(rbind,by(alldata,alldata$id,function(g) { o <- order(g$x2); g$x1[o[2]] <- g$x2[o[1]]; g; })),NULL);
## id gender x1 x2
## 1 0 m 1 3
## 2 0 m 3 7
## 3 1 f 1 5
## 4 1 f 5 6
## 5 2 f 2 9
## 6 2 f 9 15
## 7 3 m 3 10
## 8 3 m 10 15
## 9 4 m 10 12
## 10 4 m 12 20

Resources