Convert data frame common rows to columns - r

Say I have a data frame where one column is some repeating value (dates, IDs, etc). Is there a way to convert a data frame into a now data frame with columns instead of replicating rows? Basically I want to transpose something like this:
col1 col2 col3
1 aa 30
2 aa 40
3 aa 10
1 bb 20
2 bb 12
3 bb 15
1 cc 40
2 cc 31
3 cc 12
Into this:
aa bb cc
1 30 20 40
2 40 12 31
3 10 15 12
Here is some code that makes a sample of the first data frame:
a <- c(rep(1:10, 3))
b <- c(rep("aa", 10), rep("bb", 10), rep("cc", 10))
set.seed(123)
c <- sample(seq(from = 20, to = 50, by = 5), size = 30, replace = TRUE)
d <- data.frame(a,b, c)
I am unsure how to transpose it.

a <- c(rep(1:10, 3))
b <- c(rep("aa", 10), rep("bb", 10), rep("cc", 10))
set.seed(123)
c <- sample(seq(from = 20, to = 50, by = 5), size = 30, replace = TRUE)
d <- data.frame(a,b, c)
#how to transpose it#
e<-reshape(d,idvar='a',timevar='b',direction='wide')
e

This is also a case in which you can use unstack:
unstack(d, c ~ b)
# aa bb cc
# 1 30 50 50
# 2 45 35 40
# 3 30 40 40
# 4 50 40 50
# 5 50 20 40
# 6 20 50 40
# 7 35 25 35
# 8 50 20 40
# 9 35 30 30
# 10 35 50 25

Using your data frame d,
library(tidyr)
> spread(d, key = b, value = c)
a aa bb cc
1 1 30 50 50
2 2 45 35 40
3 3 30 40 40
4 4 50 40 50
5 5 50 20 40
6 6 20 50 40
7 7 35 25 35
8 8 50 20 40
9 9 35 30 30
10 10 35 50 25
Explanation, the argument key = b lets you specify a column in your data frame. spread will create a new column for each unique entry in the key column b. The argument value = c tells spread to retrieve the value in column c and write it in the corresponding new key column.

If there are always equal numbers of observations in each group, this would be very easy with split then as.data.frame
as.data.frame(split(d$c, d$b))
# aa bb cc
# 1 30 50 50
# 2 45 35 40
# 3 30 40 40
# 4 50 40 50
# 5 50 20 40
# 6 20 50 40
# 7 35 25 35
# 8 50 20 40
# 9 35 30 30
# 10 35 50 25

With split and cbind:
> ll = lapply(split(d, d$b), function(x) x[3])
> dd = do.call(cbind, ll)
> names(dd) = names(ll)
> dd
aa bb cc
1 30 50 50
2 45 35 40
3 30 40 40
4 50 40 50
5 50 20 40
6 20 50 40
7 35 25 35
8 50 20 40
9 35 30 30
10 35 50 25

Related

Operations on multiple columns accross many tables

I have two tables (dt1, dt2). dt2 contains the same variables names as dt1.
For each variable in dt1 I would like to multiply it with its values from dt2.
In the exemple below, x from dt1 will get multiplied with 4 and y with 7.
How would be the fast way to do it?
Thank you
set.seed(123)
dt1 <- data.frame(x = sample(1:10, 10, TRUE), y = sample(1:10, 10, TRUE) )
dt1
dt2 = data.frame (names = c("x", "y"), values = c(4, 7))
dt2
purrr style
map2_df(dt1, dt2 %>% pivot_wider(names_from = names, values_from = values), ~.y * .x)
# A tibble: 10 x 2
x y
<dbl> <dbl>
1 12 35
2 12 21
3 40 63
4 8 63
5 24 63
6 20 21
7 16 56
8 24 70
9 36 49
10 40 70
You can try sweep
> sweep(dt1, 2, dt2$values[match(dt2$names, names(dt1))], "*")
x y
1 12 35
2 12 21
3 40 63
4 8 63
5 24 63
6 20 21
7 16 56
8 24 70
9 36 49
10 40 70
or
> dt1[] <- t(t(dt1) * dt2$values[match(dt2$names, names(dt1))])
> dt1
x y
1 12 35
2 12 21
3 40 63
4 8 63
5 24 63
6 20 21
7 16 56
8 24 70
9 36 49
10 40 70

how to subtract the next column by the previous column and create a new column after?

There are here on stackoverflow questions about how to diff a column by the previous column like this my question is a little bit different, i want to create a new column after that diff and don't modify the existing columns
Sample data:
dfData <- data.frame(ID = c(1, 2, 3, 4, 5),
DistA = c(10, 8, 15, 22, 15),
DistB = c(15, 35, 40, 33, 20),
DistC = c(20,40,50,45,30),
DistD = c(60,55,55,48,50))
ID DistA DistB DistC DistD
1 1 10 15 20 60
2 2 8 35 40 55
3 3 15 40 50 55
4 4 22 33 45 48
5 5 15 20 30 50
Expected output:
ID DistA DistB DiffB-A DistC DistD Diff D-C
1 1 10 15 05 20 60 40
2 2 8 35 27 40 55 15
3 3 15 40 25 50 55 05
4 4 22 33 11 45 48 03
5 5 15 20 5 30 50 20
Subtract the next column by the previous column and create a new column after
If you want to subtract every two columns, we can use split.default to split the data into two columns each and subtract the second column with the first one.
cols <- ceiling(seq_along(dfData[-1])/2)
new_cols <- tapply(names(dfData[-1]), cols, function(x)
sprintf('diff_%s', paste0(x, collapse = '')))
dfData[new_cols] <- sapply(split.default(dfData[-1], cols), function(x)
x[[2]] - x[[1]])
dfData
# ID DistA DistB DistC DistD diff_DistADistB diff_DistCDistD
#1 1 10 15 20 60 5 40
#2 2 8 35 40 55 27 15
#3 3 15 40 50 55 25 5
#4 4 22 33 45 48 11 3
#5 5 15 20 30 50 5 20

Rank system in R, recursive function

I really don't have idea what I'm looking for, if a loop, recursive function or maybe something different.
This is my toy dataset:
ID1 S1 S2 S3
1 10 20 30
2 20 30 40
1 50 60 70
3 20 40 50
1 10 30 10
2 40 20 20
toy$OLD_RANK = find previous row with same ID1 and copy NEW RANK of that row. If no row with same ID1 give assigned value (10 in this example)
toy$NEW_RANK = OLD_RANK + S1+S2+S3
expected result:
ID1 S1 S2 S3 OLD_RANK NEW_RANK
1 10 20 30 10 70
2 20 30 40 10 100
1 50 60 70 70 250
3 20 40 50 10 120
1 10 30 10 280 330
2 40 20 20 100 180
dataframe for R as requested:
toy <- matrix(c(1,10,20,30,2,20,30,40,1,50,60,70,3,20,40,50,1,10,30,10,2,40,20,20),ncol=4,byrow=TRUE)
colnames(toy) <- c("ID1","S1","S2","S3")
toy <- as.data.frame(database )

selecting middle n rows in R

I have a data.table in R say df.
row.number <- c(1:20)
a <- c(rep("A", 10), rep("B", 10))
b <- c(sample(c(0:100), 20, replace = TRUE))
df <-data.table(row.number,a,b)
df
row.number a b
1 1 A 14
2 2 A 59
3 3 A 39
4 4 A 22
5 5 A 75
6 6 A 89
7 7 A 11
8 8 A 88
9 9 A 22
10 10 A 6
11 11 B 37
12 12 B 42
13 13 B 39
14 14 B 8
15 15 B 74
16 16 B 67
17 17 B 18
18 18 B 12
19 19 B 56
20 20 B 21
I want to take the 'n' rows , (say 10) from the middle after arranging the records in increasing order of column b.
Use setorder to sort and .N to filter:
setorder(df, b)[(.N/2 - 10/2):(.N/2 + 10/2 - 1), ]
row.number a b
1: 11 B 36
2: 5 A 38
3: 8 A 41
4: 18 B 43
5: 1 A 50
6: 12 B 51
7: 15 B 54
8: 3 A 55
9: 20 B 59
10: 4 A 60
You could use the following code
library(data.table)
set.seed(9876) # for reproducibility
# your data
row.number <- c(1:20)
a <- c(rep("A", 10), rep("B", 10))
b <- c(sample(c(0:100), 20, replace = TRUE))
df <- data.table(row.number,a,b)
df
# define how many to select and store in n
n <- 10
# calculate how many to cut off at start and end
n_not <- (nrow(df) - n )/2
# use data.tables setorder to arrange based on column b
setorder(df, b)
# select the rows wanted based on n
df[ (n_not+1):(nr-n_not), ]
Please let me know whether this is what you want.

R iterate over a data frame to add a new column with sequential values

Here is my data frame "data.frame"
X Y
1 10 12
2 20 22
3 30 32
Below what I want.
1) add a new colum named "New_col"
2) each cell of a given id is a sequence from X-value to Y-value (step of 1).
X Y New_col
1 10 12 10
11
12
2 20 22 20
21
22
3 30 32 30
31
32
Then fill the empty cells
X Y New_col
1 10 12 10
1 10 12 11
1 10 12 12
2 20 22 20
2 20 22 21
2 20 22 22
3 30 32 30
3 30 32 31
3 30 32 32
I tried the following:
New_col<-seq(from = data.frame$X, to = data.frame$Y, by = 1)
The problem it this code computes the sequence only for the first row.
Then I tried a loop:
for (i in 1: length(data.frame$X))
{
New_col <-seq(from = data.frame$X, to = data.frame$Y, by = 1)
}
This is the error I got:
Error in seq.default(from = data.frame$X, to = data.frame$Y, by = 1) :
'from' must be of length 1
Thank you for your help.
You can use apply:
do.call(rbind, apply(dat, 1, function(x)
data.frame(X = x[1], Y = x[2], New_col = seq(x[1], x[2]))))
where dat is the name of your data frame. You can ignore the warnings.
X Y New_col
1.1 10 12 10
1.2 10 12 11
1.3 10 12 12
2.1 20 22 20
2.2 20 22 21
2.3 20 22 22
3.1 30 32 30
3.2 30 32 31
3.3 30 32 32
This is a good use case for the data.table package (which you would have to install first):
dat = read.table(text=" X Y
1 10 12
2 20 22
3 30 32")
library(data.table)
dt = as.data.table(dat)
Once you've got your data table set up, by makes this operation easy:
dt2 = dt[, list(New_col=seq(X, Y)), by=c("X", "Y")]
# X Y New_col
# 1: 10 12 10
# 2: 10 12 11
# 3: 10 12 12
# 4: 20 22 20
# 5: 20 22 21
# 6: 20 22 22
# 7: 30 32 30
# 8: 30 32 31
# 9: 30 32 32
(The only disclaimer is that this will not work if there are duplicate (X, Y) pairs in your original data frame).

Resources