Transpose data in R by IDnumber - r

I have a question of transposing data in R. Basically I am looking for an alternative to do proc transpose by id prefix = test and proc transpose by id prefix = score in R.
So I have a set of data looks like the following
ID test date score
1 4/1/2001 98
1 5/9/2001 65
1 5/23/2001 85
2 3/21/2001 76
2 4/8/2001 58
2 5/22/2001 67
2 6/15/2001 53
3 1/15/2001 46
3 5/30/2001 55
4 1/8/2001 71
4 2/14/2001 95
4 7/15/2001 93
and I would love to transpose it into:
id test date1 score1 test date2 score2 testdate3 score3 testdate4 score4
1 4/1/2000 98 5/9/2001 65 5/23/2001 85 .
2 3/21/2001 76 4/8/2001 58 5/22/2001 67 6/15/2001 53
3 1/15/2001 46 5/30/2001 55 . .
4 1/8/2001 71 2/14/2001 95 7/15/2001 93 .

This is a basic "long" to "wide" reshaping task. In base R, you can use reshape, but only after adding a "time" variable, like this:
mydf$time <- with(mydf, ave(ID, ID, FUN = seq_along))
reshape(mydf, direction = "wide", idvar = "ID", timevar = "time")
# ID test.date.1 score.1 test.date.2 score.2 test.date.3 score.3
# 1 1 4/1/2001 98 5/9/2001 65 5/23/2001 85
# 4 2 3/21/2001 76 4/8/2001 58 5/22/2001 67
# 8 3 1/15/2001 46 5/30/2001 55 <NA> NA
# 10 4 1/8/2001 71 2/14/2001 95 7/15/2001 93
# test.date.4 score.4
# 1 <NA> NA
# 4 6/15/2001 53
# 8 <NA> NA
# 10 <NA> NA

Related

How do I regroup data?

I am looking to change the structure of my dataframe, but I am not really sure how to do it. I am not even sure how to word the question either.
ID <- c(1,8,6,2,4)
a <- c(111,94,85,76,72)
b <- c(75,37,86,55,62)
dataframe <- data.frame(ID,a,b)
ID a b
1 1 111 75
2 8 94 37
3 6 85 86
4 2 76 55
5 4 72 62
Above is the code with the output, however, I want the output to look like the following; however, the only way I know how to do this is to just type it manually, is there any other way other than changing the input manually? Because I have quite a large data set that I would like to change and manually would just take forever.
ID letter value
1 1 a 111
2 1 b 75
3 8 a 94
4 8 b 37
5 6 a 85
6 6 b 86
7 2 a 76
8 2 b 55
9 4 a 72
10 4 b 62
We may use pivot_longer
library(dplyr)
library(tidyr)
dataframe %>%
pivot_longer(cols = a:b, names_to = 'letter')
-output
# A tibble: 10 × 3
ID letter value
<dbl> <chr> <dbl>
1 1 a 111
2 1 b 75
3 8 a 94
4 8 b 37
5 6 a 85
6 6 b 86
7 2 a 76
8 2 b 55
9 4 a 72
10 4 b 62
A base R option using reshape:
df <- reshape(dataframe, direction = "long",
v.names = "value",
varying = 2:3,
times = names(dataframe)[2:3],
timevar = "letter",
idvar = "ID")
df <- df[ order(match(df$ID, dataframe$ID)), ]
row.names(df) <- NULL
Output
ID letter value
1 1 a 111
2 1 b 75
3 8 a 94
4 8 b 37
5 6 a 85
6 6 b 86
7 2 a 76
8 2 b 55
9 4 a 72
10 4 b 62

Label columns with a ascending number [duplicate]

This question already has answers here:
Make sequential numeric column names prefixed with a letter
(3 answers)
Closed 2 years ago.
I want to label columns with a ascending number. The reason is because in a bigger dataset I want to be able to sort the columns so they get in the right order.
How do i code this? Thanks!
set.seed(8)
id <- 1:6
diet <- rep(c("A","B"),3)
period <- rep(c(1,2),3)
score1 <- sample(1:100,6)
score2 <- sample(1:100,6)
score3 <- sample(1:100,6)
df <- data.frame(id, diet, period, score1, score2,score3)
df
id diet period score1 score2 score3
1 1 A 1 47 30 44
2 2 B 2 21 93 54
3 3 A 1 79 76 14
4 4 B 2 64 63 90
5 5 A 1 31 44 1
6 6 B 2 69 9 26
It should look like:
x1id x2diet x3period x4score1 x5score2 x6score3
1 1 A 1 47 30 44
2 2 B 2 21 93 54
3 3 A 1 79 76 14
4 4 B 2 64 63 90
5 5 A 1 31 44 1
6 6 B 2 69 9 26
I was thinking something like this, but something is missing....
colnames(wellbeing) <- paste(1:ncol, colnames(wellbeing))
Another options:
colnames(df) <- paste0('x', 1:dim(df)[2], colnames(df))
or
df %>%
dplyr::rename_all(~ paste0('x', 1:ncol(df), .))
Both methods would yield the same output:
# x1id x2diet x3period x4score1 x5score2 x6score3
#1 1 A 1 96 1 52
#2 2 B 2 52 93 75
#3 3 A 1 55 50 68
#4 4 B 2 79 3 9
#5 5 A 1 12 6 76
#6 6 B 2 42 86 62
You can use :
names(df) <- paste0('x', seq_along(df), names(df))
df
# x1id x2diet x3period x4score1 x5score2 x6score3
#1 1 A 1 96 1 52
#2 2 B 2 52 93 75
#3 3 A 1 55 50 68
#4 4 B 2 79 3 9
#5 5 A 1 12 6 76
#6 6 B 2 42 86 62
Maybe add an underscore?
names(df) <- paste0('x', seq_along(df), "_", names(df))
names(df)
#[1] "x1_id" "x2_diet" "x3_period" "x4_score1" "x5_score2" "x6_score3"
Here is a mapply approach.
mapply(paste0, paste0("x", 1:ncol(df)), names(df))

Replace column values based on column in another dataframe

I would like to replace some column values in a df based on column in another data frame
This is the head of the first df:
df1
A tibble: 253 x 2
id sum_correct
<int> <dbl>
1 866093 77
2 866097 95
3 866101 37
4 866102 65
5 866103 16
6 866104 72
7 866105 99
8 866106 90
9 866108 74
10 866109 92
and some sum_correct need to be replaced by the correct values in another df using the id to trigger the replacement
df 2
A tibble: 14 x 2
id sum_correct
<int> <dbl>
1 866103 61
2 866124 79
3 866152 85
4 867101 24
5 867140 76
6 867146 51
7 867152 56
8 867200 50
9 867209 97
10 879657 56
11 879680 61
12 879683 58
13 879693 77
14 881451 57
how I can achieve this in R studio? thanks for the help in advance.
You can make an update join using match to find where id matches and remove non matches (NA) with which:
idx <- match(df1$id, df2$id)
idxn <- which(!is.na(idx))
df1$sum_correct[idxn] <- df2$sum_correct[idx[idxn]]
df1
id sum_correct
1 866093 77
2 866097 95
3 866101 37
4 866102 65
5 866103 61
6 866104 72
7 866105 99
8 866106 90
9 866108 74
10 866109 92
you can do a left_join and then use coalesce:
library(dplyr)
left_join(df1, df2, by = "id", suffix = c("_1", "_2")) %>%
mutate(sum_correct_final = coalesce(sum_correct_2, sum_correct_1))
The new column sum_correct_final contains the value from df2 if it exists and from df1 if a corresponding entry from df2 does not exist.

Splitting columns of a dataframe to merge a repetitive variable

I normally find an answer in previous questions posted here, but I can't seem to find this one, so here is my maiden question:
I have a dataframe with one column with repetitive values, I would like to split the other columns and have only 1 value in the first column and more columns than in the original dataframe.
Example:
df <- data.frame(test = c(rep(1:5,3)), time = sample(1:100,15), score = sample(1:500,15))
The original dataframe has 3 columns and 15 rows.
And it would turn into a dataframe with 5 rows and the columns would be split into 7 columns: 'test', 'time1', 'time2', 'time3', 'score1', score2', 'score3'.
Does anyone have an idea how this could be done?
I think using dcast with rowid from the data.table-package is well suited for this task:
library(data.table)
dcast(setDT(df), test ~ rowid(test), value.var = c('time','score'), sep = '')
The result:
test time1 time2 time3 score1 score2 score3
1: 1 52 3 29 21 131 45
2: 2 79 44 6 119 1 186
3: 3 67 95 39 18 459 121
4: 4 83 50 40 493 466 497
5: 5 46 14 4 465 9 24
Please try this:
df <- data.frame(test = c(rep(1:5,3)), time = sample(1:100,15), score = sample(1:500,15))
df$class <- c(rep('a', 5), rep('b', 5), rep('c', 5))
df <- split(x = df, f = df$class)
binded <- cbind(df[[1]], df[[2]], df[[3]])
binded <- binded[,-c(5,9)]
> binded
test time score class time.1 score.1 class.1 time.2 score.2 class.2
1 1 40 404 a 57 409 b 70 32 c
2 2 5 119 a 32 336 b 93 177 c
3 3 20 345 a 44 91 b 100 42 c
4 4 47 468 a 60 265 b 24 478 c
5 5 16 52 a 38 219 b 3 92 c
Let me know if it works for you!

Mutate dataframes by matching ids in r [duplicate]

This question already has answers here:
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 7 years ago.
I have three data frames:
df1:
id score1
1 50
2 23
3 40
4 68
5 82
6 38
df2:
id score2
1 33
2 23
4 64
5 12
6 32
df3:
id score3
1 50
2 23
3 40
4 68
5 82
I want to mutate the three scores to a dataframe like this, using NA to denote the missing value
id score1 score2 score3
1 50 33 50
2 23 23 23
3 40 NA 40
4 68 64 68
5 82 12 82
6 38 32 NA
Or like this, deleting the NA values:
id score1 score2 score3
1 50 33 50
2 23 23 23
4 68 64 68
5 82 12 82
However, mutate (in dplyer) does not take different length. So I can not mutate. How can I do that?
You can try
Reduce(function(...) merge(..., by='id'), list(df1, df2, df3))
# id score1 score2 score3
#1 1 50 33 50
#2 2 23 23 23
#3 4 68 64 68
#4 5 82 12 82
If you have many dataset object names with pattern 'df' followed by number
Reduce(function(...) merge(..., by='id'), mget(paste0('df',1:3)))
Or instead of paste0('df', 1:3), you can use ls(pattern='df\\d+') as commented by #DavidArenburg

Resources