R - Problems when using pivot_longer function from wide to long dataframes - r

I have some data with the following features: id, group, sex, datebirth, date1, date2, date3, ctrl1, ctrl2, ctrl3, ab4v1, ab4v2, ab4v3.
What I want is to transform this dataframe onto another one with the following columns in long format: id, group, sex, datebirth, version, date, ctrl, ab4.
(NOTE: version will get values 1, 2 or 3).
Usually, I would use reshape function in R, but I have to use pivot_longer. How could I do this transformation?
I tried things like:
df %>% pivot_longer(cols = -c("id","group","sex","datebirth"),
names_to = c("version",".value"),
names_pattern = "([A-Za-z]+)(\\d+)")
But I get nothing... Any ideas?
Thank you in advance.
This is what I have:
id group sex datebirth date1 date2 date3 ctrl1 ctrl2 ctrl3 ab4v1 ab4v2 ab4v3
1 1 A Male 1975-01-08 2010-10-10 2011-11-12 2011-12-12 183 835 139 745 584 817
2 2 B Male 1998-05-12 2010-10-10 2011-11-12 2011-12-12 172 727 214 793 653 499
3 3 A Male 2005-12-28 2010-10-10 2011-11-23 2011-12-23 157 667 222 664 505 924
4 4 C Female 1957-07-01 2010-10-10 2011-11-25 2011-12-25 186 123 344 584 582 653
This is what I want:
id group sex datebirth version date ctrl ab4
1 1 A Male 1975-01-08 1 2010-10-10 183 745
2 2 B Male 1998-05-12 1 2010-10-10 172 793
3 3 A Male 2005-12-28 1 2010-10-10 157 664
4 4 C Female 1957-07-01 1 2010-10-10 186 584
.........

We need the change the order of names_to. We could either use names_sep or names_pattern. The only difference is that names_sep directs to a delimiter. Here the delimiter is the boundary between a letter ((?<=[A-Za-z])) and a digit ((?=[0-9]$)). Here, it means check for the boundary that succeeds a letter and precedes a digit. With the names_pattern, we are capturing specific sets of characters in a group ((...)). The OP's post used that "([A-Za-z]+)(\\d+)" i.e. one or more letters as the first group and digits as the second group.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = date1:ab4v3, names_to = c(".value", "version"),
names_sep = "(?<=[A-Za-z])(?=[0-9]$)")
# A tibble: 12 x 8
# id group sex datebirth version date ctrl ab4v
# <int> <chr> <chr> <chr> <chr> <chr> <int> <int>
# 1 1 A Male 1975-01-08 1 2010-10-10 183 745
# 2 1 A Male 1975-01-08 2 2011-11-12 835 584
# 3 1 A Male 1975-01-08 3 2011-12-12 139 817
# 4 2 B Male 1998-05-12 1 2010-10-10 172 793
# 5 2 B Male 1998-05-12 2 2011-11-12 727 653
# 6 2 B Male 1998-05-12 3 2011-12-12 214 499
# 7 3 A Male 2005-12-28 1 2010-10-10 157 664
# 8 3 A Male 2005-12-28 2 2011-11-23 667 505
# 9 3 A Male 2005-12-28 3 2011-12-23 222 924
#10 4 C Female 1957-07-01 1 2010-10-10 186 584
#11 4 C Female 1957-07-01 2 2011-11-25 123 582
#12 4 C Female 1957-07-01 3 2011-12-25 344 653
data
df <- structure(list(id = 1:4, group = c("A", "B", "A", "C"), sex = c("Male",
"Male", "Male", "Female"), datebirth = c("1975-01-08", "1998-05-12",
"2005-12-28", "1957-07-01"), date1 = c("2010-10-10", "2010-10-10",
"2010-10-10", "2010-10-10"), date2 = c("2011-11-12", "2011-11-12",
"2011-11-23", "2011-11-25"), date3 = c("2011-12-12", "2011-12-12",
"2011-12-23", "2011-12-25"), ctrl1 = c(183L, 172L, 157L, 186L
), ctrl2 = c(835L, 727L, 667L, 123L), ctrl3 = c(139L, 214L, 222L,
344L), ab4v1 = c(745L, 793L, 664L, 584L), ab4v2 = c(584L, 653L,
505L, 582L), ab4v3 = c(817L, 499L, 924L, 653L)), class = "data.frame",
row.names = c("1",
"2", "3", "4"))

The following is ugly but I believe it might work. It's a sequence of pivot_longer statements, taking care of one variable in wide format at a time.
library(dplyr)
library(tidyr)
fun <- function(X, Var){
Vard <- paste0(Var, "\\d")
X %>%
select(1:4, matches( {{ Vard }} )) %>%
pivot_longer(
cols = matches( {{ Vard }} ),
names_to = "version",
values_to = Var
) %>%
mutate(version = sub(Var, "", version))
}
vars <- c("date", "ctrl", "ab4v")
Reduce(function(x, y) merge(x, y), lapply(vars, function(v) fun(df1, v)))
# id group sex datebirth version date ctrl ab4v
#1 1 A Male 1975-01-08 1 2010-10-10 183 745
#2 1 A Male 1975-01-08 2 2011-11-12 835 584
#3 1 A Male 1975-01-08 3 2011-12-12 139 817
#4 2 B Male 1998-05-12 1 2010-10-10 172 793
#5 2 B Male 1998-05-12 2 2011-11-12 727 653
#6 2 B Male 1998-05-12 3 2011-12-12 214 499
#7 3 A Male 2005-12-28 1 2010-10-10 157 664
#8 3 A Male 2005-12-28 2 2011-11-23 667 505
#9 3 A Male 2005-12-28 3 2011-12-23 222 924
#10 4 C Female 1957-07-01 1 2010-10-10 186 584
#11 4 C Female 1957-07-01 2 2011-11-25 123 582
#12 4 C Female 1957-07-01 3 2011-12-25 344 653

Related

Divide columns by a reference row

I need to divide columns despesatotal and despesamonetaria by the row named Total:
Lets suppose your data set is df.
# 1) Delete the last row
df <- df[-nrow(df),]
# 2) Build the desired data.frame [combining the CNAE names and the proportion columns
new.df <- cbind(grup_CNAE = df$grup_CNAE,
100*prop.table(df[,-1],margin = 2))
Finally, rename your columns. Be careful with the matrix or data.frame formats, because sometimes mathematical operations may suppose a problem. If you you use dput function in order to give us a reproducible example, the answer would be more accurate.
Here is a way to get it done. This is not the best way, but I think it is very readable.
Suppose this is your data frame:
mydf = structure(list(grup_CNAE = c("A", "B", "C", "D", "E", "Total"
), despesatotal = c(71, 93, 81, 27, 39, 311), despesamonetaria = c(7,
72, 36, 22, 73, 210)), row.names = c(NA, -6L), class = "data.frame")
mydf
# grup_CNAE despesatotal despesamonetaria
#1 A 71 7
#2 B 93 72
#3 C 81 36
#4 D 27 22
#5 E 39 73
#6 Total 311 210
To divide despesatotal values with its total value, you need to use the total value (311 in this example) as the denominator. Note that the total value is located in the last row. You can identify its position by indexing the despesatotal column and use nrow() as the index value.
mydf |> mutate(percentage1 = despesatotal/despesatotal[nrow(mydf)],
percentage2 = despesamonetaria /despesamonetaria[nrow(mydf)])
# grup_CNAE despesatotal despesamonetaria percentage1 percentage2
#1 A 71 7 0.22829582 0.03333333
#2 B 93 72 0.29903537 0.34285714
#3 C 81 36 0.26045016 0.17142857
#4 D 27 22 0.08681672 0.10476190
#5 E 39 73 0.12540193 0.34761905
#6 Total 311 210 1.00000000 1.00000000
library(tidyverse)
Sample data
# A tibble: 11 x 3
group despesatotal despesamonetaria
<chr> <int> <int>
1 1 198 586
2 2 186 525
3 3 202 563
4 4 300 562
5 5 126 545
6 6 215 529
7 7 183 524
8 8 163 597
9 9 213 592
10 10 175 530
11 Total 1961 5553
df %>%
mutate(percentage_total = despesatotal / last(despesatotal),
percentage_monetaria = despesamonetaria/ last(despesamonetaria)) %>%
slice(-nrow(.))
# A tibble: 10 x 5
group despesatotal despesamonetaria percentage_total percentage_monetaria
<chr> <int> <int> <dbl> <dbl>
1 1 198 586 0.101 0.106
2 2 186 525 0.0948 0.0945
3 3 202 563 0.103 0.101
4 4 300 562 0.153 0.101
5 5 126 545 0.0643 0.0981
6 6 215 529 0.110 0.0953
7 7 183 524 0.0933 0.0944
8 8 163 597 0.0831 0.108
9 9 213 592 0.109 0.107
10 10 175 530 0.0892 0.0954
This is a good place to use dplyr::mutate(across()) to divide all relevant columns by the Total row. Note this is not sensitive to the order of the rows and will apply the manipulation to all numeric columns. You can supply any tidyselect semantics to across() instead if needed in your case.
library(tidyverse)
# make sample data
d <- tibble(grup_CNAE = paste0("Group", 1:12),
despesatotal = sample(1e6:5e7, 12),
despesamonetaria = sample(1e6:5e7, 12)) %>%
add_row(grup_CNAE = "Total", summarize(., across(where(is.numeric), sum)))
# divide numeric columns by value in "Total" row
d %>%
mutate(across(where(is.numeric), ~./.[grup_CNAE == "Total"]))
#> # A tibble: 13 × 3
#> grup_CNAE despesatotal despesamonetaria
#> <chr> <dbl> <dbl>
#> 1 Group1 0.117 0.0204
#> 2 Group2 0.170 0.103
#> 3 Group3 0.0451 0.0837
#> 4 Group4 0.0823 0.114
#> 5 Group5 0.0170 0.0838
#> 6 Group6 0.0174 0.0612
#> 7 Group7 0.163 0.155
#> 8 Group8 0.0352 0.0816
#> 9 Group9 0.0874 0.135
#> 10 Group10 0.113 0.0877
#> 11 Group11 0.0499 0.0495
#> 12 Group12 0.104 0.0251
#> 13 Total 1 1
Created on 2022-11-08 with reprex v2.0.2

Combine dataframes only by mutual rownames

I want to combine about 20 dataframes, with different lengths of rows and columns, only by the mutual rownames. Any rows that are not shared for ALL dataframes are deleted. So for example on two dataframes:
Patient1 Patient64 Patient472
ABC 28 38 0
XYZ 92 11 998
WWE 1 10 282
ICQ 0 76 56
SQL 22 1002 778
combine with
Pat_9 Pat_1 Pat_111
ABC 65 44 874
CBA 3 311 998
WWE 2 1110 282
vVv 2 760 56
GHG 12 1200 778
The result would be
Patient1 Patient64 Patient472 Pat_9 Pat_1 Pat_111
ABC 28 38 0 65 44 874
WWE 1 10 282 2 1110 282
I know how to use rbind and cbind but not for the purpose of joining according to shared rownames.
Try this considering change list arguments to df1 , df2 , df3 , ... , df20 your data.frames
l <- lapply(list(df1 , df2 ) , \(x) {x[["id"]] <- rownames(x) ; x})
Reduce(\(x,y) merge(x,y , by = "id") , l)
you can try
merge(d1, d2, by = "row.names")
Row.names Patient1 Patient64 Patient472 Pat_9 Pat_1 Pat_111
1 ABC 28 38 0 65 44 874
2 WWE 1 10 282 2 1110 282
for more than two you can use a tidyverse
library(tidyverse)
lst(d1, d2, d2) %>%
map(rownames_to_column) %>%
reduce(inner_join, by="rowname")
You can first turn your rownames_to_column and use a inner_join and at last convert column_to_rownames back like this:
df1 <- read.table(text=" Patient1 Patient64 Patient472
ABC 28 38 0
XYZ 92 11 998
WWE 1 10 282
ICQ 0 76 56
SQL 22 1002 778", header = TRUE)
df2 <- read.table(text = " Pat_9 Pat_1 Pat_111
ABC 65 44 874
CBA 3 311 998
WWE 2 1110 282
vVv 2 760 56
GHG 12 1200 778", header = TRUE)
library(dplyr)
library(tibble)
df1 %>%
rownames_to_column() %>%
inner_join(df2 %>% rownames_to_column(), by = "rowname") %>%
column_to_rownames()
#> Patient1 Patient64 Patient472 Pat_9 Pat_1 Pat_111
#> ABC 28 38 0 65 44 874
#> WWE 1 10 282 2 1110 282
Created on 2022-07-20 by the reprex package (v2.0.1)
Option with list of dataframes:
dfs_list <- list(df1, df2)
transform(Reduce(merge, lapply(dfs_list, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
#> Patient1 Patient64 Patient472 Pat_9 Pat_1 Pat_111
#> ABC 28 38 0 65 44 874
#> WWE 1 10 282 2 1110 282
Created on 2022-07-20 by the reprex package (v2.0.1)

Canonical way to include one id column into all elements of resulting list from split.default

I am splitting a data.frame into a list on the basis of its column names. What I want is to include a id column (id) to not just one item but into all elements of the resulting list.
Presently I am doing it through subsequent binding of id column to all items of list through map and bind_cols (alternatives through Map/do.call/mapply etc. I can do similarly myself). What I want to know is there any canonical way of doing it directly, maybe with a function argument of split.default or through some other function directly and thus saving two or three extra steps.
Reproducible example
df <- data.frame(
stringsAsFactors = FALSE,
id = c("A", "B", "C"),
nm1_a = c(928L, 476L, 928L),
nm1_b = c(61L, 362L, 398L),
nm2_a = c(965L, 466L, 369L),
nm2_b = c(240L, 375L, 904L),
nm3_a = c(429L, 730L, 788L),
nm3_b = c(99L, 896L, 540L),
nm3_c = c(463L, 143L, 870L)
)
df
#> id nm1_a nm1_b nm2_a nm2_b nm3_a nm3_b nm3_c
#> 1 A 928 61 965 240 429 99 463
#> 2 B 476 362 466 375 730 896 143
#> 3 C 928 398 369 904 788 540 870
What I am doing presently
library(tidyverse)
split.default(df[-1], gsub('^(nm\\d+).*', '\\1', names(df)[-1])) %>%
map(~ .x %>% bind_cols('id' = df$id, .))
#> $nm1
#> id nm1_a nm1_b
#> 1 A 928 61
#> 2 B 476 362
#> 3 C 928 398
#>
#> $nm2
#> id nm2_a nm2_b
#> 1 A 965 240
#> 2 B 466 375
#> 3 C 369 904
#>
#> $nm3
#> id nm3_a nm3_b nm3_c
#> 1 A 429 99 463
#> 2 B 730 896 143
#> 3 C 788 540 870
What I want is exactly the same output, but is there any way to do it directly or a more canonical way?
Just for a diversity of options, here's what you said you didn't want to do. The pivot / split / pivot method can help scale better and adapt beyond keeping an ID based just on column position. It also makes use of the ID in order to do the reshaping, so it might also be more flexible if you have other operations to do in the intermediate steps and don't know for sure that your row order will stay the same—that's one of the reasons I sometimes avoid binding columns. It also (at least for me) makes sense to split data based on some variable rather than by groups of columns.
library(tidyr)
df %>%
pivot_longer(-id) %>%
split(stringr::str_extract(.$name, "^nm\\d+")) %>%
purrr::map(pivot_wider, id_cols = id, names_from = name)
#> $nm1
#> # A tibble: 3 x 3
#> id nm1_a nm1_b
#> <chr> <int> <int>
#> 1 A 928 61
#> 2 B 476 362
#> 3 C 928 398
#>
#> $nm2
#> # A tibble: 3 x 3
#> id nm2_a nm2_b
#> <chr> <int> <int>
#> 1 A 965 240
#> 2 B 466 375
#> 3 C 369 904
#>
#> $nm3
#> # A tibble: 3 x 4
#> id nm3_a nm3_b nm3_c
#> <chr> <int> <int> <int>
#> 1 A 429 99 463
#> 2 B 730 896 143
#> 3 C 788 540 870
You can make use of a temporary variable so that the code is cleaner and easy to understand.
common_cols <- 1
tmp <- df[-common_cols]
lapply(split.default(tmp, sub('^(nm\\d+).*', '\\1', names(tmp))),
function(x) cbind(df[common_cols], x))
#$nm1
# id nm1_a nm1_b
#1 A 928 61
#2 B 476 362
#3 C 928 398
#$nm2
# id nm2_a nm2_b
#1 A 965 240
#2 B 466 375
#3 C 369 904
#$nm3
# id nm3_a nm3_b nm3_c
#1 A 429 99 463
#2 B 730 896 143
#3 C 788 540 870
This one should be just two steps, split and replace.
Map(`[<-`, split.default(df[-1], substr(names(df)[-1], 1, 3)), 'id', value=df[1])
# $nm1
# nm1_a nm1_b id
# 1 928 61 A
# 2 476 362 B
# 3 928 398 C
#
# $nm2
# nm2_a nm2_b id
# 1 965 240 A
# 2 466 375 B
# 3 369 904 C
#
# $nm3
# nm3_a nm3_b nm3_c id
# 1 429 99 463 A
# 2 730 896 143 B
# 3 788 540 870 C

Somthing is wrong with using pivot_wider and pivot_longer to gather data(I finished it by myself.It was solved.)

I used this method to gather mean and sd result successly before here .And then, I tried to use this methond to gather my gene counts DEG data with "logFC","cil","cir","ajustP_value" .But I failed because something wrong with my result.
Just like this:
data_1<-data.frame(matrix(sample(1:1200,1200,replace = T),48,25))
names(data_1) <- c(paste0("Gene_", 1:25))
rownames(data_1)<-NULL
head(data_1)
A<-paste0(1:48,"_logFC")
data_logFC<-data.frame(A=A,data_1)
#
data_2<-data.frame(matrix(sample(1:1200,1200,replace = T),48,25))
names(data_2) <- c(paste0("Gene_", 1:25))
rownames(data_1)<-NULL
B_L<-paste0(1:48,"_CI.L")
data_CIL<-data.frame(A=B_L,data_2)
data_CIL[1:48,1:6]
#
data_3<-data.frame(matrix(sample(1:1200,1200,replace = T),48,25))
names(data_3) <- c(paste0("Gene_", 1:25))
rownames(data_3)<-NULL
C_R<-paste0(1:48,"_CI.R")
data_CIR<-data.frame(A=C_R,data_3)
data_CIR[1:48,1:6]
#
data_4<-data.frame(matrix(sample(1:1200,1200,replace = T),48,25))
names(data_4) <- c(paste0("Gene_", 1:25))
rownames(data_4)<-NULL
D<-paste0(1:48,"_adj.P.Val")
data_ajustP<-data.frame(A=D,data_4)
data_ajustP[1:48,1:6]
# combine data_logFC data_CIL data_CIR data_ajustP
data <- bind_rows(list(
logFC = data_logFC,
CIL = data_CIL,
CIR =data_CIR,
AJSTP=data_ajustP
), .id = "stat")
data[1:10,1:6]
data_DEG<- data %>%
pivot_longer(-c(stat,A), names_to = "Gene", values_to = "value") %>%pivot_wider(names_from = "stat", values_from = "value")
head(data_DEG,100)
str(data_DEG$CIL)
> head(data_DEG,100)
# A tibble: 100 x 6
A Gene logFC CIL CIR AJSTP
<chr> <chr> <int> <int> <int> <int>
1 1_logFC Gene_1 504 NA NA NA
2 1_logFC Gene_2 100 NA NA NA
3 1_logFC Gene_3 689 NA NA NA
4 1_logFC Gene_4 779 NA NA NA
5 1_logFC Gene_5 397 NA NA NA
6 1_logFC Gene_6 1152 NA NA NA
7 1_logFC Gene_7 780 NA NA NA
8 1_logFC Gene_8 155 NA NA NA
9 1_logFC Gene_9 142 NA NA NA
10 1_logFC Gene_10 1150 NA NA NA
# … with 90 more rows
Why is there so many NAs ?
Can somebody help me ? Vary thankful.
EDITE:
I confused the real sample group of my data. So I reshape my data without a right index.
Here is my right method:
data[1:10,1:6]
data<-separate(data,A,c("Name","stat2"),"_")
data<-data[,-3]
data_DEG<- data %>%
pivot_longer(-c(stat,Name), names_to = "Gene", values_to = "value") %>%pivot_wider(names_from = "stat", values_from = "value")
head(data_DEG,10)
tail(data_DEG,10)
> head(data_DEG,10)
# A tibble: 10 x 6
Name Gene logFC CIL CIR AJSTP
<chr> <chr> <int> <int> <int> <int>
1 1 Gene_1 504 1116 774 278
2 1 Gene_2 100 936 448 887
3 1 Gene_3 689 189 718 933
4 1 Gene_4 779 943 690 19
5 1 Gene_5 397 976 40 135
6 1 Gene_6 1152 304 343 647
7 1 Gene_7 780 1076 796 1024
8 1 Gene_8 155 645 469 180
9 1 Gene_9 142 256 889 1047
10 1 Gene_10 1150 976 1194 670
> tail(data_DEG,10)
# A tibble: 10 x 6
Name Gene logFC CIL CIR AJSTP
<chr> <chr> <int> <int> <int> <int>
1 48 Gene_16 448 633 1080 1122
2 48 Gene_17 73 772 14 388
3 48 Gene_18 652 999 699 912
4 48 Gene_19 600 1163 512 241
5 48 Gene_20 428 1119 1142 348
6 48 Gene_21 66 553 240 82
7 48 Gene_22 753 1119 630 117
8 48 Gene_23 1017 305 1120 447
9 48 Gene_24 432 1175 447 670
10 48 Gene_25 482 394 371 696
It's a perfect result!!

Pivot/Reshape data in R [duplicate]

This question already has answers here:
Reshape horizontal to to long format using pivot_longer
(3 answers)
Closed 2 years ago.
Thank you all for your answers, I thought I was smarter than I am and hoped I would've understood any of it. I think I messed up my visualisation of my data aswell. I have edited my post to better show my sample data. Sorry for the inconvenience, and I truly hope that someone can help me.
I have a question about reshaping my data. The data collected looks as such:
data <- read.table(header=T, text='
pid measurement1 Tdays1 measurement2 Tdays2 measurement3 Tdays3 measurment4 Tdays4
1 1356 1435 1483 1405 1563 1374 NA NA
2 943 1848 1173 1818 1300 1785 NA NA
3 1590 185 NA NA NA NA 1585 294
4 130 72 443 70 NA NA 136 79
4 140 82 NA NA NA NA 756 89
4 220 126 266 124 NA NA 703 128
4 166 159 213 156 476 145 776 166
4 380 189 583 173 NA NA 586 203
4 353 231 510 222 656 217 526 240
4 180 268 NA NA NA NA NA NA
4 NA NA NA NA NA NA 580 278
4 571 334 596 303 816 289 483 371
')
Now i would like it to look something like this:
PID Time Value
1 1435 1356
1 1405 1483
1 1374 1563
2 1848 943
2 1818 1173
2 1785 1300
3 185 1590
... ... ...
How would i tend to get there? I have looked up some things about wide to longformat, but it doesn't seem to do the trick. Am reletively new to Rstudio and Stackoverflow (if you couldn't tell that already).
Kind regards, and thank you in advance.
Here is a slightly different pivot_longer() version.
library(tidyr)
library(dplyr)
dw %>%
pivot_longer(cols = -PID, names_to =".value", names_pattern = "(.+)[0-9]")
# A tibble: 9 x 3
PID T measurement
<dbl> <dbl> <dbl>
1 1 1 100
2 1 4 200
3 1 7 50
4 2 2 150
5 2 5 300
6 2 8 60
7 3 3 120
8 3 6 210
9 3 9 70
The names_to = ".value" argument creates new columns from column names based on the names_pattern argument. The names_pattern argument takes a special regex input. In this case, here is the breakdown:
(.+) # match everything - anything noted like this becomes the ".values"
[0-9] # numeric characters - tells the pattern that the numbers
# at the end are excluded from ".values". If you have multiple digit
# numbers, use [0-9*]
In the last edit you asked for a solution that is easy to understand. A very simple approach would be to stack the measurement columns on top of each other and the Tdays columns on top of each other. Although specialty packages make things very concise and elegant, for simplicity we can solve this without additional packages. Standard R has a convenient function aptly named stack, which works like this:
> exp <- data.frame(value1 = 1:5, value2 = 6:10)
> stack(exp)
values ind
1 1 value1
2 2 value1
3 3 value1
4 4 value1
5 5 value1
6 6 value2
7 7 value2
8 8 value2
9 9 value2
10 10 value2
We can stack measurements and Tdays seperately and then combine them via cbind:
data <- read.table(header=T, text='
pid measurement1 Tdays1 measurement2 Tdays2 measurement3 Tdays3 measurement4 Tdays4
1 1356 1435 1483 1405 1563 1374 NA NA
2 943 1848 1173 1818 1300 1785 NA NA
3 1590 185 NA NA NA NA 1585 294
4 130 72 443 70 NA NA 136 79
4 140 82 NA NA NA NA 756 89
4 220 126 266 124 NA NA 703 128
4 166 159 213 156 476 145 776 166
4 380 189 583 173 NA NA 586 203
4 353 231 510 222 656 217 526 240
4 180 268 NA NA NA NA NA NA
4 NA NA NA NA NA NA 580 278
4 571 334 596 303 816 289 483 371
')
cbind(stack(data, c(measurement1, measurement2, measurement3, measurement4)),
stack(data, c(Tdays1, Tdays2, Tdays3, Tdays4)))
Which keeps measurements and Tdays neatly together but leaves us without pid which we can add using rep to replicate the original pid 4 times:
result <- cbind(pid = rep(data$pid, 4),
stack(data, c(measurement1, measurement2, measurement3, measurement4)),
stack(data, c(Tdays1, Tdays2, Tdays3, Tdays4)))
The head of which looks like
> head(result)
pid values ind values ind
1 1 1356 measurement1 1435 Tdays1
2 2 943 measurement1 1848 Tdays1
3 3 1590 measurement1 185 Tdays1
4 4 130 measurement1 72 Tdays1
5 4 140 measurement1 82 Tdays1
6 4 220 measurement1 126 Tdays1
As I said above, this is not the order you expected and you can try to sort this data.frame, if that is of any concern:
result <- result[order(result$pid), c(1, 4, 2)]
names(result) <- c("pid", "Time", "Value")
leading to the final result
> head(result)
pid Time Value
1 1 1435 1356
13 1 1405 1483
25 1 1374 1563
37 1 NA NA
2 2 1848 943
14 2 1818 1173
tidyverse solution
library(tidyverse)
dw %>%
pivot_longer(-PID) %>%
mutate(name = gsub('^([A-Za-z]+)(\\d+)$', '\\1_\\2', name )) %>%
separate(name, into = c('A', 'B'), sep = '_', convert = T) %>%
pivot_wider(names_from = A, values_from = value)
Gives the following output
# A tibble: 9 x 4
PID B T measurement
<int> <int> <int> <int>
1 1 1 1 100
2 1 2 4 200
3 1 3 7 50
4 2 1 2 150
5 2 2 5 300
6 2 3 8 60
7 3 1 3 120
8 3 2 6 210
9 3 3 9 70
Considering a dataframe, df like the following:
PID T1 measurement1 T2 measurement2 T3 measurement3
1 1 100 4 200 7 50
2 2 150 5 300 8 60
3 3 120 6 210 9 70
You can use this solution to get your required dataframe:
iters = seq(from = 4, to = length(colnames(df))-1, by = 2)
finalDf = df[, c(1,2,3)]
for(j in iters){
tobind = df[, c(1,j,j+1)]
finalDf = rbind(finalDf, tobind)
}
finalDf = finalDf[order(finalDf[,1]),]
print(finalDf)
The output of the print statement is this:
PID T1 measurement1
1 1 1 100
4 1 4 200
7 1 7 50
2 2 2 150
5 2 5 300
8 2 8 60
3 3 3 120
6 3 6 210
9 3 9 70
Maybe you can try reshape like below
reshape(
setNames(data, gsub("(\\d+)$", "\\.\\1", names(data))),
direction = "long",
varying = 2:ncol(data)
)

Resources