This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
I've got a dataset like this:
dat1 <- read.table(header=TRUE, text="
ID A_1 B_1 C_1 A_2 B_2 C_2
1 1 2 1 5 5 5
2 1 3 3 4 4 1
3 1 3 1 3 2 2
4 2 5 5 3 2 2
5 1 4 1 2 1 3
")
I would like to convert this to a long format, with one column for the ID, one for the system (1 or 2), one for the variable (A, B, or C) and one with the value.
I can't figure out how to do that, I would be very grateful if somebody could help me out.
I already tried the pivot_longer command, but it only gives me three columns one for the ID one for the variables and one for the value.
Thanks!!
You can use pivot_longer in the following way :
tidyr::pivot_longer(dat1, cols = -ID,
names_to = c('variable', 'system'), names_sep = '_')
# ID variable system value
# <int> <chr> <chr> <int>
# 1 1 A 1 1
# 2 1 B 1 2
# 3 1 C 1 1
# 4 1 A 2 5
# 5 1 B 2 5
# 6 1 C 2 5
# 7 2 A 1 1
# 8 2 B 1 3
# 9 2 C 1 3
#10 2 A 2 4
# … with 20 more rows
Related
I have the following wide data in a .csv:
Subj
Show1_judgment1
Show1_judgment2
Show1_judgment3
Show1_judgment4
Show2_judgment1
Show2_judgment2
Show2_judgment3
Show2_judgment4
1
3
2
5
6
3
2
5
7
2
1
3
5
4
3
4
1
6
3
1
2
6
2
3
7
2
6
The columns keep going for the same four judgments for a series of 130 different shows.
I want to change this data into long form so that it looks like this:
Subj
show
judgment1
judgment2
judgment3
judgment4
1
show1
2
5
6
1
1
show2
3
5
4
4
1
show3
2
6
2
5
Usually, I would use base r to subset the columns into their own dataframes and then used rbind to put them into one dataframe.
But since there are so many different shows, it will be very inefficient to do it like that. I am relatively novice at R, so I can only do very basic for loops, but I think a for loop that subsets the subject column (first column in data) and then groups of 4 sequential columns would do this much more efficiently.
Can anyone help me create a for loop for this?
Thank you in advance for your help!
No for loop required, this is transforming or "pivoting" from wide to long format.
tidyr
tidyr::pivot_longer(dat, -Subj, names_pattern = "(.*)_(.*)", names_to = c("show", ".value"))
# # A tibble: 6 x 6
# Subj show judgment1 judgment2 judgment3 judgment4
# <int> <chr> <int> <int> <int> <int>
# 1 1 Show1 3 2 5 6
# 2 1 Show2 3 2 5 7
# 3 2 Show1 1 3 5 4
# 4 2 Show2 3 4 1 6
# 5 3 Show1 1 2 6 2
# 6 3 Show2 3 7 2 6
data.table
Requires data.table-1.14.3, relatively new (or from github).
data.table::melt(
dat, id.vars = "Subj",
measure.vars = measure(show, value.name, sep = "_"))
This question already has answers here:
Correlation between two dataframes by row
(2 answers)
Closed 2 years ago.
I've got two datasets from the same people and I want to compute a correlation for each person over the two datasets.
Example dataset:
dat1 <- read.table(header=TRUE, text="
ItemX1 ItemX2 ItemX3 ItemX4 ItemX5
5 1 2 1 5
3 1 3 3 4
2 1 3 1 3
4 2 5 5 3
5 1 4 1 2
")
dat2 <- read.table(header=TRUE, text="
ItemY1 ItemY2 ItemY3 ItemY4 ItemY5
4 2 1 1 4
4 3 1 2 5
1 5 3 2 2
5 2 4 4 1
5 1 5 2 1
")
Does anybody know how to compute the correlation rowwise for each person and NOT for the whole two datasets?
Thank you!
One possible solution using {purrr} to iterate over the rows of both df's and compute the correlation between each row of dat1 and dat2.
library(purrr)
dat1 <- read.table(header=TRUE, text="
ItemX1 ItemX2 ItemX3 ItemX4 ItemX5
5 1 2 1 5
3 1 3 3 4
2 1 3 1 3
4 2 5 5 3
5 1 4 1 2
")
dat2 <- read.table(header=TRUE, text="
ItemY1 ItemY2 ItemY3 ItemY4 ItemY5
4 2 1 1 4
4 3 1 2 5
1 5 3 2 2
5 2 4 4 1
5 1 5 2 1
")
n_person = nrow(dat1)
cormat <- purrr::map_df(.x = setNames(1:n_person, paste0("person_", 1:n_person)), .f = ~cor(t(dat1[.x,]), t(dat2[.x,])))
cormat
#> # A tibble: 1 x 5
#> person_1[,"1"] person_2[,"2"] person_3[,"3"] person_4[,"4"] person_5[,"5"]
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.917 0.289 -0.330 0.723 0.913
Created on 2020-11-16 by the reprex package (v0.3.0)
Following that post mentioned by #Ravi, we can transpose the dataframe and then calculate the correlations. One additional step is to vectorise the cor function if you want a not-so-wasteful approach. Consider something like this
tp <- function(x) unname(as.data.frame(t(x)))
Vectorize(cor, c("x", "y"))(tp(dat1), tp(dat2))
Output
[1] 0.9169725 0.2886751 -0.3296902 0.7234780 0.9132660
This question already has answers here:
Convert data from long format to wide format with multiple measure columns
(6 answers)
Reshape multiple value columns to wide format
(5 answers)
Closed 3 years ago.
How can I get from the data.frame below in R to the other data.frame. I am new to dplyr/tidyr, so do not know exactly what functions to use, but I guess it can be done using these packages.
NAME GROUP X1 X2
A G1 1 2
A G2 1 3
A G3 4 3
B G1 3 3
B G2 2 3
B G3 5 4
C G1 4 3
C G2 4 1
C G3 4 3
NAME X1_G1 X2_G1 X1_G2 X2_G2 X1_G3 X2_G3
A 1 2 1 3 4 3
B 4 3 3 3 2 3
C 4 3 4 1 4 3
Thank you for your help!
library(tidyr) #v1.0.0
pivot_wider(df, names_from = GROUP, values_from = c(X1, X2))
# A tibble: 3 x 7
NAME X1_G1 X1_G2 X1_G3 X2_G1 X2_G2 X2_G3
<chr> <int> <int> <int> <int> <int> <int>
1 A 1 1 4 2 3 3
2 B 3 2 5 3 3 4
3 C 4 4 4 3 1 3
PS1: Downvote why?
I know this is a dup, but this library is out for a month and this consider as a new way to solve this problem.
PS2: Dear Moderator, you should clarify before you delete someone's comment.
This question already has answers here:
Fill missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 3 years ago.
I have a data frame of ids with number column
df <- read.table(text="
id nr
1 1
2 1
1 2
3 1
1 3
", header=TRUE)
I´d like to create new dataframe from it, where each id will have unique nr from df dataframe. As you may notice, id 3 have only nr 1, but no 2 and 3. So result should be.
result <- read.table(text="
id nr
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
", header=TRUE)
You can use expand.grid as:
library(dplyr)
result <- expand.grid(id = unique(df$id), nr = unique(df$nr)) %>%
arrange(id)
result
id nr
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
We can do:
tidyr::expand(df,id,nr)
# A tibble: 9 x 2
id nr
<int> <int>
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
I am trying to clean my data so that only duplicate values that have an observation in my first sampling period are kept. For instance, if my data frame looks like this:
df <- data.frame(ID = c(1,1,1,2,2,2,3,3,4,4), period = c(1,2,3,1,2,3,2,3,1,3), mass = rnorm(10, 5, 2))
df
ID period mass
1 1 1 3.313674
2 1 2 6.371979
3 1 3 5.449435
4 2 1 4.093022
5 2 2 2.615782
6 2 3 3.622842
7 3 2 4.466666
8 3 3 6.940979
9 4 1 6.226222
10 4 3 4.233397
I would like to keep observations only the observations that are duplicated for individuals measured during period 1. My new data frame would then look like this:
ID period mass
1 1 1 3.313674
2 1 2 6.371979
3 1 3 5.449435
4 2 1 4.093022
5 2 2 2.615782
6 2 3 3.622842
9 4 1 6.226222
10 4 3 4.233397
Using suggestions on this page (Remove all unique rows) I have tried using the following command, but it leaves in the observations for individual 3 (which was not measured in period 1).
subset(df, duplicated(ID) | duplicated(ID, fromLast=T))
If you want a base solution, the following should work, as well.
> df_new <- df[df$ID %in% df$ID[df$period == 1], ]
> df_new
ID period mass
1 1 1 3.238832
2 1 2 3.428847
3 1 3 1.205347
4 2 1 8.498452
5 2 2 7.523085
6 2 3 3.613678
9 4 1 3.324095
10 4 3 1.932733
You can use dplyr as follows:
library(dplyr)
df %>% group_by(ID) %>% filter(1 %in% period)
#Source: local data frame [8 x 3]
#Groups: ID [3]
# ID period mass
# <dbl> <dbl> <dbl>
#1 1 1 7.622950
#2 1 2 7.960665
#3 1 3 5.045723
#4 2 1 4.366568
#5 2 2 4.400645
#6 2 3 6.088367
#7 4 1 2.282713
#8 4 3 2.461640