How to transform my data frame to make rows columns? - r

I have a data frame with two columns, "Type" and "Stats". I want to each type to have one row with all of the stats in a separate column. For example, my data frame looks something like this:
Column Type has values: A A A A B B B B
Column Stats has values:15 2 73 12 12 6 52 17
And I want it to look like:
Column Type has values: A B
Column Stat1 has values: 15 12
Column Stat2 has values: 2 6
Column Stat3 has values: 73 52
Column Stat4 has values: 12 17
Not all types have the same number of stats, some types are missing a stat value and others have extra. I tried using t(), but ran into issues. I then tried to combine all the values of Stat into one column and separate with gsub() and csplit(), but I had issues combining all the Stat values for each type into one column. Any advice?

We can use pivot_wider after creating a sequence column grouped by 'Type'
library(dplyr)
library(tidyr)
df1 %>%
group_by(Type) %>%
mutate(rn = str_c('Stats_', row_number())) %>%
ungroup %>%
pivot_wider(names_from = rn, values_from = Stats)
# A tibble: 2 x 5
# Type Stats_1 Stats_2 Stats_3 Stats_4
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 A 15 2 73 12
#2 B 12 6 52 17
Or using dcast from data.table
library(data.table)
dcast(setDT(df1), Type ~ paste0("Stats_", rowid(Type)), value.var = 'Stats')
Or as #Onyambu suggested in base R, it can be done with reshape
reshape(transform(df1, time = ave(Stats, Type,
FUN = seq_along)), dir="wide", idvar = "Type", sep = "_")
data
df1 <- data.frame(Type = rep(c("A", "B"), each = 4),
Stats = c(15, 2, 73, 12, 12, 6, 52, 17))

Related

Rearranging data according to rater and subject, simultaneously creating new row names

I have a dataset where multiple raters rate multiple subjects.
I'd like to rearrange the data that looks like this:
data <- data.frame(rater=c("A", "B", "C", "A", "B", "C"),
subject=c(1, 1, 1, 2, 2, 2),
measurment1=c(1, 2, 3, 4, 5,6),
measurment2=c(11, 22, 33, 44, 55,66),
measurment3=c(111, 222, 333, 444, 555, 666))
data
# rater subject measurment1 measurment2 measurment3
# 1 A 1 1 11 111
# 2 B 1 2 22 222
# 3 C 1 3 33 333
# 4 A 2 4 44 444
# 5 B 2 5 55 555
# 6 C 2 6 66 666
into data that looks like this:
data_transformed <- data.frame( A = c(1,11,111,4,44,444),
B = c(2,22,222,5,55,555),
C = c(3,33,333,6,66,666)
)
row.names(data_transformed) <- c("measurment1_1", "measurment2_1", "measurment3_1", "measurment1_2", "measurment2_2", "measurment3_2")
data_transformed
# A B C
# measurment1_1 1 2 3
# measurment2_1 11 22 33
# measurment3_1 111 222 333
# measurment1_2 4 5 6
# measurment2_2 44 55 66
# measurment3_2 444 555 666
In the new data frame, the raters (A, B and C) should become the columns. The measurement should become the rows and I'd also like to add the subject number as a suffix to the row-names.
For the rearranging one could probably use the pivot functions, yet I have no idea on how to combine the measurement-variables with the subject number.
Thanks for your help!
We could use pivot_longer, pivot_wider and unite from the tidyr package.
pivot_longer makes our data in a vertical format, it transforms the measurment columns into a sigle variable
pivot_wider does the opposite of pivot_longer, transform a variable into multiple columns for each unique value from the variable
data |>
pivot_longer(measurment1:measurment3) |>
pivot_wider(names_from = rater, values_from = value, values_fill = 0 ) |>
unite("measure_subjet",name,subject, remove = TRUE)
Please try the below code where we can accomplish the expected result using pivot_longer, pivot_wider and column_to_rownames.
library(tidyverse)
data_transformed <- data %>%
pivot_longer(c('measurment1', 'measurment2', 'measurment3')) %>%
mutate(rows = paste0(name, '_', subject)) %>%
pivot_wider(rows, names_from = rater, values_from = value) %>%
column_to_rownames(var = "rows")

Pivot wider dataframe with difficult structure dplyr

I was working on something I thought would be simple, but maybe today my brain isn't working. My data is like this:
tibble(metric = c('income', 'income_upp', 'income_low', 'n_house', 'n_house_upp', 'n_house_low'),
value = c(120, 140, 100, 10, 8, 12))
metric value
income 120
income_low 100
income_upp 140
n 10
n_low 8
n_upp 12
And I want to pivot_wider so it looks like this:
metric value value_low value_upp
income 120 100 140
n 10 8 12
I'm having trouble separating metrics, because pivot_wider as is, brings a dataframe that's too wide:
df %>% pivot_wider(names_from = 'metric', values_from = value)
How can I achieve this or should I pivot longer after the pivot wider?
Thanks!
I think if you convert metric into a column with "value", "value_upp" and "value_low" values, you can pivot_wider:
df %>%
mutate(param = case_when(str_detect(metric, "upp") ~ "value_upp",
str_detect(metric, "low") ~ "value_low",
TRUE ~ "value"),
metric = str_remove(metric, "_low|_upp")) %>%
pivot_wider(names_from = param, values_from = value)
I like to use separate() when I have text in a column like this. This function allows you to separate a column into multiple columns if there is a separator in the function.
In particular in this example we would want to use the arguments sep="_" and into = c("metric", "state") to convert into columns with those names.
Then mutate() and pivot_wider() can be used as you had previously specified.
library(tidyverse)
df <- tribble(~metric, ~value,
"income", 120,
"income_low", 100,
"income_upp", 140,
"n", 10,
"n_low", 8,
"n_upp", 12)
df |>
separate(metric, sep = "_", into = c("metric", "state")) |>
mutate(state = ifelse(is.na(state), "value", state)) |>
pivot_wider(id_cols = metric, names_from = state, values_from = value, names_sep = "_")
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2 rows [1, 4].
#> # A tibble: 2 × 4
#> metric value low upp
#> <chr> <dbl> <dbl> <dbl>
#> 1 income 120 100 140
#> 2 n 10 8 12
Created on 2022-12-21 with reprex v2.0.2
Note you can use the argument names_glue or names_prefix in pivot_wider() to add the "value" as a prefix to the column names.
a data.table approach (if you can live wit the trailing underacore achter value_
library(data.table)
setDT(df)
# create some new columns based on metric
df[, c("first", "second") := tstrsplit(metric, "_")]
# metric value first second
# 1: income 120 income <NA>
# 2: income_low 100 income low
# 3: income_upp 140 income upp
# 4: n 10 n <NA>
# 5: n_low 8 n low
# 6: n_upp 12 n upp
# replace NA with ""
df[is.na(df)] <- ""
# now cast to wide, createing colnames on the fly
dcast(df, first ~ paste0("value_", second), value.var = "value")
# first value_ value_low value_upp
# 1: income 120 100 140
# 2: n 10 8 12

sum across multiple columns of a data frame based on multiple patterns R

I have a data frame of multiple variables for for different years, that looks kind of like this:
df <- data.frame(name=c("name1", "name2", "name3", "name4"),
X1990=c(1,6,8,NA),
X1990.1=c(10,20,NA,2),
X1990.2=c(2,4,6,8),
X1990.3=c(1,NA,3,6),
X1990.4=c(8,7,5,4),
X1991=c(2,6,3,5),
X1991.1=c(NA,20,NA,2),
X1991.2=c(NA,NA,NA,NA),
X1991.3=c(1,NA,3,5),
X1991.4=c(8,9,6,3))
I made this example with only 5 variables per year and with only 2 year, but in reality is a much larger df, with tens of variables for the years 1990 to 2020.
I want to create a new dataframe with the sums all the columns for the same year, so that the new data frame looks like this:
df_sum <- data.frame(name=c("name1", "name2", "name3", "name4"),
X1990=c(22, 37, 22, 20),
X1991=c(11,35,12,15))
I was thinking some loop over rowSums(across(matches('pattern')), na.rm = TRUE) that I found on another questions, but so far have not been successful to implement.
Thanks!
We can reshape to 'long' format with pivot_longer, and get the sum while reshaping back to 'wide'
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with("X"), names_to = "name1") %>%
mutate(name1 = str_remove(name1, "\\.\\d+$")) %>%
pivot_wider(names_from = name1, values_from = value,
values_fn = ~ sum(.x, na.rm = TRUE))
-output
# A tibble: 4 × 3
name X1990 X1991
<chr> <dbl> <dbl>
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15
Or in base R, use split.default to split the data into a list of datasets based on the column name pattern, get the rowSums and cbind with the first column
cbind(df[1], sapply(split.default(df[-1],
trimws(names(df)[-1], whitespace = "\\.\\d+")), rowSums, na.rm = TRUE))
name X1990 X1991
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15

pivot_wider() generates new dataframe filled with NULL values and other misprinted values [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
I am using pivot_wider() in an attempt to transform this dataframe.
subject_id test_name test_result test_unit
12 Spanish 100 print
12 English 99 online
13 Spanish 98 print
13 English 91 print
Into:
subject_id spanish_test english_test
12 100 99
13 98 91
I used pivot_wider with the following code:
test %>%
pivot_wider(id_cols = subject_id,
names_from = Test_Name,
values_from = Test_Unit)
And I got the individual test columns generated, however, they were filled with the units or NULL values. Here is the dataframe for reference:
subject_id <- c(12, 12, 13, 13)
test_name <- c("Spanish", "English", "Spanish", "English")
test_result <- c(100, 99, 98, 91)
test_unit <- c("print", "online", "print", "print")
df <- data.frame(subject_id, test_name, test_result, test_unit)
You can use pivot_wider as -
tidyr::pivot_wider(df,
id_cols = subject_id,
names_from = test_name,
values_from = test_result,
names_glue = '{test_name}_test')
# subject_id Spanish_test English_test
# <dbl> <dbl> <dbl>
#1 12 100 99
#2 13 98 91
An alternative using reshape2 along with dplyr to rename the columns.
library(reshape2)
library(dplyr)
reshape2::dcast(df, subject_id ~ test_name,
value.var = "test_result") %>%
dplyr::rename_at(vars(Spanish:English), list( ~ paste0(., "_test"))) %>%
dplyr::rename_all(tolower) %>%
dplyr::select(subject_id, spanish_test, english_test)
Output
subject_id spanish_test english_test
1 12 100 99
2 13 98 91

How to I use map() to add a grouped index to a column of data frames?

I have data which has two measurements of on two different groups with a number of samples for each. I simple version with 6 samples each looks like this:
library(tidyverse)
df <- tibble(group = c(rep("group_A", 12), rep("group_B", 12)),
sample = rep(1:6, 4),
measurement = rep(c(rep("meas_A", 6), rep("meas_B", 6)), 2),
value = round(runif(24, min = 0, max = 60)))
but because the measurements have been repeated in different conditions it is actually a series of similar data frames represented in a list:
df2 <- bind_rows(df,df,df,df) %>%
mutate(condition = c(rep("One", 24), rep("Two", 24),
rep("Three", 24), rep("Four", 24))) %>%
unite(group_meas, group, measurement) %>%
nest(-condition)
Ultimately I'd like to reshape each data frame into a wide format so that vectors of the two measurements for each group can be easily extracted from single columns for statistical comparison. For example:
df %>% unite(group_meas, group, measurement)
%>% spread(group_meas, value)
which can be mapped down the list like so:
df2 %>% mutate(data = map(data, ~spread(.x, group_meas, value)))
My problem arises when a sample has been measured more than once and then spread() does not work because there are
Duplicate identifiers for rows
I figure the best way around this is to add a new index column grouped on the combined group/measurement and this will provide unique row identifiers. This works for a single data frame.
df %>% unite(group_meas, group, measurement) %>%
group_by(group_meas) %>%
mutate(gr_m_index = row_number())
However I cannot scale it to map down a list.
df2 %>% mutate(data = map(data, ~ group_by(.x, group_meas) %>%
mutate(gr_m_index = row_number())))
I think this must be a tidyeval thing as I get the following error suggesting it is looking in the wrong place.
Evaluation error: Column gr_m_index must be length 24 (the number of
rows) or one, not 4.
How to I use map() to add a grouped index to a column of data frames?
As I understand it, based on the error message, row_number() was returning c(1, 2, 3, 4). This is because the number of rows was counted based on df2, rather than the nested data frames.
Either approach below should work:
Approach 1. Define all the transformations to be mapped as a standalone function.
index_spread <- function(data){
return(data %>%
group_by(group_meas) %>%
mutate(gr_m_index = row_number()) %>%
spread(group_meas, value))
}
df2 %>% mutate(data = map(data, index_spread)) %>% unnest()
# A tibble: 24 x 7
condition sample gr_m_index group_A_meas_A group_A_meas_B group_B_meas_A group_B_meas_B
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 One 1 1 12 43 39 52
2 One 2 2 11 60 8 20
3 One 3 3 41 23 16 29
4 One 4 4 23 47 23 36
5 One 5 5 46 56 1 30
6 One 6 6 30 13 23 11
7 Two 1 1 12 43 39 52
8 Two 2 2 11 60 8 20
9 Two 3 3 41 23 16 29
10 Two 4 4 23 47 23 36
# ... with 14 more rows
Approach 2. Perform the transformations on df2$data, & assign the list of transformed data frames back to the original.
df2$data <- map(df2$data, ~group_by(.x, group_meas) %>%
mutate(gr_m_index = row_number()) %>%
spread(group_meas, value))
df2 %>% unnest()
# (same output as above)

Resources