How do I pivot groups of columns of related data in R? - r

I have a data set where each observation has 3 variables: time, success (yes or no), damage( yes or no).
but each participant was try 3 times each of the 3 different methods (A,B,C).
But the organization of the data is the worst I could imagine: The data was collected in a way that each row is one participant, but I need each row to be a different observation.
The raw data looks something like this:
And I want something like this:
I tried by using pivot longer() but this works only for individual columns, if I do it here I just get repetitions of observations and the overlap and get all scrambled.

I think, you should do something like this:
I have created an example data frame similar to yours:
library(tidyverse)
df <- data.frame(
c(1, 2),
c("M", "F"),
c(10, 20),
c(15, 25),
c(12, 13), c("yes", "no"), c("yes", "no"),
c(22, 25), c("yes", "no"), c("no", "yes"),
c(55, 40), c("no", "yes"), c("yes", "no"),
c(39, 68), c("yes", "no"), c("yes", "no")
)
colnames(df) <-
c("participant", "Gender", "P. info 1", "P. info 2",
"time A1", "success A1", "demage A1",
"time A2", "success A2", "demage A2",
"time B1", "success B1", "demage B1",
"time B2", "success B2", "demage B2")
Some gather/spread manipulations and you receive desired output:
df <- df %>%
gather(key = "Experiment", value = "Value", -c(1:4)) %>%
separate(col = "Experiment",
into = c("Measure", "Method")) %>%
separate(col = "Method",
into = c("Method", "# try"), sep = 1) %>%
spread(key = "Measure", value = "Value")

Related

change ggplot2 y-axis values

Please help me. I have the following data in R: I have values of three groups of organisms from day 0 to day 7 which represent the mean of populations for these groups for each day.
Here is my data:
https://docs.google.com/spreadsheets/d/15-XXT6jOSKZs0FS14FScnHMm0Qd19N-x/edit#gid=377184551
And was trying to follow an example on the following page; https://statisticsglobe.com/plot-all-columns-of-data-frame-in-r, but the graphs I get give data value on the y axis and also the ploted lines are joined. I would like to have separate lines for each of the groups (the three groups) and also have a scale on the y axis instead of plot values. Plotting individual values for each of the groups gives me the same values on the y-axis instead of a scale. I would however like the y-axis values to begin with values of Day 0 and keep ascending upwards into until Day 7 unlike the mixed case I have right now. The code I used is as follows:
Data and code
growth <- data.frame(
stringsAsFactors = FALSE,
day = c("Day 0","Day 1","Day 2",
"Day 3","Day 4","Day 5","Day 6","Day 7"),
wild_type = c(6, 9.8, 69.53, 84.67, 99.33, 145.33, 147.33, 121.8),
t7_cas9 = c(6, 8.57, 68.83, 85.5, 98.25, 144.67, 137.5, 120.5),
ip6k = c(6, 6.5, 49.67, 56, 70.5, 127.5, 123.67, 111.33)
)
data_ggp <- data.frame(x = growth$day,
y = c(growth$wild_type, growth$t7_cas9, growth$ip6k),
group = c(rep("Wild_Type", nrow(growth)),
rep("T7_Cas9", nrow(growth)),
rep("IP6K-+", nrow(growth))))
ggp <- ggplot(data_ggp, aes(x, y, col = group, group = 1)) +
geom_line()
ggp
p1 <- ggp + facet_grid(group ~ .)
p1
However, what I would like to have is:
Are you looking for such a solution:
library(tidyverse)
df %>%
pivot_longer(-Day) %>%
ggplot(aes(x = Day, y = value, group=name, color = name))+
geom_line(size=1)
AND with facet
library(tidyverse)
df %>%
pivot_longer(-Day) %>%
ggplot(aes(x = Day, y = value, group=name, color = name))+
geom_line(size=1)+
facet_grid(name ~ .)
data:
df <- structure(list(Day = c("Day 0", "Day 1", "Day 2", "Day 3", "Day 4",
"Day 5", "Day 6", "Day 7"), Wild_Type = c(6, 9.8, 69.53, 84.67,
99.33, 145.33, 147.33, 121.8), T7_Cas9 = c(6, 8.57, 68.83, 85.5,
98.25, 144.67, 137.5, 120.5), IP6K = c(6, 6.5, 49.67, 56, 70.5,
127.5, 123.67, 111.33)), class = "data.frame", row.names = c(NA,
-8L))
Try:
scale_y_continuous(breaks = seq(1, 7, 1), limits = c(0, 7), labels = c())
I guess you could play around with the labels argument; also not sure about your data, but some transformations (eg log) may help to separate the data better!

Combine or merge values

I would like to combine/merge my values into each other and become one value in R. For instance, to combine 1+3, 5+6, 10+11, 12+13. Does anyone know how to do that? :-)
tibble::tibble(
Educational_level = c(1, 3, 5, 6, 10, 11, 12, 13)
This is what I have tried, but it do not merge the factors that I would like when I run the linear regression.
ess7no <- ess7no %>%
mutate(edlvdno = as_factor(edlvdno)) %>%
mutate(edlvdno = recode(edlvdno, "1" = "3" , "5" = "6", "10" = "11", "12" = "13"))
df <- tibble(
Educational_level = c(1, 3, 5, 6, 10, 11, 12, 13),
Labels = c(
"Not graduated", "Primary school", "High school", "High school",
"Bachelor", "Bachelor", "Master", "Master"
)
)
library(dplyr)
df %>%
mutate(Labels = ifelse(Labels %in% c(
"Not graduated",
"Primary school"
), stringr::str_c("Not graduated", "_", "Primary school"), Labels)) %>%
group_by(Labels) %>%
summarise(Educational_level = sum(Educational_level))

Create a new column based on matches in another table, but bring in other columns

I have 2 data frames that look something like this (it is a count table). data1 has a column called "Method_Final" that I would like to add into data2. I want to match it based on ONLY columns Method1, Method2, Method3 (the Col1, Col2, Col3, and Count columns don't need to match, but I want them to be brought into the final dataframe). If there is a match on those 3 columns, take the Method_Final from data1 and put it in data2. If there is no match, then make the value "Not Determined". I have an example of what I'm looking for in the data frame below called data_final.
data1 <- data.frame("Col1" = c("ABC", "ABC", "EFG", "XYZ"), "Col2" = c("AA", "AA",
"AA", "BB"),"Col3" = c("Al", "B", "Al", "Al"), "Method1" =
c("Sample", "Dry", "Sample", "Sample"), "Method2" = c("Blank", "Not Blank", "Blank",
"Not Blank"), "Method3" = c("Yes", "Yes", "No", "No"), "Count" = c(1, 4, 6, 2),
"Method_Final" = c("AAR", "ARG", "PCO", "YRG"))
data2 <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2" =
c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"),
"Method1" = c("Sample", "Dry", "Sample", "Sample", "Dry", "Bucket"), "Method2" =
c("Blank", "Not Blank", "Blank", "Not Blank", "Not Blank", "Not Blank"), "Method3" =
c("Yes", "Yes", "Yes", "No", "No", "Yes"), "Count" = c(1, 4, 5, 6, 2, 1))
I would like to create a new data frame that looks like this with what I described above:
data_final <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2"
= c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"),
"Method1" = c("Sample", "Dry", "Sample", "Sample", "Dry", "Bucket"), "Method2" =
c("Blank", "Not Blank", "Blank", "Not Blank", "Not Blank", "Not Blank"), "Method3" =
c("Yes", "Yes", "Yes", "No", "No", "Yes"), "Count" = c(1, 4, 5, 6, 2, 1),
"Method_Final" = c("AAR", "ARG", "Not Determined", "YRG", "Not Determined", "Not
Determined"))
This should be possible by:
left joining data1 into data2 on just Method 1-3 (only these need to match)
Removing the extra Col 1-3 and count (I just give them the suffix _remove and remove them after the joining)
Replace NAs with Not determined
Under is an example. Note that I don't get the exact same result as you do in data_final, but I believe I have captured the logic you want.
library(dplyr, warn.conflicts = FALSE)
data2 %>%
left_join(
data1,
by = c("Method1", "Method2", "Method3"),
keep = FALSE,
suffix = c("", "_remove")
) %>%
select(-contains("_remove")) %>%
tidyr::replace_na(
list("Method_Final" = "Not Determined")
)
#> Col1 Col2 Col3 Method1 Method2 Method3 Count Method_Final
#> 1 ABC AA Al Sample Blank Yes 1 AAR
#> 2 ABC AA B Dry Not Blank Yes 4 ARG
#> 3 ABC CC C Sample Blank Yes 5 AAR
#> 4 EFG AA Al Sample Not Blank No 6 YRG
#> 5 XYZ BB Al Dry Not Blank No 2 Not Determined
#> 6 XYZ CC C Bucket Not Blank Yes 1 Not Determined

Create new column based on matches in another table/merging

I have 2 data frames that look something like this (it is a count table). data1 has a column called "Method_Final" that I would like to add into data2. I want to match it based on ONLY columns Method1, Method2, Method3 (the Col1, Col2, Col3, and Count columns don't need to match, but I want them to be brought into the final dataframe). If there is a match on those 3 columns, take the Method_Final from data1 and put it in data2. If there is no match, then make the value "Not Determined".
Note: data1 has rows that are not in data2. I would only like rows that are in data2 to be in my final table. Any rows that are in data1 that are not in data2 should be removed.
I have an example of what I'm looking for in the data frame below called data_final.
data1 <- data.frame("Col1" = c("ABC", "ABC", "EFG", "XYZ", "ZZZ"), "Col2" = c("AA",
"AA","AA", "BB", "AA"), "Col3" = c("Al", "B", "Al", "Al", "B"), "Method1" =
c("Sample", "Dry", "Sample", "Sample", "Dry"), "Method2" = c("Blank", "Not Blank",
"Blank", "Not Blank", "Not Blank"), "Method3" = c("Yes", "Yes", "No", "No", "No"),
"Count" = c(1, 4, 6, 2, 4), "Method_Final" = c("AAR", "ARG", "PCO", "YRG", "ZYX"))
data2 <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2" =
c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"),
"Method1" = c("Sample", "Dry", "Sample", "Sample", "Dry", "Bucket"), "Method2" =
c("Blank", "Not Blank", "Blank", "Not Blank", "Not Blank", "Not Blank"), "Method3" =
c("Yes", "Yes", "Yes", "No", "No", "Yes"), "Count" = c(1, 4, 5, 6, 2, 1))
This is the data set I would like to end up with:
data_final <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2"
= c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"),
"Method1" = c("Sample", "Dry", "Sample", "Sample", "Dry", "Bucket"), "Method2" =
c("Blank", "Not Blank", "Blank", "Not Blank", "Not Blank", "Not Blank"), "Method3" =
c("Yes", "Yes", "Yes", "No", "No", "Yes"), "Count" = c(1, 4, 5, 6, 2, 1),
"Method_Final" = c("AAR", "ARG", "AAR", "YRG", "Not Determined", "Not Determined"))
I've tried various joins using dplyr (left_join, right_join, etc.) and I can't figure it out.
Thank you so much!
You can do:
library(tidyverse)
data2 %>%
left_join(data1 %>%
select(starts_with('Method')),
by = paste0('Method', 1:3)) %>%
mutate(Method_Final = if_else(is.na(Method_Final), 'Not determined', Method_Final))
which gives:
Col1 Col2 Col3 Method1 Method2 Method3 Count Method_Final
1 ABC AA Al Sample Blank Yes 1 AAR
2 ABC AA B Dry Not Blank Yes 4 ARG
3 ABC CC C Sample Blank Yes 5 AAR
4 EFG AA Al Sample Not Blank No 6 YRG
5 XYZ BB Al Dry Not Blank No 2 ZYX
6 XYZ CC C Bucket Not Blank Yes 1 Not determined
Note that this differs from your expected output for the fifth row. Can you please check, what the Method_Fianl value should be here? Since there is a value for it in data1, it shouldn‘t be 'Not determined'.
You can try this. Combine the fields to match into one:
dm1 <- paste(data1$Method1, data1$Method2, data1$Method3, sep="|")
dm2 <- paste(data2$Method1, data2$Method2, data2$Method3, sep="|")
Now match the two:
m <- match(dm2, dm1)
# will return NA where not matching
Get the Method_Final from data1 where it matches:
data2$Method_Final <- as.character(data1$Method_Final[m])
Where NA, make it "Not Determined":
data2$Method_Final[is.na(data2$Method_Final)] <- "Not Determined"
Result is the same as #deschen's above.

Create a table of summary statistics (with p.value) with sub-levels (long list)

I am needing to conduct inferential analysis of a list of 21 countries comparing results (numeric variable) between gender. I have already created a pivot-long dataset with the following variables: Gender, Country, Results (numeric).
I am using gtsummary::tbl_strata and gtsummary::tbl_summary. I could not create a nesting to run each country individually. Also, the output has been returning n(%) counts for the countries (table in wide format); calculating the result variable as overall.
I have put the tabular structure I want below.
I could even generate individual tables and stack them. However, I would like a more rational strategy.
Code
library(tidyverse)
library(gtsummary)
# dataframe
df <-
data.frame(
Country = c("Country 1", "Country 2", "Country 3",
"Country 1", "Country 2", "Country 3",
"Country 1", "Country 2", "Country 3",
"Country 1", "Country 2", "Country 3"),
Gender = c("M", "M", "M",
"W", "W", "W",
"M", "M", "M",
"W", "W", "W"),
Results = c(53, 67, 48,
56, 58, 72,
78, 63, 67,
54,49,62))
df
# Table
Table <- df %>%
select(c('Gender',
'Country',
'Results')) %>%
tbl_strata(
strata = Country,
.tbl_fun =
~.x %>%
tbl_summary(by = Gender,
missing = "no") %>%
bold_labels() %>%
italicize_levels() %>%
italicize_labels())
Table
Here's how you can get that table:
remotes::install_github("ddsjoberg/gtsummary")
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.3.7.9004'
library(tidyverse)
df <-
data.frame(
Country = c("Country 1", "Country 2", "Country 3",
"Country 1", "Country 2", "Country 3",
"Country 1", "Country 2", "Country 3",
"Country 1", "Country 2", "Country 3"),
Gender = c("M", "M", "M",
"W", "W", "W",
"M", "M", "M",
"W", "W", "W"),
Results = c(53, 67, 48,
56, 58, 72,
78, 63, 67,
54,49,62))
theme_gtsummary_mean_sd()
tbl <-
df %>%
nest(data = -Country) %>%
rowwise() %>%
mutate(
tbl =
data %>%
tbl_summary(
by = Gender,
type = Results ~ "continuous",
statistic = Results ~ "{mean} ± {sd}",
label = list(Results = Country)
) %>%
add_p() %>%
modify_header(list(
label ~ "**Country**",
all_stat_cols() ~ "**{level}**"
)) %>%
list()
) %>%
pull(tbl) %>%
tbl_stack() %>%
modify_spanning_header(all_stat_cols() ~ "**Gender**")
Created on 2021-03-05 by the reprex package (v1.0.0)

Resources