reformat data frame in R - r

I am new to R.
I need to reformat the following data frame:
`Sample Name` `Target Name` 'CT values'
<chr> <chr> <dbl>
1 Sample 1 actin 19.69928
2 Sample 1 Ho-1 27.71864
3 Sample 1 Nrf-2 26.00012
9 Sample 9 Ho-1 25.31180
10 Sample 9 Nrf-2 26.41421
11 Sample 9 C3 26.16980
...
15 Sample 1 actin 19.49202
Actually, I want to have the different 'Target Names' as column names, and the individual 'Sample Names' as row names. The table should then display the respective CT values.
But note that there are duplicates, e.g., Sample 1 exists twice, as the corresponding Target name, e.g. "actin" does. What I want to have is that the table later only shows these duplicates once, with the means of the two different CT values.
I guess this is a very basic R data frame manipulation, but as I said, I am quite new to R and messing around with different tutorials.
Thank you very much in advance!

One way of doing that using the tidyverse ecosystem of packages:
library(tidyverse)
tab <- tribble(
~`Sample Name`, ~`Target Name`, ~ `CT values`,
"Sample 1", "actin", 19.69928,
"Sample 1", "Ho-1", 27.71864,
"Sample 1", "Nrf-2", 26.00012,
"Sample 9", "Ho-1", 25.31180,
"Sample 9", "Nrf-2", 26.41421,
"Sample 9", "C3", 26.16980,
"Sample 1", "actin", 19.49202
)
tab %>%
# calculate the mean of your dpulicate
group_by(`Sample Name`, `Target Name`) %>%
summarise(`CT values` = mean(`CT values`)) %>%
# reshape the data
spread(`Target Name`, `CT values`)
#> # A tibble: 2 x 5
#> # Groups: Sample Name [2]
#> `Sample Name` actin C3 `Ho-1` `Nrf-2`
#> * <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Sample 1 19.6 NA 27.7 26.0
#> 2 Sample 9 NA 26.2 25.3 26.4
you can also use data.table to a more consise way of doing this with
dcast reshape function
library(data.table)
#>
#> Attachement du package : 'data.table'
#> The following objects are masked from 'package:dplyr':
#>
#> between, first, last
#> The following object is masked from 'package:purrr':
#>
#> transpose
setDT(tab)
dcast(tab, `Sample Name` ~ `Target Name`, fun.aggregate = mean)
#> Using 'CT values' as value column. Use 'value.var' to override
#> Sample Name C3 Ho-1 Nrf-2 actin
#> 1: Sample 1 NaN 27.71864 26.00012 19.59565
#> 2: Sample 9 26.1698 25.31180 26.41421 NaN
Created on 2018-01-13 by the reprex package (v0.1.1.9000).

Related

Create date of "X" column, when I have age in days at "X" column and birth date column in R

I'm having some trouble finding out how to do a specific thing in R.
In my dataset, I have a column with the date of birth of participants. I also have a column giving me the age in days at which a disease was diagnosed.
What I want to do is to create a new column showing the date of diagnosis. I'm guessing it's a pretty easy thing to do since I have all the information needed, basically it's birth date + X number of days = Date of diagnosis, but I'm unable to figure out how to do it.
All of my searches give me information on the opposite, going from date to age. So if you're able to help me, it would be much appreciated!
library(tidyverse)
library(lubridate)
df <- tibble(
birth = sample(seq("1950-01-01" %>%
as.Date(),
today(), by = "day"), 10, replace = TRUE),
age = sample(3650:15000, 10, replace = TRUE)
)
df %>%
mutate(diagnosis_date = birth %m+% days(age))
#> # A tibble: 10 x 3
#> birth age diagnosis_date
#> <date> <int> <date>
#> 1 1955-01-16 6684 1973-05-05
#> 2 1958-11-03 6322 1976-02-24
#> 3 2007-02-23 4312 2018-12-14
#> 4 2002-07-11 8681 2026-04-17
#> 5 2021-12-28 11892 2054-07-20
#> 6 2017-07-31 3872 2028-03-07
#> 7 1995-06-30 14549 2035-04-30
#> 8 1955-09-02 12633 1990-04-04
#> 9 1958-10-10 4534 1971-03-10
#> 10 1980-12-05 6893 1999-10-20
Created on 2022-06-30 by the reprex package (v2.0.1)

Pivot_longer and Pivot wider syntax

I want to ask for ideas on creating a syntax to pivot_longer given on this.
I've already tried researching in the internet but I can't seem to find any examples that is similar to my data given where it has a Metric column which is also seperated in 3 different columns of months.
My desire final output is to have seven columns consisting of (regions,months, and the five Metrics)
How to formulate the pivot_longer and pivot_wider syntax to clean my data in order for me to visualize it?
The tricky part isn't pivot_longer. You first have to clean your Excel spreadsheet, i.e. get rid of empty rows and merge the two header rows containing the names of the variables and the dates.
One approach to achieve your desired result may look like so:
library(readxl)
library(tidyr)
library(janitor)
library(dplyr)
x <- read_excel("data/Employment.xlsx", skip = 3, col_names = FALSE) %>%
# Get rid of empty rows and cols
janitor::remove_empty()
# Make column names
col_names <- data.frame(t(x[1:2,])) %>%
fill(1) %>%
unite(name, 1:2, na.rm = TRUE) %>%
pull(name)
x <- x[-c(1:2),]
names(x) <- col_names
# Convert to long and values to numerics
x %>%
pivot_longer(-Region, names_to = c(".value", "months"), names_sep = "_") %>%
separate(months, into = c("month", "year")) %>%
mutate(across(!c(Region, month, year), as.numeric))
#> # A tibble: 6 × 8
#> Region month year `Total Population … `Labor Force Part… `Employment Rat…
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Philippin… April 2020f 73722. 55.7 82.4
#> 2 Philippin… Janu… 2021p 74733. 60.5 91.3
#> 3 Philippin… April 2021p 74971. 63.2 91.3
#> 4 National … April 2020f 9944. 54.2 87.7
#> 5 National … Janu… 2021p 10051. 57.2 91.2
#> 6 National … April 2021p 10084. 60.1 85.6
#> # … with 2 more variables: Unemployment Rate <dbl>, Underemployment Rate <dbl>

Generate codes based on Nominal Variables present in a dataframe

I have a data frame that has 1000 observations and it has this structure below.
Town <- c("TownA", "TownB", "TownC","TownD","Town A", "Town Z")
Ward <- c("Ward B","Ward Z","Ward A","Ward W","Ward X", "Ward ")
DF <- data.frame(Town, Ward)
I have another dataset that contains codes that represent the nominal observations of Town and Ward. The codes are the ones to be used for analysis. For example, Town A has the code 23, Town B has the code 15, Town Z has the code 7. Instead of manually creating a new column and populating the codes based on towns, is there a simpler way to do this in R?
My goal is to mutate a new column that will match the codes with the towns. The dataset has around 200 Towns.
You can create a new code table and then do joining:
library(tidyverse)
Town <- c("TownA", "TownB", "TownC","TownD","Town A", "Town Z")
Ward <- c("Ward B","Ward Z","Ward A","Ward W","Ward X", "Ward ")
DF <- data.frame(Town, Ward)
codes <- tribble(
~Town, ~Code,
"TownA", 23,
"TownB", 15,
"Town Z", 7
)
codes
#> # A tibble: 3 × 2
#> Town Code
#> <chr> <dbl>
#> 1 TownA 23
#> 2 TownB 15
#> 3 Town Z 7
DF %>%
left_join(codes)
#> Joining, by = "Town"
#> Town Ward Code
#> 1 TownA Ward B 23
#> 2 TownB Ward Z 15
#> 3 TownC Ward A NA
#> 4 TownD Ward W NA
#> 5 Town A Ward X NA
#> 6 Town Z Ward 7
Created on 2021-09-20 by the reprex package (v2.0.1)

Finding the mean of two columns with two different classes/labels

right now I'm trying to create a data frame that contains the mean of two columns for two separate labels/categories.
But, I don't know how to calculate the mean for two columns, it just returns the same mean for both winner and opponent/loser.
Currently, I'm using the tidyverse library.
Here is the original data frame:
winner_hand winner_ht winner_ioc winner_age opponent_hand opponent_ht opponent_ioc opponent_age result name
<chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <fct> <chr>
R 178 JPN 29.00479 R NA RUS 22.88569 winner Kei Nishikori
R NA RUS 22.88569 R 188 FRA 33.70568 winner Daniil Medvedev
R 178 JPN 29.00479 R 188 FRA 31.88227 winner Kei Nishikori
R 188 FRA 33.70568 R NA AUS 19.86858 winner Jo Wilfried Tsonga
R NA RUS 22.88569 R 196 CAN 28.01095 winner Daniil Medvedev
R 188 FRA 31.88227 R NA JPN 26.40383 winner Jeremy Chardy
My code:
age_summary <- game_data %>%
group_by(result) %>%
summarize(mean_age = mean(winner_age))
age_summary
Resulting Data frame:
result mean_age
<fct> <dbl>
winner 27.68495
loser 27.68495
If you want summaries from two columns, you need expressions for each column in the call to summarize().
Example with fake data, since your excerpt only has one value for the 'result' column:
library(tidyverse)
dat <- read_csv(
"result, winner_age, opponent_age
A, 5, 10
A, 6, 11,
B, 12, 2
B, 13, 1")
dat %>%
group_by(result) %>%
# note: two expressions here:
summarise(mean_winner_age = mean(winner_age),
mean_opponent_age = mean(opponent_age))
output:
# A tibble: 2 x 3
result mean_winner_age mean_opponent_age
<chr> <dbl> <dbl>
1 A 5.5 10.5
2 B 12.5 1.5

Tableau LOD R Equivalent

I'm using a Tableau Fixed LOD function in a report, and was looking for ways to mimic this functionality in R.
Data set looks like:
Soldto<-c("123456","122456","123456","122456","124560","125560")
Shipto<-c("123456","122555","122456","124560","122560","122456")
IssueDate<-as.Date(c("2017-01-01","2017-01-02","2017-01-01","2017-01-02","2017-01-01","2017-01-01"))
Method<-c("Ground","Ground","Ground","Air","Ground","Ground")
Delivery<-c("000123","000456","000123","000345","000456","000555")
df1<-data.frame(Soldto,Shipto,IssueDate,Method,Delivery)
What I'm looking to do is "For each Sold-to/Ship-to/Method count the number of unique delivery IDs".
The intent is to find the number of unique deliveries that could potentially be "aggregated."
In Tableau that function looks like:
{FIXED [Soldto],[Shipto],[IssueDate],[Method],:countd([Delivery])
Could this be done with aggregate or summarize as in an example below:
df.new<-ddply(df,c("Soldto","Shipto","Method"),summarise,
Deliveries = n_distinct(Delivery))
This is fairly easy with dplyr. You are looking for the number of unique delivery for each combination of soldto, shipto and method, which is just group_by and then summarise:
library(tidyverse)
tbl <- tibble(
soldto = c("123456","122456","123456","122456","124560","125560"),
shipto = c("123456","122555","122456","124560","122560","122456"),
issuedate = as.Date(c("2017-01-01","2017-01-02","2017-01-01","2017-01-02","2017-01-01","2017-01-01")),
method = c("Ground","Ground","Ground","Air","Ground","Ground"),
delivery = c("000123","000456","000123","000345","000456","000555")
)
tbl %>%
group_by(soldto, shipto, method) %>%
summarise(uniques = n_distinct(delivery))
#> # A tibble: 6 x 4
#> # Groups: soldto, shipto [?]
#> soldto shipto method uniques
#> <chr> <chr> <chr> <int>
#> 1 122456 122555 Ground 1
#> 2 122456 124560 Air 1
#> 3 123456 122456 Ground 1
#> 4 123456 123456 Ground 1
#> 5 124560 122560 Ground 1
#> 6 125560 122456 Ground 1
Created on 2018-03-02 by the reprex package (v0.2.0).

Resources