Wide to long without having an X in front of variables

Wide to long without having an X in front of variables - r

I have my data in the wide-format
what is the easiest way to change it to long without having a X in front of the time variables
Sample data:
structure(list(X1 = c("01/12/2019", "02/12/2019"), `00:30` = c(41.95,
39.689), `01:00` = c(44.96, 40.47), `01:30` = c(42.939, 38.95
), `02:00` = c(43.221, 40.46), `02:30` = c(44.439, 41.97)), class = "data.frame", row.names = c(NA,
-2L), spec = structure(list(cols = list(X1 = structure(list(), class = c("collector_character",
"collector")), `00:30` = structure(list(), class = c("collector_double",
"collector")), `01:00` = structure(list(), class = c("collector_double",
"collector")), `01:30` = structure(list(), class = c("collector_double",
"collector")), `02:00` = structure(list(), class = c("collector_double",
"collector")), `02:30` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))

with pivot_longer and pivot_wider from tidyr:
dat |>
pivot_longer(names_to="time",values_to="val",`00:30`:`02:30`) |>
pivot_wider(names_from="X1", values_from="val")
Output:
# A tibble: 5 x 3
time `01/12/2019` `02/12/2019`
<chr> <dbl> <dbl>
1 00:30 42.0 39.7
2 01:00 45.0 40.5
3 01:30 42.9 39.0
4 02:00 43.2 40.5
5 02:30 44.4 42.0

I this special case, you could transpose the part of your data.frame containing numbers and assign the column names:
df_new <- data.frame(t(df[,-1]))
colnames(df_new) <- df[, 1]
This returns a data.frame df_new:
01/12/2019 02/12/2019
00:30 41.950 39.689
01:00 44.960 40.470
01:30 42.939 38.950
02:00 43.221 40.460
02:30 44.439 41.970
Edit (Thanks to jay.sf)
For versions of R >= 4.1, you could use the natural pipe:
t(df[, -1]) |>
data.frame() |>
`colnames<-`(df[, 1])

Related

Structure of a for loop

I am learning how to create a function in R, but I am struggling to understand how to write for loop. My understanding is that
for (item I list_items) {
do_something(itemn)
}
I would like to write a for loop to replace with 333 the cells that are equal with 123. So the item is 123 and the list of items is the df from sec1 till sec4.
Could somebody explain this to me, please? And how this can be included in a function?
Sample code:
structure(list(sec1 = c(1, 123, 1), sec2 = c(123, 1, 1), sec3 = c(123,
0, 0), sec4 = c(1, 123, 1)), spec = structure(list(cols = list(
sec1 = structure(list(), class = c("collector_double", "collector"
)), sec2 = structure(list(), class = c("collector_double",
"collector")), sec3 = structure(list(), class = c("collector_double",
"collector")), sec4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
-3L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))

We do not need a for loop here:
df[df==123]<-333
If we really need for loops:
for(i in 1:ncol(df)){
df[i][df[i]==123] <-333
}
output
df
# A tibble: 3 x 4
sec1 sec2 sec3 sec4
<dbl> <dbl> <dbl> <dbl>
1 1 333 333 1
2 333 1 0 333
3 1 1 0 1

Here's how it would work for one column of your data:
dat <- structure(list(sec1 = c(1, 123, 1),
sec2 = c(123, 1, 1),
sec3 = c(123, 0, 0),
sec4 = c(1, 123, 1)),
spec = structure(list(cols = list(
sec1 = structure(list(),
class = c("collector_double", "collector")),
sec2 = structure(list(),
class = c("collector_double","collector")),
sec3 = structure(list(),
class = c("collector_double", "collector")),
sec4 = structure(list(),
class = c("collector_double","collector"))),
default = structure(list(),
class = c("collector_guess","collector")),
delim = ","), class = "col_spec"),
row.names = c(NA,-3L), class =
c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
for(i in 1:nrow(dat)){
dat$sec1[i] <- ifelse(dat$sec1[i] == 123, 333, dat$sec1[i])
}
dat
#> sec1 sec2 sec3 sec4
#> 1 1 123 123 1
#> 2 333 1 0 123
#> 3 1 1 0 1
Created on 2022-01-31 by the reprex package (v2.0.1)
To replace all of them, using for loops, you could do a double loop over columns and rows.
for(j in names(dat)){
for(i in 1:nrow(dat)){
dat[[j]][i] <- ifelse(dat[[j]][i] == 123, 333, dat[[j]][i])
}
}
Of course, as others have identified, you certainly don't need a for loop to accomplish this.

in addition to DaveArmstrong Answer this would work for all rows and columns:
dat <- structure(list(sec1 = c(1, 123, 1), sec2 = c(123, 1, 1), sec3 = c(123,
0, 0), sec4 = c(1, 123, 1)), spec = structure(list(cols = list(
sec1 = structure(list(), class = c("collector_double", "collector"
)), sec2 = structure(list(), class = c("collector_double",
"collector")), sec3 = structure(list(), class = c("collector_double",
"collector")), sec4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
-3L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
for(i in 1:nrow(dat)){
for(j in 1:ncol(dat)){
dat[i,j] <- ifelse(dat[i,j] == 123, 333, dat[i,j])
}
}

How to join combining table values without unique values added to the bottom in R code? Full_join is adding new values to the bottom

I need a chart of accounts to stay in order when new accounts are added or dropped in future years. This is because in Accounting the accounts are sorted by type (for example Asset, Liability Equity) but it is not explicit in the dataset. This is an example of the code that is putting new "Accounts" from Year2 and Year3 at the bottom.
XYZCompany_Consolidated <- XYZCompany_Year1 %>%
full_join(XYZCompany_Year2 by = "Account") %>%
full_join(XYZCompany_Year3, by = "Account")
Example: This picture is just to give a simplified example. The highlight in orange is where the new accounts are going and to the right is the code i'm using, and the green is what I'm trying to achieve

Perhaps I'm overthinking this problem but I find it hard to solve. Let's define some data first:
df_year1 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto",
"JaneDoe"), Year_1 = c(100, 1000, 20, 300, 500)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(Account = structure(list(), class = c("collector_character",
"collector")), Year_1 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df_year2 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto",
"Laptop", "JaneDoe"), Year_2 = c(80, 1200, 50, 300, 500, 0)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), spec = structure(list(
cols = list(Account = structure(list(), class = c("collector_character",
"collector")), Year_2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df_year3 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto",
"Rent", "JaneDoe"), Year_3 = c(80, 1200, 50, 300, 1000, 0)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), spec = structure(list(
cols = list(Account = structure(list(), class = c("collector_character",
"collector")), Year_3 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Those are similar to the data shown in the OP's picture, e.g. df_year1 looks like
# A tibble: 5 x 2
Account Year_1
<chr> <dbl>
1 Cash 100
2 Accounts 1000
3 Loan1 20
4 Auto 300
5 JaneDoe 500
Next we transform those data a little bit, namely
library(dplyr)
library(tidyr)
df_y1 <- df_year1 %>%
mutate(Year = 1,
no = row_number()) %>%
rename(value = Year_1)
which returns
# A tibble: 5 x 4
Account value Year no
<chr> <dbl> <dbl> <int>
1 Cash 100 1 1
2 Accounts 1000 1 2
3 Loan1 20 1 3
4 Auto 300 1 4
5 JaneDoe 500 1 5
The new column no stores the account's original position, column Year stores the chart's year. All three data.frames are processed like this, so we get df_y1, df_y2, df_y3.
Finally we bind them together
bind_rows(df_y1, df_y2, df_y3) %>%
mutate(num_years = max(Year)) %>%
group_by(Account) %>%
mutate(rank = sum((num_years - n() + 1) * no), .keep = "unused") %>%
pivot_wider(names_from = Year) %>%
arrange(rank) %>%
select(-rank) %>%
ungroup()
and calculate a rank for each account. The accounts are ordered by this rank. As a result, we get
# A tibble: 7 x 4
Account Year_1 Year_2 Year_3
<chr> <dbl> <dbl> <dbl>
1 Cash 100 80 80
2 Accounts 1000 1200 1200
3 Loan1 20 50 50
4 Auto 300 300 300
5 Laptop NA 500 NA
6 Rent NA NA 1000
7 JaneDoe 500 0 0
Note
I believe, there are better approaches, but at least this works for the example data.
I'm not sure about the calculated rank's stability. Take care.

Categorizing dataframe based on information in another dataframe

Im trying to categorize one dataframe based on information in another dataframe. In df1 I have information on the measurement type (e.g. if a jar contained wet or dry soil and whether or not the treatment was "None" or "ul5") at a given time. In df2 I have information on what a measured value X was at a given time. I need to know the measurement type for every measured value of X.
I have tried to use full_join and fill() but neither were able to give me my desired outcome. Any ideas?
Here's df1:
df1 <- structure(list(Jar = c("Soil_dry", "Soil_dry", "soil_wet", "soil_wet",
"Soil_dry", "Soil_dry", "soil_wet"), Treatment = c("None", "None",
"None", "None", "ul5", "ul5", "ul5"), Timestamp = structure(c(1608129063,
1608129122, 1608129126, 1608129136, 1608129189, 1608129242, 1608129252
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), spec = structure(list(
cols = list(Jar = structure(list(), class = c("collector_character",
"collector")), Treatment = structure(list(), class = c("collector_character",
"collector")), Timestamp = structure(list(format = ""), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
df2:
df2 <- structure(list(X = c(5, 3, 34, 4, 65, 9, 7), Timestamp = structure(c(1608129064,
1608129122, 1608129125, 1608129133, 1608129188, 1608129240, 1608129243
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), spec = structure(list(
cols = list(X = structure(list(), class = c("collector_double",
"collector")), Timestamp = structure(list(format = ""), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
Desired data:
desired_data <- structure(list(X = c(5, 3, 34, 4, 65, 9, 7), Timestamp = structure(c(1608129064,
1608129122, 1608129125, 1608129133, 1608129188, 1608129240, 1608129243
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Jar = c("Soil_dry",
"Soil_dry", "Soil_dry", "soil_wet", "soil_wet", "Soil_dry", "Soil_dry"
), Treatment = c("None", "None", "None", "None", "None", "ul5",
"ul5")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), spec = structure(list(cols = list(
X = structure(list(), class = c("collector_double", "collector"
)), Timestamp = structure(list(format = ""), class = c("collector_datetime",
"collector")), Jar = structure(list(), class = c("collector_character",
"collector")), Treatment = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))

Try data.table's rolling join.
library(data.table)
setDT(df1)
setDT(df2)
df1[df2, roll = "nearest", on = "Timestamp"]
If we want to make sure that the row selected is always greater than Timestamp from df2 :
library(dplyr)
tidyr::crossing(df1 %>%rename(Timestamp1 = Timestamp),
df2 %>% rename(Timestamp2 = Timestamp)) %>%
mutate(diff = as.numeric(Timestamp2 - Timestamp1)) %>%
filter(diff > 0) %>%
arrange(Jar, Timestamp2, diff) %>%
group_by(Timestamp2) %>%
slice(1L) %>%
ungroup %>%
arrange(Timestamp2) %>%
select(-diff)
# Jar Treatment Timestamp1 X Timestamp2
# <chr> <chr> <dttm> <dbl> <dttm>
#1 Soil_dry None 2020-12-16 14:31:03 5 2020-12-16 14:31:04
#2 Soil_dry None 2020-12-16 14:31:03 3 2020-12-16 14:32:02
#3 Soil_dry None 2020-12-16 14:32:02 34 2020-12-16 14:32:05
#4 Soil_dry None 2020-12-16 14:32:02 4 2020-12-16 14:32:13
#5 Soil_dry None 2020-12-16 14:32:02 65 2020-12-16 14:33:08
#6 Soil_dry ul5 2020-12-16 14:33:09 9 2020-12-16 14:34:00
#7 Soil_dry ul5 2020-12-16 14:34:02 7 2020-12-16 14:34:03

How to fetch values based on date range in R

I have 2 dfs df1 and df2.
df1 have 2 record dates (Base date and followup date),
(Scenario 1) At first I need to match the exact Record_date1 to Drug_Date if it is matched drug name should be update to the corresponding date (i.e PID = 345).
(Scenario 2) If the date is not matched I have to fetch the minimum drug_date of the PID based on daterange like (where min(drug_Date) between Record_date_1(-7 days) and Record_date_1(+45 days))
Here I given the sample set and expected output below.
PID Record_Date_1 D1 Record_Date_2 D2
123 22-04-1996 5.3 30-10-1996 5.4
234 16-06-1994 6.8 13-12-1994 7.2
345 18-09-2000 7.5 24-02-2001 8.9
456 20-02-2001 8.5 20-08-2001 9.4
PID Drug_Date Drugs
123 23-04-1996 Biguanides
123 28-04-1996 Sulphynureas
123 31-10-1996 SGLT2
234 15-06-1994 Insulin
234 14-12-1994 Biguanides
345 18-09-2000 DPP4-inhibitor
345 24-02-2001 Incretin
456 21-02-2001 Biguanides
456 26-08-2001 Sulphynureas
Expected output :
PID Record Date D1 Record Date_2 D2 Drug_ Date1 D1_Drugs Drug_ Date2 D2_Drugs
123 22-04-1996 5.3 30-10-1996 5.4 23-04-1996 Biguanides 31-10-1996 sulphynureas
234 16-06-1994 6.8 13-12-1994 7.2 15-06-1994 Insulin 14-12-1994 Biguanides
345 18-09-2000 7.5 24-02-2001 8.9 18-09-2000 DPP4-inhibitor 24-02-2001 Incretin
456 20-02-2001 8.5 20-08-2001 9.4 21-02-2001 Biguanides 26-08-2001 sulphynureas
If you need any clarification please let me know.
Thanks in advance!

Consider a function like this
my_match <- function(x, y) {
f <- function(i, j) {
pos <- which(j >= i - 7L & j <= i + 45L)
pos[[which.min(j[pos])]]
}
x <- as.Date(x, "%d-%m-%Y")
y <- as.Date(y, "%d-%m-%Y")
out <- match(x, y)
ifelse(is.na(out), vapply(x, f, integer(1L), y), out)
}
Then, you can just
df1$Drug_Date1 <- df2$Drug_Date[my_match(df1$Record_Date_1, df2$Drug_Date)]
df1$D1_Drug <- df2$Drugs[my_match(df1$Record_Date_1, df2$Drug_Date)]
df1$Drug_Date2 <- df2$Drug_Date[my_match(df1$Record_Date_2, df2$Drug_Date)]
df1$D2_Drug <- df2$Drugs[my_match(df1$Record_Date_2, df2$Drug_Date)]
Output
> as.data.frame(df1)
PID Record_Date_1 D1 Record_Date_2 D2 Drug_Date1 D1_Drug Drug_Date2 D2_Drug
1 123 22-04-1996 5.3 30-10-1996 5.4 23-04-1996 Biguanides 31-10-1996 SGLT2
2 234 16-06-1994 6.8 13-12-1994 7.2 15-06-1994 Insulin 14-12-1994 Biguanides
3 345 18-09-2000 7.5 24-02-2001 8.9 18-09-2000 DPP4-inhibitor 24-02-2001 Incretin
4 456 20-02-2001 8.5 20-08-2001 9.4 21-02-2001 Biguanides 26-08-2001 Sulphynureas
Data (df1)
structure(list(PID = c(123, 234, 345, 456), Record_Date_1 = c("22-04-1996",
"16-06-1994", "18-09-2000", "20-02-2001"), D1 = c(5.3, 6.8, 7.5,
8.5), Record_Date_2 = c("30-10-1996", "13-12-1994", "24-02-2001",
"20-08-2001"), D2 = c(5.4, 7.2, 8.9, 9.4), Drug_Date1 = c("23-04-1996",
"15-06-1994", "18-09-2000", "21-02-2001"), D1_Drug = c("Biguanides",
"Insulin", "DPP4-inhibitor", "Biguanides"), Drug_Date2 = c("31-10-1996",
"14-12-1994", "24-02-2001", "26-08-2001"), D2_Drug = c("SGLT2",
"Biguanides", "Incretin", "Sulphynureas")), row.names = c(NA,
-4L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), spec = structure(list(
cols = list(PID = structure(list(), class = c("collector_double",
"collector")), Record_Date_1 = structure(list(), class = c("collector_character",
"collector")), D1 = structure(list(), class = c("collector_double",
"collector")), Record_Date_2 = structure(list(), class = c("collector_character",
"collector")), D2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 2), class = "col_spec"))
Data (df2)
structure(list(PID = c(123, 123, 123, 234, 234, 345, 345, 456,
456), Drug_Date = c("23-04-1996", "28-04-1996", "31-10-1996",
"15-06-1994", "14-12-1994", "18-09-2000", "24-02-2001", "21-02-2001",
"26-08-2001"), Drugs = c("Biguanides", "Sulphynureas", "SGLT2",
"Insulin", "Biguanides", "DPP4-inhibitor", "Incretin", "Biguanides",
"Sulphynureas")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), spec = structure(list(cols = list(
PID = structure(list(), class = c("collector_double", "collector"
)), Drug_Date = structure(list(), class = c("collector_character",
"collector")), Drugs = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 2), class = "col_spec"))

How to use the first value of the "next" group in r?

I am trying to get the first value of the next group in r to estimate a ratio. I have created a group based on the type column in my the df. Then estimated some influence factors using the sample position within the group. Finally, I am trying to estimate a ratio like this: RRF=response/(F1*first(response)+(F2*??????)) where the F1*first(response) is the cal in the group but I don't know how to call the first value of the next group to finish the ratio. Can someone help with this? This is my code and my data:
library(dplyr)
conc_zero_test <- zero_test %>%
gather(gas, response, -time,-type)%>%
group_by(group = cumsum(type == "current_std"),gas)%>%
mutate(X1= row_number()-1, #estimates the position of the sample within the group -1 removes std
F1=1-(X1/n()), #relative factor influence of the cal in the current group
F2=1-F1, #relative factor influence of the cal in the next group
RRF=response/(F1*first(response)+(F2*????))
structure(list(time = structure(c(1564468200, 1564475400, 1564484400,
1564486200, 1564493400, 1564497000, 1564498800, 1564506000, 1564509600,
1564511400, 1564518600, 1564522200, 1564524000, 1564527600, 1564531200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), type = c("current_std",
"n2", "n2", "current_std", "n2", "-", "current_std", "-", "n2",
"current_std", "n2", "-", "current_std", "-", "-"), ben = c(2293951.5,
12703.1, 6392.7, 1762512.6, 10748.4, 25468.3, 1597679, 24400.4,
6019.4, 1510760.2, 10329.1, 29292.6, 1495942.8, 61227.5, 25379.5
), xyl = c(210975.6, 4482, 2910.8, 127612.4, 3792.6, 10295.7,
113439.1, 10628.8, 2064.3, 107134.3, 3764.1, 10380.6, 107353.6,
23639.1, 10317.4), cym = c(546894.5, 12202.6, 8400.8, 302091.6,
11072.2, 16349.2, 291637.5, 18891.8, 6500.7, 276997.5, 10821.2,
18672, 274149.4, 61379.2, 19254.7), isop = c(397288.2, 0, 0,
239779.9, 0, 1364.8, 199081.5, 1511.2, 0, 179364, 0, 1318.4,
174450.7, 7137.5, 9567), macr = c(221195.8, 0, 0, 138806.3, 0,
0, 116644, 0, 0, 108893.3, 0, 0, 105689, 4325.4, 0), pin = c(50795.3,
0, 0, 28436, 0, 1020.1, 26482.9, 925.2, 0, 27394.1, 0, 989.7,
24344.6, 1414.7, 736.3), tmb = c(9314.5, 0, 0, 5798, 0, 0, 5136.4,
2252.5, 0, 4542.9, 0, 0, 4398.4, 3794.4, 2186.3), tol = c(880567.1,
7430.6, 4225.5, 569616.2, 6091.8, 65642.6, 495780.5, 52129.9,
3226, 456079.6, 5874, 34725.9, 453944.8, 56594.4, 66148.1), mvk = c(169036.8,
0, 0, 108738, 0, 0, 56712.5, 0, 0, 79148.9, 0, 0, 64065, 0, 0
), euc = c(12815.2, 0, 0, 8012.6, 0, 0, 5411.8, 0, 0, 5839.9,
0, 491.7, 5450.7, 1990.8, 500.7)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L), spec = structure(list(
cols = list(time = structure(list(format = ""), class = c("collector_datetime",
"collector")), type = structure(list(), class = c("collector_character",
"collector")), ben = structure(list(), class = c("collector_double",
"collector")), xyl = structure(list(), class = c("collector_double",
"collector")), cym = structure(list(), class = c("collector_double",
"collector")), isop = structure(list(), class = c("collector_double",
"collector")), macr = structure(list(), class = c("collector_double",
"collector")), pin = structure(list(), class = c("collector_double",
"collector")), tmb = structure(list(), class = c("collector_double",
"collector")), tol = structure(list(), class = c("collector_double",
"collector")), mvk = structure(list(), class = c("collector_double",
"collector")), euc = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 2), class = "col_spec"))
Example of expected output
time type gas response group X1 F1 F2 RRF
<dttm> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 2019-07-30 06:30:00 current_std ben 2293952. 1 0 1 0 1
2 2019-07-30 08:30:00 n2 ben 12703. 1 1 0.667 0.333 0.006005
3 2019-07-30 11:00:00 n2 ben 6393. 1 2 0.333 0.667 0.003962

I would use a self-join to get the first response of the next group:
library(tidyverse)
# the OPs example data (long!)
zero_test <-
structure(
list(
time = structure(
c(
1564468200,
1564475400,
1564484400,
1564486200,
1564493400,
1564497000,
1564498800,
1564506000,
1564509600,
1564511400,
1564518600,
1564522200,
1564524000,
1564527600,
1564531200
),
class = c("POSIXct", "POSIXt"),
tzone = "UTC"
),
type = c(
"current_std",
"n2",
"n2",
"current_std",
"n2",
"-",
"current_std",
"-",
"n2",
"current_std",
"n2",
"-",
"current_std",
"-",
"-"
),
ben = c(
2293951.5,
12703.1,
6392.7,
1762512.6,
10748.4,
25468.3,
1597679,
24400.4,
6019.4,
1510760.2,
10329.1,
29292.6,
1495942.8,
61227.5,
25379.5
),
xyl = c(
210975.6,
4482,
2910.8,
127612.4,
3792.6,
10295.7,
113439.1,
10628.8,
2064.3,
107134.3,
3764.1,
10380.6,
107353.6,
23639.1,
10317.4
),
cym = c(
546894.5,
12202.6,
8400.8,
302091.6,
11072.2,
16349.2,
291637.5,
18891.8,
6500.7,
276997.5,
10821.2,
18672,
274149.4,
61379.2,
19254.7
),
isop = c(
397288.2,
0,
0,
239779.9,
0,
1364.8,
199081.5,
1511.2,
0,
179364,
0,
1318.4,
174450.7,
7137.5,
9567
),
macr = c(
221195.8,
0,
0,
138806.3,
0,
0,
116644,
0,
0,
108893.3,
0,
0,
105689,
4325.4,
0
),
pin = c(
50795.3,
0,
0,
28436,
0,
1020.1,
26482.9,
925.2,
0,
27394.1,
0,
989.7,
24344.6,
1414.7,
736.3
),
tmb = c(
9314.5,
0,
0,
5798,
0,
0,
5136.4,
2252.5,
0,
4542.9,
0,
0,
4398.4,
3794.4,
2186.3
),
tol = c(
880567.1,
7430.6,
4225.5,
569616.2,
6091.8,
65642.6,
495780.5,
52129.9,
3226,
456079.6,
5874,
34725.9,
453944.8,
56594.4,
66148.1
),
mvk = c(169036.8,
0, 0, 108738, 0, 0, 56712.5, 0, 0, 79148.9, 0, 0, 64065, 0, 0),
euc = c(
12815.2,
0,
0,
8012.6,
0,
0,
5411.8,
0,
0,
5839.9,
0,
491.7,
5450.7,
1990.8,
500.7
)
),
class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"),
row.names = c(NA,-15L),
spec = structure(list(
cols = list(
time = structure(list(format = ""), class = c("collector_datetime",
"collector")),
type = structure(list(), class = c("collector_character",
"collector")),
ben = structure(list(), class = c("collector_double",
"collector")),
xyl = structure(list(), class = c("collector_double",
"collector")),
cym = structure(list(), class = c("collector_double",
"collector")),
isop = structure(list(), class = c("collector_double",
"collector")),
macr = structure(list(), class = c("collector_double",
"collector")),
pin = structure(list(), class = c("collector_double",
"collector")),
tmb = structure(list(), class = c("collector_double",
"collector")),
tol = structure(list(), class = c("collector_double",
"collector")),
mvk = structure(list(), class = c("collector_double",
"collector")),
euc = structure(list(), class = c("collector_double",
"collector"))
),
default = structure(list(), class = c("collector_guess",
"collector")),
skip = 2
), class = "col_spec")
)
temp1 <- zero_test %>%
gather(gas, response, -time,-type) %>%
group_by(group = cumsum(type == "current_std"), gas) %>%
mutate(X1= row_number()-1, #estimates the position of the sample within the group -1 removes std
F1=1-(X1/n()), #relative factor influence of the cal in the current group
F2=1-F1,
first_response = first(response)) %>%
ungroup
conc_zero_test <- temp1 %>%
left_join(y = {temp1 %>%
mutate(group = group - 1) %>%
select(gas, group, first_response_next = first_response) %>%
distinct},
by = c("gas", "group")) %>%
mutate(RRF = response / ((F1 * first_response) + (F2 * first_response_next)))
conc_zero_test
#> # A tibble: 150 x 11
#> time type gas response group X1 F1 F2
#> <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2019-07-30 06:30:00 curr… ben 2293952. 1 0 1 0
#> 2 2019-07-30 08:30:00 n2 ben 12703. 1 1 0.667 0.333
#> 3 2019-07-30 11:00:00 n2 ben 6393. 1 2 0.333 0.667
#> 4 2019-07-30 11:30:00 curr… ben 1762513. 2 0 1 0
#> 5 2019-07-30 13:30:00 n2 ben 10748. 2 1 0.667 0.333
#> 6 2019-07-30 14:30:00 - ben 25468. 2 2 0.333 0.667
#> 7 2019-07-30 15:00:00 curr… ben 1597679 3 0 1 0
#> 8 2019-07-30 17:00:00 - ben 24400. 3 1 0.667 0.333
#> 9 2019-07-30 18:00:00 n2 ben 6019. 3 2 0.333 0.667
#> 10 2019-07-30 18:30:00 curr… ben 1510760. 4 0 1 0
#> # … with 140 more rows, and 3 more variables: first_response <dbl>,
#> # first_response_next <dbl>, RRF <dbl>
Created on 2020-08-16 by the reprex package (v0.3.0)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Wide to long without having an X in front of variables - r

Related

Structure of a for loop

How to join combining table values without unique values added to the bottom in R code? Full_join is adding new values to the bottom

Categorizing dataframe based on information in another dataframe

How to fetch values based on date range in R

How to use the first value of the "next" group in r?

Categories

Resources