Last observation carried forward and last observation carried backward in R - r

I have a dataset looking like this
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
I want an output like this, so that last observations are carried forward (by group) unless there are only NA values before one fillied-in value then I want last-observation carried backward:
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
I have been working with dplyr and na.locf from the zoo package. SO far my approach has been this:
df%>%
group_by(PID%>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
However, this only does last observation carried forward. The specification "fromLast" in the na.locf function does last observation carried backward.
But how do I connect these two, so that both functions are used:
na.LOCF if there are no NA values before the first filled-in value
na.LOCF(fromLast) meaning last observation carried backward if there are NA values before the first value that is filled-in.
Thank you so much in advance!

This should work :
library(tidyverse)
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
df2 <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
df <- df %>%
group_by(ID) %>%
fill(values, .direction = "downup") %>%
fill(values, .direction = "updown")

Related

Using map() function to apply for each element

I need, with the help of the map() function, apply the above for each element
How can I do so?
As dt is of class data.table, you can make a vector of columns of interest (i.e. your items; below I use grepl on the names), and then apply your weighting function to each of those columns using .SD and .SDcols, with by
qs = names(dt)[grepl("^q", names(dt))]
dt[, (paste0(qs,"wt")):=lapply(.SD, \(q) 1/(sum(!is.na(q))/.N)),
.(sex, education_code, age), .SDcols = qs]
As mentioned in the comments, you miss a dt <- in your dt[, .(ID, education_code, age, sex, item = q1_1)] which makes the column item unavailable in the following line dt[, no_respond := is.na(item)].
Your weighting scheme is not absolutely clear to me however, assuming you want to do what is done in your code here, I would go with dplyr solution to iterate over columns.
# your data without no_respond column and correcting missing value in q2_3
dt <- data.table::data.table(
ID = c(1,2,3,4, 5, 6, 7, 8, 9, 10),
education_code = c(20,50,20,60, 20, 10,5, 12, 12, 12),
age = c(87,67,56,52, 34, 56, 67, 78, 23, 34),
sex = c("F","M","M","M", "F","M","M","M", "M","M"),
q1_1 = c(NA,1,5,3, 1, NA, 3, 4, 5,1),
q1_2 = c(NA,1,5,3, 1, 2, NA, 4, 5,1),
q1_3 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q1_text = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q2_1 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q2_2 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q2_3 = c(NA,1,5,3, 1, NA, NA, 4, 5,1),
q2_text = c(NA,1,5,3, 1, NA, 3, 4, 5,1))
dt %>%
group_by(sex, education_code, age) %>% #groups the df by sex, education_code, age
add_count() %>% #add a column with number of rows in each group
mutate(across(starts_with("q"), #for each column starting with "q"
~ 1/(sum(!is.na(.))/n), #create a new column following your weight calculation
.names = '{.col}_wgt')) %>% #naming the new column with suffix "_wgt" to original name
ungroup()

replacing NAs with preceding strings in R [duplicate]

I have a dataset looking like this
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
I want an output like this, so that last observations are carried forward (by group) unless there are only NA values before one fillied-in value then I want last-observation carried backward:
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
I have been working with dplyr and na.locf from the zoo package. SO far my approach has been this:
df%>%
group_by(PID%>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
However, this only does last observation carried forward. The specification "fromLast" in the na.locf function does last observation carried backward.
But how do I connect these two, so that both functions are used:
na.LOCF if there are no NA values before the first filled-in value
na.LOCF(fromLast) meaning last observation carried backward if there are NA values before the first value that is filled-in.
Thank you so much in advance!
This should work :
library(tidyverse)
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
df2 <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
df <- df %>%
group_by(ID) %>%
fill(values, .direction = "downup") %>%
fill(values, .direction = "updown")

Returning values from a column based on the last value of another column

I have a dataset like this:
data <- data.frame(Time = c(1,4,6,9,11,13,16, 25, 32, 65),
A = c(10, NA, 13, 2, 32, 19, 32, 34, 93, 12),
B = c(1, 99, 32, 31, 12, 13, NA, 13, NA, NA),
C = c(2, 32, NA, NA, NA, NA, NA, NA, NA, NA))
What I want to retrieve are the values in Time that corresponds to the last numerical value in A, B, and C.
For example, the last numerical values for A, B, and C are 12, 13, and 32 respectively.
So, the Time values that correspond are 65, 25, and 4.
I've tried something like data[which(data$Time== max(data$A)), ], but this doesn't work.
We can multiply the row index with the logical matrix, and get the colMaxs (from matrixStats) to subset the 'Time' column
library(matrixStats)
data$Time[colMaxs((!is.na(data[-1])) * row(data[-1]))]
#[1] 65 25 4
Or using base R, we get the index with which/arr.ind, get the max index using a group by operation (tapply) and use that to extract the 'Time' value
m1 <- which(!is.na(data[-1]), arr.ind = TRUE)
data$Time[tapply(m1[,1], m1[,2], FUN = max)]
#[1] 65 25 4
Or with summarise/across in the devel version of dplyr
library(dplyr)
data %>%
summarise(across(A:C, ~ tail(Time[!is.na(.)], 1)))
# A B C
#1 65 25 4
Or using summarise_at with the current version of dplyr
data %>%
summarise_at(vars(A:C), ~ tail(Time[!is.na(.)], 1))

How can I estimate a function in a group?

I have a data frame with 1530 obs of 6 varaibles. In this dataframe there 51 assets with 30 obs each. I tried to apply de MACD function to obtain two values: macd and signal but show up an error. This is an example:
macdusdt <- filtusdt %>% group_by(symbol) %>% do(tail(., n = 30))
macd1m <- macdusdt %>%
mutate (signals = MACD(macdusdt$lastPrice,
nFast = 12, nSlow = 26, nSig = 9, maType = "EMA", percent = T))
Error: Column signals must be length 30 (the group size) or one, not 3060
I want to apply de MACD function to every asset in the data frame. The database is here: https://www.dropbox.com/s/ww8stgsspqi8tef/macdusdt.xlsx?dl=0
Based on the data provided, it is giving an error when applied the code
Error in EMA(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, :
n > number of non-NA values in column(s) 1
To prevent that we can do
library(dplyr)
library(TTR)
filtusdt %>%
group_by(symbol) %>%
slice(tail(row_number(), 30)) %>%
mutate(signals = if(n() < sum(is.na(lastPrice))) MACD(lastPrice,
nFast = 12, nSlow = 26, nSig = 9, maType = "EMA", percent = TRUE) else NA)
It could be an issue because of the subset dataset provided

Creating new variable based on specific rows of other two variables in a long formatted dataset

I have a long dataset of emotional responses and I need to create a variable based on specific rows of two other variables, within subjects.
The following data frame includes data for two participants ("person") presented with 2 pictures (P1, P2, P3), each with 3 repetitions (R1, R2, R3) which is the "phase" variable. The variable response includes two things the rating for each presentation ( scale -30 to 30) and the emotion experienced per picture.
person <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
block <- c(4, 4, 4, 5, 5, 5, 8, 8, 4, 4, 4, 5, 5, 5, 8, 8)
phase <- c("P1R1", "P1R2", "P1R3", "P2R1","P2R2","P2R3", "Post1", "Post2","P1R1",
"P1R2", "P1R3", "P2R1","P2R2","P2R3", "Post1", "Post2")
response <- c(30, 30, 30, -30, -30, -30, "Happy", "Sad", 28, 27, 25, -23, -24,
-22, "Excited", "Scared")
df <- data.frame(person, block, phase, emotion, response)
I need to create a new column that will be based on the block number and give me the emotion per picture.
I would like the new column to be called “postsurvey” and expect it to be as following:
postsurvey <-c ("Happy", "Happy", "Happy","Sad","Sad", "Sad", NA, NA,
"Excited", "Excited", "Excited", "Scared", "Scared", "Scared", NA, NA)
df <- data.frame(person, block, phase, emotion, response, postsurvey)
The code that I used is:
df<-df %>% group_by(person, block) %>%
mutate(postsurvey=if(block==4){response[phase=="Post1"]}
else if (block==5){response[phase=="Post2"]}
else {print("NA")})
I expect for each subject to receive for each block number the same response, but what I get is that the response is not grouped by the subjects, and is not repeated within the subject by a block number, as if there is a vector of emotions and a person gets emotions that are not his.
*In my original data I have 4 pictures per subject with 10 repetitions, so the "else if" code repeated with more then two conditions.

Resources