I have a dataset looking like this
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
I want an output like this, so that last observations are carried forward (by group) unless there are only NA values before one fillied-in value then I want last-observation carried backward:
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
I have been working with dplyr and na.locf from the zoo package. SO far my approach has been this:
df%>%
group_by(PID%>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
However, this only does last observation carried forward. The specification "fromLast" in the na.locf function does last observation carried backward.
But how do I connect these two, so that both functions are used:
na.LOCF if there are no NA values before the first filled-in value
na.LOCF(fromLast) meaning last observation carried backward if there are NA values before the first value that is filled-in.
Thank you so much in advance!
This should work :
library(tidyverse)
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
df2 <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
df <- df %>%
group_by(ID) %>%
fill(values, .direction = "downup") %>%
fill(values, .direction = "updown")
Related
I need, with the help of the map() function, apply the above for each element
How can I do so?
As dt is of class data.table, you can make a vector of columns of interest (i.e. your items; below I use grepl on the names), and then apply your weighting function to each of those columns using .SD and .SDcols, with by
qs = names(dt)[grepl("^q", names(dt))]
dt[, (paste0(qs,"wt")):=lapply(.SD, \(q) 1/(sum(!is.na(q))/.N)),
.(sex, education_code, age), .SDcols = qs]
As mentioned in the comments, you miss a dt <- in your dt[, .(ID, education_code, age, sex, item = q1_1)] which makes the column item unavailable in the following line dt[, no_respond := is.na(item)].
Your weighting scheme is not absolutely clear to me however, assuming you want to do what is done in your code here, I would go with dplyr solution to iterate over columns.
# your data without no_respond column and correcting missing value in q2_3
dt <- data.table::data.table(
ID = c(1,2,3,4, 5, 6, 7, 8, 9, 10),
education_code = c(20,50,20,60, 20, 10,5, 12, 12, 12),
age = c(87,67,56,52, 34, 56, 67, 78, 23, 34),
sex = c("F","M","M","M", "F","M","M","M", "M","M"),
q1_1 = c(NA,1,5,3, 1, NA, 3, 4, 5,1),
q1_2 = c(NA,1,5,3, 1, 2, NA, 4, 5,1),
q1_3 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q1_text = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q2_1 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q2_2 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
q2_3 = c(NA,1,5,3, 1, NA, NA, 4, 5,1),
q2_text = c(NA,1,5,3, 1, NA, 3, 4, 5,1))
dt %>%
group_by(sex, education_code, age) %>% #groups the df by sex, education_code, age
add_count() %>% #add a column with number of rows in each group
mutate(across(starts_with("q"), #for each column starting with "q"
~ 1/(sum(!is.na(.))/n), #create a new column following your weight calculation
.names = '{.col}_wgt')) %>% #naming the new column with suffix "_wgt" to original name
ungroup()
I have a dataset looking like this
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
I want an output like this, so that last observations are carried forward (by group) unless there are only NA values before one fillied-in value then I want last-observation carried backward:
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
I have been working with dplyr and na.locf from the zoo package. SO far my approach has been this:
df%>%
group_by(PID%>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
However, this only does last observation carried forward. The specification "fromLast" in the na.locf function does last observation carried backward.
But how do I connect these two, so that both functions are used:
na.LOCF if there are no NA values before the first filled-in value
na.LOCF(fromLast) meaning last observation carried backward if there are NA values before the first value that is filled-in.
Thank you so much in advance!
This should work :
library(tidyverse)
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
df2 <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
df <- df %>%
group_by(ID) %>%
fill(values, .direction = "downup") %>%
fill(values, .direction = "updown")
I have a dataset like this:
data <- data.frame(Time = c(1,4,6,9,11,13,16, 25, 32, 65),
A = c(10, NA, 13, 2, 32, 19, 32, 34, 93, 12),
B = c(1, 99, 32, 31, 12, 13, NA, 13, NA, NA),
C = c(2, 32, NA, NA, NA, NA, NA, NA, NA, NA))
What I want to retrieve are the values in Time that corresponds to the last numerical value in A, B, and C.
For example, the last numerical values for A, B, and C are 12, 13, and 32 respectively.
So, the Time values that correspond are 65, 25, and 4.
I've tried something like data[which(data$Time== max(data$A)), ], but this doesn't work.
We can multiply the row index with the logical matrix, and get the colMaxs (from matrixStats) to subset the 'Time' column
library(matrixStats)
data$Time[colMaxs((!is.na(data[-1])) * row(data[-1]))]
#[1] 65 25 4
Or using base R, we get the index with which/arr.ind, get the max index using a group by operation (tapply) and use that to extract the 'Time' value
m1 <- which(!is.na(data[-1]), arr.ind = TRUE)
data$Time[tapply(m1[,1], m1[,2], FUN = max)]
#[1] 65 25 4
Or with summarise/across in the devel version of dplyr
library(dplyr)
data %>%
summarise(across(A:C, ~ tail(Time[!is.na(.)], 1)))
# A B C
#1 65 25 4
Or using summarise_at with the current version of dplyr
data %>%
summarise_at(vars(A:C), ~ tail(Time[!is.na(.)], 1))
I have a data frame with 1530 obs of 6 varaibles. In this dataframe there 51 assets with 30 obs each. I tried to apply de MACD function to obtain two values: macd and signal but show up an error. This is an example:
macdusdt <- filtusdt %>% group_by(symbol) %>% do(tail(., n = 30))
macd1m <- macdusdt %>%
mutate (signals = MACD(macdusdt$lastPrice,
nFast = 12, nSlow = 26, nSig = 9, maType = "EMA", percent = T))
Error: Column signals must be length 30 (the group size) or one, not 3060
I want to apply de MACD function to every asset in the data frame. The database is here: https://www.dropbox.com/s/ww8stgsspqi8tef/macdusdt.xlsx?dl=0
Based on the data provided, it is giving an error when applied the code
Error in EMA(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, :
n > number of non-NA values in column(s) 1
To prevent that we can do
library(dplyr)
library(TTR)
filtusdt %>%
group_by(symbol) %>%
slice(tail(row_number(), 30)) %>%
mutate(signals = if(n() < sum(is.na(lastPrice))) MACD(lastPrice,
nFast = 12, nSlow = 26, nSig = 9, maType = "EMA", percent = TRUE) else NA)
It could be an issue because of the subset dataset provided
I'd like to replace NA elements of a vector with elements from a sequence, for example:
x <- c(1, NA, 5, NA, NA, 2, 12, NA)
replace.seq <- -1:-4 # Can assume length(replace.seq) == sum(is.na(x))
goal <- c(1, -1, 5, -2, -3, 2, 12, -4)
What's an efficient way to do this? I'd prefer to avoid sorting x.
Per #akrun:
x[is.na(x)] <- replace.seq
You can use replace:
x <- replace(x, is.na(x), replace.seq)