How can I estimate a function in a group? - r

I have a data frame with 1530 obs of 6 varaibles. In this dataframe there 51 assets with 30 obs each. I tried to apply de MACD function to obtain two values: macd and signal but show up an error. This is an example:
macdusdt <- filtusdt %>% group_by(symbol) %>% do(tail(., n = 30))
macd1m <- macdusdt %>%
mutate (signals = MACD(macdusdt$lastPrice,
nFast = 12, nSlow = 26, nSig = 9, maType = "EMA", percent = T))
Error: Column signals must be length 30 (the group size) or one, not 3060
I want to apply de MACD function to every asset in the data frame. The database is here: https://www.dropbox.com/s/ww8stgsspqi8tef/macdusdt.xlsx?dl=0

Based on the data provided, it is giving an error when applied the code
Error in EMA(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, :
n > number of non-NA values in column(s) 1
To prevent that we can do
library(dplyr)
library(TTR)
filtusdt %>%
group_by(symbol) %>%
slice(tail(row_number(), 30)) %>%
mutate(signals = if(n() < sum(is.na(lastPrice))) MACD(lastPrice,
nFast = 12, nSlow = 26, nSig = 9, maType = "EMA", percent = TRUE) else NA)
It could be an issue because of the subset dataset provided

Related

replacing NAs with preceding strings in R [duplicate]

I have a dataset looking like this
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
I want an output like this, so that last observations are carried forward (by group) unless there are only NA values before one fillied-in value then I want last-observation carried backward:
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
I have been working with dplyr and na.locf from the zoo package. SO far my approach has been this:
df%>%
group_by(PID%>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
However, this only does last observation carried forward. The specification "fromLast" in the na.locf function does last observation carried backward.
But how do I connect these two, so that both functions are used:
na.LOCF if there are no NA values before the first filled-in value
na.LOCF(fromLast) meaning last observation carried backward if there are NA values before the first value that is filled-in.
Thank you so much in advance!
This should work :
library(tidyverse)
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
df2 <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
df <- df %>%
group_by(ID) %>%
fill(values, .direction = "downup") %>%
fill(values, .direction = "updown")

calculate the mean of column and also the comments in next column

I want to calculate the mean of column and and also concatenate the texts in second column output.
for example in below i want to calculate the mean of C1 and then concatenate all texts in C1T in next column if there is more than one text in C1T.
df <- data.frame(A1 = c("class","type","class","type","class","class","class","class","class"),
B1 = c("b2","b3","b3","b1","b3","b3","b3","b2","b1"),
C1=c(6, NA, 1, 6, NA, 1, 6, 6, 2),
C1T=c(NA, "Part of other business", NA, NA, NA, NA, NA, NA, NA),
C2=c(NA, 4, 1, 2, 4, 4, 3, 3, NA),
C2T=c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
C3=c(3, 4, 3, 3, 6, NA, 2, 4, 1),
C3T=c(NA, NA, NA, NA, "two part are available but not in source", NA, NA, NA, NA),
C4=c(5, 5, 2, NA, NA, 6, 4, 1, 2),
C5T=c(NA, NA, NA, NA, NA, NA, NA, "Critical Expert", NA),
C5=c(6, 2, 6, 4, 2, 2, 5, 4, 1),
C5T=c(NA, NA, NA, NA, NA, "most of things are stuck", "weather responsible", NA, NA))
var <- "C1"
var1 <- "C1T"
var <- rlang::parse_expr(var)
var1 <- rlang::parse_expr(var1)
df1 <- df%>%filter(A1 == "class")
T1<- df1 %>%group_by(B1)%>%summarise(mean=round(mean(!!var,na.rm = TRUE),1))
Comments <- df1 %>% group_by(B1) %>% summarise_at(vars(var1), paste0, collapse = " ") %>%
select(var1) %>% unlist() %>% gsub("NA","",.) %>% stringi::stri_trim_both()
cbind(T1,Comments)
Edited Answer:
var <- "C1"
var1 <- "C1T"
filtercol <- "A1"
filterval <- "class"
groupingvar <- "B1"
var <- rlang::parse_expr(var)
var1 <- rlang::parse_expr(var1)
filtercol <- rlang::parse_expr(filtercol)
groupingvar <- rlang::parse_expr(groupingvar)
library(dplyr)
df1 <- df %>% filter(!!filtercol == filterval)
T1 <- df1 %>% group_by(!!groupingvar) %>% summarise(mean=round(mean(as.numeric(!!var),na.rm = TRUE),1))
Comments <- df1 %>% select(!!groupingvar, !!var1) %>%
group_by(!!groupingvar) %>%
summarise_at(vars(!!var1), paste0, collapse = " ") %>%
select(!!var1) %>% unlist() %>% gsub("NA", "", .) %>%
stringi::stri_trim_both()
T1 <- cbind(T1,Comments)
Update on OP's request (see comments):
library(dplyr)
# helper function to coalesce by column
coalesce_by_column <- function(df) {
return(coalesce(df[1], df[2]))
}
df %>%
pivot_longer(
cols = contains("T"),
names_to = "names",
values_to = "values"
) %>%
filter(names == "C1T") %>%
group_by(names) %>%
summarise(Mean = mean(c_across(C1:C5 & where(is.numeric)), na.rm = TRUE),
Comments = coalesce_by_column(values))
Output:
names Mean Comments
<chr> <dbl> <chr>
1 C1T 3.47 Part of other business
First answer
coalesce to construct Comments column
rowwise with c_across to calculate the mean rowwise.
In case you need to group, you can use ``group_by`
library(dplyr)
df %>%
mutate(Comments = coalesce(C1T, C2T, C3T, C4T, C5T),.keep="unused") %>%
rowwise() %>%
mutate(Mean = mean(c_across(C1:C5 & where(is.numeric)), na.rm = TRUE)) %>%
select(A1, B1, Mean, Comments)
Output:
A1 B1 Mean Comments
<chr> <chr> <dbl> <chr>
1 class b2 5 NA
2 type b3 3.75 Part of other business
3 class b3 2.6 NA
4 type b1 3.75 NA
5 class b3 4 two part are available but not in source
6 class b3 3.25 most of things are stuck
7 class b3 4 weather responsible
8 class b2 3.6 Critical Expert
9 class b1 1.5 NA

Last observation carried forward and last observation carried backward in R

I have a dataset looking like this
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
I want an output like this, so that last observations are carried forward (by group) unless there are only NA values before one fillied-in value then I want last-observation carried backward:
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
I have been working with dplyr and na.locf from the zoo package. SO far my approach has been this:
df%>%
group_by(PID%>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
However, this only does last observation carried forward. The specification "fromLast" in the na.locf function does last observation carried backward.
But how do I connect these two, so that both functions are used:
na.LOCF if there are no NA values before the first filled-in value
na.LOCF(fromLast) meaning last observation carried backward if there are NA values before the first value that is filled-in.
Thank you so much in advance!
This should work :
library(tidyverse)
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
df2 <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
df <- df %>%
group_by(ID) %>%
fill(values, .direction = "downup") %>%
fill(values, .direction = "updown")

How do we create the data frame using column name,number of missing values and their percentage

Missing_Values = data.frame(colSums(is.na(train)))
Missing_Values_per = data.frame(colMeans(is.na(train))) * 100
data.frame(Column_Name = names(train))
i need to create the data frame using these three variables ,could someone help on this
try this:
library(tidyverse)
train <- tibble(a = c(NA, 1, 4, NA, NA),
b = c(6, NA, NA, NA, NA))
train %>%
gather(column_name, v) %>%
group_by(column_name) %>%
summarize(missing_values = sum(is.na(v)),
missing_values_per = mean(is.na(v)) * 100)

Multiple columns processing and dynamically naming new columns

Variables are mistakenly being entered into multiple columns eg: "aaa_1", "aaa_2" and "aaa_3", or "ccc_1, "ccc_2", and "ccc_3"). Need to create single new columns (eg "aaa", or "ccc"). Some variables are currently in a single column though ("hhh_1"), but more columns may be added (hhh_2 etc).
This is what I got:
aaa_1 <- c(43, 23, 65, NA, 45)
aaa_2 <- c(NA, NA, NA, NA, NA)
aaa_3 <- c(NA, NA, 92, NA, 82)
ccc_1 <- c("fra", NA, "spa", NA, NA)
ccc_2 <- c(NA, NA, NA, "wez", NA)
ccc_3 <- c(NA, "ija", NA, "fda", NA)
ccc_4 <- c(NA, NA, NA, NA, NA)
hhh_1 <- c(183, NA, 198, NA, 182)
dataf1 <- data.frame(aaa_1,aaa_2,aaa_3,ccc_1,ccc_2, ccc_3,ccc_4,hhh_1)
This is what I want:
aaa <- c(43, 23, NA, NA, NA)
ccc <- c("fra", "ija", "spa", NA, NA)
hhh <- c(183, NA, 198, NA, 182)
dataf2 <- data.frame(aaa,ccc,hhh)
General solution needed as there are ~100 variables (eg "aaa", "hhh", "ccc", "ttt", "eee", "hhh"etc).
Thanks!
This is a base solution, i.e. no packages.
First define get_only which when given a list converts it to a data.frame and applies get_only to each row. When given a vector it returns the single non-NA in it or NA if there is not only one.
Define root to be the column names without the suffixes.
Convert the data frame to a list of columns, group them by root and apply get_only to each such group.
Finally, convert the resulting list to a data frame.
get_only <- function(x) UseMethod("get_only")
get_only.list <- function(x) apply(data.frame(x), 1, get_only)
get_only.default <- function(x) if (sum(!is.na(x)) == 1) na.omit(x) else NA
root <- sub("_.*", "", names(dataf1))
as.data.frame(lapply(split(as.list(dataf1), root), FUN = get_only))
giving:
age country hight
1 43 fra 183
2 23 ija NA
3 NA spa 198
4 NA <NA> NA
5 NA <NA> 182
We may try with splitstackshape
library(splitstackshape)
nm1 <- sub("_\\d+", "", names(dataf1))
tbl <- table(nm1) > 1
merged.stack(dataf1, var.stubs = names(tbl)[tbl], sep="_")
I'm not sure your example is right. For example in the third row you've got values for both age_1 and age_3, then in the desired output NA for that row.
If I've understood what you're trying to do though, it will be much easier if you transpose columns to rows, fix them and then transpose back again. Try this as a start point using the 'tidyverse' of dplyr and tidyr.
library(tidyverse)
library(stringr)
age_1 <- c(43, 23, 65, NA, 45)
age_2 <- c(NA, NA, NA, NA, NA)
age_3 <- c(NA, NA, 92, NA, 82)
country_1 <- c("fra", NA, "spa", NA, NA)
country_2 <- c(NA, NA, NA, "wez", NA)
country_3 <- c(NA, "ija", NA, "fda", NA)
country_4 <- c(NA, NA, NA, NA, NA)
hight_1 <- c(183, NA, 198, NA, 182)
dataf1 <- data.frame(age_1,age_2,age_3,country_1,country_2, country_3,country_4,hight_1)
data <- dataf1 %>%
mutate(row_num = row_number()) %>% #create a row number to track values
gather(key, value, -row_num) %>% #flatten your data
drop_na() %>% #drop na rows
mutate(key = str_replace(key, "_.", "")) %>% #remove the '_x' part of names
group_by(row_num) %>%
top_n(1) %>%
spread(key, value) #pivot back to columns
For your example you need the group_by() and top_n() lines to make it run because you've got multiple values in the same row. If you only have one value (as I think you should?) then you can remove these two lines. It will be better without them because then it won't run if your data is wrong.
Edit following comment below. This will make any duplicated entries NA.
data <- dataf1 %>%
mutate(row_num = row_number()) %>% #create a row number to track values
gather(key, value, -row_num) %>% #flatten your data
drop_na() %>% #drop na rows
mutate(key = str_replace(key, "_.", "")) %>% #remove the '_x' part of names
group_by(row_num, key) %>%
mutate(count = n()) %>% #count how many entries for each row/key combo
mutate(value = ifelse(count > 1, NA, value)) %>% #set NA for rows with duplicates
drop_na() %>%
spread(key, value) %>% #pivot back to columns
select(-count) #drop the `count` variable

Resources