Using `map`instead of `apply` inside dplyr::mutate - r

"Which 4-digit number, multiplied by 4, gives itself reversed?"
Here is an attempt to solve this relatively easy math problem, using R
library(tidyverse)
library(stringi)
expand.grid(replicate(4, 0:9, simplify = FALSE)) %>%
filter(Var1 !=0, Var4 !=0) %>%
transmute(newcol=as.numeric(do.call(paste0,.))) %>%
filter(newcol<2500) %>%
mutate(newcol2=4*newcol) %>%
filter(newcol==stri_reverse(newcol2))
A second approach was this:
expand.grid(replicate(4, 0:9, simplify = FALSE)) %>%
filter(Var1 !=0, Var4 !=0) %>%
transmute(newcol=as.numeric(do.call(paste0,.))) %>%
filter(newcol<2500) %>%
mutate(newcol2=4*newcol) %>%
filter(newcol==apply(.[c("newcol2")],1,function(x) paste0(floor(x/ 10^(0:(nchar(x) - 1))) %% 10,collapse="")))
Can you show me how to use purrr::map instead of apply in the final step?

You can change the last filter call to the following.
filter(newcol == map(newcol2, ~paste0(floor(./ 10^(0:(nchar(.) - 1))) %% 10, collapse = "")))
or this
filter(newcol == map_chr(newcol2, ~paste0(floor(./ 10^(0:(nchar(.) - 1))) %% 10, collapse = "")))

Related

Formatting of Data Frames in R

I have a data.frame with the following structure:
What I need is that in case that a value in the first column occures more than once, all corresponding entries in column V18 are concluded in one cell.
I applied the folling code.
p <- function(v) {
Reduce(f=paste0, x = v)
}
Data %>%
group_by(V1) %>%
summarise(test = p(as.character(V18))) %>%
merge(., M_TEST, by = 'V1') %>%
select(V1, V18, test)
It gives:
What I need is that instead of 4344, it is {43,44}.
How can I do this?
Thank you really much for your help!
Sincerely
Try This:
Data %>%
group_by(V1) %>%
summarise(test = p(as.character(V18))) %>%
merge(., M_TEST, by = 'V1') %>%
select(V1, V18, test) %>%
mutate(test = str_remove_all(test, pattern = "NA")) %>%
mutate(test = formatC(as.numeric(test), big.mark=",", big.interval = 2L)) %>%
mutate(test = paste0("{", test, "}"))
Edit: For Multiple Columns, this should work:
Data %>%
group_by(V1) %>%
summarise_at(vars(V2:V18), paste0, collapse="") %>%
mutate_at(vars(V2:V18), str_remove_all, pattern = "NA") %>%
mutate_at(vars(V2:V18), as.numeric) %>%
mutate_at(vars(V2:V18), formatC, big.mark=",", big.interval = 2L)

Comparing each values of each group to the minimum of the group

I have data grouped by years. I would like to replace each data where x>(minimum+2) by NA. The minimum changes each year.
I was thinking to extract the minimum by year using ddply, but i don't know how to compare each value from each year to their specific minimum...
Thanks a lot.
New <- Ancian %>%
group_by(Years) %>%
mutate_if(New$Data1, ~ replace(., . > (min(., na.rm = TRUE) + 2), NA))
I tried this, but it doesn't work...
Also tried to do a "if" function after a group_by, but no results too...
group_by(Years) %>%
if(New$Data1 > (min(New$Data, na.rm = TRUE) + 2)) {
New$Data1 <- NA }
We can use mutate_if after doing a grouping by 'years'
library(dplyr)
df1update <- df1 %>%
group_by(years) %>%
mutate_if(is.numeric, ~ replace(., . > (min(., na.rm = TRUE) + 2), NA))
If we need to do this only for a single variable
df1update <- df1 %>%
group_by(years) %>%
mutate(Data1 = replace(Data1, Data1 > (min(Data1, na.rm = TRUE) + 2), NA))

How to wirte a loop to repeat entire block code in r scripts?

I want to import 15 different datasets and clean them up. Raw dataset names are like C1_1, C2_1, C3_1 ... C15_1.
My code is as follows for the first dataset:
dataC1_1 <- read.delim("C1_1.txt",header = FALSE)
dataC1_1 <- dataC1_1[-1,-c(1,4,8:11)]
dataC1_1 <- na.omit(dataC1_1)
dataC1_1 <- dataC1_1[!(dataC1_1$V3=="Experiment"),]
dataC1_1 <- dataC1_1[!(dataC1_1$V5=="Key: Return"),]
dataC1_1 <- dataC1_1[order(dataC1_1$V6),]
dataC1_1$q_id <- strrep(c("q1","q2","q3","q4"),times = 1)
dataC1_1$response <- dataC1_1$V5 %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric
dataC1_1 <- dataC1_1[,-c(1,3,4)]
dataC1_1 <- setnames(dataC1_1,c("ad_id","rt","q_id","response"))
dataC1_1$id <- rep("C1",length(dataC1_1$q_id))
I have tried so many times with while loop and if loop, but I just could not repeat 15 times.
Anyone could help me out?
Thanks!
Create a function to apply on each of the datasets while reading the datasets in a loop
library(readr)
librar(dplyr)
library(stringr)
map2(sprintf("C%d_1", 1:15), str_c("C", 1:5), f1)
where
f1 <- function(nm, id) {
read_csv(nm) %>%
select(-c(1, 4, 8:11)) %>%
slice(-1) %>%
na.omit %>%
filter(V3 != "Experiment"| V5 != "Key: Return") %>%
arrange(V6) %>%
mutate(q_id = strrep(c("q1","q2","q3","q4"),times = 1),
response = str_match(V5,("[0-9]+") %>% unlist %>% as.numeric) %>%
select(-c(1, 3, 4)) %>%
set_names(c("ad_id","rt","q_id","response")) %>%
mutate(id = id)
}

How to count the number of negative values for each observation of a certain variable

I want to calculate total number of negative values for each observation, using previous 10 observations. I used the following code, but it does not work -
funda_addit <- funda_addit %>%
group_by(TICKER) %>%
arrange(year) %>%
mutate(NEG_EARN = rollapply(ni, 10, sum (ni<0), partial=TRUE)) %>%
ungroup()
Actually I want to create the new variable "NEG_EARN", which is the number of negative values of previous 10 observations(10 years in my data) for the variable "ni". I also use the following code, but it does not work -
funda_addit <- funda_addit %>%
group_by(TICKER) %>%
arrange(year) %>%
mutate(NEG_EARN = rollapply(ni, 10, length(which(ni<0)), partial=TRUE)) %>%
ungroup()
You could create a vector cumsum(ni < 0) and then subtract a lagged version of that vector from it
funda_addit %>%
group_by(TICKER) %>%
arrange(year) %>%
mutate(neg_earn = {cs <- cumsum(ni < 0)
cs - lag(cs, 10, default = 0)})
This is equivalent to akrun's answer if you change rollapply to rollapplyr (tested using akrun's example data)
use_cumsum <-
funda_addit %>%
group_by(TICKER) %>%
arrange(year) %>%
mutate(neg_earn = {cs <- cumsum(ni < 0)
cs - lag(cs, 10, default = 0)})
use_rollapply <-
funda_addit %>%
group_by(TICKER) %>%
arrange(year) %>%
mutate(neg_earn = rollapplyr(ni, 10, FUN = f1, partial=TRUE))
all(use_cumsum == use_rollapply)
# [1] TRUE
We can use anonymous function call (or create a new function) instead of the whole column 'ni'
library(dplyr)
library(zoo)
f1 <- function(x) sum(x < 0)
funda_addit %>%
group_by(TICKER) %>%
arrange(year) %>%
mutate(NEG_EARN = rollapplyr(ni, 10, FUN = f1, partial=TRUE)) %>%
ungroup()
EDIT: changed rollapply to rollapplyr (based on comments from #IceCreamToucan)
data
set.seed(24)
funda_addit <- data.frame(TICKER = rep(LETTERS[1:3], each = 20),
year = 1921:1940, ni = rnorm(60))

R - dplyr bootstrap issue

I have an issue understanding how to use the dplyr bootstrap function properly.
What I want is to generate a bootstrap distribution from two randomly assigned groups and compute the difference in means, like this for example :
library(dplyr)
library(broom)
data(mtcars)
mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
The issue is that I need to repeat this operation 100, 1000, or more times.
Using replicate, I can do
frep = function(mtcars) mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
replicate(1000, frep(mtcars = mtcars), simplify = T) %>% unlist()
and get the distribution
I don't really get how to use bootstraphere. How should I start ?
mtcars %>%
bootstrap(10) %>%
mutate(treat = sample(c(0, 1), 32, replace = T))
mtcars %>%
bootstrap(10) %>%
do(tidy(treat = sample(c(0, 1), 32, replace = T)))
It's not really working. Where should I put the bootstrap pip ?
Thanks.
In the do step, we wrap with data.frame and create the 'treat' column, then we can group by 'replicate' and 'treat' to get the summarised output column
mtcars %>%
bootstrap(10) %>%
do(data.frame(., treat = sample(c(0,1), 32, replace=TRUE))) %>%
group_by(replicate, treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
#or as 1 occurs second and 0 second, we can also use
#summarise(m = last(m) - first(m))

Resources