Have tidygraph/igraph a random forest crawling algorithm within?

Have tidygraph/igraph a random forest crawling algorithm within? - r

set.seed(1)
play_erdos_renyi(n = 1000,m =20000) %>%
mutate(name = cur_group_rows()) -> g
I would like to "crawl" within a tidygraph starting from a set of roots.
For example, I could sample a forest of Galton-Watson Trees with:
a %>% as_tibble() %>% sample_n(10) %>%
pull(name) -> roots
a %>% group_by(name) %>%
mutate(crawl_stage = ifelse(name %in% roots,0,NA),
parent = NA,
k = ifelse(crawl_stage == 0,
rpois(1,.5),
0)
) %>% ungroup() -> a
a %>% as_tibble %>%
filter(crawl_stage == 0,
k>0) %>% select(name,ego,k) %>% rowwise() %>%
mutate(picks = list(sample(ego,k))) %>%
select(name = picks, parent = name) %>%
unnest(cols = c(name)) -> stage
a %>% group_by(name) %>%
mutate(crawl_stage = ifelse(name %in% stage$name,1,crawl_stage),
parent = ifelse(name %in% stage$name,
stage$parent[stage$name == name],
parent),
k = ifelse(crawl_stage == 1,
rpois(1,.5),
k)
) %>% ungroup() -> a
Then I just have to i-loop the stages until sum(k[crawl_stage == i]) == 0.
But I am curious if this algorithm isn't already implemented in the packages. I think so, but maybe not necessarily with the tracking column parent that I set on.

Related

How to make this function dynamic? R function

i have a function in R that generates a table graph picking data from a dataframe and every time i want to pass a different variable (column name from dataframe) i have to repeat the code. So sometimes it can be the variable and sometimes the variableb, other times the variablec... etc.
generates_table_variablea <- function(data) { ## how to pass the column = variablea here like this
####### function(data, column = variablea) .. ???
big_data <- data %>%
group_by(a, b, c, d) %>%
mutate(total_categoria_abs = sum(abs(f))) %>%
mutate(volume_negativo = if_else(variablea < 0, f, 0)) %>%
mutate(volume_positivo = if_else(variablea > 0, f, 0)) %>%
mutate(total = sum(volume_positivo) - sum(volume_negativo)) %>%
mutate(e = if_else(variablea < 0, sum(variablea), 0)) %>%
ungroup() %>%
filter (variablea < 0) %>%
group_by(a, b, c, d) %>%
summarise(e = mean(e), vendas = sum(f*-1), frac_vendas = vendas*-1/mean(total_categoria_abs)) %>%
arrange(e) %>%
ungroup()
big_data$frac_vendas <- round(big_data$frac_vendas, digits = 2)
big_data$e <- round(big_data$e, digits = 0)
}
If I want to change this variable, I have to do the follow:
generates_table_variableb <- function(data) { ## HERE IT WILL BE function(data, column = variableb)...
big_data <- data %>%
group_by(a, b, c, d) %>%
mutate(total_categoria_abs = sum(abs(f))) %>%
mutate(volume_negativo = if_else(variableb < 0, f, 0)) %>% #### HERE I NEED TO CHANGE ALWAYS TO VARIABLEA, VARIABLEB, VARIABLEC...
mutate(volume_positivo = if_else(variableb > 0, f, 0)) %>%
mutate(total = sum(volume_positivo) - sum(volume_negativo)) %>%
mutate(e = if_else(variablea < 0, sum(variableb), 0)) %>%
ungroup() %>%
filter (variableb < 0) %>%
group_by(a, b, c, d) %>%
summarise(e = mean(e), vendas = sum(f*-1), frac_vendas = vendas*-1/mean(total_categoria_abs)) %>%
arrange(e) %>%
ungroup()
big_data$frac_vendas <- round(big_data$frac_vendas, digits = 2)
big_data$e <- round(big_data$e, digits = 0)
}
Having multiple functions doing the same thing is slowing down my code...
How could this be better? All that I want is to pass this column dynamically.

This is one of the way
library(dplyr)
x <- data.frame(v1=1:3, v2=4:6)
f <- function(data, var1){
x %>% select(!!var1)
}
f(x, quo(v1))
You can see more explanation in https://adv-r.hadley.nz/quasiquotation.html

I found a other away that works too:
generates_table_variablea <- function(dataframe, variable) { ## Here pass variable
big_data <- dataframe %>%
group_by(a, b, c, d) %>%
mutate(total_categoria_abs = sum(abs(f))) %>%
mutate(volume_negativo = if_else(.data[[variable]] < 0, f, 0)) %>%
mutate(volume_positivo = if_else(.data[[variable]] > 0, f, 0)) %>%
mutate(total = sum(volume_positivo) - sum(volume_negativo)) %>%
mutate(e = if_else(.data[[variable]] < 0, sum(variablea), 0)) %>%
ungroup() %>%
filter (.data[[variable]] < 0) %>%
group_by(a, b, c, d) %>%
summarise(e = mean(e), vendas = sum(f*-1), frac_vendas = vendas*-1/mean(total_categoria_abs)) %>%
arrange(e) %>%
ungroup()
big_data$frac_vendas <- round(big_data$frac_vendas, digits = 2)
big_data$e <- round(big_data$e, digits = 0)
}
Only replace the variable by .data[[variable]] and you can pass any column inside the function.

R sf::st_convex_hull() losing sf class on data with multiple groups

I am trying to create a 95% minimum convex polygon using sf in R. My code works fine as long as I only group my data on 1 variable, but when I group on two variables, the output loses its sf class and becomes a grouped_df instead.
Here is a toy dataset as an example
library(dplyr)
set.seed(12)
toy <- tibble::tibble(
ID = rep(c(1,2), each = 10),
year = rep(c(1,2), 10),
lat = runif(20, 1, 10),
long = runif(20, 1, 10)
) %>%
sf::st_as_sf(., coords = c("long", "lat"))
toy %>%
group_by(ID) %>%
summarize(.groups = "keep") %>%
mutate(cent = sf::st_centroid(geometry)) %>%
sf::st_cast(to = "POINT") %>%
mutate(dist = sf::st_distance(geometry, cent, by_element = TRUE)) %>%
filter(dist <= quantile(dist, .95)) %>%
summarize() %>%
sf::st_convex_hull() %>%
class()
This gives the output I want. But when I try to group by two variables, the result loses the sf class.
toy %>%
group_by(ID, year) %>%
summarize(.groups = "keep") %>%
mutate(cent = sf::st_centroid(geometry)) %>%
sf::st_cast(to = "POINT") %>%
mutate(dist = sf::st_distance(geometry, cent, by_element = TRUE)) %>%
filter(dist <= quantile(dist, .95)) %>%
summarize() %>%
sf::st_convex_hull() %>%
class
Is there something in my code keeping me from being able to group on two variables?

It's because the second summarize is regrouping on ID. You need a .groups = "keeps" there to pass the same grouping through and then convert back to an sf object. Alternatively you could create a grouping variable mutate(grp = paste0(ID, year)) and do group_by(grp) .
toy %>%
group_by(ID, year) %>%
summarize(.groups = "keep") %>%
mutate(cent = sf::st_centroid(geometry)) %>%
sf::st_cast(to = "POINT") %>%
mutate(dist = sf::st_distance(geometry, cent, by_element = TRUE)) %>%
filter(dist <= quantile(dist, .95)) %>%
summarize(.groups = "keep") %>%
sf::st_convex_hull() %>%
st_sf()

A question, in a certain context, on the meaning of complete, nesting, and replace

In
my_Orders = PreviousOrders %>%
full_join(select(Other_Orders_Redist, Type, Date, ReValue = Previous_Sales, PreviousWeeks),
by = c('Type', 'Send_Date' = 'Date', 'WeeksPrior')) %>%
mutate(Previous_Sales = ifelse(is.na(ReValue), Previous_Sales, ReValue)) %>%
select(Type, Date = Send_Date, Previous_Sales, PreviousWeeks) %>%
complete(Date = seq.Date(min(HistoricalSales$Date), max(Preious_weeks_orders$Send_Date), by="week"), nesting(Type, PreviousWeeks)) %>%
filter(Date <= FinalWeek + weeks(PreviousWeeks + 1)) %>%
mutate(PreviousWeeks = paste0('Orders_Min_', PreviousWeeks + 1, '_Weeks')) %>%
replace(., is.na(.), 0) %>%
spread(PreviousWeeks, Previous_Sales)
what do the following lines mean? Can someone explain in the context of the example?
Line #1
complete(Date = seq.Date(min(HistoricalSales$Date), max(Previous_weeks_orders$Send_Date), by="week"), nesting(Type,PreviousWeeks)) %>%
Line #2
replace(., is.na(.), 0) %>%
My guess is that the replace replaces all of the NA in the data frame by 0 but I am not sure. Is that indeed the case?

Trouble using base-R `rep()` with dplyr

library(tidyverse)
x <- c(0, 20, 30, 58)
n <- 100
df <- data_frame(x, n) %>%
distinct() %>%
filter(x >= 0 & x < n) %>%
arrange(x) %>%
bind_rows(data_frame(x = n)) %>%
mutate(lag_x = lag(x)) %>%
mutate(y = x - lag_x) %>%
filter(!is.na(y))
rep(seq_along(df$x), df$y)
The code above works fine. When I try and make the last step pipe into everything it breaks. What's off on my syntax? I want everything to be one long pipe. I think it's impossible because I'd need my whole pipe (the code chunk above) nested inside my rep() call? Is that correct?
library(tidyverse)
x <- c(0, 20, 30, 58)
n <- 100
df <- data_frame(x, n) %>%
distinct() %>%
filter(x >= 0 & x < n) %>%
arrange(x) %>%
bind_rows(data_frame(x = n)) %>%
mutate(lag_x = lag(x)) %>%
mutate(y = x - lag_x) %>%
filter(!is.na(y)) %>%
rep(seq_along(x), y) %>%
print()
Error in function_list[i] : object 'y' not found

As it is outside the mutate/summarise functions, we need to pull or extract it
data_frame(x, n) %>%
distinct() %>%
filter(x >= 0 & x < n) %>%
arrange(x) %>%
bind_rows(data_frame(x = n)) %>%
mutate(lag_x = lag(x)) %>%
mutate(y = x - lag_x) %>%
filter(!is.na(y)) %>%
{rep(seq_along(.$x), .$y)}
Or it can also be written as
data_frame(x, n) %>%
distinct() %>%
filter(x >= 0 & x < n) %>%
arrange(x) %>%
bind_rows(data_frame(x = n)) %>%
mutate(lag_x = lag(x)) %>%
mutate(y = x - lag_x) %>%
filter(!is.na(y)) %>%
summarise(n = list(rep(row_number(), y))) %>%
pull(n)

R - dplyr bootstrap issue

I have an issue understanding how to use the dplyr bootstrap function properly.
What I want is to generate a bootstrap distribution from two randomly assigned groups and compute the difference in means, like this for example :
library(dplyr)
library(broom)
data(mtcars)
mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
The issue is that I need to repeat this operation 100, 1000, or more times.
Using replicate, I can do
frep = function(mtcars) mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
replicate(1000, frep(mtcars = mtcars), simplify = T) %>% unlist()
and get the distribution
I don't really get how to use bootstraphere. How should I start ?
mtcars %>%
bootstrap(10) %>%
mutate(treat = sample(c(0, 1), 32, replace = T))
mtcars %>%
bootstrap(10) %>%
do(tidy(treat = sample(c(0, 1), 32, replace = T)))
It's not really working. Where should I put the bootstrap pip ?
Thanks.

In the do step, we wrap with data.frame and create the 'treat' column, then we can group by 'replicate' and 'treat' to get the summarised output column
mtcars %>%
bootstrap(10) %>%
do(data.frame(., treat = sample(c(0,1), 32, replace=TRUE))) %>%
group_by(replicate, treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
#or as 1 occurs second and 0 second, we can also use
#summarise(m = last(m) - first(m))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Have tidygraph/igraph a random forest crawling algorithm within? - r

Related

How to make this function dynamic? R function

R sf::st_convex_hull() losing sf class on data with multiple groups

A question, in a certain context, on the meaning of complete, nesting, and replace

Trouble using base-R `rep()` with dplyr

R - dplyr bootstrap issue

Categories

Resources