combining groupby() and bi_class - r

The code below returns what I need. However, when I tried to include groupby() function, I got an error:
Error in bi_class(., IDD_nhmap, x = Zip_Black, y = svi, style = "quantile", :
A logical scalar must be supplied for 'keep_factors'. Please provide either 'TRUE' or 'FALSE'.
# The code before including groupby function
IDD_nhmap <- bi_class(IDD_nhmap, x = Zip_Black, y = svi, style = "quantile", dim = 3)
# The code after including groupby function
IDD_nhmap <- IDD_nhmap %>%
group_by(ProjectID) %>%
bi_class(IDD_nhmap, x = Zip_Black, y = svi, style = "quantile", dim = 3)

TL;DR: remove the IDD_nhmap from your call to bi_class.
In your second use, you are passing the frame to bi_class twice which is incorrect.
In a %>%-pipe, the data as it appears in the pipe is passed as the first argument to the next function; this can be specified (or repeated) by using the . placeholder. Your code therefore is really something like this:
IDD_nhmap <- IDD_nhmap %>%
group_by(., ProjectID) %>%
bi_class(., IDD_nhmap, x = Zip_Black, y = svi, style = "quantile", dim = 3)
For group_by, this makes sense: it expects the first argument to be a frame (equivalent to .data = .) and all remaining unnamed arguments are taken as the symbols for grouping variables.
For bi_class, the . is placed in the first argument (.data = . again), which means your first unnamed argument is interpreted as the next not-yet-used argument. The arguments listed in ?bi_class are:
bi_class(.data, x, y, style, dim = 3, keep_factors = FALSE, dig_lab = 3)
Since you explicitly name x, y, style, and dim, the first unused argument is keep_factors, so your call is effectively:
IDD_nhmap <- IDD_nhmap %>%
group_by(., ProjectID) %>%
bi_class(., keep_factors = IDD_nhmap, x = Zip_Black, y = svi, style = "quantile", dim = 3)
which is obviously not correct. Your first step should be
IDD_nhmap <- IDD_nhmap %>%
group_by(ProjectID) %>%
bi_class(x = Zip_Black, y = svi, style = "quantile", dim = 3)
However, you are still not likely to get what you are hoping for. While I don't know the bi_class function personally, it does not look for the grouping attributes that dplyr::group_by adds to the data, so the results from this call will be the same as your first (ungrouped) call. A hasty attempt at this might be:
IDD_nhmap <- IDD_nhmap %>%
group_by(., ProjectID) %>%
do(bi_class(., IDD_nhmap, x = Zip_Black, y = svi, style = "quantile", dim = 3))
though do is superseded. Untested, perhaps you can try
IDD_nhmap <- IDD_nhmap %>%
group_by(., ProjectID) %>%
summarize(
bi = bi_class(cur_data(), IDD_nhmap, x = Zip_Black, y = svi, style = "quantile", dim = 3)
)
to get a nested result (bi will be a list-column), over to you how you intend to utilize this.

Related

error with bi_class function: breaks are not unique

Using the bi_class(), I am trying to create mapping classes for a bivariate map. These data will be stored in a new variable named bi_class, which will be added to the given data object.
The code below returns an error of
Error in cut.default(.data[[var]], breaks = classInt::classIntervals(.data[[var]],:'breaks' are not unique
IDD_nhmap <- IDD_nhmap %>%
group_by(ProjectID) %>%
bi_class(x = race_black, y = svi, style = "quantile", dim = 3) %>%
bi_class(x = race_hisp, y = svi, style = "quantile", dim = 3)

r - passing unquoted variables to plotly formula

I am trying to pass unquoted arguments to plotly(). If I call the column as-is (just the name), it works fine but if I try to pass the column name within a function like paste() it fails. It also works with negative numbers but not positive ones. In dplyr, I'd use curly-curly {{x}} without a problem but plotly() wants formulas to be passed so I'm a bit at a loss.
library(plotly)
library(tidyverse)
fn <- function(text, at_y) {
mpg |>
count(class) |>
plot_ly(x = ~class, y = ~n, type = "bar", color = I("grey")) |>
add_annotations(
text = enquo(text), # <---
y = enquo(at_y), # <---
showarrow = FALSE
)
}
# ok ----
fn(text = n, at_y = n)
fn(text = n, at_y = -1)
fn(text = -123, at_y = n)
# not ok ----
# positive integer
fn(text = n, at_y = 30)
#> Error in parent.env(x) : the empty environment has no parent
# used in a function
fn(text = paste("N=", n), at_y = n)
#> Error in paste("N=", n) :
#> cannot coerce type 'closure' to vector of type 'character'
As #MrFlick said in a comment, the rlang constructs used in tidyverse won't necessarily work in non-tidyverse packages. Here's a version of your function that does work, since it uses base methods to do the non-standard evaluation:
fn <- function(text, at_y) {
data <- mpg |> count(class)
at_y <- eval(substitute(at_y), data)
text <- eval(substitute(text), data)
data |>
plot_ly(x = ~class, y = ~n, type = "bar", color = I("grey")) |>
add_annotations(
text = text, # <---
y = at_y, # <---
showarrow = FALSE
)
}
You want to evaluate the expressions passed as text and at_y in the context of the tibble mpg |> count(class), and that's something that is done by the two lines calling substitute. This isn't identical to the rlang evaluation, but it's close enough.

How to scale point size and color at the same time in ggvis?

Considering a data.frame like this:
df <- data.frame(t = rep(seq(from=as.POSIXct('00:15:00',format='%H:%M:%S'),
to=as.POSIXct('24:00:00',format='%H:%M:%S'),by='15 min'),times=2),
y = c(rnorm(96,10,10),rnorm(96,40,5)),
group = factor(rep(1:2,each=96)),
type = factor(rep(1:3,each=64)))
Using ggvis, I want to generate a point-line plot in which the line is grouped by group. The size of points with type==3 should be 100 while the size of points withtype==1 and type==2 are all 50. The colour of the points should be green, blue and red corresponding to type1,type2 and type3. Here is my ggvis code:
df <- data.frame(df,id=1:nrow(df))
all_values <- function(x) {
if(is.null(x)) return(NULL)
row <- df[df$id == x$id, ]
paste0(names(row), ": ", format(row), collapse = "<br />")
}
ggvis(data=df,x=~t,y=~y,stroke=~group) %>%
layer_points(fill=~type,size=~type, key:=~id, fillOpacity := 0.5,
fillOpacity.hover := 0.8,size.hover := 500) %>%
scale_nominal("size",domain = c(1,2,3), range = c(50,50,100)) %>%
scale_nominal("fill",domain = c(1,2,3), range = c('green','blue','red')) %>%
layer_lines() %>%
add_tooltip(all_values,'click') %>%
add_legend(scales=c("fill","size"), properties = legend_props(legend = list(y = 150))) %>%
set_options(duration = 0) %>%
add_axis(type="x",format="%H:%M")
I get the error of Error: length(x) not less than or equal to 2.
Why this happened and how can I fix it?
It turns out that scale_nominal("size",domain = c(1,2,3), range = c(50,50,100)) should be replaced by scale_nominal("size",domain = c(1,2,3), range = c('50','50','100')).
The culprit for the error is more than 2 values defined for range. The definition for range suggests : For numeric values, the range can take the form of a two-element array with minimum and maximum values.
For ordinal data, the range may by an array of desired output values, which are mapped to elements in the specified domain. In this case, value should be defined in character.
This should resolve your error.

Insert a blank column in dataframe

I would like to insert a blank column in between "Delta = delta" and "Card = vars" in the dataframe below. I would also like to sort the output by the column "Model_Avg_Error" in the dataframe as well.
df = data.frame(Card = vars, Model_Avg_Error = model_error, Forecast = forecasts, Delta = delta, ,Card = vars, Model_Avg_Error = model_error,
Forecast = forecasts, Delta = delta)
# save
write.csv(df, file = file.path(proj_path, "output.csv"), row.names = F)
This was the error received from above:
Error in data.frame(Card = vars, Model_Avg_Error = model_error, Forecast = forecasts, :
argument is missing, with no default
You can add your blank column, re-order, and sort using the code below:
df$blankVar <- NA #blank column
df[c("Card", "blankVar", "Model_Avg_Error", "Forecast", "Delta")] #re-ordering columns by name
df[order(df$Model_Avg_Error),] #sorting by Model_Avg_Error
Here's a general way to add a new, blank column
library(tibble)
# Adds after the second column
iris %>% add_column(new_col = NA, .after = 2)
# Adds after a specific column (in this case, after Sepal.Width)
iris %>% add_column(new_col = NA, .after = "Sepal.Width")

How to add uncertain number of string arguments in a UDF using dplyr

I want to pass a string or several strings to a function by using dplyr but somehow it only takes the first variable in the argument but ignore others
library(lazyeval)
plotGenerationFct = function(data,..., targetVariable){
result = data %>% select_(..., targetVariable) %>% group_by_(...) %>% summarise_(mean= interp(~mean(var, na.rm = TRUE), var = as.name(targetVariable)))
return(result)
}
And the expressions below give me the same result
plotGenerationFct(diamonds, c("cut"), targetVariable = "price")
plotGenerationFct(diamonds, c("cut","color"), targetVariable = "price")
plotGenerationFct(diamonds, c("cut","color","clarity"), targetVariable = "price")
The standard evaluation version of the dplyr functions are net set up to accept vectors as standard parameters. For that use the .dots= parameter
plotGenerationFct = function(data, vars, targetVariable){
result = data %>% select_(.dots=c(vars, targetVariable)) %>%
group_by_(.dots=vars) %>%
summarise_(mean= interp(~mean(var, na.rm = TRUE), var = as.name(targetVariable)))
return(result)
}
So these are all the same
select(diamonds, cut, color)
select_(diamonds, "cut", "color")
select_(diamonds, .dots=c("cut", "color"))

Resources