library(highcharter)
library(dplyr)
library(viridisLite)
library(forecast)
library(treemap)
data("Groceries", package = "arules")
dfitems <- tbl_df(Groceries#itemInfo)
set.seed(10)
dfitemsg <- dfitems %>%
mutate(category = gsub(" ", "-", level1),
subcategory = gsub(" ", "-", level2)) %>%
group_by(category, subcategory) %>%
summarise(sales = n() ^ 3 ) %>%
ungroup() %>%
sample_n(31)
hctreemap2(group_vars = c("category","subcategory"),
size_var = "sales")%>%
hc_tooltip(pointFormat = "<b>{point.name}</b>:<br>
Pop: {point.value:,.0f}<br>
GNI: {point.colorValue:,.0f}")
the error is the following
Error in hctreemap2(., group_vars = c("category", "subcategory"), size_var = "sales") : Treemap data uses same label at multiple levels.
I tried everything and it doesn't work out, could someone with experience explain to me what is happening?
When I tried your code, it also stated that the function was deprecated and to use data_to_hierarchical. Although, it's never quite that simple, right? I tried multiple ways to get hctreemap2 to work, but wasn't able to discern that issue. From there I turned to the package recommended data_to_hierarchical. Now that worked without an issue--once I figured out the right type, which in hindsight seemed kind-of obvious.
That being said, this is what I've got:
data_to_hierarchical(data = dfitemsg,
group_vars = c(category,subcategory),
size_var = sales) %>%
hchart(type = "treemap") %>%
hc_tooltip(pointFormat = "<b>{point.name}</b>:<br>
Pop: {point.value:,.0f}<br>
GNI: {point.colorValue:,.0f}")
You didn't actually designate a color, so the GNI comes up blank.
Let me know if you run into any issues.
Based on your comment:
I have not found a way to change the color to density, which is what both hctreemap2 and treemap appear to do. The function data_to_heirarchical codes the colors to the first grouping variable or the level 1 variable.
Inadvertently, I did figure out why the function hctreemap2 would not work. It checks to see if any category labels are the same as a subcategory label. I didn't go through all of the data, but I know there is a perfumery perfumery. I don't understand what that's a hard stop. If that is a problem for this call, why wouldn't data_to_heirchical be looking for this issue, as well?
So, I changed the function. First, I called the function itself.
x = hctreemap2
Then I selected it from the environment pane. Alternatively, you can code View(x).
This view is read-only, but it's easier to read than the console. I copied the function and assigned it to its original name with changes. I removed two pieces of the code, which changed nothing structurally speaking to how the chart is created.
I removed the first line of code in the function:
.Deprecated("data_to_hierarchical")
and this code (about a third of the way down)
if (data %>% select(!!!group_syms) %>% map(unique) %>% unlist() %>%
anyDuplicated()) {
stop("Treemap data uses same label at multiple levels.")
}
This left me to recreate the function with this code:
hctreemap2 <- function (data, group_vars, size_var, color_var = NULL, ...)
{
assertthat::assert_that(is.data.frame(data))
assertthat::assert_that(is.character(group_vars))
assertthat::assert_that(is.character(size_var))
if (!is.null(color_var))
assertthat::assert_that(is.character(color_var))
group_syms <- rlang::syms(group_vars)
size_sym <- rlang::sym(size_var)
color_sym <- rlang::sym(ifelse(is.null(color_var), size_var, color_var))
data <- data %>% mutate_at(group_vars, as.character)
name_cell <- function(..., depth) paste0(list(...),
seq_len(depth),
collapse = "")
data_at_depth <- function(depth) {
data %>%
group_by(!!!group_syms) %>%
summarise(value = sum(!!size_sym), colorValue = sum(!!color_sym)) %>%
ungroup() %>%
mutate(name = !!group_syms[[depth]], level = depth) %>%
mutate_at(group_vars, as.character()) %>% {
if (depth == 1) {
mutate(., id = paste0(name, 1))
}
else {
mutate(.,
parent = pmap_chr(list(!!!group_syms[seq_len(depth) - 1]),
name_cell, depth = depth - 1),
id = paste0(parent, name, depth))
}
}
}
treemap_df <- seq_along(group_vars) %>% map(data_at_depth) %>% bind_rows()
data_list <- treemap_df %>% highcharter::list_parse() %>%
purrr::map(~.[!is.na(.)])
colorVals <- treemap_df %>%
filter(level == length(group_vars)) %>% pull(colorValue)
highchart() %>%
hc_add_series(data = data_list, type = "treemap",
allowDrillToNode = TRUE, ...) %>%
hc_colorAxis(min = min(colorVals), max = max(colorVals), enabled = TRUE)
}
Now your code, as originally written will work. You did not change the highcharter package by doing this. So if you think you'll use it in the future save the function code, as well. You will need the library purrr, since you already called dplyr (where most, if any conflicts occur), you could just call tidyverse (which calls several libraries at one time, including both dplyr and purrr).
This is what it will look like with set.seed(10):
If you drill down on the largest block:
It looks odd to me, but I'm guessing that's what you were looking for to begin with.
Related
I have a code snippet which I am trying to convert into a function. This function is supposed to look for potential spelling errors in a manual-entry field. The snippet works and you can try it out like this, using the starwars data from the tidyverse package:
require(tidyverse)
require(rlang) # loaded for {{ to force function arguments as well as the with_env() function
require(RecordLinkage) # loaded for the jarowinkler() function
starwars_cleaning <- starwars %>%
add_count(name, name = "Freq_name") %>% # this keeps track of which spelling is more frequent
distinct(name, .keep_all = T) %>% # this prevents duplicated comparisons and self-comparisons
nest_by(homeworld, .key = ".Nest") %>%
mutate(Mapped = list(imap_dfr(.x = .Nest$name,
.f = ~jarowinkler(str1 = .x,
str2 = .Nest$name[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list(.Nest$name[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
)))
The function should accept the variable(s) to nest on (ellipses) and the variable to look for potential misspelled matches in as arguments. Right now, it looks like this:
string_matching <- function(.df, .string_col, ...){
.df$.tmp_string <- .df %>% select({{.string_col}})
.df <- .df %>%
add_count(.tmp_string, name = "Freq_name") %>%
distinct(.tmp_string, .keep_all = T) %>%
nest_by(..., .key = ".Nest") %>%
mutate(Mapped_n = list(with_env(env = current_env(), # same error with or without specifying the execution environment for imap
expr = imap_dfr(.x = .Nest$.tmp_string,
.f = ~jarowinkler(str1 = .x,
str2 = .Nest$.tmp_string[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list(.Nest$.tmp_string[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
)
))
)
return(.df)
}
starwars %>%
string_matching(name, homeworld)
On the starwars data, it isn't very useful, clearly. And I cut down some of the features of this code to get a MWE--but that's the idea. When I wrap the code up like this in a function, it returns invalid argument to unary operator (apparently caused by the [-.y]). I tried the force() command after reading this post since this problem apparently comes up a lot. Because of the current error and that post, I thought the problem might have to do with the function environment causing imap_dfr() to lose track of the data somehow. I tried to wrap the call to map in with_env() and instruct it to use the function environment rather than its own. I also tried to break up the function by assigning an intermediate object to the global environment so that it could be found in the mapping step of the function:
assign(x = "TEMP", value = .df$.Nest, envir = global_env())
That landed me with the same 'unary operator` error. I'm not sure what to try next. I seem to be going in circles. Any insights into what is causing this problem and how to fix it would be greatly appreciated.
I don't think the post you pointed to is really related here. I don't think your problem is related to execution environment. The problem really is how you've handled passing the variable to your function. When you create your tmp_string, you are calling select() which is returning a tibble rather than the vector of column values. Instead, use pull() to extract those values.
string_matching <- function(.df, .string_col, ...){
.df$.tmp_string <- .df %>% pull({{.string_col}})
.df <- .df %>%
add_count(.tmp_string, name = "Freq_name") %>%
distinct(.tmp_string, .keep_all = T) %>%
nest_by(..., .key = ".Nest") %>%
mutate(Mapped_n = list(with_env(env = current_env(), # same error with or without specifying the execution environment for imap
expr = imap_dfr(.x = .Nest$.tmp_string,
.f = ~jarowinkler(str1 = .x,
str2 = .Nest$.tmp_string[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list(.Nest$.tmp_string[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
)
))
)
return(.df)
}
Or you could write your code to avoid the need for that temp column completely
string_matching <- function(.df, .string_col, ...){
col <- rlang::ensym(.string_col)
.df <- .df %>%
add_count(!!col, name = "Freq_name") %>%
distinct(!!col, .keep_all = T) %>%
nest_by(..., .key = ".Nest") %>%
mutate(Mapped_n = list(imap_dfr(.x = .Nest %>% pull(!!col),
.f = ~jarowinkler(str1 = .x,
str2 = (.Nest %>% pull(col))[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list((.Nest %>% pull(col))[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
))
)
return(.df)
}
I'm just starting with r, so this may very well be a very simple question but...
I've tried changing the name in 'a' to be more elaborate but this makes no difference
If I try to assign it to a variable
(e.g. baseline <- a %>% filter(Period == "Baseline") %>% group_by(File)%>%
It just tells me:
"Error in a %>% filter(Period == "Baseline") %>% group_by(File) %>% :
could not find function "%>%<-"
I'd really be grateful for any help with this.
It keeps telling me "Error in a(.) : could not find function "a"
and that it is unable to find Baseline_MAP even though it is defined earlier.
in mutate(Delta_MAP = Group_MAP - Baseline_MAP,
a <- read_csv("file.csv")
summary(a)
a %>%
filter(Period == "Baseline") %>%
group_by(File)%>%
summarise(Baseline_MAP = mean(MAP_Mean, na.rm=T),
Baseline_SBP = mean(SBP_Mean, na.rm=T),
Baseline_LaserMc1 = mean(Laser1_Magic, na.rm=T),
Baseline_Laser1 = mean(Laser1_Mean, na.rm=T))%>%
a%>%
filter(Period != "Baseline") %>%
group_by(File)%>%
summarise(Group_MAP = mean(MAP_Mean, na.rm=T),
Group_SBP = mean(SBP_Mean, na.rm=T),
Group_Laser_1Magic = mean(Laser1_Magic, na.rm=T),
Group_Laser_1 = mean(Laser1_Mean, na.rm=T))
a%>%
mutate(Delta_MAP = Group_MAP - Baseline_MAP,
Delta_MAP_Log = log(Group_MAP)-log(Baseline_MAP),
Delta_SBP = Group_SBP - Baseline_SBP,
Delta_SBP_Log = log(Group_SBP)-log(Baseline_SBP),
Delta_Laser1_Magic = Group_Laser_1Magic - Baseline_LaserMc1,
Delta_Laser1_Log = log(Group_Laser_1Magic)-log(Baseline_LaserMc1))
%>% is from the package "dplyr". So make sure you load it, i.e. library(dplyr).
Next, %>% does not assign the result to a variable. I.e.
a %>% mutate(foo=bar(x))
does not alter a. It will just show the result on the console (and none if you are running the script or calling it from a function).
You might be confusing the pipe-operator with %<>% (found in the package magrittr) which uses the left-hand variable as input for the pipe, and overwrites the variable with the modified result.
Finally, when you write
If I try to assign it to a variable (e.g. baseline <- a %>% filter(Period == "Baseline") %>% group_by(File)%>%)
You are assigning the result from the pipeline to a variable baseline -- this however does not modify the variable-names in the data frames (i.e. the column names).
I have made a function and then returning an object named final, However when I try to access the object outside of the function it give me error object not found.
I am not sure where I am getting wrong this seems to be fairly simple and correct, when I try to exclucde the function and just run the statements, I am able to access the final object only when trying to return the object I am not able to do so.
I am not sure why this is happening.
myfunction <- function(lo,X_train,y_train,X_test,y_test,pred){
loan_number<-as.numeric(testing$lo)
xgb.train = xgb.DMatrix(data=X_train,label=y_train)
xgb.test = xgb.DMatrix(data=X_test,label=y_test)
explainer = buildExplainer(xgb,xgb.train, type="binary", base_score = 0.5, trees = NULL)
pred.breakdown = explainPredictions(xgb, explainer, X_test)
pred.breakdown<-as.data.frame(pred.breakdown)
pred.breakdown <- pred.breakdown %>% do(.[!duplicated(names(.))])
pred_break<-pred.breakdown %>%
#Create an id by row
dplyr::mutate(id=1:n()) %>%
#Reshape
pivot_longer(cols = -id) %>%
#Arrange
arrange(id,-value) %>%
#Filter top 5
group_by(id) %>%
dplyr::mutate(Var=1:n()) %>%
filter(Var<=5) %>%
select(-c(value,Var)) %>%
#Format
dplyr::mutate(Var=paste0('Attribute',1:n())) %>%
pivot_wider(names_from = Var,values_from=name) %>%
ungroup() %>%
select(-id)
pred_break<-as.data.frame(pred_break)
prop_score<-pred
final<-as.data.frame(cbind(loan_number,prop_score,pred_break))
print("final exec")
return(final)
}
myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
final<-as.data.frame(final)
Printing final exec to check if everything is working or not , Apparently it's weird that I am not able to access the final object which is passed to return statement.
R is primarily a functional programming language with lexical scoping. This line:
myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
runs your function and returns the VALUE of final, but it's returning to the console. The function's return needs to be assigned to another variable in order to be used, like #Duck suggests:
final <- myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
This is different than final in your function. That final is inaccessible outside of the function.
I'm new to R and I don't know all basic concepts yet. The task is to produce a one merged table with multiple response sets. I am trying to do this using expss library and a loop.
This is the code in R without a loop (works fine):
#libraries
#blah, blah...
#path
df.path = "C:/dataset.sav"
#dataset load
df = read_sav(df.path)
#table
table_undropped1 = df %>%
tab_cells(mdset(q20s1i1 %to% q20s1i8)) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
There are 10 multiple response sets therefore I need to create 10 tables in a manner shown above. Then I transpose those tables and merge. To simplify the code (and learn something new) I decided to produce tables using a loop. However nothing works. I'd looked for a solution and I think the most close to correct one is:
#this generates a message: '1' not found
for(i in 1:10) {
assign(paste0("table_undropped",i),1) = df %>%
tab_cells(mdset(assign(paste0("q20s",i,"i1"),1) %to% assign(paste0("q20s",i,"i8"),1)))
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
Still it causes an error described above the code.
Alternatively, an SPSS macro for that would be (published only to better express the problem because I have to avoid SPSS):
define macro1 (x = !tokens (1)
/y = !tokens (1))
!do !i = !x !to !y.
mrsets
/mdgroup name = !concat($SET_,!i)
variables = !concat("q20s",!i,"i1") to !concat("q20s",!i,"i8")
value = 1.
ctables
/table !concat($SET_,!i) [colpct.responses.count pct40.0].
!doend
!enddefine.
*** MACRO CALL.
macro1 x = 1 y = 10.
In other words I am looking for a working substitute of !concat() in R.
%to% is not suited for parametric variable selection. There is a set of special functions for parametric variable selection and assignment. One of them is mdset_t:
for(i in 1:10) {
table_name = paste0("table_undropped",i)
..$table_name = df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
However, it is not good practice to store all tables as separate variables in the global environment. Better approach is to save all tables in the list:
all_tables = lapply(1:10, function(i)
df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
)
UPDATE.
Generally speaking, there is no need to merge. You can do all your work with tab_*:
my_big_table = df %>%
tab_total_row_position("none")
for(i in 1:10) {
my_big_table = my_big_table %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_stat_cpct()
}
my_big_table = my_big_table %>%
tab_pivot(stat_position = "inside_columns") # here we say that we need combine subtables horizontally
I am looking at a dataset from tidytuesday, available here:
video_games <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-30/video_games.csv")
I wrote this code to create a horizontal bar plot, ranked in descending order.
video_games %>%
top_n(10, metascore) %>%
arrange(desc(metascore)) %>%
plot_ly(x = ~metascore, y = ~fct_reorder(game, metascore),
type = "bar") %>%
layout(xaxis = list(title = "Metascore"),
yaxis = list(title = ""))
I want to reuse the code with multiple variables without copying and pasting, so I created a function with 2 entries for the variables I want to plot. (I left out the layout section. If there is a way to automatically re-lable the plot inside the function, that would be cool.)
video_games_ranking_plot <- function(A, B) {
top_n(10, A) %>%
arrange(desc(A)) %>%
plot_ly(x = ~A, y = ~fct_reorder(B, A),
type = "bar")
}
When I run the function
video_games %>%
video_games_ranking_plot(metascore, game)
... I get the error message Error in video_games_ranking_plot(., metascore, game) :
unused argument (game)
Does anyone know why?
The source of the problem seems to be that you are passing the same arguments metascore, game of one type to very different elements of your custom function that accepts arguments of different types:
top_n(10, metascore)
arrange(desc(metascore)
plot_ly(x = ~metascore, y = ~fct_reorder(game, metascore)
The fact that you are also passing columns ase arguments using piping can also pose certain challenges. I haven't found the time to build a complete solution, but hopefully this will help you on your way to a complete solution:
Plot:
Code:
library(dplyr)
library(forcats)
library(plotly)
# get data
video_games <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-30/video_games.csv")
data <- video_games
# custom function
video_games_ranking_plot <- function(data, topn, col_top, col_ord){
# select and arrand data
df <- data %>% top_n(topn, {{col_top}}) %>% arrange(desc({{col_ord}})) #%>%
col_top_name <- deparse(substitute(col_top))
col_ord_name <- deparse(substitute(col_ord))
df2<- df[c(col_top_name, col_ord_name)]
# build plotly pliot
p <- plot_ly(x = df2[[col_top_name]], y = df2[[col_ord_name]], type = "bar")
}
plt <- video_games_ranking_plot(data=video_games, topn=5, metascore, game)
plt
There's still an issue with the ~fct_reorder(game, metascore) part.
I had to raise a question myself to even get this far. Take a look at the answer from user Ronak Shah to the post How to pass a dataframe column as an argument in a function using piping? to learn more on how to pass arguments to piping functions.
I hope this helps!