R Function defined in gloabl environment not visible in subsequent function - r

I have defined four functions. I have executed the code for each and all four appear in the global environment when I call ls().
The first two are used inside the third and this works as expected. However, when I call the third function from the fourth function I get an error message telling me that curent_month doesn't exist.
(I eliminated all code from the fourth function as the failure occurs at the first statement, so the rest is not relevant.)
I have always understood that any object defined in the global environment is available to any sub-environment (i.e., inside a function).
Can anyone point me in the right direction?
## Function returns the most recent month having billing revenues
current_month_POSIX <- function(x){
## Fetch current month name for use in label below
current_month_POSIX <- x %>%
filter(Year == 2020) %>%
filter(!is.na(Billing)) %>%
select(Month) %>%
unique()%>%
arrange() %>%
tail(1) %>%
unlist() %>%
as_datetime()
return(current_month_POSIX)
}
current_month_name <- function(x){
current_month_name <- x %>%
filter(Year == 2020) %>%
filter(!is.na(Billing)) %>%
select(Month, month_name) %>%
unique()%>%
arrange() %>%
tail(1) %>%
select(month_name) %>%
substr(.,1,3)
return(current_month_name)
}
curent_month <- function(x){
POSIX <- current_month_POSIX(x)
name <- current_month_name(x)
return(list("current_month_name" = name, "current_month_POSIX" = POSIX))
}
### Function to reduce source data to clustered bar chart table
clustered_bar_data <- function(x){
latest_month <- current_month(x)
}

current_month does not exist! You named your function curent_month.

Related

Using a for loop in R to assign value labels

Context: I have a large dataset (CoreData) with an accompanying datafile (CoreValues) that contains the code and values for each variable within the dataset.
Problem: I want to use a loop to assign each variable within the dataset (CoreData) the correct value labels (from the CoreValues data).
What I've tried so far:
I have created a character vector that identifies which variables within my main data (CoreData) have values that need to be added:
Core_VarwithValueLabels<- unique(CoreValues$Abbreviation)
I have tried a for loop using the vector created , to create vectors for both the label and level arguments that feed into the factor() function.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=paste0(i, 'Levels'), labels = paste0(i, 'Labels'))
}
This creates the correct label and level vectors, however, they are not being picked up properly within the factor function.
Question: Can you help me identify how to get my factor function to work within this loop or if there is a more appropriate method?
Sample data:
CoreValues:
example data from CoreValues
CoreData:
example data from CoreData
UPDATE: RESOLVED
I have now resolved this by using the get() function within my factor() function as it uses the strings I've created with paste0() and find the vector of that name.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=get(paste0(i, 'Levels')), labels = get(paste0(i, 'Labels')))
}

Return function not accessible in R

I have made a function and then returning an object named final, However when I try to access the object outside of the function it give me error object not found.
I am not sure where I am getting wrong this seems to be fairly simple and correct, when I try to exclucde the function and just run the statements, I am able to access the final object only when trying to return the object I am not able to do so.
I am not sure why this is happening.
myfunction <- function(lo,X_train,y_train,X_test,y_test,pred){
loan_number<-as.numeric(testing$lo)
xgb.train = xgb.DMatrix(data=X_train,label=y_train)
xgb.test = xgb.DMatrix(data=X_test,label=y_test)
explainer = buildExplainer(xgb,xgb.train, type="binary", base_score = 0.5, trees = NULL)
pred.breakdown = explainPredictions(xgb, explainer, X_test)
pred.breakdown<-as.data.frame(pred.breakdown)
pred.breakdown <- pred.breakdown %>% do(.[!duplicated(names(.))])
pred_break<-pred.breakdown %>%
#Create an id by row
dplyr::mutate(id=1:n()) %>%
#Reshape
pivot_longer(cols = -id) %>%
#Arrange
arrange(id,-value) %>%
#Filter top 5
group_by(id) %>%
dplyr::mutate(Var=1:n()) %>%
filter(Var<=5) %>%
select(-c(value,Var)) %>%
#Format
dplyr::mutate(Var=paste0('Attribute',1:n())) %>%
pivot_wider(names_from = Var,values_from=name) %>%
ungroup() %>%
select(-id)
pred_break<-as.data.frame(pred_break)
prop_score<-pred
final<-as.data.frame(cbind(loan_number,prop_score,pred_break))
print("final exec")
return(final)
}
myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
final<-as.data.frame(final)
Printing final exec to check if everything is working or not , Apparently it's weird that I am not able to access the final object which is passed to return statement.
R is primarily a functional programming language with lexical scoping. This line:
myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
runs your function and returns the VALUE of final, but it's returning to the console. The function's return needs to be assigned to another variable in order to be used, like #Duck suggests:
final <- myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
This is different than final in your function. That final is inaccessible outside of the function.

Problem with mutate keyword and functions in R

I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.

How to properly parse (?) mdsets in expss within a loop?

I'm new to R and I don't know all basic concepts yet. The task is to produce a one merged table with multiple response sets. I am trying to do this using expss library and a loop.
This is the code in R without a loop (works fine):
#libraries
#blah, blah...
#path
df.path = "C:/dataset.sav"
#dataset load
df = read_sav(df.path)
#table
table_undropped1 = df %>%
tab_cells(mdset(q20s1i1 %to% q20s1i8)) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
There are 10 multiple response sets therefore I need to create 10 tables in a manner shown above. Then I transpose those tables and merge. To simplify the code (and learn something new) I decided to produce tables using a loop. However nothing works. I'd looked for a solution and I think the most close to correct one is:
#this generates a message: '1' not found
for(i in 1:10) {
assign(paste0("table_undropped",i),1) = df %>%
tab_cells(mdset(assign(paste0("q20s",i,"i1"),1) %to% assign(paste0("q20s",i,"i8"),1)))
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
Still it causes an error described above the code.
Alternatively, an SPSS macro for that would be (published only to better express the problem because I have to avoid SPSS):
define macro1 (x = !tokens (1)
/y = !tokens (1))
!do !i = !x !to !y.
mrsets
/mdgroup name = !concat($SET_,!i)
variables = !concat("q20s",!i,"i1") to !concat("q20s",!i,"i8")
value = 1.
ctables
/table !concat($SET_,!i) [colpct.responses.count pct40.0].
!doend
!enddefine.
*** MACRO CALL.
macro1 x = 1 y = 10.
In other words I am looking for a working substitute of !concat() in R.
%to% is not suited for parametric variable selection. There is a set of special functions for parametric variable selection and assignment. One of them is mdset_t:
for(i in 1:10) {
table_name = paste0("table_undropped",i)
..$table_name = df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
However, it is not good practice to store all tables as separate variables in the global environment. Better approach is to save all tables in the list:
all_tables = lapply(1:10, function(i)
df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
)
UPDATE.
Generally speaking, there is no need to merge. You can do all your work with tab_*:
my_big_table = df %>%
tab_total_row_position("none")
for(i in 1:10) {
my_big_table = my_big_table %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_stat_cpct()
}
my_big_table = my_big_table %>%
tab_pivot(stat_position = "inside_columns") # here we say that we need combine subtables horizontally

How to resolve 'Don't know how to pluck from a closure' error in R

The code below works if I remove the Sys.sleep() from within the map() function. I tried to research the error ('Don't know how to pluck from a closure') but i haven't found much on that topic.
Does anyone know where I can find documentation on this error, and any help on why it is happening and how to prevent it?
library(rvest)
library(tidyverse)
library(stringr)
# lets assume 3 pages only to do it quickly
page <- (0:18)
# no need to create a list. Just a vector
urls = paste0("https://www.mlssoccer.com/players?page=", page)
# define this function that collects the player's name from a url
get_the_names = function( url){
url %>%
read_html() %>%
html_nodes("a.name_link") %>%
html_text()
}
# map the urls to the function that gets the names
players = map(urls, get_the_names) %>%
# turn into a single character vector
unlist() %>%
# make lower case
tolower() %>%
# replace the `space` to underscore
str_replace_all(" ", "-")
# Now create a vector of player urls
player_urls = paste0("https://www.mlssoccer.com/players/", players )
# define a function that reads the 3rd table of the url
get_the_summary_stats <- function(url){
url %>%
read_html() %>%
html_nodes("table") %>%
html_table() %>% .[[3]]
}
# lets read 3 players only to speed things up [otherwise it takes a significant amount of time to run...]
a_few_players <- player_urls[1:5]
# get the stats
tables = a_few_players %>%
# important step so I can name the rows I get in the table
set_names() %>%
#map the player urls to the function that reads the 3rd table
# note the `safely` wrap around the get_the_summary_stats' function
# since there are players with no stats and causes an error (eg.brenden-aaronson )
# the output will be a list of lists [result and error]
map(., ~{ Sys.sleep(5)
safely(get_the_summary_stats) }) %>%
# collect only the `result` output (the table) INTO A DATA FRAME
# There is also an `error` output
# also, name each row with the players name
map_df("result", .id = "player") %>%
#keep only the player name (remove the www.mls.... part)
mutate(player = str_replace(player, "https://www.mlssoccer.com/players/", "")) %>%
as_tibble()
tables <- tables %>% separate(Match,c("awayTeam","homeTeam"), extra= "drop", fill = "right")
purrr::safely(...) returns a function, so your map(., { Sys.sleep(5); safely(get_the_summary_stats) }) is returning functions, not any data. In R, a "closure" is a function and its enclosing environment.
Tilde notation is a tidyverse-specific method of more-terse anonymous functions. Typically (e.g., with lapply) one would use lapply(mydata, function(x) get_the_summary_stats(x)). In tilde notation, the same thing is written as map(mydata, ~ get_the_summary_stats(.))
So, re-write to:
... %>% map(~ { Sys.sleep(5); safely(get_the_summary_stats)(.); })
From comments by #r2evans

Resources