Expss call returns blank table - r

Attempting to call Expss from within a function. However it returns an empty table.
s1_a<-c("a","b","b")
s1_b<-c("a","a","b")
df<-data.frame(s1_a,s1_b)
multi<-function(v) {
df %>%
tab_cells(mrset_p("v")) %>%
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot()
}
multi("s1_")

In your case you don't need quotes in the mrset_p:
library(expss)
s1_a<-c("a","b","b")
s1_b<-c("a","a","b")
df<-data.frame(s1_a,s1_b)
multi<-function(v) {
df %>%
tab_cells(mrset_p(v)) %>% # no quotes
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot()
}
multi("s1_")

Related

Error in xml_nodeset(NextMethod()) : Expecting an external pointer: [type=NULL] when scraping with RVEST

i am having a problem when trying to scrape some data, i have created a function that is properly working, problems occurs when i run this function for many different code.
require ("rvest")
library("dplyr")
getFin = function(ticker)
{
url= paste0("https://it.finance.yahoo.com/quote/",ticker,
"/key-statistics?p=",ticker)
a <- read_html(url)
tbl= a %>% html_nodes("section") %>% html_nodes("div")%>% html_nodes("table")
misureval = tbl %>% .[1] %>% html_table() %>% as.data.frame()
prezzistorici = tbl %>% .[2] %>% html_table() %>% as.data.frame()
titolistat = tbl %>% .[3] %>% html_table() %>% as.data.frame()
dividendi = tbl %>% .[4] %>% html_table() %>% as.data.frame()
annofiscale = tbl %>% .[5] %>% html_table() %>% as.data.frame()
redditivita = tbl %>% .[6] %>% html_table() %>% as.data.frame()
gestione = tbl %>% .[7] %>% html_table() %>% as.data.frame()
contoeco = tbl %>% .[8] %>% html_table() %>% as.data.frame()
bilancio = tbl %>% .[9] %>% html_table() %>% as.data.frame()
flussi = tbl %>% .[10] %>% html_table() %>% as.data.frame()
info1 = rbind(ticker, misureval, prezzistorici, titolistat, dividendi, annofiscale, redditivita, gestione, contoeco, bilancio, flussi)
}
What i am trying to do is to use
finale <- lapply(codici, getFin)
where codici is linked to many different Ticker which will be used in the function to generate one url at time and scrape data.
I have tried with 50 ticker and the function works properly, however when i increase the number i get this error:
Error in xml_nodeset(NextMethod()) : Expecting an external pointer:
[type=NULL].
i don't know if this may be related to the number of request or something other. i have also tested a non existing ticker and the function still works, problems just arises when the number is large.
Solved problem, i just need to add Sys.sleep in order to reduce the frequency of requests.
the best number in this case is 3, so Sys.sleep(3) at the end of the for cycle.

Creating a function with column name in group_by clause

I am trying to specify a following function where I wall pass a dataset's column name as a name to group_by clause.
counter<-function(df,col_name){
a<-df %>%
group_by(col_name) %>%
count() %>%
arrange(desc(n))
return(a)
}
So if I try for example:
fraud_continent<-counter(fraud,continent_source1)
where fraud is dataset and continent_source1 is the column name from this dataset, the function wont work and the error I get is:
Error: Must group by variables found in .data.
Column col_name is not found.
How do I solve this?
You can use curly curly operator ({{}}).
counter<-function(df,col_name){
a<-df %>%
group_by({{col_name}}) %>%
count() %>%
arrange(desc(n))
return(a)
}
Also you can do this without group_by -
counter<-function(df,col_name){
a<-df %>%
count({{col_name}}) %>%
arrange(desc(n))
return(a)
}
This can be called as -
fraud_continent<-counter(fraud,continent_source1)
We could use ensym with !!
library(dplyr)
counter <- function(df, colname){
df %>%
count(!! rlang::ensym(colname)) %>%
arrange(desc(n))
}
and then it can be called as either
fraud_continent<-counter(fraud,continent_source1)
Or
fraud_continent<-counter(fraud, "continent_source1")
Update:
Thanks to the support of akrun .data[[col_name]] is better:
First answer:
Or you could use df[,col_name]
library(dplyr)
counter<-function(df,col_name){
a<-df %>%
group_by(df[,col_name]) %>%
count() %>%
arrange(desc(n))
return(a)
}
fraud_continent<-counter(fraud,"continent_source1")

How to do a multiple choice crosstable in R

My code...
library(expss)
library(haven)
X4707 <- read_sav("/home/cfmc/4707/data/4707.sav")
X4707 %>%
tab_cells("By phone"=qpd4_1==1,"By email"=qpd4_2==1,"Utility website"=qpd4_3==1,"Roseville Electric notification"=qpd4_4==1,"Social media"=qpd4_5==1,"Text"=qpd4_6==1,"Not sure"=qpd4_8==1) %>%
tab_cols(total(), qf5) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot()
My output looks like this...
I would like for the output to simply contain the text of the code going down the stub (By phone, By email, etc.) without the TRUE, FALSE, etc.
You need to designate that you want multiple response. You have multiple choice with positional coding so you need mdset (m(ultiple) d(ichotomy) set) function:
library(expss)
library(haven)
X4707 <- read_sav("/home/cfmc/4707/data/4707.sav")
X4707 %>%
tab_cells(mdset("By phone"=qpd4_1==1,"By email"=qpd4_2==1,"Utility website"=qpd4_3==1,"Roseville Electric notification"=qpd4_4==1,"Social media"=qpd4_5==1,"Text"=qpd4_6==1,"Not sure"=qpd4_8==1)) %>%
tab_cols(total(), qf5) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot()

R Dplyr top_n does not work when used within function

My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share

Getting the tidyr::nest() -> purrr:map() workflow to work for special case of no grouping var

I'm trying to write a function that does a split-apply-combine for which the split variable(s) are parameters, and - importantly - a null split is acceptable. For example, running statistics either on subsets of data or on the entire dataset.
somedata=expand.grid(a=1:3,b=1:3)
somefun=function(df_in,grpvars=NULL){
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>% return()
}
somefun(somedata,"a") # This works
somefun(somedata) # This fails
The null condition fails because nest() seems to need a variable to nest by, rather than nesting the entire df into a 1x1 data.frame. I can get around this as follows:
somefun2=function(df_in,grpvars="Dummy"){
df_in$Dummy=1
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>%
select(-Dummy) %>% return()
}
somefun2(somedata) # This works
However, I'm wondering if there is a more elegant way to fix this, without needing the dummy variabe?
Hmm, that behavior is a little surprising to me. A fix is easy though: you just have to make sure you nest everything():
somefun3 <- function(df_in, grpvars = NULL) {
df_in %>%
group_by_(.dots = grpvars) %>%
nest(everything()) %>%
mutate(X2.Resid = map(data, ~with(.x, chisq.test(b)$residuals))) %>%
unnest()
}
somefun3(somedata, "a")
somefun3(somedata)
Both work.

Resources