Attempting to call Expss from within a function. However it returns an empty table.
s1_a<-c("a","b","b")
s1_b<-c("a","a","b")
df<-data.frame(s1_a,s1_b)
multi<-function(v) {
df %>%
tab_cells(mrset_p("v")) %>%
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot()
}
multi("s1_")
In your case you don't need quotes in the mrset_p:
library(expss)
s1_a<-c("a","b","b")
s1_b<-c("a","a","b")
df<-data.frame(s1_a,s1_b)
multi<-function(v) {
df %>%
tab_cells(mrset_p(v)) %>% # no quotes
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot()
}
multi("s1_")
Related
i am having a problem when trying to scrape some data, i have created a function that is properly working, problems occurs when i run this function for many different code.
require ("rvest")
library("dplyr")
getFin = function(ticker)
{
url= paste0("https://it.finance.yahoo.com/quote/",ticker,
"/key-statistics?p=",ticker)
a <- read_html(url)
tbl= a %>% html_nodes("section") %>% html_nodes("div")%>% html_nodes("table")
misureval = tbl %>% .[1] %>% html_table() %>% as.data.frame()
prezzistorici = tbl %>% .[2] %>% html_table() %>% as.data.frame()
titolistat = tbl %>% .[3] %>% html_table() %>% as.data.frame()
dividendi = tbl %>% .[4] %>% html_table() %>% as.data.frame()
annofiscale = tbl %>% .[5] %>% html_table() %>% as.data.frame()
redditivita = tbl %>% .[6] %>% html_table() %>% as.data.frame()
gestione = tbl %>% .[7] %>% html_table() %>% as.data.frame()
contoeco = tbl %>% .[8] %>% html_table() %>% as.data.frame()
bilancio = tbl %>% .[9] %>% html_table() %>% as.data.frame()
flussi = tbl %>% .[10] %>% html_table() %>% as.data.frame()
info1 = rbind(ticker, misureval, prezzistorici, titolistat, dividendi, annofiscale, redditivita, gestione, contoeco, bilancio, flussi)
}
What i am trying to do is to use
finale <- lapply(codici, getFin)
where codici is linked to many different Ticker which will be used in the function to generate one url at time and scrape data.
I have tried with 50 ticker and the function works properly, however when i increase the number i get this error:
Error in xml_nodeset(NextMethod()) : Expecting an external pointer:
[type=NULL].
i don't know if this may be related to the number of request or something other. i have also tested a non existing ticker and the function still works, problems just arises when the number is large.
Solved problem, i just need to add Sys.sleep in order to reduce the frequency of requests.
the best number in this case is 3, so Sys.sleep(3) at the end of the for cycle.
I am trying to specify a following function where I wall pass a dataset's column name as a name to group_by clause.
counter<-function(df,col_name){
a<-df %>%
group_by(col_name) %>%
count() %>%
arrange(desc(n))
return(a)
}
So if I try for example:
fraud_continent<-counter(fraud,continent_source1)
where fraud is dataset and continent_source1 is the column name from this dataset, the function wont work and the error I get is:
Error: Must group by variables found in .data.
Column col_name is not found.
How do I solve this?
You can use curly curly operator ({{}}).
counter<-function(df,col_name){
a<-df %>%
group_by({{col_name}}) %>%
count() %>%
arrange(desc(n))
return(a)
}
Also you can do this without group_by -
counter<-function(df,col_name){
a<-df %>%
count({{col_name}}) %>%
arrange(desc(n))
return(a)
}
This can be called as -
fraud_continent<-counter(fraud,continent_source1)
We could use ensym with !!
library(dplyr)
counter <- function(df, colname){
df %>%
count(!! rlang::ensym(colname)) %>%
arrange(desc(n))
}
and then it can be called as either
fraud_continent<-counter(fraud,continent_source1)
Or
fraud_continent<-counter(fraud, "continent_source1")
Update:
Thanks to the support of akrun .data[[col_name]] is better:
First answer:
Or you could use df[,col_name]
library(dplyr)
counter<-function(df,col_name){
a<-df %>%
group_by(df[,col_name]) %>%
count() %>%
arrange(desc(n))
return(a)
}
fraud_continent<-counter(fraud,"continent_source1")
My code...
library(expss)
library(haven)
X4707 <- read_sav("/home/cfmc/4707/data/4707.sav")
X4707 %>%
tab_cells("By phone"=qpd4_1==1,"By email"=qpd4_2==1,"Utility website"=qpd4_3==1,"Roseville Electric notification"=qpd4_4==1,"Social media"=qpd4_5==1,"Text"=qpd4_6==1,"Not sure"=qpd4_8==1) %>%
tab_cols(total(), qf5) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot()
My output looks like this...
I would like for the output to simply contain the text of the code going down the stub (By phone, By email, etc.) without the TRUE, FALSE, etc.
You need to designate that you want multiple response. You have multiple choice with positional coding so you need mdset (m(ultiple) d(ichotomy) set) function:
library(expss)
library(haven)
X4707 <- read_sav("/home/cfmc/4707/data/4707.sav")
X4707 %>%
tab_cells(mdset("By phone"=qpd4_1==1,"By email"=qpd4_2==1,"Utility website"=qpd4_3==1,"Roseville Electric notification"=qpd4_4==1,"Social media"=qpd4_5==1,"Text"=qpd4_6==1,"Not sure"=qpd4_8==1)) %>%
tab_cols(total(), qf5) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot()
My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share
I'm trying to write a function that does a split-apply-combine for which the split variable(s) are parameters, and - importantly - a null split is acceptable. For example, running statistics either on subsets of data or on the entire dataset.
somedata=expand.grid(a=1:3,b=1:3)
somefun=function(df_in,grpvars=NULL){
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>% return()
}
somefun(somedata,"a") # This works
somefun(somedata) # This fails
The null condition fails because nest() seems to need a variable to nest by, rather than nesting the entire df into a 1x1 data.frame. I can get around this as follows:
somefun2=function(df_in,grpvars="Dummy"){
df_in$Dummy=1
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>%
select(-Dummy) %>% return()
}
somefun2(somedata) # This works
However, I'm wondering if there is a more elegant way to fix this, without needing the dummy variabe?
Hmm, that behavior is a little surprising to me. A fix is easy though: you just have to make sure you nest everything():
somefun3 <- function(df_in, grpvars = NULL) {
df_in %>%
group_by_(.dots = grpvars) %>%
nest(everything()) %>%
mutate(X2.Resid = map(data, ~with(.x, chisq.test(b)$residuals))) %>%
unnest()
}
somefun3(somedata, "a")
somefun3(somedata)
Both work.