I created my first R package which has three functions, to query the data in database and return the data frames based on user input. Since the data frames are large instead of printing them in console, I added View() within my function to show user the data extracted based on their input.
Code goes like this:
queryData <- function(p, q, r, s, t){
d <- DBI::dbGetQuery(conn = con, statement = "SELECT * FROM dataset" )
d <- d%>%
dplyr::filter(org == p) %>%
dplyr::filter(exp == q) %>%
dplyr::filter(dis %like% r) %>%
dplyr::filter(tis %like% s) %>%
dplyr::filter(Rx %like% t)
print(paste("No. of datasets that matched your criteria:",nrow(d)))
View(d)
}
R check was fine, I was able to install the package and run the functions. But it gave me error when I created vignette for the package.
Here is the error message:
Error: processing vignette 'package_vignette.Rmd' failed with diagnostics:
View() should not be used in examples etc
--- failed re-building 'package_vignette.Rmd'
SUMMARY: processing the following file failed:
'package_vignette.Rmd'
Error: Vignette re-building failed.
Any advice on how to fix this issue?
As the error message mentioned, View() is not made for RMarkdown, which is what the package vignettes are written in. The R Markdown cookbook suggests you can display the data just by calling the object using the built-in knitr::kable(). If it's too long you can show just the first bit by subsetting it. E.g.
knitr::kable(my_table[5,5])
will print only the first 5 rows and columns of the table. There are other packages you can use too (a brief list here), which work differently depending on the desired output format.
Alternatively, you can use paged tables to avoid scrolling:
rmarkdown::paged_table(my_table)
Related
I'm using the Radlibrary package in R and have used it several times. Now I want to update my data on Facebook Ads but when running the as_tibble function to convert the data I have in class paginated_adlib_data_response I'm being met with the error message: Error: Problem with `mutate()` input `percentage`. x Column `percentage` not found in `.data`
Last time I used the API and Radlibrary package was back in May. I don't know if it's the dplyr package that has changed and now producing the problem or if Facebook has changed in it's dataformat. The problem only arises for demographic and regional data - the 'ad' part of the data still works fine with the as_tibble function.
Does anyone know the answer to this or perhaps know another way of converting the "paginated_adlib_data_response" into a data.frame format or something similar?
My code looks like this:
query_dem <- adlib_build_query(ad_reached_countries = 'DK',
ad_active_status = 'ALL',
search_page_ids = ID_media$page_id,
fields = "demographic_data")
result_dem <- adlib_get_paginated(query_dem, max_gets = 100)
tibble_dem <- as_tibble(result_dem, type = "demographic") # This is where the error is produced```
Best,
Mads
I have a .RMD file that I knit to generate reports. In the beginning of the doc, there is a code chunk that executes a query on a remote DB and returns some data as an output (typical select query from joins of different tables). The data that is being generated is quite fixed in the sense that I'm retrieving data from a certain date interval and this date interval that I'm performing the analysis on doesn't change.
Each time I make an appearance related change in the .RMD file and I knit it, it runs this query which takes >2 minutes to run as quite some data is being returned. I don't want this to happen since the base data that I'm performing analysis on doesn't change at all.
How do I ensure this one block alone doesn't evaluate all the time? I have tried putting eval = FALSE
However I get the following error:
Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "function" Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> %>% -> mutate Execution halted`<br><br>
For context, df is the data frame that is returned after execution of the query through dbExecuteQuery(). As soon as the chunk that has eval = FALSE is executed, I have another chunk that performs mutate() on it. That is where the error is generated.
The code chunk with eval=FALSE is a just a generic sql execution code chunk. The chunk's content goes something like this:
query <- 'select * from table1 join table2'
query2 <- 'select * from table3'
df1 <- dbGetQuery(conn,query)
df2 <- dbGetQuery(conn,query2)
df < - left_join(df1,df2)
The next code chunk where the error originates from does something like this:
df <- df %>% mutate(newcol = is.na(somecol))
I found an answer on Stackoverflow that kinda sorta explains my problem but with no satisfactory solution.
Link to SF post: error knitting flex dashboard rmarkdown dplyr
I assume, even if you get the error message, you receive it before the 2 minute mark, which would it take if this chunk would be evaluated, right? So the eval = FALSE shouldn't be the problem at all. If you don't need to run the query again and again, I assume that you stored the data locally?
The error message itself could pop up because of a package conflict, maybe try dplyr::mutate?
You could perhaps try with memoise
it'd be something like:
query <- 'select * from table1 join table2'
query2 <- 'select * from table3'
my.get.query <- memoise( dbGetQuery )
df1 <- my.get.query(conn,query)
df2 <- my.get.query(conn,query2)
df < - left_join(df1,df2)
This may not work, it depends entirely on how knitting is operated, if it starts fresh with a new session or not.
Otherwise write your own function that caches the results to a file.
I want to make a dataframe available to users of a package, as well as to functions of the package.
Starting from a new package I've used
devtools::use_data_raw("my_data")
This creates the file data_raw/my_data.R, which I edit as
my_data <- data.frame(x = runif(3), y = runif(3))
devtools::use_data(my_data, overwrite = TRUE)
After running the code above, a file data/my_data.Rda is created.
According to Hadley Wickham's R Packages every file under data/ is exported, and if I try
load_all()
my_data
I can see that this is the case. However if I now try to use the my_data dataframe inside a function in the package, say like R/test_my_data.R
test_my_data <- function {
my_data
}
and then I run
devtools::check()
I get the infamous no visible binding for global variable my_data NOTE.
I appreciate there are many questions already on this topic but many are related to the cases where a tidy evaluation function is used, or data from another package is referred to. Why is R CMD check failing on the example above, and what's the correct way of sorting this out?
I'm aware that the
utils::globalVariables("my_data")
solution will avoid this NOTE, but I'd like to know if there's a proper way of informing R CMD check that my_data actually exists.
I am trying to execute Random Forest algorithm on SparkR, with Spark 1.5.1 installed. I am not getting clear idea, why i am getting the error -
Error: could not find function "includePackage"
Further even if I use mapPartitions function in my code , i get the error saying -
Error: could not find function "mapPartitions"
Please find the below code:
rdd <- SparkR:::textFile(sc, "http://localhost:50070/explorer.html#/Datasets/Datasets/iris.csv",5)
includePackage(sc,randomForest)
rf <- mapPartitions(rdd, function(input) {
## my function code for RF
}
This is more of a comment and a cross question rather than an answer (not allowed to comment because of the reputation) but just to take this further, if we are using the collect method to convert the rdd back to an R dataframe, isnt that counter productive as if the data is too large, it would take too long to execute in R.
Also does it mean that we could possibly use any R package say, markovChain or a neuralnet using the same methodology.
Kindly check the functions that can be in used in sparkR http://spark.apache.org/docs/latest/api/R/index.html
This doesn't include function mapPartitions() or includePackage()
#For reading csv in sparkR
sparkRdf <- read.df(sqlContext, "./nycflights13.csv",
"com.databricks.spark.csv", header="true")
#Possible way to use `randomForest` is to convert the `sparkR` data frame to `R` data frame
Rdf <- collect(sparkRdf)
#compute as usual in `R` code
>install.packages("randomForest")
>library(rainForest)
......
#convert back to sparkRdf
sparkRdf <- createDataFrame(sqlContext, Rdf)
I'm trying to create a pdf from an R markdown script which says this:
test <- group_by(trials, SubjID)
number <- summarise(test, nsubj=n())
sum(number$nsubj != 12)
but when I click on Knitpdf I get the following error:
error in eval(expr,envir,enclos): could not find function "group_by" Calls:
<Anonymous>...handle->withCallingHandlers->withVisible->eval->eval Execution halted
I have dplyr installed and it works when I send the information to the console but not when I press knitPDF.
You need to add below library import at top in R markdown-
library(dplyr)