unexpected symbol in R please see the code - r

I was trying to write a function to parse and merge some data. But R throws an unexpected symbol error exception. I have tried different ways to solve this issue, still doesn't work. Please help.
see code
$aggall = function(df,grp){numcols = sapply(df,class) %in%
c('integer', 'numeric') result = aggregate(df[,numcols],df[grp],mean)
counts = as.data.frame(table(df[grp])) names(counts)[1] =
grp merge(counts, result, sort=FALSE)}
Error: unexpected symbol in "aggall = function(go,grp){numcols = sapply(go,class) %in% c('integer','numeric') results"

you have your whole function in one physical line.
Therefore, when R tries to parse it, it has no way of knowing when one line ends and the next one begins.
To fix this, either use separate lines or add a semicolon between them.
Alternatively, you can have the formatR package do it for you!
(pretty awesome package):
install.packages("formatR")
library(formatR)
tidy.source("mySource.R", reindent.space=5)
aggall = function(df, grp) {
numcols = sapply(df, class) %in% c("integer", "numeric")
result = aggregate(df[, numcols], df[grp], mean)
counts = as.data.frame(table(df[grp]))
names(counts)[1] = grp
merge(counts, result, sort = FALSE)
}

Related

R: Not able to trycatch error with lapply

I have a table with stocks in R where I want to calculate the 6 month return based on tq_get and tiingo API. I wanted to use lapply to fill my table but unfortunately some tickers are not available on tiingo or maybe are wrong which returns an error. With this error the assigned data has less rows then the existing data and lapply is not working. I tried to resolve with tryCatch but it's still not working. What is missing?
today <- Sys.Date()
yesterday <- as.Date(today) - days(1)
before <- as.Date(today) - months(6)
tiingo_api_key('<my API key')
calculate <- function (x) {
((tq_get(x, get = "tiingo", from = yesterday, to = yesterday)$adjusted)/(tq_get(x, get = "tiingo", from = before, to = before)$adjusted)-1)
}
top10[20] <- lapply(top10[1], calculate(x) tryCatch(calculate(x), error=function(e) NA))
You need to move the function inside tryCatch. tryCatch wraps your function and catches errors. This should work.
# Old version vvvvvv function call in wrong place
top10[20] <- lapply(top10[1], calculate(x) tryCatch(calculate(x), error=function(e) NA))
# Corrected version
top10[20] <- lapply(top10[1], function(x) tryCatch(calculate(x), error=function(e) NA))
EDIT: #rawr already suggested this in a comment, I just saw. I only added a brief explanation of the function.
With including is_supported_ticker() from package riingo a workaround is possible to avoid the error message.
calculate <- function (x) {
supported = sapply(x, is_supported_ticker, type = "tiingo")
result = rep(NA, length(x))
result[supported] =
(
tq_get(x[supported], get = "tiingo", from = yesterday, to = yesterday)$adjusted /
tq_get(x[supported], get = "tiingo", from = before, to = before)$adjusted
) - 1
return(result)
}

Tidyr function gather and paste inside loop

I'm trying to run a loop with multiple dataframes. I'm using the gather function from tidyr and I want to use as argument the index of the loop, i, along with a word, deaths.
I've been trying:
gather(data[[i]], "year", paste("deaths", i, sep="_"), 2:ncol(data[[i]]))
However, everytime I try that, it returns "Error: Must supply a symbol or a string as argument".
I read somewhere that tidyr evaluates things in a non-standard way and that the alternative is gather_, which uses standard evaluation.
However, the command
gather_(data[[i]], "year", paste("deaths", i, sep="_"), 2:ncol(data[[i]]))
Returns Error: Only strings can be converted to symbols.
However, I tought the paste command was already resulting in a string.
Anyone knows a fix?
Here is the full error:
"<error>
message: Only strings can be converted to symbols
class: `rlang_error`
backtrace:
-tidyr::gather_(...)
-tidyr:::gather_.data.frame(...)
-rlang::syms(gather_cols)
-rlang:::map(x, sym)
-base::lapply(.x, .f, ...)
-rlang:::FUN(X[[i]], ...)
Call `summary(rlang::last_error())` to see the full backtrace"
The full code:
require(datasus)
require(tidyr)
data_list <- list()
for(i in 1:2){
data_list[[i]] <- sim_inf10_mun(linha = "Município", coluna = "Ano do Óbito", periodo = c(1996:2016), municipio = "all",
capitulo_cid10 = i)
data_list[[i]] <- data.frame(data_list[i])
data_list[[i]] <- data_list[[i]][-1,]
data_list[[i]] <- data_list[[i]][,-ncol(data_list[[i]])]
data_list[[i]] <- gather(data_list[[i]], "ano", "deaths_01_i", 2:ncol(data_list[[i]]))
names(data_list[[i]])[1]<-"cod_mun"
data_list[[i]] <- transform(data_list[[i]], cod_mun = substr(cod_mun, 1, 6))
data_list[[i]] <- transform(data_list[[i]], ano = substr(ano, 2, 5))
}
This returns a panel dataset exactly the way I want, with (municipality code-year) identification in lines, and a value. My problem is that the value (column) name is "deaths_01_i", which is kinda obvious since it is quotation marks, whereas I wanted it to run with the loop. Thus I tried to implement it with a paste.
I know I can just change the variable name by adding a line names(data_list[[i]])[3]<-paste("deaths_01",i,sep="_"), but the problem got my attention to improving my understanding of the code.
Some words are in Portuguese but they are irrelevant to the problem. I also changed the range of the loop to avoid time issues.

do.call cannot find function

First of all, sorry that I do not provide a fully reproducible example, but I'm using the devel version of a BioConductor package and the installation is a bit of a pain in the butt.
I am trying to use do.call to invoke functions based on a string I paste together. Like so:
testfunction <- function(){
print("do.call is working")
}
do.call(paste("test", "function", sep = ""), args = list())
This prints:
"do.call is working"
When I am trying to invoke my function of interest, like so:
for (dataset in unique(metadata[, 1])){
replace_idx <- which(metadata[,1 ] == dataset)
do.call(paste('curatedMetagenomicData::', dataset, ".genefamilies_relab.stool", sep = ""), args = list())
if (length(replace_idx) > 0){
# Do further stuff.
}
}
I get this error:
Error in
curatedMetagenomicData::ZellerG_2014.genefamilies_relab.stool() :
could not find function
"curatedMetagenomicData::ZellerG_2014.genefamilies_relab.stool"
Invoking the function outside of the loop, without do.call, works.
Could this be an environment problem of some sort? Any help is greatly appreciated.

Disable assignment via = in R

R allows for assignment via <- and =.
Whereas there a subtle differences between both assignment operators, there seems to be a broad consensus that <- is the better choice than =, as = is also used as operator mapping values to arguments and thus its use may lead to ambiguous statements. The following exemplifies this:
> system.time(x <- rnorm(10))
user system elapsed
0 0 0
> system.time(x = rnorm(10))
Error in system.time(x = rnorm(10)) : unused argument(s) (x = rnorm(10))
In fact, the Google style code disallows using = for assignment (see comments to this answer for a converse view).
I also almost exclusively use <- as assignment operator. However, the almost in the previous sentence is the reason for this question. When = acts as assignment operator in my code it is always accidental and if it leads to problems these are usually hard to spot.
I would like to know if there is a way to turn off assignment via = and let R throw an error any time = is used for assignment.
Optimally this behavior would only occur for code in the Global Environment, as there may well be code in attached namespaces that uses = for assignment and should not break.
(This question was inspired by a discussion with Jonathan Nelson)
Here's a candidate:
`=` <- function(...) stop("Assignment by = disabled, use <- instead")
# seems to work
a = 1
Error in a = 1 : Assignment by = disabled, use <- instead
# appears not to break named arguments
sum(1:2,na.rm=TRUE)
[1] 3
I'm not sure, but maybe simply overwriting the assignment of = is enough for you. After all, `=` is a name like any other—almost.
> `=` <- function() { }
> a = 3
Error in a = 3 : unused argument(s) (a, 3)
> a <- 3
> data.frame(a = 3)
a
1 3
So any use of = for assignment will result in an error, whereas using it to name arguments remains valid. Its use in functions might go unnoticed unless the line in question actually gets executed.
The lint package (CRAN) has a style check for that, so assuming you have your code in a file, you can run lint against it and it will warn you about those line numbers with = assignments.
Here is a basic example:
temp <- tempfile()
write("foo = function(...) {
good <- 0
bad = 1
sum(..., na.rm = TRUE)
}", file = temp)
library(lint)
lint(file = temp, style = list(styles.assignment.noeq))
# Lint checking: C:\Users\flodel\AppData\Local\Temp\RtmpwF3pZ6\file19ac3b66b81
# Lint: Equal sign assignemnts: found on lines 1, 3
The lint package comes with a few more tests you may find interesting, including:
warns against right assignments
recommends spaces around =
recommends spaces after commas
recommends spaces between infixes (a.k.a. binary operators)
warns against tabs
possibility to warn against a max line width
warns against assignments inside function calls
You can turn on or off any of the pre-defined style checks and you can write your own. However the package is still in its infancy: it comes with a few bugs (https://github.com/halpo/lint) and the documentation is a bit hard to digest. The author is responsive though and slowly making improvements.
If you don't want to break existing code, something like this (printing a warning not an error) might work - you give the warning then assign to the parent.frame using <- (to avoid any recursion)
`=` <- function(...){
.what <- as.list(match.call())
.call <- sprintf('%s <- %s', deparse(.what[[2]]), deparse(.what[[3]]))
mess <- 'Use <- instead of = for assigment '
if(getOption('warn_assign', default = T)) {
stop (mess) } else {
warning(mess)
eval(parse(text =.call), envir = parent.frame())
}
}
If you set options(warn_assign = F), then = will warn and assign. Anything else will throw an error and not assign.
examples in use
# with no option set
z = 1
## Error in z = 1 : Use <- instead of = for assigment
options(warn_assign = T)
z = 1
## Error in z = 1 : Use <- instead of = for assigment
options(warn_assign = F)
z = 1
## Warning message:
## In z = 1 : Use <- instead of = for assigment
Better options
I think formatR or lint and code formatting are better approaches.

How to use an unknown number of key columns in a data.table

I want to do the same as explained here, i.e. adding missing rows to a data.table. The only additional difficulty I'm facing is that I want the number of key columns, i.e. those rows that are used for the self-join, to be flexible.
Here is a small example that basically repeats what is done in the link mentioned above:
df <- data.frame(fundID = rep(letters[1:4], each=6),
cfType = rep(c("D", "D", "T", "T", "R", "R"), times=4),
variable = rep(c(1,3), times=12),
value = 1:24)
DT <- as.data.table(df)
idCols <- c("fundID", "cfType")
setkeyv(DT, c(idCols, "variable"))
DT[CJ(unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))), nomatch=NA]
What bothers me is the last line. I want idCols to be flexible (for instance if I use it within a function), so I don't want to type unique(df$fundID), unique(df$cfType) manually. However, I just don't find any workaround for this. All my attempts to automatically split the subset of df into vectors, as needed by CJ, fail with the error message Error in setkeyv(x, cols, verbose = verbose) : Column 'V1' is type 'list' which is not (currently) allowed as a key column type.
CJ(sapply(df[, idCols], unique))
CJ(unique(df[, idCols]))
CJ(as.vector(unique(df[, idCols])))
CJ(unique(DT[, idCols, with=FALSE]))
I also tried building the expression myself:
str <- ""
for (i in idCols) {
str <- paste0(str, "unique(df$", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))"
But then I don't know how to use str. This all fails:
CJ(eval(str))
CJ(substitute(str))
CJ(call(str))
Does anyone know a good workaround?
Michael's answer is great. do.call is indeed needed to call CJ flexibly in that way, afaik.
To clear up on the expression building approach and starting with your code, but removing the df$ parts (not needed and not done in the linked answer, since i is evaluated within the scope of DT) :
str <- ""
for (i in idCols) {
str <- paste0(str, "unique(", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(fundID), unique(cfType), seq(from=min(variable), to=max(variable))"
then it's :
expr <- parse(text=paste0("CJ(",str,")"))
DT[eval(expr),nomatch=NA]
or alternatively build and eval the whole query dynamically :
eval(parse(text=paste0("DT[CJ(",str,"),nomatch=NA")))
And if this is done a lot then it may be worth creating yourself a helper function :
E = function(...) eval(parse(text=paste0(...)))
to reduce it to :
E("DT[CJ(",str,"),nomatch=NA")
I've never used the data.table package, so forgive me if I miss the mark here, but I think I've got it. There's a lot going on here. Start by reading up on do.call, which allows you to evaluate any function in a sort of non-traditional manner where arguments are specified by a supplied list (where each element is in the list is positionally matched to the function arguments unless explicitly named). Also notice that I had to specify min(df$variable) instead of just min(variable). Read Hadley's page on scoping to get an idea of the issue here.
CJargs <- lapply(df[, idCols], unique)
names(CJargs) <- NULL
CJargs[[length(CJargs) +1]] <- seq(from=min(df$variable), to=max(df$variable))
DT[do.call("CJ", CJargs),nomatch=NA]

Resources