Tidyr function gather and paste inside loop - r

I'm trying to run a loop with multiple dataframes. I'm using the gather function from tidyr and I want to use as argument the index of the loop, i, along with a word, deaths.
I've been trying:
gather(data[[i]], "year", paste("deaths", i, sep="_"), 2:ncol(data[[i]]))
However, everytime I try that, it returns "Error: Must supply a symbol or a string as argument".
I read somewhere that tidyr evaluates things in a non-standard way and that the alternative is gather_, which uses standard evaluation.
However, the command
gather_(data[[i]], "year", paste("deaths", i, sep="_"), 2:ncol(data[[i]]))
Returns Error: Only strings can be converted to symbols.
However, I tought the paste command was already resulting in a string.
Anyone knows a fix?
Here is the full error:
"<error>
message: Only strings can be converted to symbols
class: `rlang_error`
backtrace:
-tidyr::gather_(...)
-tidyr:::gather_.data.frame(...)
-rlang::syms(gather_cols)
-rlang:::map(x, sym)
-base::lapply(.x, .f, ...)
-rlang:::FUN(X[[i]], ...)
Call `summary(rlang::last_error())` to see the full backtrace"
The full code:
require(datasus)
require(tidyr)
data_list <- list()
for(i in 1:2){
data_list[[i]] <- sim_inf10_mun(linha = "Município", coluna = "Ano do Óbito", periodo = c(1996:2016), municipio = "all",
capitulo_cid10 = i)
data_list[[i]] <- data.frame(data_list[i])
data_list[[i]] <- data_list[[i]][-1,]
data_list[[i]] <- data_list[[i]][,-ncol(data_list[[i]])]
data_list[[i]] <- gather(data_list[[i]], "ano", "deaths_01_i", 2:ncol(data_list[[i]]))
names(data_list[[i]])[1]<-"cod_mun"
data_list[[i]] <- transform(data_list[[i]], cod_mun = substr(cod_mun, 1, 6))
data_list[[i]] <- transform(data_list[[i]], ano = substr(ano, 2, 5))
}
This returns a panel dataset exactly the way I want, with (municipality code-year) identification in lines, and a value. My problem is that the value (column) name is "deaths_01_i", which is kinda obvious since it is quotation marks, whereas I wanted it to run with the loop. Thus I tried to implement it with a paste.
I know I can just change the variable name by adding a line names(data_list[[i]])[3]<-paste("deaths_01",i,sep="_"), but the problem got my attention to improving my understanding of the code.
Some words are in Portuguese but they are irrelevant to the problem. I also changed the range of the loop to avoid time issues.

Related

R: created a names vector containing the means of multiple numeric vectors

I have over 20 numeric vectors which consist of a series of values. each vector is distinguished by a letter, e.g. val_a, val_b, val_c etc...
I would like to put the means from each of these vectors into a single named vector. I could of course do this in a laborious manner like so:
obs <- c("val_a" = round(mean(val_a),3),
"val_b" = round(mean(val_b),3),
"val_c" = round(mean(val_c),3))
But with 20 vectors this then becomes tedious to write out, and not to mention an inelegant solution. How can I create the named vector in a more succinct way? I have made an attempt using a for loop, as so:
obs <- c(for (j in 1:20) {
assign(paste("val",letters[j], sep = "_"),
mean(as.name(paste('val',letters[j], sep = '_'))),)
})
In the right hand argument passed to assign, "as.name" is used in order to remove the quotation marks from output of "paste". So the second argument passed to assign returns a character which has the exact same name as the numeric vector that I want get the mean of, e.g. val_a. But I get the error messsage:
Warning messages:
1: In mean.default(as.name(paste("val", letters[j], sep = "_"))) :
argument is not numeric or logical: returning NA
Does anyone know how to accomplish this?
Solution
To build on bouncyball's comment so you have a full answer, you can do this:
sapply(paste('val', letters[1:20], sep='_'), function(x) round(mean(get(x)), 3))
Explanation
For an object in your environment called x, get("x") will return x. See help("get"). Then we can do this for every element of paste('val', letters[1:20], sep='_') using sapply(), or if you like, a loop.
Example
val_a <- rnorm(100)
val_b <- rnorm(100)
val_c <- rnorm(100)
sapply(paste('val', letters[1:3], sep='_'), function(x) round(mean(get(x)), 3))
val_a val_b val_c
-0.09328504 -0.15632654 -0.09759111

How do I convert this for loop into something cooler like by in R

uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
dat <- subset(file, file[,12] == uniq[i])
names <- paste("Sample_filtered_on_", uniq[i], sep="")
qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
sub=subtitle)
}
dev.off()
file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:
pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""), pvals = T, sub=subtitle) } )
dev.off()
It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?
Hope this makes sense.
A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.
dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))
(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.
pdf("SKAT.pdf")
for(i in seq_along(dat)) {
vals <- dat[[i]]
nm <- names(dat)[[i]]
qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()
If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map
Map(qq.chisq, dat, main=names(dat),
MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))

r error in rep(value, length.out = n) : attempt to replicate an object of type 'closure' [duplicate]

This question already has an answer here:
In R, getting the following error: "attempt to replicate an object of type 'closure'"
(1 answer)
Closed 9 years ago.
I'm trying to write a function that can apply another function to a number of data.frames at one time. The data.frames are named DATA_1, DATA_2, etc. and the variable 'actioncol' is to indicate the column that has to be changed. This is my code so far:
gsubFUN <- function(name, actioncol, ...){
df.vec <- ls(pattern = paste("name", "*", sep="_"), envir=.GlobalEnv)
for(ii in 1:length(df.vec)){
DATA <- get(df.vec[ii])
DATA[,actioncol] <- gsub(pattern.vec[ii], replace.vec[ii], DATA[,actioncol])
assign(paste(name, ii, sep = "_"),DATA, envir = .GlobalEnv)
}
}
I am aware that my code may be quite messed up, but it does work. Since I would like the outer function to apply other functions (not just gsub) on the data.frames, too, I tried to replace it with a variable:
multiDfFUN <- function(name, actioncol, FUN, ...){
df.vec <- ls(pattern = paste(name, "*", sep="_"), envir=.GlobalEnv)
for(ii in 1:length(df.vec)){
DATA <- get(df.vec[ii])
DATA[,actioncol] <- match.fun(FUN)
assign(paste(name, ii, sep = "_"),DATA, envir = .GlobalEnv)
}
}
multiDfFUN(name="audi", actioncol="color", FUN=gsub, pattern=pattern.vec[ii],
replacement=replace.vec[ii], x=DATA[,actioncol])
However, this now returns an error message:
error in rep(value, length.out = n) :
attempt to replicate an object of type 'closure'
I don't even understand the meaning of this. Searching the web wouldn't help it either. Could the arguments pattern, replacement & x when calling the function be the reason for this? I would be really glad if somebody could enlighten me on this issue or even point me to a simple solution (if there is any).
Many thanks in advance.
This line:
DATA[,actioncol] <- match.fun(FUN)
... is attempting to assign functions (not function names) to items in a dataframe. That's not going to succeed. And then you write:
assign(paste(name, ii, sep = "_"),DATA, envir = .GlobalEnv)
That effort is very contrary to the preferred programing style of R. Assigning to the GlobalEnv from within the function body should only be attempted by people who know what that error message meant. match.fun returns a functions so I image you would wnat to do something like this:
DATA[,actioncol] <- match.fun(FUN)( DATA[,actioncol] )
return(DATA)
And then call it like:
DATAnew <- multiDfFUN(name="audi", actioncol="color", FUN=gsub,
pattern=pattern.vec[ii],
replacement=replace.vec[ii], x=DATA[,actioncol])
Since we have no example data to work with, I will leave this as an untested guess.
Note added in proof:
fortunes::fortune("understand why")
The only people who should use the assign function are those who fully understand
why you should never use the assign function.
-- Greg Snow
R-help (July 2009)
Examns kept me quite busy, that's why I'm just answering now. DWin's suggestions actually helped me get the function to work as intended.
I also took all your warnings about attachin consideration. But as mentioned before, I did this code for an assignment where I was explicitly asked to use it. So this is what I ended up with:
MultiDfFUN <- function(df.vec, col.name, col.new="new", df.name,
FUN, overwrite=F, ...){
df.list <- list(NULL)
for(ii in 1:length(df.vec)){
DATA <- get(df.vec[ii])
DATA[,col.new] <- FUN(DATA[,col.name],...)
if(overwrite == TRUE){
assign(paste(df.name, ii, sep = "_"),DATA, envir = .GlobalEnv)
}else{
df.list[[ii]] <- DATA[,col.new]
}
}
if(overwrite == FALSE) return(df.list)
}

Substituting variables in a loop?

I am trying to write a loop in R but I think the nomenclature is not correct as it does not create the new objects, here is a simplified example of what I am trying to do:
for i in (1:8) {
List_i <-List
colsToGrab_i <-grep(predefinedRegex_i, colnames(List_i$table))
List_i$table <- List_i$table[,predefinedRegex_i]
}
I have created 'predefinedRegex'es 1:8 which the grep should use to search
The loop creates an object called "List_i" and then fails to find "predefinedRegex_i".
I have tried putting quotes around the "i" and $ in front of the i , also [i] but these do not work.
Any help much appreciated. Thank you.
#
Using #RyanGrammel's answer below::
#CREATING regular expressions for grabbing sets groups 1 -7 ::::
g_1 <- "DC*"
g_2 <- "BN_._X.*"
g_3 <- "BN_a*"
g_4 <- "BN_b*"
g_5 <- "BN_a_X.*"
g_6 <- "BN_b_X.*"
g_7 <- "BN_._Y.*"
for i in (1:8)
{
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA)
assign(x = paste("Forgrep_", i, sep=""), value = colnames(get(x = paste("tableA_", i, sep=""))))
assign(x = paste("grab_", i, sep=""), value = grep((get(x = paste("g_",i, sep=""))), (get(x = paste("Forgrep_",i, sep="")))))
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA[,get(x = paste("grab_",i, sep=""))])
}
This loop is repeated for each table inside "BigList".
I found I could not extract columnnames from
(get(x = paste("BigList_", i, "$tableA" sep=""))))
or from
(get(x = paste("BigList_", i, "[[2]]" sep=""))))
so it was easier to extract the tables first. I will now write a loop to repack the lists up.
Problem
Your syntax is off: you don't seem to understand how exactly R deals with variable names.
for(i in 1:10) name_i <- 1
The above code doesn't assign name_1, name_2,....,name_10. It assigns "name_i" over and over again
To create a list, you call 'list()', not List
creating a variable List_i in a loop doesn't assign List_1, List_2,...,List_8.
It repeatedly assigns an empty list to the name 'List_i'. Think about it; if R names variables in the way you tried to, it'd be equally likely to name your variables L1st_1, L2st_2...See 'Solution' for some valid R code do something similar
'predefinedRegex_i' isn't interpreted as an attempt to get the variable 'predefinedRegex_1', 'predefinedRegex_2', and so one.
However, get(paste0("predefinedRegex_", i)) is interpreted in this way. Just make sure i actually has a value when using this. See below.
Solution:
In general, use this to dynamically assign variables (List_1, List_2,..)
assign(x = paste0("prefix_", i), value = i)
if i is equal to 199, then this code assigns the variable prefix_199 the value 199.
In general, use this to dynamically get the variables you assigned using the above snippet of code.
get(x = paste0("prefix_", i))
if i is equal to 199, then this code gets the variable prefix_199.
That should solve the crux of your problem; if you need any further help feel free to ask for clarification here, or contact me via my Twitter Feed.

How to use an unknown number of key columns in a data.table

I want to do the same as explained here, i.e. adding missing rows to a data.table. The only additional difficulty I'm facing is that I want the number of key columns, i.e. those rows that are used for the self-join, to be flexible.
Here is a small example that basically repeats what is done in the link mentioned above:
df <- data.frame(fundID = rep(letters[1:4], each=6),
cfType = rep(c("D", "D", "T", "T", "R", "R"), times=4),
variable = rep(c(1,3), times=12),
value = 1:24)
DT <- as.data.table(df)
idCols <- c("fundID", "cfType")
setkeyv(DT, c(idCols, "variable"))
DT[CJ(unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))), nomatch=NA]
What bothers me is the last line. I want idCols to be flexible (for instance if I use it within a function), so I don't want to type unique(df$fundID), unique(df$cfType) manually. However, I just don't find any workaround for this. All my attempts to automatically split the subset of df into vectors, as needed by CJ, fail with the error message Error in setkeyv(x, cols, verbose = verbose) : Column 'V1' is type 'list' which is not (currently) allowed as a key column type.
CJ(sapply(df[, idCols], unique))
CJ(unique(df[, idCols]))
CJ(as.vector(unique(df[, idCols])))
CJ(unique(DT[, idCols, with=FALSE]))
I also tried building the expression myself:
str <- ""
for (i in idCols) {
str <- paste0(str, "unique(df$", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))"
But then I don't know how to use str. This all fails:
CJ(eval(str))
CJ(substitute(str))
CJ(call(str))
Does anyone know a good workaround?
Michael's answer is great. do.call is indeed needed to call CJ flexibly in that way, afaik.
To clear up on the expression building approach and starting with your code, but removing the df$ parts (not needed and not done in the linked answer, since i is evaluated within the scope of DT) :
str <- ""
for (i in idCols) {
str <- paste0(str, "unique(", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(fundID), unique(cfType), seq(from=min(variable), to=max(variable))"
then it's :
expr <- parse(text=paste0("CJ(",str,")"))
DT[eval(expr),nomatch=NA]
or alternatively build and eval the whole query dynamically :
eval(parse(text=paste0("DT[CJ(",str,"),nomatch=NA")))
And if this is done a lot then it may be worth creating yourself a helper function :
E = function(...) eval(parse(text=paste0(...)))
to reduce it to :
E("DT[CJ(",str,"),nomatch=NA")
I've never used the data.table package, so forgive me if I miss the mark here, but I think I've got it. There's a lot going on here. Start by reading up on do.call, which allows you to evaluate any function in a sort of non-traditional manner where arguments are specified by a supplied list (where each element is in the list is positionally matched to the function arguments unless explicitly named). Also notice that I had to specify min(df$variable) instead of just min(variable). Read Hadley's page on scoping to get an idea of the issue here.
CJargs <- lapply(df[, idCols], unique)
names(CJargs) <- NULL
CJargs[[length(CJargs) +1]] <- seq(from=min(df$variable), to=max(df$variable))
DT[do.call("CJ", CJargs),nomatch=NA]

Resources