I got a bunch dynamically created regressions stored in some list called regressions. Now I´d like to rename their coefficients efficiently. What I have so far is this loop that works:
for (i in 1:length(params[,1])){
names(regressions[[i]]$coefficients)[pos] <- paste(params[i,1],".lag",params[i,2],sep="")
}
I've been trying for quite a while to get this done a little more generally with the help of a function, cause this not the only list of regressions I have. However I could not get anything else to work. Here a few other tries basically based on lapply:
correctNames <- function(reglist,namevec,pos){
names(reglist[[i]]$coefficients)[pos] <- as.character(namevec)
}
lapply(regressions,correctNames(reglist,namevec,pos),
reglist=regressions,namevec=params[,1],pos=2)
Another try was to write a function with a for loop which also works internally as print shows but does not assign the names globally (where the regressions list is stored).
correctNames <- function(reglist,pos,namevec){
for (i in 1:length(params[,1])){
names(reglist[[i]]$coefficients)[pos] <- paste(namevec,".lag",namevec,sep="")
}
#this test proves it's work inside the function...
print(reglist[[10]]
}
Ah, gimme a break.
There's no "i" inside that first version of "correctNames" function; and you probably don't realize that you are not assigning it to "regressions", only to a copy of the regression object. Try instead:
correctNames <- function(reglist,namevec,pos){
names(reglist$coefficients)[pos] <- as.character(namevec)
return(reglist) }
newregs <- mapply(correctNames,
reglist=regressions,
namevec=as.character(params[,1]),
MoreArgs= list( pos=2))
After seeing the note from Ramnath and noticing that the code did work but was giving flaky names for the "params" I looked at params and saw that it was a factor, and so changed the argument in the mapply call to as.character(params[,1]).
> newregs[1,1]
[[1]]
(Intercept) log(M1)
-5.753758 2.178137
If this is a follow up to your earlier question, then here is what I would do
coefs = plyr::ldply(regressions, coef)
coefs = transform(coefs, reg_name = paste(x, '.lag', l, sep = ""))[,-c(1, 2)]
names(coefs) = c('intercept', 'reg_coef', 'reg_name')
This gives you
intercept reg_coef reg_name
1 -5.753758 2.178137 log(M1).lag0
2 7.356434 7.532603 rs.lag0
3 7.198149 8.993312 rl.lag0
4 -5.840754 2.193382 log(M1).lag1
5 7.366914 7.419599 rs.lag1
6 7.211223 8.879969 rl.lag1
7 -5.988306 2.220994 log(M1).lag4
8 7.395494 7.127231 rs.lag4
9 7.246161 8.582998 rl.lag4
Related
I want to make some calculations using loop in R.
I try assign but it still does not work well.
Can anyone give me a hint about how to setting up correct variable in R, please?
# My data
data <- read.table(textConnection("
a1 a2
a1 1.00000000 0.4803088
a1 0.48030878 1.0000000
"), header = TRUE)
no <- 2
for (k in 1:no){
paste0("dat.",k) <- aggregate(data[,c(paste0("a",k),paste0("b",k), paste0("b",k))],list(data$id),mean)
paste0("cor.",k) <- cor(paste0("dat.mean.",k),use = "complete.obs")
paste0("cal.",k) <- as.data.frame(paste0("dat.mean.",k))
paste0("lm.",k) <- lm(paste0("a",k) ~ paste0("b",k),data = paste0("lm.cal.",k))
}
I'm not sure which language you are coming from (SAS maybe?) but R is a proper functional programming language and doesn't use things like macros to automate tasks. Here's a more R-like way to approach the problem
no <- 2
results <- lapply(1:no, function(k) {
# use aggregate function to make correlation calculation.
this_dat_mean <- aggregate(data[,c(paste0("y", c("f","p","c"), "_",k))], list(data$id), mean)
this_cor <- cor(this_dat_mean, use = "complete.obs")
#write.table(this_cor, "file_path", row.names=T, col.names=T, quote=F)
# calculate the lm
this_lm_cal <- as.data.frame(this_dat_mean)
this_lm <- lm(reformulate(paste0("yc_",k), paste0("yf_",k)), data = this_lm_cal)
#write.table(this_lm, "file_path2", row.names=T, col.names=T, quote=F)
list(lm=this_lm, cor=this_cor)
})
Notice that we use a function to iterate over the inputs of interest. This function has a bunch of local variables. We can return a list of values that we want to preserve from the function. We can get at them by looking at
results[[1]]$lm
results[[2]]$cor
for example. It's better to create a (possibly named) list of values in R than to create a bunch of similarly named variables.
The lm model isn't a data.frame so you can't use write.table with that. Not sure what the goal was there.
For your use case, I second the point by MrFlick and suggest rewriting your code.
However, as I sometimes myself prefer dynamically generated variables in some situations and R is messy and do allow you to do so in selective ways (some things work, some do not), I would like to briefly explain you how:
> k=4
> paste0("lm.", k)
[1] "lm.4"
> paste0("lm.", k) <- 1515
Error in paste0("lm.", k) <- 1515 :
target of assignment expands to non-language object
> assign ( paste0("lm.", k) , 1515 )
> paste0("lm.", k)
[1] "lm.4"
> eval(parse(text = paste0("lm.", k) ))
[1] 1515
> str(eval(parse(text = paste0("lm.", k) )))
num 1515
> str(paste0("lm.", k) )
chr "lm.4"
in summary: every time you use a glued-together variable, you have to refer to it through eval/parse. And remember that <-will not work as opperator - use assign()
Here is my R Script that works just fine:
perc.rank <- function(x) trunc(rank(x)) / length(x) * 100.0
library(dplyr)
setwd("~/R/xyz")
datFm <- read.csv("yellow_point_02.csv")
datFm <- filter(datFm, HRA_ClassHRA_Final != -9999)
quant_cols <- c("CL_GammaRay_Despiked_Spline_MLR", "CT_Density_Despiked_Spline_FinalMerged",
"HRA_PC_1HRA_Final", "HRA_PC_2HRA_Final","HRA_PC_3HRA_Final",
"SRES_IMGCAL_SHIFT2VL_Slab_SHIFT2CL_DT", "Ultrasonic_DT_Despiked_Spline_MLR")
# add an extra column to datFm to store the quantile value
for (column_name in quant_cols) {
datFm[paste(column_name, "quantile", sep = "_")] <- NA
}
# initialize an empty dataframe with the new column names appended
newDatFm <- datFm[0,]
# get the unique values for the hra classes
hraClassNumV <- sort(unique(datFm$HRA_ClassHRA_Final))
# loop through the vector and create currDatFm and append it to newDatFm
for (i in hraClassNumV) {
currDatFm <- filter(datFm, HRA_ClassHRA_Final == i)
for (column_name in quant_cols) {
currDatFm <- within(currDatFm,
{
CL_GammaRay_Despiked_Spline_MLR_quantile <- perc.rank(currDatFm$CL_GammaRay_Despiked_Spline_MLR)
CT_Density_Despiked_Spline_FinalMerged_quantile <- perc.rank(currDatFm$CT_Density_Despiked_Spline_FinalMerged)
HRA_PC_1HRA_Final_quantile <- perc.rank(currDatFm$HRA_PC_1HRA_Final)
HRA_PC_2HRA_Final_quantile <- perc.rank(currDatFm$HRA_PC_2HRA_Final)
HRA_PC_3HRA_Final_quantile <- perc.rank(currDatFm$HRA_PC_3HRA_Final)
SRES_IMGCAL_SHIFT2VL_Slab_SHIFT2CL_DT_quantile <- perc.rank(currDatFm$SRES_IMGCAL_SHIFT2VL_Slab_SHIFT2CL_DT)
Ultrasonic_DT_Despiked_Spline_MLR_quantile <- perc.rank(currDatFm$Ultrasonic_DT_Despiked_Spline_MLR)
}
)
}
newDatFm <- rbind(newDatFm, currDatFm)
}
newDatFm <- newDatFm[order(newDatFm$Core_Depth),]
# head(newDatFm, 10)
write.csv(newDatFm, file = "Ricardo_quantiles.csv")
I have a few questions though. Every R book or video that I have read or watched, recommends using the 'apply' family of language constructs over the classic 'for' loop stating that apply is much faster.
So the first question is: how would you write it using apply (or tapply or some other apply)?
Second, is this really true though that apply is much faster than for? The csv file 'yellow_point_02.csv' has approx. 2500 rows. This script runs almost instantly on my Macbook Pro which has 16 Gig of memory.
Third, See the 'quant_cols' vector? I created it so that I could write a generic loop (for columm_name in quant_cols) ....But I could not make it to work. So I hard-coded the column names post-fixed with '_quantile' and called the 'perc.rank' many times. Is there a way this could be made dynamic? I tried the 'paste' stuff that I have in my script, but that did not work.
On the positive side though, R seems awesome in its ability to cut through the 'Data Wrangling' tasks with very few statements.
Thanks for your time.
I am trying to print the "result" of using table function, but when I tried to use the code here, I got something very strange:
for (i in 1:4){
print (table(paste("group",i,"$", "BMI_obese",sep=""), paste("group",i,"$","A1.1", sep="")))
}
This is the result in R output:
group1$A1.1
group1$BMI_obese 1
group2$A1.1
group2$BMI_obese 1
group3$A1.1
group3$BMI_obese 1
group4$A1.1
group4$BMI_obese 1
But when I type out the statement without typing inside the loop:
table(group2$BMI_obese, group2$A1.1)
I got what I want:
1 2 3 4 5
0 51 20 9 8 0
1 37 20 15 6 4
Does anyone know which part of my for loop code is not correct or can be modified to fit my purpose of printing the loop table result?
Hi, all but now I have another problem. I am trying to add an inner loop which will take the column name as an argument, because I would like to loop through mulitiple column for each of the group data (i.e. for group1, I would like to have table of BMI_obese vs A1.1, BMI_obese vs A1.2 ... BMI_obese vs A1.15. This is my code, but somehow it is not working, I think it is because it is not recognizing the A1.1, A1.2,... as an column taking from the data group1, group2, group3, group4. But instead it is treated as a string I think. I am not sure how to fix it:
for (i in 2:4) {
for (j in c("A1.1","A1.2"))
{
print(with(get(paste0("group", i)),table(BMI_obese,j)))
}
}
I keep getting this error message:
Error in table(BMI_obese, j) : all arguments must have the same length
Okay, you are trying to construct a variable name using paste and then do a table. You are simply passing the name of the variable to table, not the variable object itself. For this sort of approach you want to use get()
for (i in 1:4) {
with(get(paste0("group", i), table(BMI_obese, A1.1))
}
#example saving as a list (using lapply rather than for loop)
group1 <- data.frame(x=LETTERS[1:10], y=(1:10)[sample(10, replace=TRUE)])
group2 <- data.frame(x=LETTERS[1:10], y=(1:10)[sample(10, replace=TRUE)])
result <- lapply(1:2, function(i) with(get(paste0("group", i)), table(x, y)))
#look at first six rows of each:
head(result[[1]])
head(result[[2]])
#example illustrating fetching objects from a string name
data(mtcars)
head(with(get("mtcars"), table(disp, cyl)))
head(with(get("mtcars"), table(disp, "cyl")))
#Error in table(disp, "cyl") : all arguments must have the same length
head(with(get("mtcars"), table(disp, get("cyl"))))
You could also use a combination of eval and parse like this:
x1 <- c(sample(10, 100, replace = TRUE))
y1 <- c(sample(10, 100, replace = TRUE))
table(eval(parse(text = paste0("x", 1))),
eval(parse(text = paste0("y", 1))))
But I'd also say it is not the nicest practice to access variables that way...
Your types are used wrong. See the difference:
table(group2$BMI_obese, group2$A1.1)
and
table(paste(...),paste(...))
So what type does paste return? Certainly some string.
EDIT:
paste(...) was not meant to be syntactically correct but an abbreviation for paste("group",i,"$", "BMI_obese",sep=""), or whatever you paste together.
paste(...) is returning some string. If you put that result into a table, you get a table of strings (the unexpected result that you got). What you want to do is acessing variables or fields with the name which is returned by your paste(...). Just an an eval to your paste like Daniel said and do it like this.
for (i in 1:4){
print (table(eval(paste("group",i,"$", "BMI_obese",sep="")),eval(paste("group",i,"$","A1.1", sep=""))))
}
When I have data.frame objects, I can simply do View(df), and then I get to see the data.frame in a nice table (even if I can't see all of the rows, I still have an idea of what variables my data contains).
But when I have a list object, the same command does not work. And when the list is large, I have no idea what the list looks like.
I've tried head(mylist) but my console simply cannot display all of the information at once. What's an efficient way to look at a large list in R?
Here's a few ways to look at a list:
Look at one element of a list:
myList[[1]]
Look at the head of one element of a list:
head(myList[[1]])
See the elements that are in a list neatly:
summary(myList)
See the structure of a list (more in depth):
str(myList)
Alternatively, as suggested above you could make a custom print method as such:
printList <- function(list) {
for (item in 1:length(list)) {
print(head(list[[item]]))
}
}
The above will print out the head of each item in the list.
I use str to see the structure of any object, especially complex list's
Rstudio shows you the structure by clicking at the blue arrow in the data-window:
You can also use a package called listviewer
library(listviewer)
jsonedit( myList )
If you have a really large list, you can look at part of it using
str(myList, max.level=1)
(If you don't feel like typing out the second argument, it can be written as max=1 since there are no other arguments that start with max.)
I do this often enough that I have an alias in my .Rprofile for it:
str1 <- function(x, ...) str(x, max.level=1, ...)
And a couple others that limit the printed output (see example(str) for an example of using list.len):
strl <- function(x, len=10L, ...) str(x, list.len=len, ...) # lowercase L in the func name
str1l <- function(x, len=10L, ...) str(x, max.level=1, list.len=len, ...)
you can check the "head" of your dataframes using lapply family:
lapply(yourList, head)
which will return the "heads" of you list.
For example:
df1 <- data.frame(x = runif(3), y = runif(3))
df2 <- data.frame(x = runif(3), y = runif(3))
dfs <- list(df1, df2)
lapply(dfs, head)
Returns:
> lapply(dfs, head)
[[1]]
x y
1 0.3149013 0.8418625
2 0.8807581 0.5048528
3 0.2490966 0.2373453
[[2]]
x y
1 0.4132597 0.5762428
2 0.0303704 0.3399696
3 0.9425158 0.5465939
Instead of "head" you can use any function related to the data.frames, i.e. names, nrow...
Seeing as you explicitly specify that you want to use View() with a list, this is probably what you are looking for:
View(myList[[x]])
Where x is the number of the list element that you wish to view.
For example:
View(myList[[1]])
will show you the first element of the list in the standard View() format that you will be used to in RStudio.
If you know the name of the list item you wish to view, you can do this:
View(myList[["itemOne"]])
There are several other ways, but these will probably serve you best.
This is a simple edit of giraffehere's excellent answer.
For some lists it is convenient to only print the head of a subset of the nested objects, to print the name of the given slot above the output of head().
Arguments:
#'#param list a list object name
#'#param n an integer - the the objects within the list that you wish to print
#'#param hn an integer - the number of rows you wish head to print
USAGE: printList(mylist, n = 5, hn = 3)
printList <- function(list, n = length(list), hn = 6) {
for (item in 1:n) {
cat("\n", names(list[item]), ":\n")
print(head(list[[item]], hn))
}
}
For numeric lists, output may be more readable if the number of digits is limited to 3, eg:
printList <- function(list, n = length(list), hn = 6) {
for (item in 1:n) {
cat("\n", names(list[item]), ":\n")
print(head(list[[item]], hn), digits = 3)
}
}
I had a similar problem and managed to solve it using as_tibble() on my list (dplyr or tibble packages), then just use View() as usual.
In recent versions of RStudio, you can just use View() (or alternatively click on the little blue arrow beside the object in the Global Environment pane).
For example, if we create a list with:
test_list <- list(
iris,
mtcars
)
Then either of the above methods will show you:
I like using as.matrix() on the list and then can use the standard View() command.
It seems possible to assign a vector of functions in R like this:
F <- c(function(){return(0)},function(){return(1)})
so that they can be invoked like this (for example): F[[1]]().
This gave me the impression I could do this:
DF <- data.frame(F=c(function(){return(0)}))
which results in the following error
Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot
coerce class ""function"" to a data.frame
Does this mean it is not possible to put functions into a data frame? Or am I doing something wrong?
No, you cannot directly put a function into a data-frame.
You can, however, define the functions beforehand and put their names in the data frame.
foo <- function(bar) { return( 2 + bar ) }
foo2 <- function(bar) { return( 2 * bar ) }
df <- data.frame(c('foo', 'foo2'), stringsAsFactors = FALSE)
Then use do.call() to use the functions:
do.call(df[1, 1], list(4))
# 6
do.call(df[2, 1], list(4))
# 8
EDIT
The above work around will work as long as you have a named function.
The issue seems to be that R see's the class of the object as a function, looks up the appropriate method for as.data.frame() (i.e. as.data.frame.function()) but can't find it. That causes a call to as.data.frame.default() which pretty must is a wrapper for a stop() call with the message you reported.
In short, they just seem not to have implemented it for that class.
While you can't put a function or other object directly into a data.frame, you can make it work if you go via a matrix.
foo <- function() {print("qux")}
m <- matrix(c("bar", foo), nrow=1, ncol=2)
df <- data.frame(m)
df$X2[[1]]()
Yields:
[1] "qux"
And the contents of df look like:
X1 X2
1 bar function () , {, print("qux"), }
Quite why this works while the direct path does not, I don't know. I suspect that doing this in any production code would be a "bad thing".