Why doesn't my function save my assignment? - r

f1 <- function(x) {
zx1 <- sample(1:nrow(zone4[[x]]), nrow(zone4[[x]]), replace=F)
zone4[[x]]$randnums <- zx1
}
f1(1)
## DOESN'T UPDATE zone4[[1]]
zx2 <- sample(1:nrow(zone4[[1]]), nrow(zone4[[1]]), replace=F)
zone4[[1]]$randnums <- zx2
## DOES UPDATE zone[[1]]
If I make a function f1() like shown above, the object 'zone4[[x]]' is not updated. However, if I run the same command as above but explicitly state 'x', as shown below, then the object 'zone4[[x]]' is updated. Why could this be? I want to know because I want to run iterations of the code. If within the definition of the function f1() above I write "names(zone4[[x]])", then the output I get tells me that the function did what it was supposed to, but when queried again, zone[[x]] appears to be unchanged. Thank you for your help. The idea is to make random numbers for each subset of a data set for a given year and another variable, zone. The data set was originally a single data frame, but I used the split() function to separate the data according to year and zone, of which there are 4. Maybe there is a better way to assign random numbers to specific subsets of data without using the split() function?

R functions don't usually have side effects (ie. changing things in global objects)
This is a good thing (most of the time as we don't want unintended consequences)
The idiomatic approach is to assign the results to a new object (it can be the same name to overwrite the original)
f1 <- function(x) {
zx1 <- sample(1:nrow(zone4[[x]]), nrow(zone4[[x]]), replace=F)
zone4[[x]]$randnums <- zx1
# usually a good idea to return the complete object
# especially when a replacement function (in your case `[[<-`)
# is the last one called
return(zone4)
}
zone4 <- f1(1)
An alternative would be to use data.table
library(data.table)
zone4 <- lapply(zone4, as.data.table)
f1 <- function(x) {
zone4[[x]][,randnums := sample(.N)]
invisible(NULL)
}

Related

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

R code to Provide progress on code execution

So, I have a task that needs to be performed on a list of objects (30 total objects). I used to have this coded as individual objects, and would run it one at a time. But after some StackExchange feedback I have moved on to lists. But, when I execute the code so it starts going through the list and completing the desired task, I have no idea what the progress is. I only see the red stopsign in the RStudio GUI, so I have no idea if the computer is hung up, or what object in the list it is currently working on.
Has anyone tried to create some sort of Feedback code chunk, where you will get some sort of feedback when an object in the listed is completed?
Editing for more details
I have this list
sizes <- list(
n1.6<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
n7.8<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1),
n9.10<-c(2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1),
Which actually has 30 objects but the 3 above are an example
and I have this function
strata_eq <- function(n_vec){
sample_means <-matrix(rep(0,80000), nrow=16, ncol=5000)
for(i in 1:5000){
sample <- strata(data[,c(1,23)], "stream", n_vec, method="srswor")
sample <- data[sample$ID_unit,]
stream_means <- with(sample, tapply(avg.RPD, stream, mean))
sample_means[,i] <- c(unlist(stream_means))
}
return(sample_means)
}
I then would like to pass this function to the list, so that the function is applied to every object in the list.
rates <- lapply(sizes, function(x) strata_eq(x)
)
However, this takes really long, and I was hoping there was some sort of code that would help provide some progress on the execution. Maybe just something that tells me what object is completed in the list, or which one it is working on...?
You could include in your code, a print function with the time passed. It should be enough to allow you to estimate the remaining time if your function is equivalent on all the objects.
At the beginning of your code:
ptm <- proc.time()
In your function:
strata_eq <- function(n_vec){
sample_means <-matrix(rep(0,80000), nrow=16, ncol=5000)
for(i in 1:5000){
sample <- strata(data[,c(1,23)], "stream", n_vec, method="srswor")
sample <- data[sample$ID_unit,]
stream_means <- with(sample, tapply(avg.RPD, stream, mean))
sample_means[,i] <- c(unlist(stream_means))
}
print(proc.time() - ptm)
return(sample_means)
}
I then would like to pass this function to the list, so that the function is applied to every object in the list.
rates <- lapply(sizes, strata_eq)
May I suggest you as well some trick to tighten your function:
Create a table equal to data[,c(1,23)] before the beginning of the function
replacing the for by a *ply* function and convert your result into a matrix later on. Easily achievable using rbindlist, for example.
Which lead to : use the package data.table

Running the same function multiple times and saving results with different names in workspace

So, I built a function called sort.song.
My goal with this function is to randomly sample the rows of a data.frame (DATA) and then filter it out (DATA.NEW) to analyse it. I want to do it multiple times (let's say 10 times). By the end, I want that each object (mantel.something) resulted from this function to be saved in my workspace with a name that I can relate to each cycle (mantel.something1, mantel.somenthing2...mantel.something10).
I have the following code, so far:
sort.song<-function(DATA){
require(ade4)
for(i in 1:10){ # Am I using for correctly here?
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantel.numnotes[i]<<-mantel.rtest(coord.dist,num.notes.dist,nrepet=1000)
mantel.songdur[i]<<-mantel.rtest(coord.dist,songdur.dist,nrepet=1000)
mantel.hfreq[i]<<-mantel.rtest(coord.dist,hfreq.dist,nrepet=1000)
mantel.lfreq[i]<<-mantel.rtest(coord.dist,lfreq.dist,nrepet=1000)
mantel.bwidth[i]<<-mantel.rtest(coord.dist,bwidth.dist,nrepet=1000)
mantel.hfreqlnote[i]<<-mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
}
}
Could someone please help me to do it the right way?
I think I'm not assigning the cycles correctly for each mantel.somenthing object.
Many thanks in advance!
The best way to implement what you are trying to do is through a list. You can even make it take two indices, the first for the iterations, the second for the type of analysis.
mantellist <- as.list(1:10) ## initiate list with some values
for (i in 1:10){
...
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
...)
}
return(mantellist)
In this way you can index your specific analysis for each iteration in an intuitive way:
mantellist[[2]][['hfreq']]
mantellist[[2]]$hfreq ## alternative
EDIT by Mohr:
Just for clarification...
So, according to your suggestion the code should be something like this:
sort.song<-function(DATA){
require(ade4)
mantellist <- as.list(1:10)
for(i in 1:10){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq=mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth=mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote=mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
)
}
return(mantellist)
}
You can achieve your objective of repeating this exercise 10 (or more times) without using an explicit for-loop. Rather than have the function run the loop, write the sort.song function to run one iteration of the process, then you can use replicate to repeat that process however many times you desire.
It is generally good practice not to create a bunch of named objects in your global environment. Instead, you can hold of the results of each iteration of this process in a single object. replicate will return an array (if possible) otherwise a list (in the example below, a list of lists). So, the list will have 10 elements (one for each iteration) and each element will itself be a list containing named elements corresponding to each result of mantel.rtest.
sort.song<-function(DATA){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist <- dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist <- dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist <- dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist <- dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist <- dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist <- dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist <- dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
return(list(
numnotes = mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur = mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq = mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq = mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth = mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote = mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
))
}
require(ade4)
replicate(10, sort.song(DATA))

Applying a set of operations across several data frames in r

I've been learning R for my project and have been unable to google a solution to my current problem.
I have ~ 100 csv files and need to perform an exact set of operations across them. I've read them in as separate objects (which I assume is probably improper r style) but I've been unable to write a function that can loop through. Each csv is a dataframe that contain information, including a column with dates in decimal year form. I need to create 2 new columns containing year and day of year. I've figured out how to do it manually I would like to find a way to automate the process. Here's what I've been doing:
#setup
library(lubridate) #Used to check for leap years
df.00 <- data.frame( site = seq(1:10), date = runif(10,1980,2000 ))
#what I need done
df.00$doy <- NA # make an empty column which I'm going to place the day of the year
df.00$year <- floor(df.00$date) # grabs the year from the date column
df.00$dday <- df.00$date - df.00$year # get the year fraction. intermediate step.
# multiply the fraction year by 365 or 366 if it's a leap year to give me the day of the year
df.00$doy[which(leap_year(df.00$year))] <- round(df.00$dday[which(leap_year(df.00$year))] * 366)
df.00$doy[which(!leap_year(df.00$year))] <- round(df.00$dday[which(!leap_year(df.00$year))] * 365)
The above, while inelegant, does what I would like it to. However, I need to do this to the other data frames, df.01 - df.99. So far I've been unable to place it in a function or for loop. If I place it into a function:
funtest <- function(x) {
x$doy <- NA
}
funtest(df.00) does nothing. Which is what I would expect from my understanding of how functions work in r but if I wrap it up in a for loop:
for(i in c(df.00)) {
i$doy <- NA }
I get "In i$doy <- NA : Coercing LHS to a list" several times which tells me that the loop isn't treat the dataframe as a single unit but perhaps looking at each column in the frame.
I would really appreciate some insight on what I should be doing. I feel that I could have solved this easily using bash and awk but I would like to be less incompetent using r
the most efficient and direct way is to use a list.
Put all of your CSV's into one folder
grab a list of the files in that folder
eg: files <- dir('path/to/folder', full.names=TRUE)
iterativly read in all those files into a list of data.frames
eg: df.list <- lapply(files, read.csv, <additional args>)
apply your function iteratively over each data.frame
eg: lapply(df.list, myFunc, <additional args>)
Since your df's are already loaded, and they have nice convenient names, you can grab them easily using the following:
nms <- c(paste0("df.0", 0:9), paste0("df.", 10:99))
df.list <- lapply(nms, get)
Then take everything you have in the #what I need done portion and put inside a function, eg:
myFunc <- function(DF) {
# what you want done to a single DF
return(DF)
}
And then lapply accordingly
df.list <- lapply(df.list, myFunc)
On a separate notes, regarding functions:
The reason your funTest "does nothing" is that it you are not having it return anything. That is to say, it is doing something, but when it finishes doing that, then it does "nothing".
You need to include a return(.) statement in the function. Alternatively, the output of last line of the function, if not assigned to an object, will be used as the return value -- but this last sentence is only loosely true and hence one needs to be cautious. The cleanest option (in my opinion) is to use return(.)
regarding the for loop over the data.frame
As you observed, using for (i in someDataFrame) {...} iterates over the columns of the data.frame.
You can iterate over the rows using apply:
apply(myDF, MARGIN=1, function(x) { x$doy <- ...; return(x) } ) # dont forget to return

Return multiple data frames from function R

I am trying to put together a function that will loop thru a given data frame in blocks and return a new data frame containing stuff calculated from the original. The length of x will be different each time and the actual problem will have more loops in the function. New-ish to R and have not been able to find anything helpful (I don't think using a list will help)
func<-function(x){
tmp # need to declare this here?
for (i in 1:dim(x)[1]){
tmp[i]<-ave(x[i,]) # add things to it
}
return(tmp)
}
df<-cbind(rnorm(10),rnorm(10))
means<-func(df)
This code does not work but I hope it gets across what I want to do. thanks!
Do you mean you want to loop through each row of df and return a data frame with the calculated values?
You may want to look in to the apply function:
df <- cbind(rnorm(10),rnorm(10))
# apply(df,1,FUN) does FUN(df[i,])
# e.g. mean of each row:
apply(df,1,mean)
For more complicated looping like performing some operation on a per-factor basis, I strongly recommend package plyr, and function ddply within. Quick example:
df <- data.frame( gender=c('M','M','F','F'), height=c(183,176,157,168) )
# find mean height *per gender*
ddply(df,.(gender), function(x) c(height=mean(x$height)))
# returns:
gender height
1 F 162.5
2 M 179.5

Resources