How to get the previous name of data set in R? - r

Suppose I assign data=Abortion (Abortion data set given in the ltm package). I have some function where one of the inputs is data.
While using the function, I will write.
function.name(data=Abortion)
For writing the summary of the results I want the name of the data set I used; here in this case it is Abortion.
How can I get that name back?
In more general sense. suppose I have some object which has some name abc. I assign xyz=abc and now how can I get the name abc back?

I suggest to rethink your approach. I assume you are trying to loop through different datasets and get results. Try following example:
#dummy data
dat1 <- runif(10)
dat2 <- runif(10)
dat3 <- runif(10)
#my function
myfunc <- function(data) max(data)
#make a list - creating list of data manually, this is done automatically,e.g.:
# lapply(list.files(),read.table)
all_dat <- list(dat1,dat2,dat3)
#add names to list
names(all_dat) <- c("dat1","dat2","dat3")
#loop through dat1,2,3
sapply(all_dat,myfunc)

Related

Looping function with left_join over multiple variables

I am working to loop a function that contains a left_join iteratively over a dataframe based on multiple variables in R. The function works when I run it line-by-line over the dataframe, but breaks down in the loop. I need to automate this process because I have to run it hundreds of times, but I am getting errors using foreach and mapply.
A portion of the original data set and the original function is this:
library(tidyverse)
ID <- c(22226820,22226820,22226814,22226814)
ID_US_1 <- c(22226830,22226818,22226816,22226832)
mydf <- data.frame(cbind(ID==as.character(ID),ID_US_1=as.character(ID_US_1)))
ID_key <- c(22226830,22226818,22226818,22226816,22226816,22226832,22226832,22226806,22226806,22226814,22226814,22226804)
ID_key_US <- c(0,22226806,22226814,22226804,22226802,22226840,22226842,22226798,22226796,22226816,22226832,22227684)
mykey <- data.frame(cbind(ID_key=as.character(ID_key),ID_key_US=as.character(ID_key_US)))
myfx <- function(iteration_prior,iteration){
# iteration_prior <- "1"
# iteration <- "2"
varnameprior <- paste0("ID_US","_",iteration_prior)
varname <- paste0("ID_US","_",iteration)
colnames(mykey) <- c(varnameprior,varname)
mydf <-mydf %>%
left_join(x=.,y=mykey,by=varnameprior)
mydf[,ncol(mydf)][is.na(mydf[,ncol(mydf)])] <- 0
mydf[,ncol(mydf)]<-as.character(mydf[,ncol(mydf)])
return(mydf)
}
prior <- c(1,2,3)
current <- c(2,3,4)
mylist <- data.frame(cbind(prior=prior,current=current))
mydf <- myfx(prior[1],current[1])
mydf <- myfx(prior[2],current[2])
This creates my desired output, which is iterative columns of data. ID_US_2 is calculated based on ID_US_1 using the mykey dataframe and ID_US_3 is calculated using ID_US_2 and mykey.
I need to carry out this operation hundreds of times, which means I need to automate the process. I have tried a foreach loop and get the error that 'Join columns must be present in data'. I think this means my new output is not correctly amending to the dataframe. I got the same error/issue with mapply.
library(foreach)
foreach(i=prior,j=current) %do% {myfx(i,j)}
I also considered a nested for loop, but was hung up on the multiple variables (and foreach/mapply seem better suited).
I think that your only issue is that you haven't reassigned mydf in the foreach command. Editing that, you have:
foreach(i=prior, j=current) %do% {mydf <- myfx(i,j)}

How to create columns in a data frame with the name of the object repeated using code that doesn't require manual input

I want to know how to create columns in a data frame with the name of the object repeated using code that doesn't require manual input.
For example, I can do this manually using the following code:
# displays df
mtcars
# adds column manually
# ---- NOTE: REQUIRES MANUAL INPUT
mtcars$dataset_name <- c("mtcars")
# gives unique values for mtcars$dataset_name
unique(mtcars$dataset_name)
Is there anyway to do this automatically?
Thanks.
We can create a function that takes a object as input, and returns a column with the object name
f1 <- function(dat) {
nm1 <- deparse(substitute(dat))
dat$dataset_name <- nm1
dat
}
f1(mtcars)

Why add column data to data frame not working with a function in r

I am curious that why the following code doesn't work for adding column data to a data frame.
a <- c(1:3)
b <- c(4:6)
df <- data.frame(a,b) # create a data frame example
add <- function(df, vector){
df[[3]] <- vector
} # create a function to add column data to a data frame
d <- c(7:9) # a new vector to be added to the data frame
add(df,d) # execute the function
If you run the code in R, the new vector doesn't add to the data frame and no error also.
R passes parameters to functions by value - not by reference - that means inside the function you work on a copy of the data.frame df and when returning from the function the modified data.frame "dies" and the original data.frame outside the function is still unchanged.
This is why #RichScriven proposed to store the return value of your function in the data.frame df again.
Credits go to #RichScriven please...
PS: You should use cbind ("column bind") to extend your data.frame independently of how many columns already exist and ensure unique column names:
add <- function(df, vector){
res <- cbind(df, vector)
names(res) <- make.names(names(res), unique = T)
res # return value
}
PS2: You could use a data.table instead of a data.frame which is passed by reference (not by value).

R - Subset a Dataframe with a Programmatically built Formula

I'm working with a large data frame that is pulled from a data lake which I need to subset according to multiple different columns and run an analysis on. The basic subsettings come from an external Excel file which I read in and generate all possible combinations of. I want something to loop through each of these columns and subset my data accordingly.
A few of the subsettings follow a similar form to:
data_settings <- data.frame(country = rep(c('DE','RU','US','CA','BR'),6),
transport=rep(c('road','air','sea')),
category = rep(c('A','B')))
And my data lake extract has a form like:
df <- data.frame(country = rep(unique(data_settings$country),6),
transport = rep(unique(data_settings$transport),10),
category = rep(c('A','B'),15),
values = round(runif(30) * 10))
I need to subset the data according to each of the rows in my data_settings data frame, so I built a loop which constructs the formula according to what is in my data_settings data frame.
for(i in 1:nrow(data_settings)){
sub_string <- paste0(names(data_settings[1]), '==', data_settings[i,1])
for(j in 2:ncol(data_settings)){
col <- names(data_settings)[j]
val <- as.character(data_settings[i,j])
sub_string <- paste0(sub_string, ' & ', col," == ","'",val,"'")
}
df_sub <- subset(df, formula(sub_string))
}
This successfully builds my strings which I try to pass to formula or as.formula, but I receive an error at that point. I've tried a few different formulations without any success. In my actual case, there are thousands of combinations with different columns and values to filter against.
Thanks in advance for your help!
Try this:
merge(data_settings, df)
I worked with my previous approach a bit more today without using subset, filter, etc. and put this together which seems to do what I want well enough by filtering recursively according to the next item in the data_settings frame.
for(i in 1:nrow(data_settings)){
df_sub <- df
for(j in 1:ncol(data_settings)){
col <- names(data_settings)[j]
val <- as.character(data_settings[i,j])
df_col <- grep(col, names(df))
df_sub <- df_sub[df_sub[,df_col] == val,]
}
# Run further analysis here...
}

Populating a data frame in R in a loop

I am trying to populate a data frame from within a for loop in R. The names of the columns are generated dynamically within the loop and the value of some of the loop variables is used as the values while populating the data frame. For instance the name of the current column could be some variable name as a string in the loop, and the column can take the value of the current iterator as its value in the data frame.
I tried to create an empty data frame outside the loop, like this
d = data.frame()
But I cant really do anything with it, the moment I try to populate it, I run into an error
d[1] = c(1,2)
Error in `[<-.data.frame`(`*tmp*`, 1, value = c(1, 2)) :
replacement has 2 rows, data has 0
What may be a good way to achieve what I am looking to do. Please let me know if I wasnt clear.
It is often preferable to avoid loops and use vectorized functions. If that is not possible there are two approaches:
Preallocate your data.frame. This is not recommended because indexing is slow for data.frames.
Use another data structure in the loop and transform into a data.frame afterwards. A list is very useful here.
Example to illustrate the general approach:
mylist <- list() #create an empty list
for (i in 1:5) {
vec <- numeric(5) #preallocate a numeric vector
for (j in 1:5) { #fill the vector
vec[j] <- i^j
}
mylist[[i]] <- vec #put all vectors in the list
}
df <- do.call("rbind",mylist) #combine all vectors into a matrix
In this example it is not necessary to use a list, you could preallocate a matrix. However, if you do not know how many iterations your loop will need, you should use a list.
Finally here is a vectorized alternative to the example loop:
outer(1:5,1:5,function(i,j) i^j)
As you see it's simpler and also more efficient.
You could do it like this:
iterations = 10
variables = 2
output <- matrix(ncol=variables, nrow=iterations)
for(i in 1:iterations){
output[i,] <- runif(2)
}
output
and then turn it into a data.frame
output <- data.frame(output)
class(output)
what this does:
create a matrix with rows and columns according to the expected growth
insert 2 random numbers into the matrix
convert this into a dataframe after the loop has finished.
this works too.
df = NULL
for (k in 1:10)
{
x = 1
y = 2
z = 3
df = rbind(df, data.frame(x,y,z))
}
output will look like this
df #enter
x y z #col names
1 2 3
Thanks Notable1, works for me with the tidytextr
Create a dataframe with the name of files in one column and content in other.
diretorio <- "D:/base"
arquivos <- list.files(diretorio, pattern = "*.PDF")
quantidade <- length(arquivos)
#
df = NULL
for (k in 1:quantidade) {
nome = arquivos[k]
print(nome)
Sys.sleep(1)
dados = read_pdf(arquivos[k],ocr = T)
print(dados)
Sys.sleep(1)
df = rbind(df, data.frame(nome,dados))
Sys.sleep(1)
}
Encoding(df$text) <- "UTF-8"
I had a case in where I was needing to use a data frame within a for loop function. In this case, it was the "efficient", however, keep in mind that the database was small and the iterations in the loop were very simple. But maybe the code could be useful for some one with similar conditions.
The for loop purpose was to use the raster extract function along five locations (i.e. 5 Tokio, New York, Sau Paulo, Seul & Mexico city) and each location had their respective raster grids. I had a spatial point database with more than 1000 observations allocated within the 5 different locations and I was needing to extract information from 10 different raster grids (two grids per location). Also, for the subsequent analysis, I was not only needing the raster values but also the unique ID for each observations.
After preparing the spatial data, which included the following tasks:
Import points shapefile with the readOGR function (rgdap package)
Import raster files with the raster function (raster package)
Stack grids from the same location into one file, with the function stack (raster package)
Here the for loop code with the use of a data frame:
1. Add stacked rasters per location into a list
raslist <- list(LOC1,LOC2,LOC3,LOC4,LOC5)
2. Create an empty dataframe, this will be the output file
TB <- data.frame(VAR1=double(),VAR2=double(),ID=character())
3. Set up for loop function
L1 <- seq(1,5,1) # the location ID is a numeric variable with values from 1 to 5
for (i in 1:length(L1)) {
dat=subset(points,LOCATION==i) # select corresponding points for location [i]
t=data.frame(extract(raslist[[i]],dat),dat$ID) # run extract function with points & raster stack for location [i]
names(t)=c("VAR1","VAR2","ID")
TB=rbind(TB,t)
}
was looking for the same and the following may be useful as well.
a <- vector("list", 1)
for(i in 1:3){a[[i]] <- data.frame(x= rnorm(2), y= runif(2))}
a
rbind(a[[1]], a[[2]], a[[3]])

Resources