Getting Error "could not find function "assign<-" inside of colnames() - r

I'm using assign() to assign some new data frames from some other data frame. I then want to name some of the columns in the new data frame. When I use assign() to create the new data frames it works fine. But when I use the assign() inside of colnames() is gives the error 'Error "could not find function "assign<-".'
Here's my snippet of code(abbreviated of course):
for(i in 1:value) {
assign(Name[i], Old.Data.Frame[Old.Data.Frame$1 == Index[i]]) #I'm going to call this line of code 'New Data Frame' for brevity
for(j in 1:ncol(New Data Frame)) {
colnames(New Data Frame)[j] = as.character(Old.Data.Frame[3,j])
I do all this assign() stuff because the names of the Old Data Frame constantly change and I can create any concrete variables in my code, only the dimentions of the frame stay the same.
The only error in this code is that R cannot "find function assign<- in colnames(...". I'm flustered because assign() had just worked in the line before, any help is appreciated, thanks!

You have a list of variable names in Name, which you assign a value (your code block).
for(i in 1:value) { assign(Name[i], Old.Data.Frame[Old.Data.Frame$1 == Index[i]]) }
Could you then try (note I'm separating this code block for debugging purposes):
for(i in 1:value) { colnames(get(Names[i])) <- as.character(Old.Data.Frame[3,] }
get will retrieve the data (data.frame) assigned to the variable name Names[i] (character)

Related

How can I use a string input from an R function to name a dataset?

I want to create a very simple function that takes part of a large dataset (df) and creates a new dataset in the global environment with a specified name. The problem is that it seems to name the new dataframe "x" instead of the actual string input. Example:
create_dataset<-function(x,rows,columns) {
name<<-df[rows,columns]
}
create_dataset(x="skildpadde",
rows=690:692,
columns=2:7)
How can I use the input "x" as the dataset name?
Use get():
create_dataset<-function(x,rows,columns) {
get(x)[rows,columns]
}
Or, if you trying to assign to x in the global environment:
create_dataset<-function(x,rows,columns) {
assign(x, df[rows,columns],envir = .GlobalEnv)
}
I'm not sure I understand the use case or rationale behind either of these...

Unable to update data in dataframe

i tried updating data in dataframe but its unable to get updating
//Initialize data and dataframe here
user_data=read.csv("train_5.csv")
baskets.df=data.frame(Sequence=character(),
Challenge=character(),
countno=integer(),
stringsAsFactors=FALSE)
/Updating data in dataframe here
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i]==user_data$challenge_sequence[j]&&user_data$challenge[i]==user_data$challenge[j])
{
writedata(user_data$challenge_sequence[i],user_data$challenge[i])
}
}
}
writedata=function( seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence=seqnn,Challenge=challng,countno=1)
baskets.df=rbind(baskets.df,newRow)
}
//view data here
View(baskets.df)
I've modified your code to what I believe will work. You haven't provided sample data, so I can't verify that it works the way you want. I'm basing my attempt here on a couple of common novice mistakes that I'll do my best to explain.
Your writedata function was written to be a little loose with it's scope. When you create a new function, what happens in the function technically happens in its own environment. That is, it tries to look for things defined within the function, and then any new objects it creates are created only within that environment. R also has this neat (and sometimes tricky) feature where, if it can't find an object in an environment, it will try to look up to the parent environment.
The impact this has on your writedata function is that when R looks for baskets.df in the function and can't find it, R then turns to the Global Environment, finds baskets.df there, and then uses it in rbind. However, the result of rbind gets saved to a baskets.df in the function environment, and does not update the object of the same name in the global environment.
To address this, I added an argument to writedata that is simply named data. We can then use this argument to pass a data frame to the function's environment and do everything locally. By not making any assignment at the end, we implicitly tell the function to return it's result.
Then, in your loop, instead of simply calling writedata, we assign it's result back to baskets.df to replace the previous result.
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i] == user_data$challenge_sequence[j] &&
user_data$challenge[i] == user_data$challenge[j])
{
baskets.df <- writedata(baskets.df,
user_data$challenge_sequence[i],
user_data$challenge[i])
}
}
}
writedata=function(data, seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence = seqnn,
Challenge = challng,
countno = 1)
rbind(data, newRow)
}
I'm not sure what you're programming background is, but your loops will be very slow in R because it's an interpreted language. To get around this, many functions are vectorized (which simply means that you give them more than one data point, and they do the looping inside compiled code where the loops are fast).
With that in mind, here's what I believe will be a much faster implementation of your code
user_data=read.csv("train_5.csv")
# challenge_indices will be a matrix with TRUE at every place "challenge" and "challenge_sequence" is the same
challenge_indices <- outer(user_data$challenge_sequence, user_data$challenge_sequence, "==") &
outer(user_data$challenge, user_data$challenge, "==")
# since you don't want duplicates, get rid of them
challenge_indices[upper.tri(challenge_indices, diag = TRUE)] <- FALSE
# now let's get the indices of interest
index_list <- which(challenge_indices,arr.ind = TRUE)
# now we make the resulting data set all at once
# this is much faster, because it does not require copying the data frame many times - which would be required if you created a new row every time.
baskets.df <- with(user_data, data.frame(
Sequence = challenge_sequence[index_list[,"row"]],
challenge = challenge[index_list[,"row"]]
)

Need assistance with understanding R for loop error: unexpected '}' in " }"

Just to give some background first:
I currently have 2 data frames (giraffe, leaf) and both of them share the column 'key', where the elements in the leaf data frame are a subset of giraffe. What I needed to do is compare the two data frames and when there are matching elements in both data frames in the 'key' column, the string 'leaf' will be input into another column (project) in the giraffe data frame inside the same row as the matching 'key' element. I've taken the following approach however it seems I have made a small error somewhere and after searching online, I still don't know what it is:
Truth_vector <- is.element((giraffe[,1]),(leaf[,1])) #returns a vector with 3000 elements, most are FALSE except for where the element inside 'key' is present in both data frames
i=1
for (i in 1:length(giraffe[,1])) {
if Truth_vector[i] == TRUE {
giraffe[i,5] <- 'leaf'
}
i = i+1
}
Error: unexpected '}' in "}"
Edit:
I tried implementing the solution as a function however nothing ends up happening, no error messages get returned either. What I've done is:
Project_assign <- function(prjct) {
Truth_vector <- is.element((giraffe[,1]),(prjct[,1]))
giraffe[which(Truth_vector),5] <- 'prjct'
}
Project_assign(leaf)
Edit: This was because everything was getting assigned in the function sub environment, not the global environment. Using assign('giraffe',giraffe,envir=.GlobalEnv) solves this however you should try and avoid the assign function and Instead I used a for loop going over a list of all the dataframes
You have a couple issues. First, the if criteria needs to be in parentheses, and secondly you don't need to increment i yourself. This should suffice:
for (i in 1:length(giraffe[,1])) {
if (Truth_vector[i] == TRUE) {
giraffe[i,5] <- 'leaf'
}
}
Of course, this would do it too:
giraffe[which(Truth_vector),5] <- 'leaf'
(assuming Truth_vector is not longer than the number of rows in giraffe)

return list of dataframe from inside a function to apply rbind using do.call()

Below is a trimmed version of my script. I am hoping my function will return me a list per loop iteration, so that I can rbind all the list to form a new data frame, but when I am executing this script, I keep getting the error:
do.call("rbind", listofdfs) : object 'listofdfs' not found
Thank you all for your help.
library(DBI)
library(RPostgreSQL)
drv<- dbDriver("MyDataBase")
con<-dbConnect(drv,dbname="DB_Name",
host="DB_Location",port=number,user="MyName",password= "Password")
dates <- seq(as.Date(as.character(Sys.Date() - 33)), as.Date(as.character(Sys.Date() - 1)), by=1)
my_function<-function(dates){
listofdfs<-list()
for(i in 1:length(dates){
data<-dbGetQuery(con, sprintf("select X,Y,Z from TABLE where date>=date('%s')", dates[i])
data$newColumn<-mean(data$X)
listofdfs[[i]]<-data
}
return(listofdfs)
}
df<-do.call("rbind", listofdfs)
I have a small simplified example to refer, please refer to the dates variable from above
my_list_function<-function(dates){
for(i in 1:length(dates))
{
my_list<-list()
my_list[[i]]<-i
}
return(my_list) }
k<-do.call(rbind,my_list(dates))
View(k)
now running
do.call(rbind,my_list(dates))
returns error could not find function "my_list" and running do.call(rbind,my_list_function(dates)) works but is only giving 33.
Thanks again for help.
listofdfs is a variable that is declared within your function. Therefore it is not defined outside of its body.
but because it is returned by the function, you can access it by calling the function itself:
df<-do.call("rbind", my_function(dates))
Also on to make you small example work:
my_list_function<-function(dates){
my_list<-list()
for(i in 1:length(dates))
{
my_list[[i]]<-i
}
return(my_list)
}
k<-do.call(rbind,my_list_function(dates))

R - create iterable list/dataframe from unique()

I'd like to get the unique elements from a column. That seems straight forward. Both of these work, but I'm not getting the object type I'd like:
userlist <- as.list(somebigdf$username)
userlist <- unique(userlist)
or
userlist <- unique(somebigdf$username)
When I iterate through, I'm not getting the names:
for(i in 1:length(userlist)){
cat(names(userlist[i]), '\n')
}
Returns blank spaces.
for(i in userlist){
cat(i, '\n')
}
Returns integers.
The above function is just an example. I'll be using that but also matching the returned name in an if-else function.
The object types seem to be integers or an extended data.frame with lots of values for each name - which isn't what I want. I would really just like a list of strings something along the lines of userlist = c( the results from unique).
Edit -
This code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}
I'm accepting my own answer. Namely, a working solution - this code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}
If someone at a later date has a better answer that seems more in keeping with the question, I will be happy to accept that as the answer.

Resources