R code does not work when called from function - r

HI i just started learning R and finding this problem to be really interesting where I just run a code directly without wrapping in a function it works but when I place it inside a function it doesn't work, What can be possible reason?
fill_column<-function(colName){
count <- 0
for(i in fg_data$particulars) {
count <- count +1
if(grepl(colName, i) && fg_data$value[count] > 0.0){
fg_data[,colName][count] <- as.numeric(fg_data$value[count])
} else {
fg_data[,colName][count] <- 'NA'
}
}
}
fill_column('volume')
Where I am creating new column named volume it this string exists in particulars column.
I have added a comment where solution given by another question does not work for me, Please look at my comment below.

Finally I got it working but reading another answer on SO, here is the solution:
fill_column <- function(colName){
count <- 0
for(i in fg_data$particulars) {
count <- count +1
if(grepl(colName, i) && fg_data$value[count] > 0.0){
fg_data[,colName][count] <- as.numeric(fg_data$value[count])
} else {
fg_data[,colName][count] <- 'NA'
}
}
return(fg_data)
}
fg_data = fill_column('volume')
Now reason, Usually in any language when we modify global object inside any function it reflects on global object immediately but in R we have to return the modified object from function and then assign it again to global object to see our changes. or another way for doing this is to assign local object from within the function to global context using envir=.GlobalEnv.

Related

Create a series of new folders in for loop in R

I have create a small script that passes a vector through a loop. In this loop I am using an if else statement to check if folder exists and if not to create the folder. However, I am getting error: Error in file.exists(i) : invalid 'file' argument. This has to due with file.exist(). I dont understand why this isnt ok. I check the man using help. Seems like this should be working.
folders<- c("RawData", "Output", "BCV", "DEplots", "DEtables", "PathwayOuts", "VolcanoPLots")
for(i in 1:length(folders)){
if (file.exists(i)){
cat(paste0(i, "already exists"))
} else {
cat(paste0(i, "does not exists"))
dir.create(i)
}
}
You are looping over an index (that is, 1:length(folders) is just the vector 1:7, not the values of the folders vector itself. The easiest solution is to loop over the vector itself:
for (i in folders) {
Or, if you still want to loop over the index:
for (i in 1:length(folders)) {
if (file.exists(folders[i])){
cat(paste0(folders[i], "already exists"))
}
else {
cat(paste0(folders[i], "does not exists"))
dir.create(folders[i])
}
}
A quick tip: if you are debugging a for-loop, the place to start is to add print(i) at the start of the loop. You would have immediately seen the problem: i was an integer, not the first value of the vector.

How to handle variable not defined error in r

Is there a way to evade variable not defined error in r? For example the code below throws variable not defined error.
for(i in length(someList)){
print(someList[[i]])
}
Can I check if a variable exists first before executing the code above. A pseudocode would look like this:
if(someList exists){
for(i in length(someList)){
print(someList[[i]])
}
} else cat("The variable does not exist")
Are you looking for the exists function?
someList <- list(1,2,3)
if (exists("someListNot")){
for(i in length(someListNot)){
print(someListNot[[i]])
}
}
if (exists("someList")){
for(i in 1:length(someList)){
print(someList[[i]])
}
}
You can use the exists("myVariable") function to check existence.
Also make sure your loop is really looping! If you use for (i in length(variable) it will only use the last index of your variable instead of looping over it.
You code could look something like this:
if ( exits("myVariable")){
for(i in seq_len(length(someList))){
print(someList[[i]])
}
}

Unable to update data in dataframe

i tried updating data in dataframe but its unable to get updating
//Initialize data and dataframe here
user_data=read.csv("train_5.csv")
baskets.df=data.frame(Sequence=character(),
Challenge=character(),
countno=integer(),
stringsAsFactors=FALSE)
/Updating data in dataframe here
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i]==user_data$challenge_sequence[j]&&user_data$challenge[i]==user_data$challenge[j])
{
writedata(user_data$challenge_sequence[i],user_data$challenge[i])
}
}
}
writedata=function( seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence=seqnn,Challenge=challng,countno=1)
baskets.df=rbind(baskets.df,newRow)
}
//view data here
View(baskets.df)
I've modified your code to what I believe will work. You haven't provided sample data, so I can't verify that it works the way you want. I'm basing my attempt here on a couple of common novice mistakes that I'll do my best to explain.
Your writedata function was written to be a little loose with it's scope. When you create a new function, what happens in the function technically happens in its own environment. That is, it tries to look for things defined within the function, and then any new objects it creates are created only within that environment. R also has this neat (and sometimes tricky) feature where, if it can't find an object in an environment, it will try to look up to the parent environment.
The impact this has on your writedata function is that when R looks for baskets.df in the function and can't find it, R then turns to the Global Environment, finds baskets.df there, and then uses it in rbind. However, the result of rbind gets saved to a baskets.df in the function environment, and does not update the object of the same name in the global environment.
To address this, I added an argument to writedata that is simply named data. We can then use this argument to pass a data frame to the function's environment and do everything locally. By not making any assignment at the end, we implicitly tell the function to return it's result.
Then, in your loop, instead of simply calling writedata, we assign it's result back to baskets.df to replace the previous result.
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i] == user_data$challenge_sequence[j] &&
user_data$challenge[i] == user_data$challenge[j])
{
baskets.df <- writedata(baskets.df,
user_data$challenge_sequence[i],
user_data$challenge[i])
}
}
}
writedata=function(data, seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence = seqnn,
Challenge = challng,
countno = 1)
rbind(data, newRow)
}
I'm not sure what you're programming background is, but your loops will be very slow in R because it's an interpreted language. To get around this, many functions are vectorized (which simply means that you give them more than one data point, and they do the looping inside compiled code where the loops are fast).
With that in mind, here's what I believe will be a much faster implementation of your code
user_data=read.csv("train_5.csv")
# challenge_indices will be a matrix with TRUE at every place "challenge" and "challenge_sequence" is the same
challenge_indices <- outer(user_data$challenge_sequence, user_data$challenge_sequence, "==") &
outer(user_data$challenge, user_data$challenge, "==")
# since you don't want duplicates, get rid of them
challenge_indices[upper.tri(challenge_indices, diag = TRUE)] <- FALSE
# now let's get the indices of interest
index_list <- which(challenge_indices,arr.ind = TRUE)
# now we make the resulting data set all at once
# this is much faster, because it does not require copying the data frame many times - which would be required if you created a new row every time.
baskets.df <- with(user_data, data.frame(
Sequence = challenge_sequence[index_list[,"row"]],
challenge = challenge[index_list[,"row"]]
)

R: Iterating Over the List

I am trying to implement following algorithm in R:
Iterate(Cell: top)
While (top != null)
Print top.Value
top = top.Next
End While
End Iterate
Basically, given a list, the algorithm should break as soon as it hits 'null' even when the list is not over.
myls<-list('africa','america south','asia','antarctica','australasia',NULL,'europe','america north')
I had to add a for loop for using is.null() function, but following code is disaster and I need your help to fix it.
Cell <- function(top) {
#This algorithm examines every cell in the linked list, so if the list contains N cells,
#it has run time O(N).
for (i in 1:length(top)){
while(is.null(top[[i]]) !=TRUE){
print(top)
top = next(top)
}
}
}
You may run this function using:
Cell(myls)
You were close but there is no need to use for(...) in this
construction.
Cell <- function(top){
i = 1
while(i <= length(top) && !is.null(top[[i]])){
print(top[[i]])
i = i + 1
}
}
As you see I've added one extra condition to the while loop: i <= length(top) this is to make sure you don't go beyond the length of the
list in case there no null items.
However you can use a for loop with this construction:
Cell <- function(top){
for(i in 1:length(top)){
if(is.null(top[[i]])) break
print(top[[i]])
}
}
Alternatively you can use this code without a for/while construction:
myls[1:(which(sapply(myls, is.null))[1]-1)]
Check this out: It runs one by one for all the values in myls and prints them but If it encounters NULL value it breaks.
for (val in myls) {
if (is.null(val)){
break
}
print(val)
}
Let me know in case of any query.

Loop works outside function but in functions it doesn't.

Been going around for hours with this. My 1st question online on R. Trying to creat a function that contains a loop. The function takes a vector that the user submits like in pollutantmean(4:6) and then it loads a bunch of csv files (in the directory mentioned) and binds them. What is strange (to me) is that if I assign the variable id and then run the loop without using a function, it works! When I put it inside a function so that the user can supply the id vector then it does nothing. Can someone help ? thank you!!!
pollutantmean<-function(id=1:332)
{
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
}
Your function is missing a return value. (#Roland)
pollutantmean<-function(id=1:332) {
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
return(allfiles)
}
Edit:
Your mistake was that you did not specify in your function what you want to get out from the function. In R, you create objects inside of function (you could imagine it as different environment) and then specify which object you want it to return.
With my comment about accepting my answer, I meant this: (...To mark an answer as accepted, click on the check mark beside the answer to toggle it from greyed out to filled in...).
Consider even an lapply and do.call which would not need return being last line of function:
pollutantmean <- function(id=1:332) {
id <- str_pad(id,3,pad = "0")
direct_files <- paste0("/Users/ped/Documents/LearningR/", id, ".csv")
# READ FILES INTO LIST AND ROW BIND
allfiles <- do.call(rbind, lapply(direct_files, read.csv))
}
ok, I got it. I was expecting the files that are built to be actually created and show up in the environment of R. But for some reason they don't. But R still does all the calculations. Thanks lot for the replies!!!!
pollutantmean<-function(directory,pollutant,id)
{
#read files
allfiles<-data.frame()
id2<-str_pad(id,3,pad = "0")
direct<-paste("/Users/pedroalbuquerque/Documents/Learning R/",directory,sep="")
for (i in id2) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
#averaging polutants
mean(allfiles[,pollutant],na.rm = TRUE)
}
pollutantmean("specdata","nitrate",23:35)

Resources