Multiple if statements in a for loop - R - r

I am trying to create a for loop that goes through each URL in (recipe_urls) and if it contains a certain domain, a function I wrote for scraping the recipe off the website will run. I have never tried nesting if statements within a for loop before, so I think that might be the root of the problem.
for (recipe_url in recipe_urls) {
if ("allrecipes.*" %rin% recipe_urls == "TRUE"){
scrape_allrecipes(recipe_url)
}
if("foodnetwork.*" %rin% recipe_urls == "TRUE"){
scrape_allrecipes(recipe_url)
}
}
Note: %rin% is a custom function to return true or false if a partial match (e.g., "allrecipes" is present within the URL):
`%rin%` = function (pattern, list) {
vapply(pattern, function (p) any(grepl(p, list)), logical(1L), USE.NAMES = FALSE)
}
Interestingly, the allrecipes one works but duplicates everything twice (I have it set to print in a .txt file). However the foodnetwork does not seem to run.
Individually, this works to return recipes from URL:
recipes <- for (recipe_url in recipe_urls) {
scrape_allrecipes(recipe_url)
}

In my infinite wisdom, I accidentally was calling the same function in the second if statement. Works now, oops!

Related

Create a series of new folders in for loop in R

I have create a small script that passes a vector through a loop. In this loop I am using an if else statement to check if folder exists and if not to create the folder. However, I am getting error: Error in file.exists(i) : invalid 'file' argument. This has to due with file.exist(). I dont understand why this isnt ok. I check the man using help. Seems like this should be working.
folders<- c("RawData", "Output", "BCV", "DEplots", "DEtables", "PathwayOuts", "VolcanoPLots")
for(i in 1:length(folders)){
if (file.exists(i)){
cat(paste0(i, "already exists"))
} else {
cat(paste0(i, "does not exists"))
dir.create(i)
}
}
You are looping over an index (that is, 1:length(folders) is just the vector 1:7, not the values of the folders vector itself. The easiest solution is to loop over the vector itself:
for (i in folders) {
Or, if you still want to loop over the index:
for (i in 1:length(folders)) {
if (file.exists(folders[i])){
cat(paste0(folders[i], "already exists"))
}
else {
cat(paste0(folders[i], "does not exists"))
dir.create(folders[i])
}
}
A quick tip: if you are debugging a for-loop, the place to start is to add print(i) at the start of the loop. You would have immediately seen the problem: i was an integer, not the first value of the vector.

Loop works outside function but in functions it doesn't.

Been going around for hours with this. My 1st question online on R. Trying to creat a function that contains a loop. The function takes a vector that the user submits like in pollutantmean(4:6) and then it loads a bunch of csv files (in the directory mentioned) and binds them. What is strange (to me) is that if I assign the variable id and then run the loop without using a function, it works! When I put it inside a function so that the user can supply the id vector then it does nothing. Can someone help ? thank you!!!
pollutantmean<-function(id=1:332)
{
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
}
Your function is missing a return value. (#Roland)
pollutantmean<-function(id=1:332) {
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
return(allfiles)
}
Edit:
Your mistake was that you did not specify in your function what you want to get out from the function. In R, you create objects inside of function (you could imagine it as different environment) and then specify which object you want it to return.
With my comment about accepting my answer, I meant this: (...To mark an answer as accepted, click on the check mark beside the answer to toggle it from greyed out to filled in...).
Consider even an lapply and do.call which would not need return being last line of function:
pollutantmean <- function(id=1:332) {
id <- str_pad(id,3,pad = "0")
direct_files <- paste0("/Users/ped/Documents/LearningR/", id, ".csv")
# READ FILES INTO LIST AND ROW BIND
allfiles <- do.call(rbind, lapply(direct_files, read.csv))
}
ok, I got it. I was expecting the files that are built to be actually created and show up in the environment of R. But for some reason they don't. But R still does all the calculations. Thanks lot for the replies!!!!
pollutantmean<-function(directory,pollutant,id)
{
#read files
allfiles<-data.frame()
id2<-str_pad(id,3,pad = "0")
direct<-paste("/Users/pedroalbuquerque/Documents/Learning R/",directory,sep="")
for (i in id2) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
#averaging polutants
mean(allfiles[,pollutant],na.rm = TRUE)
}
pollutantmean("specdata","nitrate",23:35)

R code does not work when called from function

HI i just started learning R and finding this problem to be really interesting where I just run a code directly without wrapping in a function it works but when I place it inside a function it doesn't work, What can be possible reason?
fill_column<-function(colName){
count <- 0
for(i in fg_data$particulars) {
count <- count +1
if(grepl(colName, i) && fg_data$value[count] > 0.0){
fg_data[,colName][count] <- as.numeric(fg_data$value[count])
} else {
fg_data[,colName][count] <- 'NA'
}
}
}
fill_column('volume')
Where I am creating new column named volume it this string exists in particulars column.
I have added a comment where solution given by another question does not work for me, Please look at my comment below.
Finally I got it working but reading another answer on SO, here is the solution:
fill_column <- function(colName){
count <- 0
for(i in fg_data$particulars) {
count <- count +1
if(grepl(colName, i) && fg_data$value[count] > 0.0){
fg_data[,colName][count] <- as.numeric(fg_data$value[count])
} else {
fg_data[,colName][count] <- 'NA'
}
}
return(fg_data)
}
fg_data = fill_column('volume')
Now reason, Usually in any language when we modify global object inside any function it reflects on global object immediately but in R we have to return the modified object from function and then assign it again to global object to see our changes. or another way for doing this is to assign local object from within the function to global context using envir=.GlobalEnv.

data.table and R's ellipsis (i.e. '...'): pass by reference does not seem to work

Im trying to manipulate a large data table (~37 MB) but in a special way: for other (unrelated) reasons I have implemented a 'hook' like structure meaning that the overall process is like
1) load the data.table from disk
2) fire a certain hook
3) the hook structure looks for this name ans checks whether the user (=me :)) has bound a function to this hook and if so, it is called
4) the data is processed further
The functions look like this:
data = readRDS(pathToFile)
data = data.table(data)
fireHook("After_data_read", data, [some other parameters])
some_more_processing(data)
and the region around fireHook looks like
hooksRegistered = list(
"After_data_read" = function(data, ...) {
# do some stuff
}
)
fireHook = function(hookName, ...) {
for (hookNameRegistered in names(hooksRegistered)) {
if (hookName == hookNameRegistered) {
func = .global.hooksRegistered[[hookName]]
func(hookName, ...)
}
}
}
Observe that one needs to cast an object that already is a data.table into it again (otherwise the pass-by-reference does not work), see Adding new columns to a data.table by-reference within a function not always working and Pass by reference bug?
Problem: this line: func(hookName, ...) takes like forever (> 5 minutes).
The debugger never really gets into the function (so its not the code in the function that takes a long time) and I've tested it with small data.tables and it worked. Also, I noted that the following seems to work:
fireHook = function(hookName, ...) {
args = list(...)
for (hookNameRegistered in names(.global.hooksRegistered)) {
if (hookName == hookNameRegistered) {
func = .global.hooksRegistered[[hookName]]
func(hookName, args)
}
}
}
(notice that I substituted ... by list(...)). To me, it seems as if R is trying to copy the whole table when using .... Is this right/desired? Or am I using it wrong?
regards,
FW

how do you know which functions in R are flagged for debugging?

I've been using debug() more often now, but sometimes I wonder which functions have been flagged for debugging. I know that you can use isdebugged() to find out if a particular function is flagged. But is there a way for R to list all the functions that are being debugged?
This is convoluted, but it works:
find.debugged.functions <- function(environments=search()) {
r <- do.call("rbind", lapply(environments, function(environment.name) {
return(do.call("rbind", lapply(ls(environment.name), function(x) {
if(is.function(get(x))) {
is.d <- try(isdebugged(get(x)))
if(!(class(is.d)=="try-error")) {
return(data.frame(function.name=x, debugged=is.d))
} else { return(NULL) }
}
})))
}))
return(r)
}
You can run it across all your environments like so:
find.debugged.functions()
Or just in your ".GlobalEnv" with this:
> find.debugged.functions(1)
function.name debugged
1 find.debugged.functions FALSE
2 test TRUE
Here I created a test function which I am debugging.
Unless you wanted to get into something like writing a function to fire everything through isdebugged(), I don't think you can.
In debug.c, the function do_debug is what checks for the DEBUG flag being set on an object. There are only three R functions which call the do_debug C call: debug, undebug and isdebugged.
which(sapply(lsf.str(), isdebugged))

Resources