I am new to R and trying to write some code that will loop over all the files in my a folder, and then pull in all the data associated with a certain tab. However, this tab may not be present in all of the files I have stored in this folder. To trouble shoot this, I am using a Try-Catch function, but am still running into issues.
What else do I need to do so I just loop over the data if the tab is not present, and not load it in?
This is what I have tried:
for (i in 1:nrow(filesinfolderfull_list)){
print(filesinfolder_list[i])
i_ddolv_temp <- tryCatch (
{ read_excel(filesinfolderfull_list$datafiles[i], sheet="Display-OLV Reporting",col_names=TRUE,skip=4)},
error = function(e){print("skip")}
)
templateDDOLV_df<- bind_rows(templateDDOLV_df,i_ddolv_temp)
}
I get from your explanation that some excels do not have the "Display-OLV Reporting" sheet. Why not look up first with excel_sheets and then extract if TRUE. Something like:
for (i in 1:nrow(filesinfolderfull_list)){
print(filesinfolder_list[i])
if ("Display-OLV Reporting"%in%excel_sheets(read_excel(filesinfolderfull_list$datafiles[i]){
i_ddolv_temp <- read_excel(filesinfolderfull_list$datafiles[i], sheet="Display-OLV Reporting",col_names=TRUE,skip=4)
templateDDOLV_df<- bind_rows(templateDDOLV_df,i_ddolv_temp)
}
}
You could use a variable to not read each excel twice
Related
I have 500+ .json files that I am trying to get a specific element out of. I cannot figure out why I cannot read more than one at a time..
This works:
library (jsonlite)
files<-list.files(‘~/JSON’)
file1<-fromJSON(readLines(‘~/JSON/file1.json),flatten=TRUE)
result<-as.data.frame(source=file1$element$subdata$data)
However, regardless of using different json packages (eg RJSONIO), I cannot apply this to the entire contents of files. The error I continue to get is...
attempt to run same code as function over all contents in file list
for (i in files) {
fromJSON(readLines(i),flatten = TRUE)
as.data.frame(i)$element$subdata$data}
My goal is to loop through all 500+ and extract the data and its contents. Specifically if the file has the element ‘subdata$data’, i want to extract the list and put them all in a dataframe.
Note: files are being read as ASCII (Windows OS). This does bot have a negative effect on single extractions but for the loop i get ‘invalid character bytes’
Update 1/25/2019
Ran the following but returned errors...
files<-list.files('~/JSON')
out<-lapply(files,function (fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})
Error in file(i): object 'i' not found
Also updated function, this time with UTF* errors...
files<-list.files('~/JSON')
out<-lapply(files,function (i,fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})
Error in parse_con(txt,bigint_as_char):
lexical error: invalid bytes in UTF8 string. (right here)------^
Latest Update
Think I found out a solution to the crazy 'bytes' problem. When I run readLines on the .json file, I can then apply fromJSON),
e.x.
json<-readLines('~/JSON')
jsonread<-fromJSON(json)
jsondf<-as.data.frame(jsonread$element$subdata$data)
#returns a dataframe with the correct information
Problem is, I cannot apply readLines to all the files within the JSON folder (PATH). If I can get help with that, I think I can run...
files<-list.files('~/JSON')
for (i in files){
a<-readLines(i)
o<-fromJSON(file(a),flatten=TRUE)
as.data.frame(i)$element$subdata}
Needed Steps
apply readLines to all 500 .json files in JSON folder
apply fromJSON to files from step.1
create a data.frame that returns entries if list (fromJSON) contains $element$subdata$data.
Thoughts?
Solution (Workaround?)
Unfortunately, the fromJSON still runs in to trouble with the .json files. My guess is that my GET method (httr) is unable to wait/delay and load the 'pretty print' and thus is grabbing the raw .json which in-turn is giving odd characters and as a result giving the ubiquitous '------^' error. Nevertheless, I was able to put together a solution, please see below. I want to post it for future folks that may have the same problem with the .json files not working nicely with any R json package.
#keeping the same 'files' variable as earlier
raw_data<-lapply(files,readLines)
dat<-do.call(rbind,raw_data)
dat2<-as.data.frame(dat,stringsasFactors=FALSE)
#check to see json contents were read-in
dat2[1,1]
library(tidyr)
dat3<-separate_rows(dat2,sep='')
x<-unlist(raw_data)
x<-gsub('[[:punct:]]', ' ',x)
#Identify elements wanted in original .json and apply regex
y<-regmatches(x,regexc('.*SubElement2 *(.*?) *Text.*',x))
for loops never return anything, so you must save all valuable data yourself.
You call as.data.frame(i) which is creating a frame with exactly one element, the filename, probably not what you want to keep.
(Minor) Use fromJSON(file(i),...).
Since you want to capture these into one frame, I suggest something along the lines of:
out <- lapply(files, function(fn) {
o <- fromJSON(file(fn), flatten = TRUE)
as.data.frame(o)$element$subdata$data
})
allout <- do.call(rbind.data.frame, out)
### alternatives:
allout <- dplyr::bind_rows(out)
allout <- data.table::rbindlist(out)
I have a flexdashboard which is used by multiple users. They read, modify and write the same (csv) file. I haven't been able to figure out how to do this with a SQL connection so in the meantime (I need a working app) I would like to use a simple .csv file as a database. This should be fine since the users aren't likely to work on it the exact same time and loading and writing the full file is almost instant.
My strategy is therefore:
1-load file,
2-edit (edits are done in rhandsontable which is backconverted to a dataframe)
3-save:
(a)-loads file again (to get the latest data),
(b)-appends the edits from the rhandsontable and keeps the latest data (indicated by a timestamp)
(c)-write.csv
I'm thinking I should add something in (1) such that it checks if the file is not already in use/open (because an other user is at (3). So: check if open, if not-> continue, else-> sys.sleep(3) and try again.
Any ideas about how to do this in R? In Delphi it would be something like:
if fileinuse(filename) then sleep(3) else df<-read.csv
What's the R way?
Edit:
I'm starting with the my edited answer as it's more elegant. This uses a shell command to test whether a file is available as discussed in this question:
How to check in command-line if a given file or directory is locked (used by any process)?
This avoids loading and saving the file, and is therefor more efficient.
# Function to test availability
IsInUse <- function(fName) {
shell(paste("( type nul >> ", fName, " ) 2>nul && echo available || echo in use", sep=""), intern = TRUE)=="in use"
}
# Test availability
IsInUse("test.txt")
Original answer:
Interesting question! I did not find a way to check if a file is in use before trying to write to it. The solution below is far from elegant. It relies on a tryCatch function, and on reading and writing to a file to check if it is available (which can be quite slow depending on your file size).
# Function to check if the file is in use (relies on reading and writing which is inefficient)
IsInUse <- function(fName) {
rData <- read.csv(fName)
tryCatch(
{
write.csv(rData, file=fName, row.names = FALSE)
return(FALSE)
},
error=function(cond) {
return(TRUE)
}
)
}
# Loop to check if file is in use
while(IsInUse(fName)) {
print("Still in use")
Sys.sleep(0.1)
}
# Your action here
I also found the answer to this question useful How to write trycatch in R to make sense of the tryCatch function.
I'd be interested to see if anyone else has a more elegant suggestion!
Interesting question, indeed! Curious about an elegant solution too...
Just a question... supposing I have the following R file, called script.R:
calculation1 <- function(){
# multiple calculations goes here.
}
calculation2 <- function(){
# multiple calculations goes here.
}
And I want to open this file on another file in order to use calculation1 and calculation2 multiple times in different cases.
Is that possible? I mean, something like the following.
calculation_script <- Some way, function or method that instantiate script.R in this variable.
calculation_script.calculation1()
calculation_script.calculation2()
I need to do this because I need to open script.R file in a Tableau Calculated field.
I would like to used delayedAssign to load a series of data from a set of files only when the data is needed. But since these files will always be in the same directory (which may be moved around), instead of hard coding the location of each file (which would be tedious to change later on if the directory was moved), I would like to simply make a function that accepts the filepath for the directory.
loadLayers <- function(filepath) {
delayedAssign("dataset1", readRDS(file.path(filepath, "experiment1.rds")))
delayedAssign("dataset2", readRDS(file.path(filepath, "experiment2.rds")))
delayedAssign("dataset3", readRDS(file.path(filepath,"experiment3.rds")))
return (list <- (setOne = dataset1, setTwo = dataset2, setThree = dataset3)
}
So instead of loading all the data sets at the start, I'd like to have each data set loaded only when needed (which speeds up the shiny app).
However, I'm having trouble when doing this in a function. It works when the delayedAssign is not in a function, but when I put them in a function, all the objects in the list simply return null, and the "promise" to evaluate them when needed doesn't seem to be fulfilled.
What would be the correct way to achieve this?
Thanks.
Your example code doesn't work in R, but even conceptually, you're using delayedAssign and then you immediately resolve it by referencing it in return() so you end up loading everything anyway. To make it clear, assignments are binding a symbol to a value in an enviroment. So in order for it to make any sense your function must return the environment, not a list. Or, you can simply use the global environment and the function doesn't need to return anything as you use it for its side-effect.
loadLayers <- function(filepath, where=.GlobalEnv) {
delayedAssign("dataset1", readRDS(file.path(filepath, "experiment1.rds")),
assign.env=where)
delayedAssign("dataset2", readRDS(file.path(filepath, "experiment2.rds")),
assign.env=where)
delayedAssign("dataset3", readRDS(file.path(filepath, "experiment3.rds")),
assign.env=where)
where
}
I am not a programmer and have less than a month of experience with R but have been writing simple scripts that read from external CSV files for the past week. The function below, which reads data from a CSV file, was originally more complex but I repeatedly shortened it during troubleshooting until I was left with this:
newfunction <- function(input1, input2) {
processingobject <- read.csv("processing-file.csv")
print(head(processingobject))
}
I can print both the head and the entire processingobject within the script without a problem, but after the script ends processingobject no longer exists. It never appears in the RStudio global environment pane. Shouldn't processingobject still exist after the script terminates?
The script runs without displaying any error or warning messages. I tried assigning processingobject to a second variable: processingobject2 <- processingobject but the second variable doesn't exist after the script ends either. I also tried clearing the global environment and restarting RStudio, but that did not work either. If at the prompt after the script I type processingobject I get the message "Error: object 'processingobject' not found". The CSV file itself is perfectly normal as far as I can tell.
Obviously, I must be doing something very stupid. Please help me. Thanks.
You need to use return in your function and reproducible examples are always a good idea. It makes it easier for people to help you out.
ncol<- 10
nrow<- 100
x <- matrix(runif(nrow*ncol),nrow,ncol)
write.csv(x,file="example_data.csv")
newfunction <- function(input1) {
processingobject <- read.csv("example_data.csv")
result <- apply(processingobject,2,function(x)x*input1) #doing something to csv with input
print(head(result))
return(result)
}
newcsv <-newfunction(3)