Importing a file only if it has been modified since last import and then save to a new object - r

I am trying to create a script that I run about once a week. The goal is that it will go out and check an MS Excel file that a co-worker manages. It then tests to see if the date the file was modified is newer then the last time it was imported. If it is newer, it will import the file (I am using readxl package - WONDERFUL!) into a new object that is a named with the date the original Excel file was last modified included in the object name. I have everything working except for the assignment of the imported data.frame to a new object that includes the date.
An example of the code I am using is:
First I create an object with a path pointing to the file of interest.
pfdFilePath <- file.path("H:", "3700", "3780", "002-00", "3.
Project Information", "Program", "RAH program.xls")
after testing to verify the file has been modified, I have tried simple assignment ("test" is just an example for simplification):
paste("df-", as.Date(file.info(pfdFilePath)$mtime), sep = "") <- "test"
But that code produces an error:
Error in paste("df-", as.Date(file.info(pfdFilePath)$mtime), sep = "") <- "test" :
target of assignment expands to non-language object
I then try the assign function:
assign(paste("df-", as.Date(file.info(pfdFilePath)$mtime), sep = ""), "test")
Running this code creates an object that looks to be okay, but when I evaluate it, or try using str() or class() I get the following error:
Error in df - df-2016-08-09 :
non-numeric argument to binary operator
I am pretty sure this is an error that has to do with the environment I am using assign, but being relatively new to R, I cannot figure it out. I understand that the assign function seems to be frowned upon, but those warnings seem to centered on for-loops vs. lapply functions. I am not really iterating within a function though. Just a dynamically named object whenever I run a script. I can't come up with a better way to do it. If there is another way to do this that doesn't require the assign function, or a better way to use assign function , I would love to know it.
Thank you in advance, and sorry if this is a duplicate. I have spent the entire evening digging and can't derive what I need.

Abdou provided the key.
assign(paste0("df.", "pfd.", strftime(file.info(pfdFilePath)$mtime, "%Y%m%d")), "test01")
I also converted to the cleaner paste0 function and got rid of the dashes to avoid confusion. Lesson learned.
Works perfectly.

Related

How to I create a data frame from inbuilt data set 'iris'?

I am a beginner at using Rstudio and have been working through the exercises outlined as part of our course notes.
We are to work with the 'iris' dataset however I haven't been able to successfully save it as a valid data.frame. The best I have done is created an empty dataframe in the global environment with 0 obs. of 0 variables.
Here is some of the codes I have worked through and the outputs. I am very new to R and am struggling a little with using inbuilt data sets in terms of loading and using - I am ok with importing and creating however.
data()
> View(iris)
> iris<-write.csv(iris)
""
> iris
NULL
> str(iris)
NULL
> iris<-data.frame(iris)
> iris<-read.csv(iris.csv)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
object 'iris.csv' not found
> library(datasets)
> data.frame(iris)
data frame with 0 columns and 0 rows
I have tried > write.csv(iris, 'iris.csv') # no luck
First check if iris is already a data.frame by running the following command:-
is.data.frame(iris)
If the answer is TRUE, then run in the following command to write it to a .csv file:-
write.csv(iris, "/location/at/which/you/want/to/save/the/file)
If one wants to save objects as R objects, one has to use save() and the file extension has to be .RData. Like in you case you can run the following command:-
save(iris, file = '/location/iris.RData'
And you can load an .RData file with the load() function in R. In your case it could be :-
load('/location/iris.RData')
Some mistakes that you've made:-
In your second code line where you run
> iris<-write.csv(iris)
you've just provided write.csv with it's first argument called x, but never specified the second argument it requires which is file. And also, one never assign write.csv() function with it's arguments to an object with the help of <- because write.csv() is a function which does not returns a value or an object. Another example of a function like write.csv() could be library().
So the way you code flows is you worte a wrong syntax for by running the following line
> iris<-write.csv(iris)
and hence, you got a NULL object. And the str of a NULL object is itself NULL.
Then you created a data.frame objects by passing iris as a data object, but since earlier iris became a NULL object, data.frame of a NULL object is NULL. Since there never was an iris.csv file written, R won't be able to read it too.
Also in your read.csv() function, you passed the file argument as a data object and not as a path. This is why you got the error object 'iris.csv' not found and not as cannot open file 'iris.csv': No such file or directory. To pass it as a path you should always put the location of your file in quotes, either single or double.
If you ever you don't understand how you have to pass objects in a function, please run the command ?function_name, for example ?write.csv(), ?library, ?read.csv. This will provide you with documentation on the function. It will also provide you with usage examples.
I hope this helps.

Error in eval(expr, envir, enclos) : object 'score' not found

We have always been an SPSS shop but we're trying to learn R. Been doing some basics. I am trying to do a simple t-test, just to learn that code.
Here's my code, and what's happening:
Code screenshot
I don't get why it says "score" isn't found. It's in the table, and the read.csv code defaults to assuming the first row contains headers. I can't see why it's not "finding" the column for score. I'm sure it's something simple, but would appreciate any help.
You didn't store your imported csv file in a variable. It printed to the console because it had nowhere else to go - it just gets printed and then thrown away. You need to assign it for it to be saved in memory:
my_data_frame <- read.csv("ttest.csv")
Now your data exists in the variable my_data_frame and you can use it by supplying it as the data argument:
t.test(score ~ class, mu=0, alt="two.sided", conf=0.95, var.eg=F, paired=F, data=my_data_frame)
Also, in general, I would recommend using read_csv from the package readr over the default read.csv - it's faster.
Finally, when you ask questions, please provide your code as text, not a picture. You should also provide your data or a toy dataset - you can use the function dput to print code that will create your data, or just provide the csv file or some code that creates toy data.

rxDataStep in RevoScaleR package crashing

I am trying to create a new factor column on an .xdf data set with the rxDataStep function in RevoScaleR:
rxDataStep(nyc_lab1
, nyc_lab1
, transforms = list(RatecodeID_desc = factor(RatecodeID, levels=RatecodeID_Levels, labels=RatecodeID_Labels))
, overwrite=T
)
where nyc_lab1 is a pointer to a .xdf file. I know that the file is fine because I imported it into a data table and successfully created a the new factor column.
However, I get the following error message:
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERROR: The sample data set for the analysis has no variables.
What could be wrong?
First, RevoScaleR has some warts when it comes to replacing data. In particular, overwriting the input file with the output can sometimes causes rxDataStep to fail for unknown reasons.
Even if it works, you probably shouldn't do it anyway. If there is a mistake in your code, you risk destroying your data. Instead, write to a new file each time, and only delete the old file once you've verified you no longer need it.
Second, any object you reference that isn't part of the dataset itself, has to be passed in via the transformObjects argument. See ?rxTransform. Basically the rx* functions are meant to be portable to distributed computing contexts, where the R session that runs the code isn't be the same as your local session. In this scenario, you can't assume that objects in your global environment will exist in the session where the code executes.
Try something like this:
nyc_lab2 <- RxXdfData("nyc_lab2.xdf")
nyc_lab2 <- rxDataStep(nyc_lab1, nyc_lab2,
transforms=list(
RatecodeID_desc=factor(RatecodeID, levels=.levs, labels=.labs)
),
rxTransformObjects=list(
.levs=RatecodeID_Levels,
.labs=RatecodeID_Labels
)
)
Or, you could use dplyrXdf which will handle all this file management business for you:
nyc_lab2 <- nyc_lab1 %>% factorise(RatecodeID)

R: How far does it go? (Plus venting)

I have an object called defaultPacks, containing the names of packages installed on all the computers I use. Much abbreviated:
defaultPacks <- c(
"AER",
"plyr",
"dplyr"
)
I want to save this object to file in a shared directory all of them can reach. I am using Dropbox for this, with sync always paused when R is running.
save(defaultPacks,
file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata"))
Then I want to load the object and install the packages the names of which are in the object defaultPacks.
SyncPacks <- function(fileString){
defaultPacks <- load(file=fileString)
install.packages(defaultPacks, repos="http://cran.us.r-project.org")
}
SyncPacks(file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata")
If I do this, I get a warning:
Warning in install.packages: package ‘defaultPacks’ is not available (for R version 3.2.1)
I look what is in defaultPacks immediately after I load and assign it: the string "defaultPacks". So it seems to loading just be a string rather than an object.
So I go back to my save, and try
save(get(defaultPacks), file.path(etc.))
This gives me an different error:
Error in save(get("defaultPacks"), file = file.path("C:", "Users", "andrewH", :
object ‘get("defaultPacks")’ not found.
Then I tried dynGet() with the same result.
So where before it was treating a symbol as a string, now it is treating a function as a string.
So I try the list option for save:
save(list = defaultPacks, file = file.path(etc))
And get yet another error:
Error in save(list = defaultPacks, file = file.path("C:", "Users", "andrewH", :
objects ‘AER’, ‘plyr’, ‘dplyr’, (etc.) not found
So where before I couldn't get to my character vector, now I am shooting right past it, evaluating defaultPacks to find the strings, and then treating each string as a symbol, and evaluating it to its (nonexistent) object.
So, I want to know how to make this work. But I am asking for something more than that. I have this problem, or an analogous problem, all the time. After several years of using R, I still have it a couple of times a week. I don't know how many steps of evaluation R is going to take on any given occasion. I hand a function an object name, and the function treats it as a string. I hand a function a string, and the R function converts it to a symbol and tries to evaluate it. Here, I don't understand why the save function does not save the object I gave it, and then give it back with load.
I've read the discussions on scoping in ten different R books, from Chambers "Software for Data Analysis" to Wickham's "Advanced R." Twice. Four times in some cases. I know about the four environments of a function, and the difference between the call stack and the chain of environmental parents. And yet, it is clear that I am missing something basic. It is not just that I don't know why save does not take a name in its ... argument and save it as an object (unless the problem is at the load end). I don't know how I can know. The function description says, of the ...s, "the names of the objects to be saved (as symbols or character strings)." So why is it saving a name as a string? Or why is load returning a string, if save saved an object? And how could I predict that?
Experienced R programmers, I know you can tell in advance how a given R function is going to treat one of its arguments. You know how far it will be evaluated. You can make it go as far as you want it to, and then STOP. You don't have to write str()'s into your functions every time you want to figure out what the heck it thinks its arguments mean. How do you do it?
Bloody "R Inferno". It's an understatement.
One way of seeing the problem is to note that the value of defaultPacks changes from before to after these operations.
> fname = tempfile()
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> save(defaultPacks, file=fname)
> defaultPacks = load(fname)
> identical(orig, defaultPacks)
[1] FALSE
The problem starts with an understanding of what save() does. From ?save, the object that is saved is named defaultPacks and it has value c("AER", "plyr", "dplyr"). save() could save multiple objects, each with a name and associated value, so it somehow has to save the name of each object.
load() restores the objects that save() has written, and returns (from ?load) a "character vector of the names of objects created". In this case load() restores (creates in the global environment) the symbol defaultPacks, populates it with the character vector of default packages, and returns the name (i.e., character vector of length 1 "defaultPacks") of the object(s) it has restored. The return value then overwrites the restored value, and we have defaultPacks = "defaultPacks".
install.packages doesn't do anything fancy with it's first argument, which from ?install.packages is a "character vector of the names of packages whose current versions should be downloaded". The character vector happens to be the symbol defaultPacks, but the error comes from the value of the symbol, which is the character vector "defaultPacks".
save() and load() more or less have to work the way they do to support multiple objects. On the other hand saveRDS() and readRDS() (ok, why read instead of load?) have a contract to save a single object. The name of the saved object does not need to be stored to be able to recover the values associated with it. So saveRDS(defaultPacks, fname); defaultPacks = readRDS(fname) works, and in particular the value of defaultPacks before and after this series of operations remains unchanged.
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> saveRDS(defaultPacks, fname)
> defaultPacks = readRDS(fname)
> identical(orig, defaultPacks)
[1] TRUE
Without meaning to be too much of a jerk, the answer to the question "Experienced R programmers...how do you do it?" the answer is implied by the ? above -- by carefully reading the manual. Also, there are not that many places in base R code where evaluation is non-standards -- formulas and library are the main culprits -- so recognizing what the problem is not can help to focus on what is actually going on.

getting the name of a dataframe from loading a .rda file in R

I am trying to load an .rda file in r which was a saved dataframe. I do not remember the name of it though.
I have tried
a<-load("al.rda")
which then does not let me do anything with a. I get the error
Error:object 'a' not found
I have also tried to use the = sign.
How do I load this .rda file so I can use it?
I restared R with load("al.rda) and I know get the following error
Error: C stack usage is too close to the limit
Use 'attach' and then 'ls' with a name argument. Something like:
attach("al.rda")
ls("file:al.rda")
The data file is now on your search path in position 2, most likely. Do:
search()
ls(pos=2)
for enlightenment. Typing the name of any object saved in al.rda will now get it, unless you have something in search path position 1, but R will probably warn you with some message about a thing masking another thing if there is.
However I now suspect you've saved nothing in your RData file. Two reasons:
You say you don't get an error message
load says there's nothing loaded
I can duplicate this situation. If you do save(file="foo.RData") then you'll get an empty RData file - what you probably meant to do was save.image(file="foo.RData") which saves all your objects.
How big is this .rda file of yours? If its under 100 bytes (my empty RData files are 42 bytes long) then I suspect that's what's happened.
I had to reinstall R...somehow it was corrupt. The simple command which I expected of
load("al.rda")
finally worked.
I had a similar issue, and it was solved without reinstall R. for example doing
load("al.rda) works fine, however if you do
a <- load("al.rda") will not work.
The load function does return the list of variables that it loaded. I suspect you actually get an error when you load "al.rda". What exactly does R output when you load?
Example of how it should work:
d <- data.frame(a=11:13, b=letters[1:3])
save(d, file='foo.rda')
a <- load('foo.rda')
a # prints "d"
Just to be sure, check that the load function you actually call is the original one:
find("load") # should print "package:base"
EDIT Since you now get an error when you load the file, it is probably corrupt in some way. Try this and say what it prints:
file.info("a1.rda") # Prints the file size etc...
readBin("a1.rda", "raw", 50) # reads first 50 bytes from the file
Without having access to the file, it's hard to investigate more... Maybe you could share the file somehow (http://www.filedropper.com or similar)?
I usually use save to save only a single object, and I then use the following utility method to retrieve that object into a given variable name using load, but into a temporary namespace to avoid overwriting existing objects. Maybe it will be helpful for others as well:
load_first_object <- function(fname){
e <- new.env(parent = parent.frame())
load(fname, e)
return(e[[ls(e)[1]]])
}
The method can of course be extended to also return named objects and lists of objects, but this simple version is for me the most useful.

Resources