only download a file if its missing - r

I'm looking to make my code available to others to run, and they need the correct csv files to run my code.
Once they have git cloned my repo, they need to get the data
so I currently have:
u = 'https://someURL/data/RegularSeasonCompactResults.csv'
download.file(u,'RegularSeasonCompactResults.csv')
data = read.table('RegularSeasonCompactResults.csv')
However, if the user runs this the second time, it will re-download the file, even though that is not necessary.
This seems like it could be a reoccurring problem for people, so im wondering if there is a built in solution to this?

Wrap it with if(!file.exists("RegularSeasonCompactResults.csv")){ ... }

Related

write_csv() produces bank cells in msexcel

This isn't a major issue, but I still thought I would ask.
I've been cleaning some data for a project at work, and there's a point at the process where I save all of the individual files I've cleaned as a CSV in long format. I noticed that with some of the files that if I open them, some cells that SHOULD have data appear blank. If I use the "Clear All Formats" option, the data appears. It reads into R just fine and it hasn't caused any issues, but I still think it's weird.
Has anyone else run into this and if so, was there a way to resolve this without going through each column? The files I'm cleaning start out with all sorts of formatting, so I'm curious if that could be the cause. I thought that a CSV doesn't save formats though, so I'm a little confused.
Again, not the biggest deal but slightly annoying and I'll get questions about it if my colleagues ever take a look at these files.
The data is prorietary, and I'm not exactly sure how I would share it. but I'm using a pretty stragith forward write_csv(data,"path.csv")
I think I figured out the solution to this issue, and I wanted to share in case anyone else runs into this.
I'm using a Windows Computer, which needed an update. That got me thinking and I needed to update my version of RStudio. I'm not sure what would have caused this issue, but when I re-run those files, the issue appears to be resolved.

Saving workspace image, in R

When closing R Studio at the end of a R session, I am asked via a dialog box: "Save workspace image to [working directory] ?"
What does that mean? If I choose to save the workspace image, where is it saved? I always choose not to save the workspace image, are there any disadvantages to save it?
I looked at stackoverflow but did not find posts explaining what does the question mean? I only find a question about how to disable the prompt (with no simple answers...): How to disable "Save workspace image?" prompt in R?
What does that mean?
It means that R saves a list of objects in your global environment (i.e. where your normal work happens) into a file. When R next loads, this list is by default restored (at least partially — there are cases where it won’t work).
A consequence is that restarting R does not give you a clean slate. Instead, your workspace is cluttered with existing stuff, which is generally not what you want. People then resort to all kinds of hacks to try to clean their workspace. But none of these hacks are reliable, and none are necessary if you simply don’t save/restore your workspace.
If I choose to save the workspace image, where is it saved?
R creates a (hidden) file called .RData in your current working directory.
I always choose not to save the workspace image, are there any disadvantages to save it?
The advantage is that, under some circumstances, you avoid recomputing results when you continue your work later. However, there are other, better ways of achieving this. On the flip side, starting R without a clean slate has many disadvantages: Any new analysis you now start won’t be in a clean room, and it won’t be reproducible when executed again.
So you are doing the right thing by not saving the workspace! It’s one of the rules of creating reproducible R code. For more information, I recommend Jenny Bryan’s article on using R with a Project-oriented workflow
But having to manually reject saving the workspace every time is annoying and error-prone. You can disable the dialog box in the RStudio options.
The workspace will include any of your saved objects e.g. dataframes, matrices, functions etc.
Saving it into your working directory will allow you to load this back in next time you open up RStudio so you can continue exactly where you left off. No real disadvantage if you can recreate everything from your script next time and if your script doesn't take a long time to run.
The only thing I have to add here is that you should consider seriously that some people may be working on ongoing projects, i.e. things that aren't accomplished in one day and thus must save their workspace image so as to not start from the beginning again.
I think, best practice is: its ok to save your workspace, but your code only really works if you can clear your entire workspace and then rerun it completely with no errors!

Version control with RStudio: how to compare and modify files from multiple users

I am new to the use of version control in RStudio. I would like to know what is the optimal way to do this:
I have a R project, and I can give one version of the project to a coworker.
Then he will modify some files (and me too).
When he gives the folder back to me, is there is simple way to compare two files and decide which which lines go to the final version?
The usual workflow for comparing files using builtin Rstudio is through version control tools such as git or subversion.
Your process is more "I give you the content of the project folder, make whatever you want and give it back to me" then "you want to know what have changed between your copy and his".
Rstudio won't help you in this case. You'd better use merge tools such as meld, winmerge, .. that will spot the differences between files and their content and allow you to copy changes in either way (from you to him or him to you)

Impossible to remove data

I noticed that even after clearing the environment, clearing the workspace and uninstalling R, I still can't get rid of old variables that still show up.
Here is how I launch my database:
#rm(list=ls(all=TRUE))
stim<-read.table(file.choose(),header=T)
attach(stim)
names(stim)
summary(stim)
str(stim$emotionT2)
names(stim)
I tried removing the "attach(stim)" line, but then none of the newly imported dataset works.
How can I completely clear all data to make sure that I am really testing the newly imported one?
Delete any .RData files in your working directory (your home directory, if you aren't sure). If you want to be careful, just move them/rename them rather than deleting.

Maintaining same piece of code in multiple files

I have an unusual environment in a project where I have many files that are each independent standalone scripts. All of the code required by the script must be in the one file and I can't reference outside files with includes etc.
There is a common function in all of these files that does authorization that is the last function in each file. If this function changes at all (as it does now and then) it has to changed in all the files and there are plenty of them.
Initially I was thinking of keeping the authorization function in a separate file and running a batch process that produced the final files by combining the auth file with each of the others. However, this is extremely cumbersome when debugging because the auth function needs to be in the main file for this purpose. So I'd always be testing and debugging in the folder with the combined file and then have to copy changes back to the uncombined files.
Can anyone think of a way to solve this problem? i.e. maintain an identical fragment of code in multiple files.
I'm not sure what you mean by "the auth function needs to be in the main file for this purpose", but a typical Unix solution might be to use make(1) and cpp(1) here.
Not sure what environment/editor your using but one thing you can do is to use prebuild events. create a start-tag/end-tag which defines the import region, and then in the prebuild event copy the common code between the tags and then compile...
//$start-tag-common-auth
..... code here .....
//$end-tag-common-auth
In your prebuild event just find those tags, and replace them with the import code and then finish compiling.
VS supports pre-post build events which can call external processes, but do not directly interact with the environment (like batch files or scripts).
Instead of keeping the authentication code in a separate file, designate one of your existing scripts as the primary or master script. Use this one to edit/debug/work on the authentication code. Then add a build/batch process like you are talking about that copies the authentication code from the master script into all of the other scripts.
That way you can still debug and work with the master script at any time, you don't have to worry about one more file, and your build/deploy process keeps everything in sync.
You can use a technique like #Priyank Bolia suggested to make it easy to find/replace the required bit of code.
I ugly way I can think of:
have the original code in all the files, and surround it with markers like:
///To be replaced automatically by the build process to the latest code
String str = "my code copy that can be old";
///Marker end.
This code block can be replaced automatically by the build process, from one common code file.

Resources