How to transfer my files to R Projects, and then to GitHub? - r

I have 3 r scripts;
data1.r
data2.r
graph1.r
the two data files, run some math and generate 2 separate data files, which I save in my working directory. I then call these two files in graph1.r and use it to plot the data.
How can I organise and create an R project which has;
these two data files - data1.r and data2.r
another file which calls these files (graph1.r)
Output of graph1.r
I would then like to share all of this on GitHub (I know how to do this part).
Edit -
Here is the data1 script
df1 <- data.frame(x = seq(1,100,1), y=rnorm(100))
save(df1, file = "data1.Rda")
Here is the data2 script
df2 <- data.frame(x = seq(1,100,1), y=rnorm(100))
save(df2, file = "data2.Rda")
Here is the graph1 script
load(file = "data1.Rda")
load(file = "data2.Rda")
library(ggplot2)
ggplot()+geom_point(data= df1, aes(x=x,y=y))+geom_point(data= df2, aes(x=x,y=y))
Question worded differently -
How would the above need to be executed inside a project?
I have looked at the following tutorials -
https://r4ds.had.co.nz/workflow-projects.html
https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner's-guide/
https://swcarpentry.github.io/r-novice-gapminder/02-project-intro/
https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
https://chrisvoncsefalvay.com/2018/08/09/structuring-r-projects/
https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects

I have broken my answer into three parts:
The question in your title
The reworded question in your text
What I, based on your comments, believe you are actually asking
How to transfer my files to R Projects, and then to GitHub?
From RStudio, just create a new project and move your files to this folder. You can then initialize this folder with git using git init.
How would [my included code] need to be executed inside a project?
You don't need to change anything in your example code. If you just place your files in a project folder they will run just fine.
An R project mainly takes care of the following for you:
Working directory (it's always set to the project folder)
File paths (all paths are relative to the project root folder)
Settings (you can set project specific settings)
Further, many external packages are meant to work with projects, making many task easier for you. A project is also a very good starting point for sharing your code with Git.
What would be a good workflow for working with multiple scripts in an R project?
One common way of organizing multiple scripts is to make a new script calling the other scripts in order. Typically, I number the scripts so it's easy to see the order to call them. For example, here I would create 00_main.R and include the code:
source("01_data.R")
source("02_data.R")
source("03_graph.R")
Note that I've renamed your scripts to make the order clear.
In your code, you do not need to save the data to pass it between the scripts. The above code would run just fine if you delete the save() and load() parts of your code. The objects created by the scripts would still be in your global environment, ready for the next script to use them.
If you do need to save your data, I would save it to a folder named data/. The output from your plot I would probably save to outputs/ or plots/.
When you get used to working with R, the next step to organize your code is probably to create a package instead of using only a project. You can find all the information you need in this book.

Related

How read external data from different server with golem shiny app?

I am trying to add the txt file that contains let's say 50 proj, with paths outside of the package. I am attempting to use these files to get a shinyapp using golem framework.
My problem is, as much as I read on golem shiny apps, I do not understand where to add these txt files so that I can then use them for my shiny applications. NOTE: I want to work with golem framework and therefore the answer should be aligned to these request.
This is a txt file.
nameproj technology pathwork LinkPublic Access
Inside I have 50 projects with paths and links that will be used to retrieve the data for the app.
L3_baseline pooled /projects/gb/gb_screening/analyses_gb/L3_baseline/ kkwf800, kkwf900, etc..
Then I create paths to the data like this:
path_to_data1 = "data/data1.txt"
path_to_data2 = "data/data2.txt"
Then, I create helper functions. These helper functions will be used in app_server and app_ui modules. Something like the bellow.
make_path<-function(pathwork,type,ex, subfolder=""){
path<-paste0(pathwork,"/proj", type,"/",ex,"/",subfolder,"/")
return(path)
}
getfiles = function(screennames, types, pathwork){
files = data.frame()
for (ind in 1:length(screennames)){
hitfile = file.path(make_path(path_worj, types[ind], names[ind], "analysis"),"File.tsv")
if(file.exists(file)){files=rbind(files, data.frame(filename=file, screen=paste0(names[ind],"-",types[ind])))}
}
return(files)
}
Can someone direct me to:
how to actually add the txt files containing paths to external data and projects within golem framework
a clear example where these files are added within the golem
NOTES: My datasets are all within private servers within my company. Thus, all these paths are directing me to these servers. And I have no issues with accessing these datasets.
I have solve the issue by simply adding a source file, with the paths above only and run the app. It seems it is working.

How to combine multiple similar scripts into one in R?

I have 48 scripts used to clean data corresponding to 48 different tests. The cleaning protocols for each test used to be unique and test-specific, but after some time the final project guideline allows that all tests may use the same cleaning protocol granted they save all output files to the appropriate directory (each test's own folder of results). I'm trying to combine these tests into one master cleaning script that can be used by any team member to clean data as more is collected, or make small changes, given they have the raw data files and a folder for each test (that I would give to them).
Currently I have tried two approaches:
The first is to include all necessary libraries in the body of a master cleaning script, then source() each individual cleaning script. Inside each script, the libraries are the require()ed, the appropriate files are read in, and code for the files are saved to their correct destination. This method seems to work best, but if the whole script is run, some subtests are successfully cleaned and saved to their correct locations, and the rest need to be saved individually--I'm not sure why.
library(readr)
library(dplyr)
library(data.table)
library(lubridate)
source("~/SF_Cleaning_Protocol.R")
etc
.
.
The second is the save the body of the general cleaning script as a function, and then call that function in a series of if statements based on the test one wants to clean.
For example:
if (testname == "SF"){
setwd("~/SF")
#read in the csv file
subtest<- read_csv()
path_map<- read_csv()
SpecIDs<- read_csv()
CleaningProtocol(subtest,path_map,SpecIDs)
write.csv("output1.csv")
write.csv("output2.csv")
write.csv("output3.csv")
write.csv("output4.csv")
} else if (testname == "EV"){
etc
}
The code reads in and prints out files fine if selected individually, but when testname is specified and the script is run as a whole, it ignores the if statements, runs all test, but fails to print results for any.
Is there a better option I haven't tried, or can anyone help me diagnose my issues?
Many thanks.

Sourcing R files housed within an R project and maintaining relative paths

Okay, so I like to use R projects in Rstudio for scripts and data that I'm working with. However, let's say I want to source those scripts in another directory...R does not detect the .Rproj file unless the script is called from the directory where it is housed. Is there any way to source an R script that is part of an R project from another directory?
This is relevant as I have a system where I perform analyses and make figures in one directory, but then produce LaTeX documents that use those figures in another directory. I like to be able to source the R scripts that make the figures and save them to the directory where I'm writing in LaTeX.
Here's a MRE:
With an R project already created in directory (done via Rstudio)...let's call it ~/test.
Create some data:
a <- 1:10
dat <- data.frame(a = a, b = a + rnorm(length(a), 10, 2))
save(dat, file = "test.RData")
Place the following script in ~/test. Let's call it test.R.
load("test.RData")
pdf(file = "plot.pdf")
plot(b ~ a, data = dat)
dev.off()
Works great, right? But if we try the following from any other directory R can't figure it out.
cd ~
Rscript ~/test/test.R
Any thoughtful solutions? I suppose it's easy enough to just setwd() in the script that I'm sourcing the original script from, but this sort of defeats the whole purpose of using R projects.
You could use setwd("~/test/") at the beginning of the script and if necessary change it back later on.

I want to know an automated way to load the data set when the files moved to another computer?

Someone, please guide me. Suppose I choose the location of a data file using the file.choose () and load the dataset after that. Also, suppose I have sent the script+data set to a friend of mine from e-mail. when my friend downloaded the files and run the r script, he has to choose the location of the file to run the script. I want to know an automated way to load the data set when the files moved to another computer.
First, consider having a "project" directory where you have a directory for scripts and one for data. There's a 📦 called rprojroot that has filesystem helpers which will aid you in writing system independent code and will work well if you have a "project" directory. RStudio has a concept of projects & project directories which makes this even easier.
Second, consider using a public or private GitHub for this work (scripts & data). If the data is sensitive, make it a private repo and grant access as you need. If it's not, then it's even easier to share. You'll get data and code version control this way as well.
Third --- as a GitHub alternative --- consider using Keybase shared directories or git spaces. You can grant/remove access to specific individuals and they remain private and secure as well as easy to use.
These solutions will work on any computer without changing the script.
1) use current dir If you assume the data and script are in the same directory then this will work on any computer provided the user first does a setwd("/my/dir") or starts R in that directory. One invokes the script using source("myscript.R") and the script reads the data using read.table("mydata.dat"). This approach is the simplest, particularly if the script is only going to be used once or a few times and then never used again.
2) use R options A slightly more general approach is to assume that R option DATADIR (pick any name you like) contains that directory or the current directory if not defined. In the script write:
datadir <- getOption("DATADIR", ".") # use DATADIR or . if DATADIR not defined
read.table(file.path(datadir, "mydata.dat"))
Then the user can define DATADIR in their R session or in their .Rprofile:
options(DAtADIR = "/my/dir")
or not define it at all but setwd to that directory in their R session prior to running the script or start R in that directory.
This might be better than (1) if the script is going to be used over a long period of time and moved around without the data. If you put the options statement in your .Rprofile then it will help remind you where the data is if you don't use the script for a long time and lose track of its location.
3) include data in script If the script always uses the same data and it is not too large you could include the data in the script. Use dput(DF) where DF is the data frame in order to get the R code corresponding to DF and then just paste that into your script. Here is such a sample script where we used the output of dput(BOD):
DF <- structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3,
19, 16, 15.6, 19.8)), .Names = c("Time", "demand"), row.names = c(NA,
-6L), class = "data.frame", reference = "A1.4, p. 270")
plot(demand ~ Time, DF)
Of course if you always use the same data you could create a package and include the script and the data.
4) config package You could use the config package to define a configuration file for your script. That still begs the question of how to find the configuration file but config can search the current directory and all ancestors (parent dir, grandparent dir, etc.) for the config file so specification of its location may not be needed.

Where to store .xls file for xlsReadWrite in R

I am relatively new to R and am having some trouble with how to access my data. I have my test.xls file created in my MYDocuments. How to I access it from R
library(xlsReadWrite)
DF1 <- read.xls("test.xls") # read 1st sheet
Set the working directory with:
setwd("C:/Documents and Settings/yourname/My Documents")
This link may be useful as a method of making working folders per project and then placing all relevant info in that folder. It's a nice tutorial for making project files that contain everything you need. This is one approach.
http://www.dangoldstein.com/flash/Rtutorial2/Rtutorial2.html
The setwd() is another approach. I use a combination of the two in my work.

Resources