How i Connect the data that's in my folder without writing the specific path like:"C:\\Users\\Dima\\Desktop\\NewData\\..." - r

I am writing a script that's Requires Data Which is in my computer folder.
But eventually this script will be used in another computer, by another person.
I can't tell him to change all the paths to the data in the script.
How i Connect the data that's in my folder without writing the specific path
Like:"C:\Users\Dima\Desktop\NewData\..."

The best way of making your code shareable depends upon your use case.
As Carl Witthoft pointed out, most code should be encapsulated in functions. These functions can then be packaged into packages and easily redistributed on other peoples's machines. Writing packages is easier than you think.
For one off analyses, scripts are appropriate. How you make them user-independent depends on who your users are. If your are sharing the script with colleagues, try to keep your data on a network drive, then the link to the data will be the same for everyone. If you are sharing your script with the world, then keep your data on the internet, and the link to the data will be a hyperlink, again, the same for everyone.
If you are sharing your script with a few people who don't have access to a common drive, and you can't put your data on the internet, then some directory manipulation is acceptable.
Change your working directory to the root of where your project files are.
setwd("c:/Users/Dima/My Project")
Then you can reference the location of the data using relative paths.
data_file <- "Data/My data file.csv"
my_data <- read.csv(data_file)
Assuming that you keep the directory structure within your project the same, then you only need to change the call to setwd on each machine.
Also note that the special location "~" refers to your user home directory. Try
normalizePath("~")
That way, if you keep your project in that location, you can avoid reference to "Dima" entirely.

Related

Relative paths in R: how to avoid my computer being set on fire?

A while back I was reading an article about improving project workflow. The advice was not to use setwd or my computer would burn:
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
I started using the here package and it worked great until I started to schedule scripts using cronR. After asking this question my laptop was again threatened with arson:
If the first line of your #rstats script is wd <- here(), I will come
into your lab and SET YOUR COMPUTER ON FIRE.
Fearing for my laptop's safety I started using the method suggested in the answer to get relative file paths:
wd <- Sys.getenv("HOME")
wd <- file.path(wd, "projects", "my_proj")
Which worked for me but not people I was working with who didn't have the same projects directory. So now I'm confused. What is the safest / best way get relative file paths so that a project can be portable?
There are quite a few options: 1, 2. My requirements are to source functions/scripts and read/write csv files. Perhaps the rprojroot package is the best bet?
Create an RStudio project and then reference all files with relative paths from the project's root folder. That way, all users will open the project and automatically have the correct working directory.
There are many ways to organize code and data for use with R. Given that the "arsonist" described in the OP has rejected at least two approaches for locating the project files in an R script, the best next step is to ask the arsonist how s/he performs this function, and adjust your code and file structures accordingly.
UPDATE: Since the "arsonists" appear to be someone who writes on Tidyverse.org (see Tidyverse article in OP) and an answer on SO (see additional links in OP), your computer appears to be relatively safe.
If you are sharing code or executing it with batch processes where the "user" is someone other than you, a useful approach is to place the code, data, and configuration under version control, and develop a runbook to explain how others can retrieve the components and execute them on another computer.
As noted in the comments to the OP, there's nothing wrong with here::here() if its use can be made reliable through documentation in a runbook.
I structure all of my R code into Projects within RStudio, which are organized into a gitrepositories directory. All of the projects can be accessed as subdirectories from the gitrepositories directory. If I need to share a project, I make the project accessible to other users on GitHub.
In my R code I reference external files as subdirectories from the project root directory, such as ./data/gen01.csv.
There are two parts to this question:
how to load data from a relative path, and
how to load code from a relative path
For most use cases (including when invoking tools from a CRON job or similar) the location of the data should either be specified by the user (via command line arguments, standard input or environment variables) or should be relative to the current working directory (getwd() in R).
… Unless the data is a fixed part of the project itself — more on this below.
Loading code from a path that’s relative to other code is simply not supported by base R. For example, source('xyz.r') won’t source an xyz.r file from the project. It will always try to load it from the current working directory, whatever that happens to be. Which is pretty much never what you want. And as you’ve noticed, the ‘here’ package also doesn’t always work.
R basically only works when code is only loaded from packages. But packages aren’t suitable for all types of projects. R has no built-in solution for those other cases. I recommend using ‘box’ modules to solve this. ‘box’ provides a modern module system for R, which means that you can have R projects consisting of multiple code files (and nested sub-projects), without having to wrap them in packages. Loading code inside the same relative path in a module is as simple as
box::use(./xyz)
This always works, as you’d expect from a modern module system, and doesn’t require ‘here’ or similar hacks.
OK, back to the point about data that’s bundled with a project itself. If your project is an R package, you’d use system.file() to load that data. However, this once again doesn’t work for non-package projects. But if you use ‘box’ modules to structure your project, you can use box::file() to load data that’s associated with a module.
Packages such as ‘here’ or ‘rprojroot’, while well-intended, are essentially hacks to work around limitations in R’s handling of non-package code. The proper solution is to make non-package code into a first-class citizen of the R world, and ‘box’ does that.
You can check docs of RSuite package (https://RSuite.io). It is working with script_path that points to currently run R script. I use it to make relative paths using 'file.path' command

Is it possible to define a cross-platform working directory for R?

I am teaching R tutorials in person w/ a large number of undergraduate R novices. I am also trying to format my notes on RPubs so that they can easily be used by other people. Nothing derails things faster than people mis-specifying working directories or saving spreadsheet files to someplace different than their working directory.
Is it possible to define a working directory that is universal across platforms? For example, a line of code or a function like
setwd( someplace that is likely to exist on every computer)
This could involve a function that finds some place the usually exists on all computers, such as the desktop, downloads folder, or R directory.
In general your best bet is to go for the user's home area,
setwd("~")
path.expand("~")
Since you are teaching novices, a common problem is that students notice the R package directory ~/R/ and assume that they should put their scripts this directory; thereby creating odd bugs. To avoid this, I would go for
dir.create("~/RCourse", FALSE)
setwd("~/RCourse")
If you use RStudio, you could get them to create an RStudio project.
In the past, I have come across situations where this doesn't work. For example, some people have their home area as a network drive, but can't connect to the internet or get through a firewall.
You asked about
someplace that is likely to exist on every computer
and yes, R works hard to ensure this returns a valid directory: tempdir().
The main danger, though, may be that this directory will vanish after the session (unless you override the default behaviour of removing the per-session temporary directory at end). Until then, it works.
Still, this can be useful. I sometimes use that to write temporary files I don't want to clutter in the current directory, or ~.
Otherwise #csgillespie gave you a good answer pertaining to $HOME aka ~.

Accessing Excel file from Sharepoint with R

am trying to write an R script that will access an Excel file that is stored on my company's Sharepoint page so that I can make a few calculations and plot the results. I've tried various ways to do this (download.file, RCurl getURL(), gdata), but I can't seem to figure out how to do this. The url is HTTPS and there should be a username and password required. I've gotten the closest with this code:
require(RCurl)
URL<-"https://companyname.sharepoint.com/sites/folder/_layouts/15/WopiFrame.aspx?sourcedoc={2DCC2ED7-1C13-4910-AFAD-4A9ACFF1C797}&file=myfile.xlsx&action=default'
f<-getURL(URL,verbose=T,ssl.verifyhost=F,ssl.verifypeer=F,userpwd="mylogin:mypw")
This seems to connect (although the username and password don't seem to matter) and returns
> f
[1] "<html><head><title>Object moved</title></head><body>\r\n<h2>Object moved to here.</h2>\r\n</body></html>\r\n"`
However, I'm not sure what to do at this point, or even if I'm on the right track. Any help will be greatly appreciated.
I use
library(readxl)
read_excel('//companySharepointSite/project/.../ExcelFilename.xlsx', 'Sheet1', skip=1)
Note, no https:, and sometimes I have to open the file first (i.e., cut and paste //companySharepointSite/project/.../ExcelFilename.xlsx into my browser's address bar)
I found that other answers did not work for me, perhaps because I am on a Mac, which obviously does not play as well with Microsoft products such as Sharepoint.
Ended up having to split it into two pieces: first download the Excel file to disk and then separately read that Excel file.
library(httr)
library(readxl)
# the URL of your sharepoint file
file_url <- "https://yoursharepointsite/Documents/yourfile.xlsx"
# save the excel file to disk
GET(file_url,
authenticate(active_directory_username, active_directory_password, "ntlm"),
write_disk("tempfile.xlsx", overwrite = TRUE))
# save to dataframe
df <- read_excel("tempfile.xlsx")
df
# remove excel file from disk
file.remove("tempfile.xlsx")
This gets the job done, though would be interested if anyone knows how to avoid the interim step of writing to disk.
N.B. Depending on your specific machine/network/Sharepoint configuration, you may also be able to just use authenticate(":",":","ntlm") per this answer.
I was unable to accomplish this using hints from answers above in R (I tried many approaches found on this site). However, just to highlight the response by #RyanBradley above and especially the response by #ZS27:
I instead had to use the OneDrive Desktop client (Windows) to allow me to sync the folder to my computer. Newer versions of SharePoint (like that found in MS Teams) have a sync button or feature in the document libraries / folders that interfaces with OneDrive.
This is the functional equivalent of mounting the folder as a network drive, so R interfaces with the file as if it was a part of your file system. Works for me.
You may need to map a network drive to the SharePoint library so that you can connect to it directly. Or if you don't want to map a network drive you could also place a shortcut to the folder in your startup folder.
Example file path:
\company_sharepoint_site\ssp\site_name\sub_site_name\library_name
Example start up folder location (Windows 10):
C:\Users\USER_NAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
Note direction of the slashes ("\" rather than "/") is important so that your file path is interpreted as a file location, not an internet browser location. By placing such a path in a network drive or as a shortcut in your startup folder your PC should connect to it when it boots.
# Load or install readxl
if(require(readxl) == FALSE){
install.packages("readxl")
if(require(readxl)== FALSE){stop("Unable to install and load readxl")}
}
# Define path to data
data_path <- "\\\\company_sharepoint_site\\ssp\\site_name\\sub_site_name\\library_name\\Example.xlsx"
# Pull data
df_employees <- read_xlsx(data_path)
I had a situation exactly like you. I want to access an excel file, available on an sharepoint site using R programming language.
I have also surfed many stuff in Internet and I didn't find anything relevant to my requirement.
Then, I have attempted the following thing:
I have made the sharepoint folder as a network drive folder, in my local system.
Then, I have accessed that excel file(in sharepoint site) from my machine without accessing web browser.
Hence, I have copied the network path, present in my system (it will be same as your sharepoint site, however it will not have https/http.
The site will start with "\" like the following: "\sharepoint.test.com\folder\path").
Launch RStudio and select Import Dataset option, under Environment section.
Choose 'From Excel'. 'Import Excel Data' form will be opened.
Under File/URL field: Paste the network path of sharepoint (copied from your machine).
Click Import, the excel file in Sharepoint will be imported in R successfully.
Ensure that the file should not have html language as input (lie %20 and all) and Backslash should be used as separator in the URL.
While importing the file, provide the input of the folder name exactly, as you see.
For example:
Sharepoint.microsoft.com - Sharepoint's Domain
Department name - name of the Folder
Project name - name of the folder
Sample.xlsx - name of the file
So, your URL to import dataset should be:
"\Sharepoint.microsoft.com\Department name\Project name\Sample.xlsx".
Thank you!
Try using the link in this format:
http://site/_layouts/download.aspx?SourceUrl=url-of-document-in-library
If above doesn't work try this syntax [note slash directions]:
"\\gov.sharepoint.com#SSL/DavWWWRoot/sites/SomePath/SomePath/SomePath/SomeFile"
See this for more info about syntax and what's going on:
Connect to a site via SSL/DavWWWRoot not usual URL? Why does this make a difference?

Perforce directory structure

In Perforce, I notice that my work-space is linked to a specific directory (location) in my local hard drive. Is it possible to change the location of this mapping for each file? For example if I have two scripts in two completely different directories locally -
C:/File1.pl & D:/File2.pl
And I want to map these 2 scripts under the same folder in perforce. Is this possible?
The root directory must be the same for all files in a single workspace.
However, you can define multiple workspaces, one which resides on your C:\ disk and one which resides on your D:\ disk.
Generally, a single workspace is used for a single project, and generally all files for a single project are located together in a single area of your workstation. I'm having trouble thinking of a scenario in which you'd want to have files be part of a single project, and yet stored in various places scattered around your workstation. Can you explain your scenario further?
There are techniques (the SUBST command, using Windows Junction Points, etc.) which can be used to create aliases for files on a different disk, but given what you've described, using multiple workspaces seems like the clearest approach to me.

Using R to save images & .csv's, can I use R to upload them to website (use filezilla to do it manually)?

First I should say that a lot of this is over my head, so I apologize in advance for using incorrect terminology and potentially asking an unclear question. I'm doing my best.
Also, I saw ThisPost; is RCurl the tool I want to use for this task?
Every day for 4 months I'll be analyzing new data, and generating .csv files and .png's that need to be uploaded to a web site so that other team members will be checking. I've (nearly) automated all of the data collecting, data downloading, analysis, and file saving. The analysis is carried out in R, and R saves the files. Currently I use Filezilla to manually upload the new files to the website. Is there a way to use R to upload the files to the web site, so that I don't have to open Filezilla and drag+drop files?
It'd be nice to run my R-code and walk away, knowing that once it finishes running, the newly saved files will be automatically be put on the website.
Thanks for any help!
You didn't specify which protocol you use to upload your files using FileZilla. I assume it is ftp. If so, you can use the ftpUpload function of RCurl:
library(RCurl)
ftpUpload("yourfile", "ftp://ftp.yourserver.foo/yourfile",
userpwd="username:passwd")
RCurl also had methods for scp and should also support sftp using ftpUpload.

Resources