Read .rds file from specific git commit - r

(This is not a duplicate of Reading Rds file from git)
Is there a way of reading an .rds file in R (or any other file) from a specific git commit without having to checkout the commit nor create a temporary file (not from GitHub, but for example from a bare or non-bare repo saved locally or on a server)?
I tried the following, but not surprisingly it does not work (assuming you have a git repo with some commits and a file a.rds):
b <- readRDS(system("git show 9358:a.rds"))
> Error in readRDS(system("git show 9358:a.rds")) :
invalid 'description' argument

Had the same problem with a text file a few days ago. I solved by saving the file (in the system call) before reading it. I guess in your case it would be something in the lines of:
# Target file output
output_dest <- "~/path/to/output.rds"
# Output stdout to file (in this case 3 commits before HEAD)
system(sprintf('show HEAD~3:your_repo/file.rds > %s', output_dest))
readRDS(output_dest)

Related

Downloading and unzipping GitHub zipped files directly in R

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?
You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

Read open excel file in R

is there a way to read an open excel file into R?
When an excel file is open in Excel, Excel puts a lock on the file, such as the reading method in R cannot access the file.
Can you circumvent this lock?
Thanks
Edit: this occurs under windows with original excel.
I too do not have problem opening xlsx files that are already open in excel, but if you do i have a workaround that might work:
path_to_xlsx <- "C:/Some/Path/to/test.xlsx"
temp <- tempdir()
file.copy(path_to_xlsx, to = paste0(temp, "/test.xlsx"))
df <- openxlsx::read.xlsx(paste0(temp, "/test.xlsx"))
This copies the file (Which should not be blocked) to a temporary directory, and then loads the file from there. Again, i'm not sure if this is needed, as i do not have the problem you have.
You could try something like this using the ps package. I've used it on Windows and Mac to read from files that I had downloaded from some web resource and opened in Excel with openxlsx2, but it should work with other packages or programs too.
# get the path to the open file via the ps package
library(ps)
p <- ps()
# get the pid for the current program, in my case Excel on Mac
ppid <- p$pid[grepl("Excel", p$name)]
# get the list of open files for that program
pfiles <- ps_open_files(ps_handle(ppid))
pfile <- pfiles[grepl(".xlsx", pfiles$path),]
# return the path to the file
sel <- grepl("^(.|[^~].*)\\.xlsx", basename(pfile$path))
path <- pfile$path[sel]
What do you mean by "the reading method in R", and by "cannot access the file" (i.e. what code are you using and what error message do you get exactly)? I'm successfully importing Excel files that are currently open, with something like:
dat <- readxl::read_excel("PATH/TO/FILE.xlsx")
If the file is being edited in Excel, R imports the last saved version.
EDIT: I've now tried it on both Linux and Windows and it still works, at least with version 1.3.1 of 'readxl'.

Set wd in RStudio

I am creating a series of r scripts that will be used by multiple people, meaning that the working directory of files used and stored will differ. There are two folders, one for the R code, called "rcode," and another to store the generated outputs, called "data". These two folders will always be shared in tandem. To accommodate for the changing working directory I created a "global" script that has the following lines of code and resides in the "rcode" folder:
source_path = rstudioapi::getActiveDocumentContext()$path
setwd(dirname(source_path))
swd_data <- paste0("..\\data\\")
The first line gets the source path of the global script. The second line makes this the working directory. The third line essentially tells the script to store an output in the "data" folder, which has the same path as the "rcode" folder. So to read in a csv file within the "data" folder I write:
old_total_demand <- read.csv(paste0(swd_data, "boerne_total_demand.csv"))
When I use this script on my Windows laptop it works beautifully, but when I use it on my Mac I get the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '..\data\demand\boerne_total_demand.csv': No such file or directory
Would anyone have any idea why this would be? Thanks in advance for the help.
I'm not sure what systems your collaborators will be using, but you may run into issues due to differences between Window/Mac/Linux with regards to how paths are written. I suggest you create a R Project .Rprj using RStudio and save that in your directory that contains subdirectories for data and rcode, and share the entire project directory.
/Project_dir/MyProject.Rprj
/Project_dir/data/
/Project_dir/rcode/
Then from the R project opened through RStudio you should be able to directly refer to your data by:
data <- read.csv("data/boerne_total_demand.csv")
The working directory will always be where your .Rproj is stored, so you can avoid having to setwd as it causes lots of chaos when sharing and collaborating with others.
I have this code from my current script at hand.
I hope you like it !
path <- dirname(getActiveDocumentContext()$path)
setwd(path)
swd_path <- paste0(path,"/data/")
if(!dir.exists(swd_path)){
dir.create(swd_path)
}
old_total_demand <- read.csv(paste0(swd_data, "boerne_total_demand.csv"))

Why is the working directory overwritten to the directory of the current Rmd file?

I have an R.proj file called Food_Choices.Rproj that is supposed to be setting my working directory to ~/Desktop/Food_Choices, a folder containing reproducibility files according to the TIER system like
But it's not setting the working directory properly, because when I knit my processing file with code like this
food<-read_csv("Original_Data/food_coded.csv")
#imagine some processing code in between here
write.csv(food, file = "Analysis-Data/analysis_data.csv")
I get this error:
Error: 'Original_Data/food_coded.csv' does not exist in current working directory ('/Users/IdanCarre/Desktop/Food_Choices/Command_Files').
Which is not the project directory, it's the directory of the processing file!
I thought I set the working directory when I opened the files in the context of the R project, but that doesn't seem to be happening anymore (even though my files from a year ago with the same setup still work??)
NOTE: I don't want to use
library(knitr)
opts_knit$set(root.dir = '/Users/IdanCarre/Desktop/Food_Choices')
Because then new users who want to reproduce the results have to go manually insert their own directory into each file they want to run. That's a lot of work they shouldn't have to do.
UPDATE TO COMMENTS:
I used the here package, and that works satisfactorily for read.csv (it throws a data column de-duplication warning but I think it's probably okay for now), but when I write out the processed data file to the analysis data folder, I'm trying to use
write.csv(food, file = here("Analysis-Data", "analysis_data.csv"))
And the error I get is
Error in file(file, ifelse(append, "a", "w")) : cannot open the connection
I get this same problem if I use
write.csv(food, file = "Analysis-Data/analysis_data.csv")

R error when using untar

I'm running a script with input parameters that are referenced in the code to automate the directory creation, download of file and untar of file. I would be fine with unzip, however this particular file I want to analyze is .tar.gz. I manually unpacked and it was tar.gz, unpacked to .tar file. Would that be the problem?
Full error: Error in untar2(tarfile, files, list, exdir) : unsupported entry type ‘’
Running Windows 10, 64 bit, R set to: [Default] [64-bit] C:\Program Files\R\R-3.2.2
Script notes one solution found (issues, lines 28-31), but I don't really understand it.
I did install 7-zip on my computer, restart and of course restart R:
`#DOWNLOADING AND UNZIPPING TAR FILE
#load required packages.
#If there is a load package error, use install.packages("[package]")
library(dplyr)
library(lubridate)
library(XML) # HTML processing
options(stringsAsFactors = FALSE)
#Set directory locations, data file and fetch data file from internet
#enter full url including file name between ' ' marks
mainDir<-"C:/R/BEES/"
subDir<-"C:/R/BEES/Killers"
Fetch<-'http://dds.cr.usgs.gov/pub/data/nationalatlas/afrbeep020_nt00218.tar.gz'
ArchFile<-basename(Fetch)
download.file<-(ArchFile)
#Check for file directories and create if directory if it doesn't exist
if(!file.exists(mainDir)){dir.create(mainDir)}
if(!file.exists(subDir)){dir.create(subDir)}
#set the working directory
setwd(file.path(subDir))
#check if file exists and download if it doesn't exist.
if(!file.exists(ArchFile))
{download.file (url=Fetch,destfile=ArchFile,method='auto')}
#unpack and view file list
untar(path.expand(ArchFile),list=TRUE,exdir=subDir,compressed="gzip")
list.files(subDir)
#Error: Error in untar2(tarfile, files, list, exdir) :
# unsupported entry type ‘’
#Need solution to use tar/untar app
#instructions here: https://stevemosher.wordpress.com/step-10-build/`
Appreciate feedback - I've been lurking around StackOverflow for some time to use other people's solutions.

Resources