Using read_csv with path to a file (readr's package) - r

I face a difficulty in trying to read a csv file with read_csv function from readr package using file's path.
My file ("test.csv") is located in the 'data' folder.
Data folder is located at the root of my project (working directory)
wd <- getwd()
data_path <- "data"
file.exists(file.path(wd, data_path, "test.csv")) # Returns TRUE
library(readr)
data.1 <- read_csv(file = file.path(wd, data_path, "test.csv")) # Does not work
The log provides me with the following error:
Error in withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) :
argument "x" is missing, with no default
However it works perfectly with the standard read.csv function
data.1 <- read.csv("data/mockup_data_v1.csv", header = TRUE, sep = ",") # OK
Could you please let me know how to proceed to use read_csv from readr package with file path as an argument?

As you've already set your working directory, you should be able to just read the file with:
data.1 <- read_csv("data/test.csv")
Because R looks in your working directory by default, you are in effect asking R to look in:
working directory/working directory/data/test.csv

All you need to do is add paste and you should be good to go
library(readr)
wd <- getwd()
data_path <- "data"
data.1 <- read_csv(paste(wd, data_path, "test.csv"))

Related

Read multiple “.xlsx” files

I am trying to read multiple excel files under different folders by R
Here is my solution:
setwd("D:/data")
filename <- list.files(getwd(),full.names = TRUE)
# Four folders "epdata1" "epdata2" "epdata3" "epdata4" were inside the folder "data"
dataname <- list.files(filename,pattern="*.xlsx$",full.names = TRUE)
# Every folder in the folder "data" contains five excel files
datalist <- lapply(dataname,read_xlsx)
Error: `path` does not exist:'D:/data/epidata1/出院舱随访1.xlsx'
But read_xlsx was successfully run
read_xlsx("D:/data/epidata1/出院舱随访1.xlsx")
All file directories are available in the "data" folder and why R fails to read those excel file?
Your help will much appreciated!
I dont see any point why your code shouldnt work. Make sure your folder names are correct. In your comments you write "epdata1" and your error says "epidata1".
I tried it with some csv and mixed xlsx files.
This is again what i would come up with, to find the error/typo:
library(readxl)
pp <- function(...){print(paste(...))}
main <- function(){
# finding / setting up data main folder
# You may change this to your needs
main_dir <- paste0(getwd(),"/data/")
pp("working directory:",dir_data)
pp("Found following folders:")
pp(list.files(main_dir,full.names = FALSE))
data_folders <- list.files(dir_data,full.names = TRUE)
pp("Found these files in folders:",list.files(data_folders,full.names = TRUE))
pp("Filtering *.xlsx files",list.files(data_folders,pattern="*.xlsx$",full.names = TRUE))
files <- list.files(data_folders,pattern="\\.xlsx$",full.names = TRUE)
datalist <- lapply(files,read_xlsx)
print(datalist)
}
main()

R SAS to xlsx conversion script

I am attempting to write a script that allows me to quickly convert a folder of SAS datasets into .xlsx and same them in a different folder. Here is my current code:
require(haven)
require(openxlsx)
setwd(choose.dir())
lapply(list.files(pattern="*.sas7bdat"), function(x) {
openxlsx::write.xlsx(haven::read_sas(x), path = paste0(choose.dir(),x,".xlsx"))
})
I keep getting the following error and I am not sure why:
Error in saveWorkbook(wb = wb, file = file, overwrite = overwrite) :
argument "file" is missing, with no default
Final Code (thanks #oliver):
require(haven)
require(openxlsx)
setwd(choose.dir())
lapply(list.files(pattern="*.sas7bdat"), function(x) {
openxlsx::write.xlsx(haven::read_sas(x), file = paste0(gsub("\\.sas7bdat$", "", basename(x)), ".xlsx"))
})
The problem is the write.xlsx doesn't have a path argument but instead uses a file argument. This is documented in the function as well, see help("write.xlsx"):
outdir <- choose.dir() #<== choose only directory once
lapply(list.files(pattern="*.sas7bdat"), function(x) {
# Obtain the basename of the file, without SAS extension
x_basename <- gsub('\\.sas7bdat$', '', basename(x))
# Write the file to Excel
openxlsx::write.xlsx(haven::read_sas(x),
# Use "file" instead of "path"
file = paste0(outdir, x_basename, ".xlsx"))
})

file.copy not working when called inside function

I'm trying to do a standardized directory setup through a function call. Inside this function I'm using two file.copy calls to copy some files from a selfmade package into the working directory of a project.
If I run the code line by line, everything is working fine but if I run the whole function, only the directories get created, but no files get copied. Unfortunately the function does not throw any error, so I really do not understand whats going on or where to start troubleshooting.
Maybe one of you guys can give me a hint where to find the solution.
abstract (non working) example:
dir_setup <- function() {
# list directories which shall be created
dir_names <- c("dir1", "dir2", "dir3", "dir4")
# create directories
lapply(dir_names, function(x){dir.create(path = paste(getwd(), x, sep = '/'))})
# get path of package library
lib_path <- .libPaths()
# shorten list to vector of length 1
if (length(lib_path) > 1) lib_path = lib_path[1]
# list files in source
files <- list.files(paste0(lib_path, "/package/files/dir1"), full.names = TRUE)
# copy resource files from package directory to working directory
file.copy(files, paste(getwd(), "dir1", sep = '/'), overwrite = TRUE)
# list more files
files2 <- list.files(paste0(lib_path, "/package/files/dir2"), full.names = TRUE)
# copy more files from package directory to working directory
file.copy(files2, paste(getwd(), "dir2", sep = '/'), overwrite = TRUE)
}

How can I create a data frame in R from a zip file with multiple levels located in an URL?

I have been trying to work this out but I have not been able to do it...
I want to create a data frame with four columns: country-number-year-(content of the .txt file)
There is a .zip file in the following URL:
https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/0TJX8Y/PZUURT
The file contains a folder with 49 folders in it, and each of them contain 150 .txt files give or take.
I first tried to download the zip file with get_dataset but did not work
if (!require("dataverse")) devtools::install_github("iqss/dataverse-client-r")
library("dataverse")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
get_dataset("=doi:10.7910/DVN/0TJX8Y/PZUURT", key = "", server = "dataverse.harvard.edu")
"Error in get_dataset("=doi:10.7910/DVN/0TJX8Y/PZUURT", key = "", server = "dataverse.harvard.edu") :
Not Found (HTTP 404)."
Then I tried
temp <- tempfile()
download.file("https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/0TJX8Y/PZUURT",temp)
UNGDC <-unzip(temp, "UNGDC+1970-2018.zip")
It worked to some point... I downloaded the .zip file and then I created UNGDC but nothing happened, because it only has the following information:
UNGDC
A connection with
description "/var/folders/nl/ss_qsy090l78_tyycy03x0yh0000gn/T//RtmpTc3lvX/fileab730f392b3:UNGDC+1970-2018.zip"
class "unz"
mode "r"
text "text"
opened "closed"
can read "yes"
can write "yes"
Here I don't know what to do... I have not found relevant information to proceed... Can someone please give me some hints? or any web to learn how to do it?
Thanks for your attention and help!!!
How about this? I used the zip package to unzip, but possibly the base unzip might work as well.
library(zip)
dir.create(temp <- tempfile())
url<-'https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/0TJX8Y/PZUURT'
download.file(url, paste0(temp, '/PZUURT.zip'), mode = 'wb', exdir = temp)
unzip(paste0(temp, '/PZUURT.zip'), exdir = temp)
Note in particular I had to set the mode = 'wb' as I'm on a Windows machine.
I then saw that the unzipped archive had a _MACOSX folder and a Converted sessions folder. Assuming I don't need the MACOSX stuff, I did the following to get just the files I'm interested in:
root_folder <- paste0(temp,'/Converted sessions/')
filelist <- list.files(path = root_folder, pattern = '*.txt', recursive = TRUE)
filenames <- basename(filelist)
'filelist' contains the full paths to each text file, while 'filenames' has just each file name, which I'll then break up to get the country, the number and the year:
df <- data.frame(t(sapply(strsplit(filenames, '_'),
function(x) c(x[1], x[2], substr(x[3], 1, 4)))))
colnames(df) <- c('Country', 'Number', 'Year')
Finally, I can read the text from each of the files and stick it into the dataframe as a new Text field:
df$Text <- sapply(paste0(root_folder, filelist), function(x) readChar(x, file.info(x)$size))

How to use read.csv2.sql to read zip file without unzipping it?

I am trying to read a zip file without unzipping it in my directory while utilizing read.csv2.sql for specific row filtering.
Zip file can be downloaded here :
I have tried setting up a file connection to read.csv2.sql, but it seems that it does not take in file connection as an parameter for "file".
I already installed sqldf package in my machine.
This is my following R code for the issue described:
### Name the download file
zipFile <- "Dataset.zip"
### Download it
download.file("https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip",zipFile,mode="wb")
## Set up zip file directory
zip_dir <- paste0(workingDirectory,"/Dataset.zip")
### Establish link to "household_power_consumption.txt" inside zip file
data_file <- unz(zip_dir,"household_power_consumption.txt")
### Read file into loaded_df
loaded_df <- read.csv2.sql(data_file , sql="SELECT * FROM file WHERE Date='01/02/2007' OR Date='02/02/2007'",header=TRUE)
### Error Msg
### -Error in file(file) : invalid 'description' argument
This does not use read.csv2.sql but as there are only ~ 2 million records in the file it should be possible to just download it, read it in using read.csv2 and then subset it in R.
# download file creating zipfile
u <-"https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip"
zipfile <- sub(".*%2F", "", u)
download.file(u, zipfile)
# extract fname from zipfile, read it into DF0 and subset it to DF
fname <- sub(".zip", ".txt", zipfile)
DF0 <- read.csv2(unz(zipfile, fname))
DF0$Date <- as.Date(DF0$Date, format = "%d/%m/%Y")
DF <- subset(DF0, Date == '2007-02-01' | Date == '2007-02-02')
# can optionally free up memory used by DF0
# rm(DF0)

Resources