Reading an Excel file into an R dataframe from a zipped folder - r

I have an Excel file (.xls extension) that is inside a zipped folder that I would like to read as a dataframe into R. I loaded the gdata library and set up my working directory to the folder that houses the zipped folder.
When I type in the following syntax:
data_frame1 <- read.xls( unz("./Data/Project1.zip","schools.xls"))
I get the following error messages:
Error in path.expand(xls) : invalid 'path' argument
Error in file.exists(tfn) : invalid 'file' argument
I'm guessing that I'm missing some arguments in the syntax, but I'm not entirely sure what else needs to be included.
Thanks for your help! This R newbie really appreciates it!

Unfortunately, after a quick survey of all the xls functions I know, there is no xls reading function that can recognize the unz output (I would love to be proven wrong here). If it were a 'csv' it would work fine. As it stands, until such a function is written, you must do the loading in two steps extraction and then loading.
To give you a little more control, you can specify which file to unzip as well as the directory to place the files with unzip.
# default exdir is current directory
unzip(zipfile="./Data/Project1.zip", files = "schools.xls", exdir=".")
dataframe_1 <- read.xls("schools.xls")
Sadly, this also means that you must do cleanup afterwards if you don't want the 'xls' file hanging around.

Related

Error in file(file, "rt") : invalid 'description' argument when reading csv files

I have seen posts about this issue before, but none exactly like the issue that I am having. This code has been working for me with previous versions of R. I recently updated my R and R Studio to versions R 4.2.1 and RStudio Desktop 2022.07.1+554, and now I am getting the subject error when I try to read in my data files. The data files all have the same filenames. I point to the top level directory and then the code goes down through the folder structure and pulls out all of the data files to be used by the rest of the program.
Also want to mention that I am not that well versed in R, so I may not be doing everything in the best manner. Any suggestions that anyone can provide would be most appreciated.
Here is my code to select the top level folder, search through those folders and then read the files which is generating the error.
wd <<- choose.dir(caption = "Select top level folder where your data is located")
setwd(wd)
#List the full path and filename of all files in the working directory and sub-directories that starts
#with "DINum" and ends with ".csv"
out_files <- list.files(pattern = "^DINum(.*)csv$", recursive = TRUE)
# initialise list to store csv files
list.data <- NULL
# create a loop to read in data
for (i in 1:length(out_files))
{
list.data[[i]]<-read.csv(out_files[i], check.names = TRUE)
}
I found a solution. I have been using read.csv from base R. I installed the readr package and tried read_csv and it is working fine. Not sure why read.csv no longer works, but from what I read online, read_csv is a better choice anyway for large data files. Thanks to everyone that tried to help me. I appreciate your time!

Confusion while uploading the csv file in R [duplicate]

I have an excel file that I want to open in R. I tried both of these commands after saving the excel file as a csv file or a text file.
read.table() or read.csv()
I think part of the problem is where the file is located. I have it saved on the desk top. What am I missing here?
Here is the R output
In file(file, "rt") :
cannot open file 'Rtrial.csv': No such file or directory
> help.search("read.csv")
> read.csv("Rtrial.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'Rtrial.csv': No such file or directory
> read.table("tab")
To throw out another option, why not set the working directory (preferably via a script) to the desktop using setwd('C:\John\Desktop') and then read the files just using file names
Try
f <- file.choose()
to choose the file interactively and save the name in f.
Then run read.csv on the saved filename
d <- read.csv(f)
Sound like you just have an issue with the path. Include the full path, if you use backslashes they need to be escaped: "C:\\folder\\folder\\Desktop\\file.csv" or "C:/folder/folder/Desktop/file.csv".
myfile = read.csv("C:/folder/folder/Desktop/file.csv") # or read.table()
It may also be wise to avoid spaces and symbols in your file names, though I'm fairly certain spaces are OK.
I had to combine Maiasaura and Svun answers to get it to work: using setwd and escaping all the slashes and spaces.
setwd('C:\\Users\\firstname\ lastname\\Desktop\\folder1\\folder2\\folder3')
data = read.csv("file.csv")
data
This solved the issue for me.
Here is one way to do it. It uses the ability of R to construct file paths based on the platform and hence will work on both Mac OS and Windows. Moreover, you don't need to convert your xls file to csv, as there are many R packages that will help you read xls directly (e.g. gdata package).
# get user's home directory
home = setwd(Sys.getenv("HOME"));
# construct path to file
fpath = file.path(home, "Desktop", "RTrial.xls");
# load gdata library to read xls files
library(gdata);
# read xls file
Rtrial = read.xls(fpath);
Let me know if this works.
Save as in excel will keep the file open and lock it so you can't open it. Close the excel file or you won't be able to use it in R.
Give the full path and escape backslashes read.csv("c:\\users\\JoeUser\\Desktop\\JoesData.csv")
I have experienced that this error occurs when you either move the excel file to the destination other than where your r file is located or when you move your r file to the destination other than where your excel file is located.
Good Practice:
Keep your .r and .csv files in the same directory.
open your .r file from getting into its directory instead of opening the r file from rstuio's open file option.
You also have import Dataset option at Environment Block, just click there and get your required packages installed & from next time use this option to read datasets. You will not get this error again.
I also appreciate the above provided answers.
Another way of reading Excel including the new format xlsx could be the package speedR (https://r-forge.r-project.org/projects/speedr/). It is an interactive and visual data importer. Besides importing you can filter(subset) the existing objects from the R workspace.
My issue was very simple, the working directory was not the "Source" directory that was printed when the file ran. To fix this, you can use getwd() and setwd() to get your relative links working, or just use a full path when opening the csv.
print(getwd()) # Where does the code think it is?
setwd("~/Documents") # Where do I want my code to be?
dat = read.csv("~/Documents/Data Visualization/expDataAnalysis/one/ac1_survey.csv") #just make it work!
MAC OS It happened to me as well. I simply chose from the R toolbar MISC and then chose Change Working Directory. I was able to choose the directory that the .csv file was saved in. When I went back to the command line and typed getwd() the full directory was updated and correct and the read.csv function finally worked.
I had the same problem and when I checked the properties of the file on file explorer, it shows me the next message:
"Security: This file came from another computer and might be blocked to help protect this computer"
You click on the "Unblock" button and... you can access to the file from R without any problem, just using read.csv() function and from the directory specified as your working directory, even if is not the same as the file’s directory you are accessing to.
I just had this problem and I first switched to another directory and then switched back and the problem was fixed.
this work for me, accesing data from root. use double slash to access address.
dataset = read.csv('C:\\Users\\Desktop\\Machine Learning\\Data.csv')
Kindly check whether the file name has an extension for example:
abc.csv
if so remove the .csv extension.
set wd to the folder containing the file (~)
data<-read.csv("abc.csv")
Your data has been read the data object
In my case this very problem was raised by wrong spelling, lower case 'c:' instead of upper case 'C:' in the path. I corrected spelling and problem vanished.
You can add absolute path to the file
heisenberg <- read.csv(file="C:/Users/tiago/Desktop/sample_100000.csv")
If really want to run something like
heisenberg <- read.csv(file="sample_100000.csv")
then you'll have to change the working directory to match the place the .CSV file is at. More about it here.

No such file or directory in R studio [duplicate]

I have an excel file that I want to open in R. I tried both of these commands after saving the excel file as a csv file or a text file.
read.table() or read.csv()
I think part of the problem is where the file is located. I have it saved on the desk top. What am I missing here?
Here is the R output
In file(file, "rt") :
cannot open file 'Rtrial.csv': No such file or directory
> help.search("read.csv")
> read.csv("Rtrial.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'Rtrial.csv': No such file or directory
> read.table("tab")
To throw out another option, why not set the working directory (preferably via a script) to the desktop using setwd('C:\John\Desktop') and then read the files just using file names
Try
f <- file.choose()
to choose the file interactively and save the name in f.
Then run read.csv on the saved filename
d <- read.csv(f)
Sound like you just have an issue with the path. Include the full path, if you use backslashes they need to be escaped: "C:\\folder\\folder\\Desktop\\file.csv" or "C:/folder/folder/Desktop/file.csv".
myfile = read.csv("C:/folder/folder/Desktop/file.csv") # or read.table()
It may also be wise to avoid spaces and symbols in your file names, though I'm fairly certain spaces are OK.
I had to combine Maiasaura and Svun answers to get it to work: using setwd and escaping all the slashes and spaces.
setwd('C:\\Users\\firstname\ lastname\\Desktop\\folder1\\folder2\\folder3')
data = read.csv("file.csv")
data
This solved the issue for me.
Here is one way to do it. It uses the ability of R to construct file paths based on the platform and hence will work on both Mac OS and Windows. Moreover, you don't need to convert your xls file to csv, as there are many R packages that will help you read xls directly (e.g. gdata package).
# get user's home directory
home = setwd(Sys.getenv("HOME"));
# construct path to file
fpath = file.path(home, "Desktop", "RTrial.xls");
# load gdata library to read xls files
library(gdata);
# read xls file
Rtrial = read.xls(fpath);
Let me know if this works.
Save as in excel will keep the file open and lock it so you can't open it. Close the excel file or you won't be able to use it in R.
Give the full path and escape backslashes read.csv("c:\\users\\JoeUser\\Desktop\\JoesData.csv")
I have experienced that this error occurs when you either move the excel file to the destination other than where your r file is located or when you move your r file to the destination other than where your excel file is located.
Good Practice:
Keep your .r and .csv files in the same directory.
open your .r file from getting into its directory instead of opening the r file from rstuio's open file option.
You also have import Dataset option at Environment Block, just click there and get your required packages installed & from next time use this option to read datasets. You will not get this error again.
I also appreciate the above provided answers.
Another way of reading Excel including the new format xlsx could be the package speedR (https://r-forge.r-project.org/projects/speedr/). It is an interactive and visual data importer. Besides importing you can filter(subset) the existing objects from the R workspace.
My issue was very simple, the working directory was not the "Source" directory that was printed when the file ran. To fix this, you can use getwd() and setwd() to get your relative links working, or just use a full path when opening the csv.
print(getwd()) # Where does the code think it is?
setwd("~/Documents") # Where do I want my code to be?
dat = read.csv("~/Documents/Data Visualization/expDataAnalysis/one/ac1_survey.csv") #just make it work!
MAC OS It happened to me as well. I simply chose from the R toolbar MISC and then chose Change Working Directory. I was able to choose the directory that the .csv file was saved in. When I went back to the command line and typed getwd() the full directory was updated and correct and the read.csv function finally worked.
I had the same problem and when I checked the properties of the file on file explorer, it shows me the next message:
"Security: This file came from another computer and might be blocked to help protect this computer"
You click on the "Unblock" button and... you can access to the file from R without any problem, just using read.csv() function and from the directory specified as your working directory, even if is not the same as the file’s directory you are accessing to.
I just had this problem and I first switched to another directory and then switched back and the problem was fixed.
this work for me, accesing data from root. use double slash to access address.
dataset = read.csv('C:\\Users\\Desktop\\Machine Learning\\Data.csv')
Kindly check whether the file name has an extension for example:
abc.csv
if so remove the .csv extension.
set wd to the folder containing the file (~)
data<-read.csv("abc.csv")
Your data has been read the data object
In my case this very problem was raised by wrong spelling, lower case 'c:' instead of upper case 'C:' in the path. I corrected spelling and problem vanished.
You can add absolute path to the file
heisenberg <- read.csv(file="C:/Users/tiago/Desktop/sample_100000.csv")
If really want to run something like
heisenberg <- read.csv(file="sample_100000.csv")
then you'll have to change the working directory to match the place the .CSV file is at. More about it here.

r function unzip error 1 in extracting from zip file

Environment:
Windows 7 OS
RStudio Version 0.99.491
I have been programming in R for about 4 months via the Coursera Data Science curriculum, but I have NEVER been successful in using the unzip function.
I've looked at the forums for hours for potential solutions, syntax problems, undefined arguments, etc., but to no avail. I eventually unzip the contents manually and proceed with the assignment, but I am tired of not knowing why it is not working.
Here are a few examples of the error:
fileName <- "StormData.zip"
unzip(fileName, exdir = mainDir,subDir)
Warning message: In unzip(fileName, exdir = mainDir, subDir) : error
1 in extracting from zip file
unzip(fileName)
Warning message: In unzip(fileName) : error 1 in extracting from zip
file
unzip(fileName, "stormdata.csv")
Warning message: In unzip(fileName, "stormdata.csv") : error 1 in
extracting from zip file
unzip(fileName, "stormdata.csv", list = TRUE)
Error in unzip(fileName, "stormdata.csv", list = TRUE) : zip file
'StormData.zip' cannot be opened
Any suggestions would be greatly appreciated.
I was getting the same error.
I changed the path --
from :
uzp <- "C:\\Users\\Sharvari\\Downloads\\rprog%2Fdata%2Fspecdata"
to
uzp <- "C:\\Users\\Sharvari\\Downloads\\rprog%2Fdata%2Fspecdata.zip"
and it works fine!
setwd("C:\\Users\\Sharvari\\Downloads")
uzp <- "C:\\Users\\Sharvari\\Downloads\\rprog%2Fdata%2Fspecdata.zip"
unzip(uzp, exdir = "C:\\Users\\Sharvari\\Desktop\\specdata")
I too was getting that error 1 message when trying to unzipping a zip file. Glitch in my case was the conflict between working directory and zip file path.
My case was:
My working directory was like "C:/Users/SCOTT/Desktop/Training"
While my zip file was located in "C:/Users/SCOTT/Desktop/Training/house_consumption_data"
When I was trying to execute this:
unzip("house_data.zip")
Possibly your file is in a different folder.
I have had the same problem trying to download and unzip the same file, for the same course. And I have had problems with unzip in the past and was determined to solve it this time too.
Eventually the extension of the file turned out to be csv.bz2. And than this Extract bz2 file in R post solved my problem.
After downloading the file I was able to read it directly with
stormdata <- read.csv("stormdata.zip")
without using unzip.
This error seems to appear whenever openXLS is unable to open the specified file.
It could be a wrong name, wrong directory or the file might be encrypted or password protected
change your zip file format this error will appear while the zip format problems occur, look at your zip file it should be "rar" change it to "zip". the function works only for "zip" format files.
I faced the same issue. Make sure that, you specify the correct name of the file(get it from the properties of .zip file) in the following code.
file = read.table(unzip("file_name.csv.zip"), sep = ",", header = TRUE)
In my case, Was just mentioning file_name.zip and R was throwing the error.
Also, there are two functions for unzipping files in R
1) unz - to extract single element from zip file/s
2) unzip - to extract all the present elements from the .zip file
I usually prefer unzip.
If you will use unz in the above code, R will throw error again.
I encountered the same error using install_course_zip' with a zip file. I followed all the instructions for the command faithfully but kept getting errors relating to the 'exdir'. I moved the zip file to various directories without success.
I finally used getwd() to get the working directory and then placed the zip file in that directory. I then was able to use the zip file name without having to use any folder structure and this worked. I still have no idea why R would not accept a different directory.
I had list of files to be unzipped and processed; I was facing same error
"error 1 in extracting from zip file"
used full directory and set working directory code worked
files <- list.files(path="C:\\Users\\Tejas naik\\Documents\\land", pattern=".zip$")
out_dir<- "C:\\Users\\Tejas naik\\Documents\\input"
setwd("C:\\Users\\Tejas naik\\Documents\\land")
for (i in files) {
#nzip(paste(out_dir,i), exdir=out_dir)
unzip(i,exdir=out_dir)
}
This error was happening bit differently in my case . As there was no zip file ,the issue was file was open in excel so this error was poping up .
It's crucial to give the full name (including the path) of the zip-file to the unzip function.
So instead of file.zip, it should be C:\user\name\file.zip.
In case you're using the list.files function, one should set the full.names option to TRUE.
For me the error is fixed after I add \ backslash character to the filepath.
Example:
from
unzip("abc\aaa.zip")
to
unzip("abc\\aaa.zip")

Convert RDA to csv

I need to convert an rda file to csv. I've tried to load it in R , but I get the following error:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file file 'data/matrix.rda', probable reason 'No such file or directory'
Here is a link to rda file (http://elisacarli.altervista.org/matrix.rda)
Thanks in advance for any suggestion
This appears to be an issue of not having the object you are trying to write out to your csv in your working environment. Did you load your .RDA file first? I was able to load your .RDA file into my R session and write it out the LDH.aap.ave object with write.csv() with no apparent problems.
I recommend you check:
What is in your current working environment? Check with ls(). Presumably, the contents of your .RDA file will not be in here. For cleanliness, maybe you want to clear your working environment first and start fresh? rm(list=ls()) will do the trick for you there.
Your current working directory with getwd()
The location of your .RDA file
Navigate to the appropriate directory if needed with setwd()
Use load("my.RDA")
Check the objects in your current working environment with ls(). I see one object in the attached .RDA file named "LDH.aap.ave"
You can check the structure of that object to make sure it was read in properly. head(), str(), summary() are your friends here.
Write our LDH.aap.ave with write.csv(LDH.aap-ave, file = "myFileName.csv")
for starters, if your data is at that url, you needs to open a connection to the url and then load the .rda file:
con <- url('http://elisacarli.altervista.org/matrix.rda')
load(con)
close(con)
if you have the file on your computer, then just:
load('[full path to file]/matrix.rda')
this should create and object called 'matrix', see what is in it by typing:
matrix
then you would use this function:
write.csv(matrix,file="mysavefile.csv")

Resources