R: data.table fread on zip containing multiple files - r

What is an OS-agnostic solution to use fread on a zip? I can't seem to find one.
Setting up the stage
Let's create two dataframes, write them to disk, and put them in a zip archive (I stole this from: How to zip multiple CSV files in R?)
library(zip)
df1 <- head(mtcars)
df2 <- head(iris)
write.csv(df1, 'df1.csv')
write.csv(df2, 'df2.csv')
zip(zipfile='df.zip', files=list.files(path = getwd(), pattern = ".csv$"))
Now I want to read this zip into R
Let's say I want to read df1.csv from the zip.
fread('df.zip/df1.csv')
Error in fread("df.zip/df1.csv") : File 'df.zip/df1.csv' does not
exist or is non-readable
I tried this from fread() of file from archive
fread('unzip -p df.zip/df1.csv')
Null data.table (0 rows and 0 cols)
Warning message:
In fread("unzip -p df.zip/df1.csv") :
File '/var/folders/w5/kqy78qb17v176195dtyyc4pj40000gn/T//RtmpIlNSk8/filee1693cc7f89'
has size 0. Returning a NULL data.table.
I do not understand what it is trying to import, but clearly not my dataframe of interest.
Can you help?
Edit 1
Unzipping first is not really an option. In practice, I am working with batches of highly compressible files. Usually ~ 3000 xls files, each 1M rows. 100 Gb unzipped / 8 Gb zipped. Needless to say it would be much more comfortable to read directly from the zip!

Having unzip installed, this solution works on my computer :
fread(cmd = 'unzip -p df.zip df1.csv')
V1 mpg cyl disp hp drat wt qsec vs am gear carb
1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Related

How to import multiple .csv files at once? Error: unexpected symbol in "View(X.csv"

I want to import multiple CSV simultaneously in R and I don’t want to merge them. I want them to be quickly accessible to do ICC after. I tried the code below. When I double click on each file in the global environment in Rstudio and error appear in the console Error: unexpected symbol in "View(X.csv". I tried other methods, but they didn’t give the result I wanted, or they didn’t solve my problem.
The 1243 CSV files are in the same folder (i.e., Tableau des features).
The name's files are 16.CSV, 17.CSV …,1257.CSV, 1258.CSV
All the files have the same structure which consists of a data.frame of 5 columns and 83 rows.
library(data.table)
setwd("/Users/T/Desktop/Rstudio/Tableau des features/")
temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))
It's not quite clear how you'd like to return the formatted data. Assuming you're trying to return another data frame you can store all the files in a list and merge them together with bind_rows from {dplyr} package.
library(dplyr)
files <- list.files(path = "data/", pattern="*.csv")
data <- lapply(files, function(x) {
read.csv(paste0("data/", x))
})
data <- bind_rows(data)
head(data)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2020-07-28 by the reprex package (v0.3.0)

Save object to RData using eval(parse(text=...)))

I am able to write an object to a .csv file using eval(parse(text=...))) but cannot save to an .RData file. Why is that? Any suggested workarounds?
# Assign value to variable name (in my function this variable name changes)
varName <- "test"
assign(x=varName,value=mtcars)
# Check variable exists
head(test)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Save .csv
write.csv(eval(parse(text = varName)), file=paste0(varName, ".csv") # works
# Save .RData
save(eval(parse(text = varName)), file=paste0(varName, ".RData")) # doesn't work
> Error in save(eval(parse(text = varName)), file = paste0(varName, ".RData")) :
object ‘eval(parse(text = varName))’ not found
The answer, aside from fortune(106) is to investigate: execute
eval(parse(text = varName))
to see what class the returned object is. Compare with what save requires.
That will show, as the other comments and answer does/did that you need a character object for that argument in save .
I think the easiest way to do this is to use
save(list = varName, file = paste0(varName, ".RData"))
It saves having to get (or mget) the variable(s), as save effectively does it for you.

How to recover property from table into dataframe in R? [duplicate]

I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames

Upload csv file to Google Drive from R

I have a shared folder in google drive. I'll use this link as an example:
https://drive.google.com/drive/u/1/folders/1on07liV24xKCVpcWkOJEu6Ci8Lmcl9hi
I have a script in r like below:
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
##Work Directory?
write_csv(mtcars, 'mtcars_dataset.csv')
How do I set my work directory to be this shared folder?
I attempted to use the googledrive package but I could only find a way to access files rather than folders.
You can set your working directory manually by
setwd("C:/Users/*user name*/Google Drive/*shared folder*")
If you are using googledrive package, there is the function drive_upload. You can take a look on the documentation.
Basically you need to specify your file, the path to upload and a name for the file.
I hope it's clear!

What is about the first column in R's dataset mtcars?

I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames

Resources