Writing and loading expression sets to and from csv files - r

In R, one can write a bioconductor ExpressionSet into a csv file using csv.write. For example, using the standard bladderbatch data available as a bioconductor package the following code writes a csv file to the current working drectory:
library("bladderbatch")
data("bladderdata")
write.csv(bladderEset, "bladderEset.csv")
Is there a tool which can read the produced csv file back into R as an ExpressionSet?
If not, is there an ExpressionSet ↔ csv serialiser/deserialiser, which can both output ExpressionSets as csv files and read csv files as ExpressionSets?
The reason I'm asking is because I need to interact with ExpressionsSets with python and java code, and I can easily work with "csv" files, but not with ".rda", ".CEL" or other binary files.

If you just wanted to interact with the data using R and python, consider saving the ExpressionSet as a feather object.
https://github.com/wesm/feather

The comment from #Nathan Werth is what I think you are looking for. By calling readExpressionSet you can easily read in a CSV file as an ExpressionSet.
First, write out the CSV file as your initial code:
library("bladderbatch")
data("bladderdata")
write.csv(bladderEset, "bladderEset.csv")
Then read it back in:
temp <- Biobase::readExpressionSet("bladderEset.csv")
> class(temp)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"

Related

Converting RData to CSV file returns incorrect CSV file

I do not have any expertise on R and I have to convert RData files to CSV to analyze the data. I followed the following links to do this: Converting Rdata files to CSV and "filename.rdata" file Exploring and Converting to CSV. The second option seemed to be a simpler as I failed to understand the first one. This is what I have tried till now and the results along with it:
>ddata <- load("input_data.RData")
>print(ddata)
[1] "input_data"
> print(ddata[[1]])
[1] "input_data"
> write.csv(ddata,"test.csv")
From the first link I learnt that we can see the RData type and when I did str(ddata) I found out that it is a List of size 1. Hence, I checked to see if print(ddata[[1]]) would print anything apart from just "input_data". With the write.csv I was able to write it to a csv without any errors but it has just the following 2 lines inside the CSV file:
"","x"
"1","input_data"
Can you please help me understand what am I doing wrong and show a way to get all the details in a csv?
The object ddata contains the name of the object(s) that load() created. Try typing the command ls(). That should give you the names of the objects in your environment. One of them should be input_data. That is the object. If it is a data frame (str(input_data)), you can create the csv file with
write.csv(input_data, "test.csv")

Converting *.rds into *.csv file

I am trying to convert an *.rds file into a *.csv file. First, I am importing the file via data <- readRDS(file.rds) and next I am trying to write the CSV file via write.csv(data,file="file.csv").
However, this yields the following error:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘structure("dgCMatrix", package = "Matrix")’ to a data.frame
How can I turn the *.rds file into a *.csv file?
Sparse matrice often cannot be converted directly into a dataframe.
This answer might be very resource intensive, but it might work by converting the sparse matrix to a normal matrix first and then saving it to a csv.
Try this:
write.csv(as.matrix(data),file="file.csv")
This solution is not efficient and might crash R, so save your work prior.
As a general comment, this csv-file will be huge, so it might be more helpful to use more efficient data storage like a database engine.

How to write data into a macro-enabled Excel file (write.xlslx corrupts my document)?

I'm trying to write a table into a macro-enabled Excel file (.xlsm) through the R. The write.xlsx (openxlsx) and writeWorksheetToFile (XLconnect) functions don't work.
When I used the openxlsx package, as seen below, the resulting .xlsm files ended up getting corrupted.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
write.xlsx(Input_Files[[i]], Inputs[i], sheetName="Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Corrupted .xlsm file error message after write.xlsx:
Excel cannot open the file 'xxxxx.xslm' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file
After researching this problem extensively, I found that the XLConnect connect package offers the writeWorksheetToFile function which works with .xlsm, albeit after running it a few times it yields an error message that there is no more free space. It also runs for 20+ minutes for tables with approximately 10,000 lines. I tried adding xlcFreeMemory at the beginning of the for loop, but it doesn't solve the issue.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
xlcFreeMemory()
writeWorksheetToFile(Inputs[i], Input_Files[[i]], "Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Could anyone recommend a way to easily and quickly transfer an R table into an xlsm file without corrupting it?

Error: Invalid: File is too small to be a well-formed file - error when using feather in R

I'm trying to use feather (v. 0.0.1) in R to read a fairly large (3.5 GB) csv file with 21178665 rows and 16 columns.
I use the following lines to load the file:
library(feather)
path <- "pp-complete.csv"
df <- read_feather(path)
But I get the following error:
Error: Invalid: File is too small to be a well-formed file
There's no explanation in the documentation of read_feather so I'm not sure what's the problem. I guess this function expects a different file form but I'm not sure what that would be.
Btw, I can read the file with read_csv in readr library but it takes a while.
The feather file format is distinct from a CSV file format. They are not interchangeable. The read_feather function cannot read simple CSV files.
If you want to read CSV files quickly, your best bets are probably readr::read_csv or data.table::fread. For large files, it will still usually take a while just to read it from disc.
After you've loaded the data into R, you can create a file in the feather format with write_feather so you can read it with read_feather the next time.

Read a zipped .csv file in R

I have been trying hard to solve this, but I cannot get my head around how to read zipped .csv files in R. I could first unzip the files and then read them, but since the amount of unzipped data is around 22GB, I guess it is more practical to handle zipped files.
I basically have many .csv files, which I ZIPPED ONE BY ONE into single .7z files. Every file is named like: file1.csv, file2.csv, etc., which zipped became respectively: file1.csv.7z, file2.csv.7z, etc.
If I use the following command:
data <- read.table(unz("substn-20100101.csv.7z", "substn-20100101.csv"), nrows=10, header=T, quote="\"", sep=",")
I get the message:
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") : cannot open zip file 'substn-20100101.7z'
Any help would be much appreciated, thank you in advance.
First of all if your problem is RAM, as you said each file has 22G, using compressed files won't resolve your problems. After read.table, for example, all file will be loaded in memory. If you are using these files to some kind of modeling i advise you to look at ff and bigmemory packages.
Another solution is use Revolutions R that has an academic licence and you can use for free. Revolutions R provides Big Data capabilities and you can manage this files easily with packages like revoscaleR.
Even another solution is using Postgres + MADLib + PivotalR. After ingesting data at Postgres, use PivotalR package to access that data and do models with MADLib library, directly from R console.
BUT, if you are planing something that be done with chunks of data, summary for example, you can use the package iterators. I will provide an use case to show how this can be done. Get Airlines data, 1988, and follow this code:
> install.packages('iterators')
> library(iterators)
> con <- bzfile('1988.csv.bz2', 'r')
OK, now you have a connection to your file. Let's create an iterator:
> it <- ireadLines(con, n=1) ## read just one line from the connection (n=1)
Just to test:
> nextElem(it)
and you will see something like:
1 "1988,1,9,6,1348,1331,1458,1435,PI,942,NA,70,64,NA,23,17,SYR,BWI,273,NA,NA,0,NA,0,NA,NA,NA,NA,NA"
> nextElem(it)
and you will see the next line, and so on. Be aware that you are reading a line at a time, so you are not loading all the file to RAM.
If you want to read line by line till the end of the file you can use
> tryCatch(expr=nextElem(it), error=function(e) return(FALSE))
for example. When the file ends it return a logical FALSE.
If I understand the question correctly, at least on Windows OS, you could use 7-Zip Command-Line.
For the sake of simplicity put 7za.exe in your R working directory (and your 7zip files), create .bat file with the following text in it:
"7za e *.7z -y"
...than in R you run the following code:
my_batch <- "your_bat_file_name.bat"
shell.exec(shQuote(paste(my_batch), type = "cmd"))
Than you just read.table()...
It works for me.
According to the readr package documentation, readr::read_csv and fellows will automatically unzip files ending in .gz, .bz2, .xz, or .zip. Although .7z is not mentioned, perhaps a solution is to change to one of these compression formats and then use readr (which also offers a number of other benefits). If your data is compressed with zip, your code would be:
library(readr)
data <- read_csv("substn-20100101.csv.zip", n_max=10)

Resources