How to change the object when importing data in R - r

I have an Excel file containing a column of 10000 numbers that I wish to import into R.
However, no matter the method I use, the resulting object is either a list of 1, or 10000 obs. of 1 variable (I have used read.csv on the .csv version of the file, read_xlsx on the .xlsx version). If this is expected, how can I work these objects into ordinary arrays?
I have tried importing the same files into matlab and everything is working normally there (it's immediately an ordinary array).

If it's an excel file you might want to try the readxl package.
library("readxl")
dt <- read_excel("your_file_path")
link

Found an easy method:
convert data to a dataframe, and then convert it to an array;
my_data<-data.frame(my_data)
my_data<-data.matrix(my_data)

Related

Converting *.rds into *.csv file

I am trying to convert an *.rds file into a *.csv file. First, I am importing the file via data <- readRDS(file.rds) and next I am trying to write the CSV file via write.csv(data,file="file.csv").
However, this yields the following error:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘structure("dgCMatrix", package = "Matrix")’ to a data.frame
How can I turn the *.rds file into a *.csv file?
Sparse matrice often cannot be converted directly into a dataframe.
This answer might be very resource intensive, but it might work by converting the sparse matrix to a normal matrix first and then saving it to a csv.
Try this:
write.csv(as.matrix(data),file="file.csv")
This solution is not efficient and might crash R, so save your work prior.
As a general comment, this csv-file will be huge, so it might be more helpful to use more efficient data storage like a database engine.

Is there a way to read in a large document as a data.frame in R?

I'm trying to use ggplot2 on a large data set stored into a csv file. I used to read it with excel.
I don't know how to convert this data into a data.frame. In particular, I have a date column that has the following format: "2020/04/12:12:00". How can I get R to understand this format ?
If it's a csv, you can use:
fread function from data.table. This will be the fastest way to read your csv.
read_csv or read_csv2 (for ; delimited documents) in readr package
If it's .xls (or .xlsx) document, have a look at the readxl package.
All these functions import your data as data.frames (with additional classes like data.table for fread or tibble for read_csv).
Edit
Given your comment, it looks like your file is not an excel but a csv. If you want to convert a column type to date, assuming your dataframe is called df
df[, dates := as.POSIXct(get(colnames(df)[1]), format = "%Y/%m/%d:%H:%M")]
Note that you don't need to use cbind or even reassign the data.table because you use := operator
As the message is saying you, you don't need the extra-precision of POSIXlt
Going by the question alone, I would suggest the openxlsx package, it has helped me reduce the time significantly in reading large datasets. Three points you may find it to be helpful based on your question and the comments
The read command stays same as xlsx package, however would suggest you to use openxlsx::read.xslx(file_path)
the arguments are again same, but in the place of sheetIndex it is sheet and it takes only numbers
If the existing columns are converted to character, then a simple as.Date would work

Problem with export data from excel to R (not interpreting data as numbers)

I have the following problem with exporting data from excel, and importing them into R.
My data in excel contains "," so excel interprets them as numbers.
But when I have this data in R I have "." so R interprets them as text instead of numbers.
For example,
In excel I have 12,765 and in R I have 12.765
Do you have any idea how can I fix this ?
I use the following code to import the data file into R:
library(openxlsx)
read.xlsx("pk.xlsx")
I suggest using the nice package readxl from the tidyverse (https://readxl.tidyverse.org/) :
library(readxl)
read_excel("pk.xlsx")
Your should find everything you need with the tidyverse to import different format of files (readr for all csv, tsv and so on... ).

How to preserve empty rows when importing xlsx files into R with read.xlsx?

I'd like to be able to import xlsx files of varying lengths into R. I'm currently using the function read.xlsx from R's xlsx package to import the xlsx files into R, and unfortunately it drops empty rows. Is there a way that I can import every row of an xlsx file up until the last row with content without dropping empty rows?
That package has not been updated since 2014 (CRAN, though it looks like there has been some work in 2017 at https://github.com/dragua/xlsx), I suggest either readxl or openxlsx:
readxl::read_excel("file_with_blank_row.xlsx")
openxlsx::read.xlsx("file_with_blank_row.xlsx", skipEmptyRows=FALSE)
As noted by r2evans, both readxl and openxlsx have options to turn off skipping of empty rows. However, regardless of those switches, they will silently drop leading empty rows.
openxlsx doesn't seem to offer the possibility of altering that behaviour.
readxl has a range parameter that will indeed keep all empty rows. This is necessary if you're hoping to edit the same Excel file in very specific locations.
You need to have something like readxl::read_excel("path_to_your.xlsx", range = cell_limits(c(1, NA), c(NA,NA)). Using NA for all values apparently causes the function to revert to default and drop leading empty rows.
Try this:
library("readxl")
my_data <- read_xlsx("file_with_blank_row.xlsx")

Using R to write a .mat file not giving the right output?

I had a .csv file that I wanted to read into Octave (originally tried to use csvread). It was taking too long, so I tried to use R to workaround: How to read large matrix from a csv efficiently in Octave
This is what I did in R:
forest_test=read.csv('forest_test.csv')
library(R.matlab)
writeMat("forest_test.mat", forest_test_data=forest_test)
and then I went back to Octave and did this:
forest_test = load('forest_test.mat')
This is not giving me a matrix, but a struct. What am I doing wrong?
To answer your exact question, you are using the load function wrong. You must not assign it's output to a variable if you just want the variables on the file to be inserted in the workspace. From Octave's load help text:
If invoked with a single output argument, Octave returns data
instead of inserting variables in the symbol table. If the data
file contains only numbers (TAB- or space-delimited columns), a
matrix of values is returned. Otherwise, 'load' returns a
structure with members corresponding to the names of the variables
in the file.
With examples, following our case:
## inserts all variables in the file in the workspace
load ("forest_test.mat");
## each variable in the file becomes a field in the forest_test struct
forest_test = load ("forest_test.mat");
But still, the link you posted about Octave being slow with CSV files makes referece to Octave 3.2.4 which is a quite old version. Have you confirmed this is still the case in a recent version (last release was 3.8.2).
There is a function designed to convert dataframes to matrices:
?data.matrix
forest_test=data.matrix( read.csv('forest_test.csv') )
library(R.matlab)
writeMat("forest_test.mat", forest_test_data=forest_test)

Resources