Convert columns into rows when importing .csv - r

I'm looking for an efficient way to use read.csv (or an alternative) when reading a .csv file that has 100,000s columns and virtually a single row.
The file with this structure is extracted from MATLAB which seems to prefer to add millions of rows rather than columns. When opening the file in Excel it does not load completely so I cannot simply transpose.
The following works in r, still slowly, but I'm wondering if there is a better way?
library(data.table)
dfr <- as.data.frame(t(fread('filename.csv')))

If there is only a single row, we can read it with scan and convert to data.frame
data.frame(Col=scan('filename.csv', skip=1, what=numeric(), sep=','))

Related

import multiple csv from web into one data frame

i want to read out several csv files from the web and save the data into a data frame.
If the files were on my computer this would be very easy as I have seen but I don't always want to download the files.
The example:
"https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv"
These are the CSV files. maybe its possible with a function or a loop but i dont know how.
Maybe you can help me.
Greetings
Reading files from the web is just as easy as reading them from your file system; you can just pass a URL instead of a file-path to readr::read_csv() (you tagged your question with readr so I assume you want to use that).
Assuming your files are in a vector:
files <- c("https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv")
You can use readr::read_csv to read a specific file, and combine them into one data-frame with purrr::map_dfr:
df <- purrr::map_dfr(files, readr::read_csv)
This iterates over the contents of files, applies readr::read_csv to each of those elements, and combines them into one data frame, rowwise (hence dfr).

Is there a way to read in a large document as a data.frame in R?

I'm trying to use ggplot2 on a large data set stored into a csv file. I used to read it with excel.
I don't know how to convert this data into a data.frame. In particular, I have a date column that has the following format: "2020/04/12:12:00". How can I get R to understand this format ?
If it's a csv, you can use:
fread function from data.table. This will be the fastest way to read your csv.
read_csv or read_csv2 (for ; delimited documents) in readr package
If it's .xls (or .xlsx) document, have a look at the readxl package.
All these functions import your data as data.frames (with additional classes like data.table for fread or tibble for read_csv).
Edit
Given your comment, it looks like your file is not an excel but a csv. If you want to convert a column type to date, assuming your dataframe is called df
df[, dates := as.POSIXct(get(colnames(df)[1]), format = "%Y/%m/%d:%H:%M")]
Note that you don't need to use cbind or even reassign the data.table because you use := operator
As the message is saying you, you don't need the extra-precision of POSIXlt
Going by the question alone, I would suggest the openxlsx package, it has helped me reduce the time significantly in reading large datasets. Three points you may find it to be helpful based on your question and the comments
The read command stays same as xlsx package, however would suggest you to use openxlsx::read.xslx(file_path)
the arguments are again same, but in the place of sheetIndex it is sheet and it takes only numbers
If the existing columns are converted to character, then a simple as.Date would work

read_csv does not work separate commas and not capture separate rows

I am trying to parse a text log file like this, I can use the default read.csv to parse this file.
test <- read.csv("test.txt", header=FALSE)
It separated all comma parts, though not perfectly put in a dataframe, further manipulation can be done to improve.
However, I can not seem to do so using readr package
test <- read_csv("test.txt", header=FALSE)
All observations turn into 1 row, no separation between commas.
I am learning this package so any help would be great.
{"dev_id":"f8:f0:05:xx:db:xx","data":[{"dist":[7270,7269,7269,7275,7270,7271,7265,7270,7274,7267,7271,7271,7266,7263,7268,7271,7266,7265,7270,7268,7264,7270,7261,7260]},{"temp":0},{"hum":0},{"vin":448}],"time":4485318,"transmit_time":4495658,"version":"1.0"}
{"dev_id":"f8:xx:05:xx:d9:xx","data":[{"dist":[6869,6868,6867,6871,6866,6867,6863,6865,6868,6869,6868,6860,6865,6866,6870,6861,6865,6868,6866,6864,6866,6866,6865,6872]},{"temp":0},{"hum":0},{"vin":449}],"time":4405316,"transmit_time":4413715,"version":"1.0"}
{"dev_id":"xx:f0:05:e8:da:xx","data":[{"dist":[5775,5775,5777,5772,5777,5770,5779,5773,5776,5777,5772,5768,5782,5772,5765,5770,5770,5767,5767,5777,5766,5763,5773,5776]},{"temp":0},{"hum":0},{"vin":447}],"time":4461316,"transmit_time":4473307,"version":"1.0"}
{"dev_id":"xx:f0:xx:e8:xx:0a","data":[{"dist":[4358,4361,4355,4358,4359,4359,4361,4358,4359,4360,4360,4361,4361,4359,4359,4356,4357,4361,4359,4360,4358,4358,4362,4359]},{"temp":0},{"hum":0},{"vin":424}],"time":5190320,"transmit_time":5198748,"version":"1.0"}
Thanks to #Dave2e pointing out that this file is in JSON format, I found the way to parse it using ndjson::stream_in.

How to handle parsing hundreds files in R?

I want to parse last year weather data which are recorded in CSV files. Each CSV file includes one day of data. So I have 365 CSV files need to parse. What is the best way to handle these files? As far as I know, I need to load all of them into R and bind them into one big data frame. But I don't know whether this is the best solution. What if I have more than one years of data files? Do I need to load all of them into memory? Or is there any other way to handle them?
Each file is about 1M to 1.5M.
The easiest way to do this is to get all your files to read using list.files, read them into a list of data frames, then rbind all the frames together:
#setwd('dirwithallmycsvs')
x <- list.files(pattern = '.+\\.csv$')
out = lapply(x, read.csv)
out2 = do.call(rbind, out)
Your output should now be one dataframe. You will need to take care all the columns are the same across your files.

Large csv file fails to fully read in to R data.frame

I am trying to load a fairly large csv file into R. It has about 50 columns and 2million row.
My code is pretty basic, and I have used it to open files before but none this large.
mydata <- read.csv('file.csv', header = FALSE, sep=",", stringsAsFactors = FALSE)
The result is that it reads in the data but stops after 1080000 rows or so. This is roughly where excel stops as well. Is their way to get R to read the whole file in? Why is it stopping around half way.
Update: (11/30/14)
After speaking with the provider of the data it was discovered that they may have been some corruption issue with the file. A new file was provided which also is smaller and loads into R easily.
As, "read.csv()" read up to 1080000 rows, "fread" from library(data.table) should read it with ease. If not, there exists two other options, either try with library(h20) or with "fread" you can use select option to read required columns (or read in two halves, do some cleaning and can merge them back).
You can try using read.table and include the parameter colClasses to specify the type of the individual columns.
With your current code, R will read all data first as strings and then check for each column if it is convertible e. g. to a numeric type, which needs more memory than reading right away as numeric. colClasses will also allow you to ignore columns you might not need.

Resources