I want to read in a pretty large csv file from S3 including entries like Hawaii 21"" pizza. However, I noticed that if I use fread (which I do prefer as it's faster), entries include two double quotes changes into Hawaii 21"""" pizza. This kind of issue does not occur if I use read.csv.
I noticed the warning message recommends to add quote="" in order to avoid the issue. But how can I insert it in s3_read_using function?
I can use gsub to make extra quotes disappear, but still wondering if there's any direct solution to it.
And below is my read-in code:
table <- s3read_using(FUN=fread, object='mytable.csv', bucket="mybucket/tables")
table <- s3read_using(FUN=read.csv, object='mytable.csv', bucket="mybucket/tables")
Thanks in advance!
Try :
table <- s3read_using(FUN=fread, quote="\"", object='mytable.csv', bucket="mybucket/tables")
Related
I have dug into rlist and purrr and found them to be quite helpful in working with lists of pre-structured data. I tried to solve the problems arising on my one to improve my coding skills - so thanks to the community of helping out! However, I reached a dead-end now:
I want to write a code which is needed to be written in a way, that we throw our excel files in xlsm format to the folder an r does the rest.
I Import my data using:
vec.files<-list.files(pattern=".xlsm")
vec.numbers<- gsub("\\.xlsm","",vec.files)
list.alldata <- lapply(vec.files, read_excel, sheet="XYZ")
names(list.alldata) <- vec.numbers
The data we call is a combination of charaters, Dates (...).
When I try to use the rlist-package everything works fine until I try to use to filter on names, which were in the excel file not a fixed entry (e.g. Measurable 1), but a reference to another field (e.g. =Table1!A1, or an Reference).
If I try to call a false element I get this failure:
list.map(list.alldata, NameWhichWasAReferenceToAnotherFieldBefore)
Fehler in eval(.expr, .evalwith(.data), environment()) :
Objekt 'Namewhichwasareferencetoanotherfieldbefore' nicht gefunden
I am quite surprised, as if I call
names(list.alldata[[1]])
I get a vector with the correct entries / names.
As I identified the read_excel() as the problem causing reason I tried to add col_names=True, but did not help. Also col_names=False calls the correct arguments into the dataset.
I assume, that exporting the data as a .csv would help, but this is not an option. Can this be easily done by r in a pree-loop?
In my concept of working assessing the data by the names is essential and there is no work around so I really appreciate your help!
I'd like to acces data from Github repositories directly from R.
When importing data, I get this error
cols(<!DOCTYPE html> = col_character()) 60 parsing failures.
¿How can I fix that? My code:
data <- read_csv(curl("https://github.com/datatto/AU25-de-Mayo/blob/master/AU_F_Properati_v2.csv")
The key, as #karthik commented, is to change the URL by replacing https://github.com/with raw.githubusercontent.com/, and skipping the blob/ part.
i.e. changing:
https://github.com/datatto/AU25-de-Mayo/blob/master/AU_F_Properati_v2.csv
to:
https://raw.githubusercontent.com/datatto/AU25-de-Mayo/master/AU_F_Properati_v2.csv
(carefully compare the URLs and you'll spot the differences)
Besides that, it seems your .csv file is formatted using ";" as the field separator and "," as the decimal separator; this is common with data in languages such as Spanish, where the comma is reserved as the decimal separator.
To properly parse the file, simply use read.csv2() or read_csv2() i.e.:
library(tidyverse)
mydata <- read_csv2("https://raw.githubusercontent.com/datatto/AU25-de-Mayo/master/AU_F_Properati_v2.csv")
I had the same problem and found a short answer here: One way is to use this line of code in R:
readr::read_csv("https://raw.github.com/user/repository/branch/file.name")
I export my CSV file with python, numbers are wrapped as ="10000000000" in cells, for example:
name,price
"something expensive",="10000000000",
in order to display the number correctly, I prefer to wrap the big number or string of numbers(so someone could open it directly without reformating the column), like order ID into this format.
It's correct with excel or number, but when I import it with R by using read.csv, cells' values show as =10000000000.
Is there any solution to this?
Thank you
how about:
yourcsv <- read.csv("yourcsv.csv")
yourcsv <- gsub("=", "", yourcsv$price)
Also, in my experience read_csv() from the tidyverse library reads data in much faster than read.csv() and I think also has more logic built into it for nonideal cases encountered, so maybe it's worth trying.
I've got a CSV file that I am reading into an R script using fread. The resulting variable is a vector, which is what I need for the next step in my process. There are values in my CSV file such as 'Energy \nElectricity', and the intention is that these will be labels for a chart, with a line break between (in this case) 'Energy' and 'Electricity' for formatting reasons.
When I manually code the vector to be
myVec <- c('Energy \nElectricity'), this works fine and the line break is maintained.
When I read the data in using fread, however, the resulting vector is effectively c('Energy \\nElectricity'), i.e. the process has inserted an extra escape character and the formatting is lost.
My question is as follows:
Is there a way to use fread to maintain these line breaks at all?
If not, can I format them differently in my csv file?
If not, can I use gsub or similar to remove the extra line break once the file has been read into a vector?
I have tried all manner of ways to implement gsub (and sub), but they either get rid of both escape characters, such as gsub("\\\\", "\\", myVec) which gives
[1] "Energy nElectricity", or they throw an error. I think I am missing something obvious. Any help appreciated.
If nobody comes up with a better solution, this is how you would clean it using gsub:
gsub("\\n", "\n", "Energy \\nElectricity", fixed = TRUE)
The fixed option ignores all regex characters and is also considerably faster than fixed = FALSE.
This code works, however, I wonder if there is a more efficient way. I have a CSV file that has a single column of ticker symbols. I then read this csv into R and apply functions to each ticker using a for loop.
I read in the csv, and then go into the data frame and pull out the character vector that the for loop needs to run properly.
SymbolListDataFrame = read.csv("DJIA.csv", header = FALSE, stringsAsFactors=F)
SymbolList = SymbolListDataFrame[[1]]
for (Symbol in SymbolList){...}
Is there a way to combine the first two lines I have written into one? Maybe read.csv is not the best command for this?
Thank you.
UPDATE
I am using the readlines method suggested by Jake and Bartek. There is a warning "incomplete final line found on" the csv file but I ignore it since the data is correct.
SymbolList <- readLines("DJIA.csv")
SymbolList <- read.csv("DJIA.csv", header = FALSE, stringsAsFactors=F)[[1]]
readLines function is the best solution here.
Please note that read.csv function is not only for reading files with csv extensions. This is simply read.table function with parameters like header or sep set differently. Check the documentation for more info.