I've been searching for similar problems but I can't find anything helpful.
I'm trying to open a portion of a big csv file with
#choosing a certain number of variables from more than 250 available in the file
resources<-c("P13_2_1","P13_3_1","P13_2_2",...)
v <- fread("file.csv", select = resources, header = TRUE, encoding = "UTF-8"
After the file is opened, wherever there shoul be NA there's blank cells. However, when I try to see whats in any of the blank cells, i see this
v$P13_2_1[2]
[1] "\r"
Similarly, the header of every column seems fine in the viewer of R Studio but when I try to see them in the console, there's the same \r attached.
The problem is present using both, read.csv and fread and I've tried to modify the quote and na.string arguments.
I would like to get rid of the "\r" and posibly subtitute it with NA
I'm trying to work with a file that is saved as a .csv file but is actually ; deliminated. The decimal points are commas.
Example of a row:
SAA1;6,022367813;10,9403136;5,807354922;3,169925001;3,807354922;8,636624621;5,247927513;5,459431619;9,09011242;4,247927513;4,087462841;5,247927513;4,584962501;11,17492568;4,754887502;6,857980995;7,409390936;7,499845887;8,224001674;10,19967234;9,638435914;4,700439718;6,14974712;2,807354922;0;7,348728154;4,700439718;6,820178962;4,700439718;6,044394119;1,584962501;6,044394119;6,375039431;3,807354922;9,087462841;8,74819285;5,614709844;8,330916878;6,62935662;5,169925001;6,442943496;2,321928095;8,312882955;9,240791332;2,807354922;9,06608919;6,539158811;5,64385619;4,584962501;6,700439718;6,108524457;7,539158811;6,658211483;8,982993575;5,285402219;8,744833837
I need to read this data into R and then work with it as numbers where decimal points are "."
Here's what I've tried:
read.csv2("filename.csv", row.names=1, sep=";",dec=",")
This almost worked. Most of the numbers were correctly read in with periods. However all the numbers in certain columns remained separated by commas. I tried to fix this with:
temp<-sub(",", ".", data)
However, this did not quite work. It truncated several of the numbers and completely corrupted other ones. I have no idea why.
I've also tried opening the file in Sublime text. I found and replaced all commas with periods. This again worked for the majority of the data, but several numbers again became corrupted.
I've also tried reading in the file without changing the comma delimited nature, writing it period deliminated and then reading it in again.
temp<-read.csv2("filename.csv", row.names=1, sep=";")
write.csv2(temp, "filename_edited", sep = ";", dec=".", row.names = TRUE, col.names = TRUE)
temp2 <- read.csv2("filename_edited", sep=";", row.names=1)
This also didn't work. (I'm not surprised, I was getting desperate.)
What am I doing wrong? How can I fix it?
A common issue is related to trailing white space before or after the numbers (e.g. " 342,5", instead of "342,5"). Have you tried using the strip.white=TRUE parameter, like:
read.csv2("filename.csv", row.names=1, sep=";", strip.white=TRUE)
If you otherwise pre-process the data, trimws() may also be useful in this context.
I have run into some problems while importing a pipe delimited file. The file consistently delimits but something is getting in the way of R reading some of the delimiters while parsing. R reads in 10 columns when there should be 11, even though the appropriate number of pipes are in place.
A very small sample of the data can be found here: https://drive.google.com/file/d/1ek6-H5EWKCaPfDTfB2muqYBjJz1fM3pf/view
dat <- read_delim("~/Desktop/foo.txt", delim = "|", col.names = TRUE)
I've tried playing around with how R treats the quotes... quote = "/"" did nothing to help and ignoring the quotes with quote = "" made an even bigger mess of the import.
Any thoughts on how to fix the problem?
Feel free to use fread() in data.table package as below.
library(data.table)
FOO3<-fread("~/Downloads/foo.txt",sep = "|",fill = T)
Below is the import dataset I got.
I am trying to read in a file in R, using the following command (in RStudio):
fileRaw <- read.csv(file = "file.csv", header = TRUE, stringsAsFactors = FALSE)
file.csv looks something like this:
However, when it's read into R, I get:
As you can see LOCATION is changed to ï..LOCATION for seemingly no reason.
I tried adding check.names = FALSE but this only made it worse, as LOCATION is now replaced with LOCATION. What gives?
How do I fix this? Why is R/RStudio doing this?
There is a UTF-8 BOM at the beginning of the file. Try reading as UTF-8, or remove the BOM from the file.
The UTF-8 representation of the BOM is the (hexadecimal) byte sequence
0xEF,0xBB,0xBF. A text editor or web browser misinterpreting the text
as ISO-8859-1 or CP1252 will display the characters  for this.
Edit: looks like using fileEncoding = "UTF-8-BOM" fixes the problem in RStudio.
Using fileEncoding = "UTF-8-BOM" fixed my problem and read the file with no issues.
Using fileEncoding = "UTF-8"/encoding = "UTF-8" did not resolve the issue.
I'm trying to read a giant DF cbc.read.table:
my.df <- cbc.read.table("df.csv",sep = ";", header =F)
This is what I get:
Error in cbc.read.table("2012Q2.csv", sep = "|", header = F) :
No rows to read
The wd is set correctly. Inprinciple it works using read.table, just that it doesn't read in all lines (about two million)
Has anybody an idea what I can do about this?
SOLUTION:
Hi again, the following thread helped me out:
R: Why does read.table stop reading a file?
The problem was caused by quotation marks, probably because some of them were not closing. I simply used an editor and deleted all double and single quotation marks as well as all hash marks. It's working now.
#Anthony: Thanks for your question. I noticed that the problem did not occur in the first three lines which is why I got idea that it's an issue with the file. Thanks!
Paul