I have a .txt file with one column consisting of 1040 lines (including a header). However, when loading it into R using the read.table() command, it's showing 1044 lines (including a header).
The snippet of the file looks like
L*H
no
H*L
no
no
no
H*L
no
Might it be an issue with R?
When opened in Excel it doesn't show any errors as well.
EDIT
The problem was that R read a line like L + H* as three separated lines L + H*.
I used
table <- read.table(file.choose(), header=T, encoding="UTF-8", quote="\n")
You can try readLines() to see how many lines are there in your file. And feel free to use read.csv() to import it again to see it gets the expected return. Sometimes, the file may be parsed differently due to extra quote, extra return, and potentially some other things.
possible import steps:
look at your data with text editor or readLines() to figure out the delimiter and file type
Determine an import method (type read and press tab, you will see the import functions for import. Also check out readr.)
customize your argument. For example, if you have a header or not, or if you want to skip the first n lines.
Look at the data again in R with View(head(data)) or View(tail(data)). And determine if you need to repeat step 2,3,4
Based on the data you have provided, try using sep = "\n". By using sep = "\n" we ensure that each line is read as a single column value. Additionally, quote does not need to be used at all. There is no header in your example data, so I would remove that argument as well.
All that said, the following code should get the job done.
table <- read.table(file.choose(), sep = "\n")
Related
I've been searching for similar problems but I can't find anything helpful.
I'm trying to open a portion of a big csv file with
#choosing a certain number of variables from more than 250 available in the file
resources<-c("P13_2_1","P13_3_1","P13_2_2",...)
v <- fread("file.csv", select = resources, header = TRUE, encoding = "UTF-8"
After the file is opened, wherever there shoul be NA there's blank cells. However, when I try to see whats in any of the blank cells, i see this
v$P13_2_1[2]
[1] "\r"
Similarly, the header of every column seems fine in the viewer of R Studio but when I try to see them in the console, there's the same \r attached.
The problem is present using both, read.csv and fread and I've tried to modify the quote and na.string arguments.
I would like to get rid of the "\r" and posibly subtitute it with NA
I'm trying to import the YRBS ASCII .dat file found here to analyze in R, but I'm having trouble importing the file. I followed the recommendations here and here but none seem to work. More specifically, it's still showing up as being one column/variable in R with 14,765 observations.
I've tried using the readLines(), read.table, and read.csv functions but none seem to be separating the columns.
Here are the specific codes I tried:
readLines("D:/Projects/XXH2017_YRBS_Data.dat", n=5)
read.csv("D:/Projects/XXH2017_YRBS_Data.dat", header = FALSE)
read.table("D:/Projects/XXH2017_YRBS_Data.dat", header = FALSE)
readLines and read.csv only provided one column and I got an error message from using read.table that stated that line 1 did not have 23 elements (which I'm assuming is just referring to the missing values?).
The data also starts from line 1 so I cannot use skip = 1 like some have suggested online.
How do I import this file into R so that I can separate the columns?
Bulky file, so I did not download them.
First, use an Access file version then use try following codes.
Compare it to Access data.
data<- readr::read_table2("XXH2017_YRBS_Data.dat", col_names = FALSE, na = ".")
The program I am exporting my data from (PowerBI) saves the data as a .csv file, but the first line of the file is sep=, and then the second line of the file has the header (column names).
Sample fake .csv file:
sep=,
Initiative,Actual to Estimate (revised),Hours Logged,Revised Estimate,InitiativeType,Client
FakeInitiative1 ,35 %,320.08,911,Platform,FakeClient1
FakeInitiative2,40 %,161.50,400,Platform,FakeClient2
I'm using this command to read the file:
initData <- read.csv("initData.csv",
row.names=NULL,
header=T,
stringsAsFactors = F)
but I keep getting an error that there are the wrong number of columns (because it thinks the first line tells it the number of columns).
If I do header=F instead then it loads, but then when I do names(initData) <- initData[2,] then the names have spaces and illegal characters and it breaks the rest of my program. Obnoxious.
Does anyone know how to tell R to ignore that first line? I can go into the .csv file in a text editor and just delete the first line manually before I load it each time (if I do that, everything works fine) but I have to export a bunch of files and this is a bit stupid and tedious.
Any help would be much appreciated.
There are many ways to do that. Here's one:
all_content = readLines("initData.csv")
skip_first_line = all_content[-1]
initData <- read.csv(textConnection(skip_first_line),
row.names=NULL,
header=T,
stringsAsFactors = F)
Your file could be in a UTF-16 encoding. See hrbrmstr's answer in how to read a UTF-16 file:
The R function read.csv works as the following as stated in the manual: "If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names." That's good. However, when it comes to the function write.csv, I cannot find a way to write the csv file in a similar way. So, if I have a file.txt as below:
Column_1,Column_2
Row_1,2,3
Row_2,4,5
Then when I read it using a = read.csv('file.txt'), the row and column names are Row_x and Column_x as expected. However, when I write the matrix a to a csv file again, then what I get as a result from write.csv('file2.txt', quote=F) is as below:
,Column_1,Column_2
Row_1,2,3
Row_2,4,5
So, there is a comma in the beginning of this file. And if I would read this file again using a2 = read.csv('file2.txt'), then resulting a2 will not be the same as the previous matrix a. The row names of the matrix a2 will not be Row_x. That's, I do not want a comma in the beginning of the file. How can I get rid of this comma while using write.csv?
The two functions that you have mentioned, read.cvs and write.csv are just a specific form of the more generic functions read.table and write.table.
When I copy your example data into a .csv and try to read it with read.csv, R throws a warning and says that the header line was incomplete. Thus it resorted to special behaviour to fix the error. Because we had an incomplete file, it completed the file by adding an empty element at the top left. R understands that this is a header row, and thus the data appears okay in R, but when we write to a csv, it doesn't understand what is header and what is not. Thus the empty element only appearing in the header row created by R shows up as a regular element. Which you would expect. Basically it made our table into a 3x3 because it can't have a weird number of elements.
You want the extra comma there, because it allows programs to read the column names in the right place. In order to read the file in again you can do the following, assuming test.csv is your data. You can fix this by manually adding the column and row names in R, including the missing element to put everything in place.
To fix the wonky row names, you're going to want to add an extra option specifying which row is the row names (row.names = your_column_number) when you read it back in with the comma correctly in place.
y <- read.csv(file = "foo.csv") #this throws a warning because your input is incorrect
write.csv(y, "foo_out.csv")
x <- read.csv(file = "foo.csv", header = T, row.names = 1) #this will read the first column as the row names.
Play around with read/write.csv, but it might be worth while to move into the more generic functions read.table and write.table. They offer expanded functionality.
To read a csv in the generic function
y <- read.table(file = "foo.csv", sep = ",", header = TRUE)
thus you can specify the delimiter and easily read in excel spreadsheets (separated by tab or "\t") or space delimited files ( " " ).
Hope that helps.
I am trying to use read.table() to import this TXT file into R (contains informations about meteorological stations provided by the WMO):
However, when I try to use
tmp <- read.table(file=...,sep=";",header=FALSE)
I get this error
eof within quoted string
warning and only 3514 of the 6702 lines appear in 'tmp'. From a quick look at the text file, I couldn't find any seemingly problematic characters.
As suggested in other threads, I also tried quote="". The EOF warning disappeared, but still only 3514 lines are imported.
Any advice on how I can get read.table() to work for this particular txt file?
It looks like your data actually has 11548 rows. This works:
read.table(url('http://weather.noaa.gov/data/nsd_bbsss.txt'),
sep=';', quote=NULL, comment='', header=FALSE)
edit: updated according #MrFlick's comment's below.
The problem is LF. R will not recognize "^M", to load the file, you only need to specify the encoding like this:
read.table("nsd_bbsss.txt",sep=";",header=F,encoding="latin1",quote="",comment='',colClasses=rep("character",14)) -> data
But Line 8638 has more than 14 columns, which is different from other lines and may lead an error message.