Issues Reading txt files [duplicate] - r

This question already has answers here:
Get "embedded nul(s) found in input" when reading a csv using read.csv()
(7 answers)
Closed 3 years ago.
I want to text a .txt file in R but I keep getting an embedded null error.
I have tried this code:
text_df = read.delim2(testfile, header = TRUE, sep = ',')
The original file ("testfile") looks like this:
UPC,HSY Item Description,Hsy Seasonal Segmentation,Store Nbr,Store Name,Building City,Building State/Prov,Building Postal Code,Store Type,WM Date,SeasonAndYear,OH_Qty,POS_Qty,POS_Sales
"0001070006638","Whprs Rbn Egg 13.75OZ","EAS $2.98 Candy Dish",1,"ROGERS, AR","ROGERS","AR","72756","Supercenter",1/27/2018 12:00:00 AM,"EAS2018",0,0,0.0000
"0001070006638","Whprs Rbn Egg 13.75OZ","EAS $2.98 Candy Dish",1,"ROGERS, AR","ROGERS","AR","72756","Supercenter",1/30/2018 12:00:00 AM,"EAS2018",0,0,0.0000
"0001070006638","Whprs Rbn Egg 13.75OZ","EAS $2.98 Candy Dish",1,"ROGERS, AR","ROGERS","AR","72756","Supercenter",2/2/2018 12:00:00 AM,"EAS2018",0,0,0.0000
I keep getting this error:
Warning messages: 1: In read.table(file = file, header = header, sep =
sep, quote = quote, : line 1 appears to contain embedded nulls 2:
In read.table(file = file, header = header, sep = sep, quote = quote,
: line 2 appears to contain embedded nulls 3: In read.table(file =
file, header = header, sep = sep, quote = quote, : line 3 appears
to contain embedded nulls 4: In read.table(file = file, header =
header, sep = sep, quote = quote, : line 4 appears to contain
embedded nulls 5: In read.table(file = file, header = header, sep =
sep, quote = quote, : line 5 appears to contain embedded nulls 6:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
: embedded nul(s) found in input

Try this:
df = read.table(yourFile, quote = '"', sep = ",", header = T)
This should treat the comma inside "ROGERS, AR" as part of the string and not as a separator.

Related

problems while reading a .csv file

I'm using a tool called Meltwater to download tweets and news that contain certain keywords. The file seems to be a .csv, but when I try reading it with the following code:
test <- read.csv("C:/Users/Usuario/Downloads/esto_es_una_prueba - Nov 24, 2022 - 11 40 37 AM.csv")
it returns this error:
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>D'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 3 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 4 appears to contain embedded nulls
5: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 5 appears to contain embedded nulls
If I open the file with Excel, instead of getting one comma separated column, such as:
Date,Headline,URL
24-Nov-2022 11:38AM,prueba,https://twitter.com/Jessipaola24/statuses/1595788900215230467
I get a regular Excel file, like this:
enter image description here
I tried with different seq = , I tried reading the file as a .xslx with different libraries. Nothing seems to work. I only manage to open the file in R if I first open it on Xlsx and save it as a .csv/.xlsx file. But since I have to read many files, I'd like to lose this step.

why i can't import the following dataset from uci

Good afternoon ,
Assume we have the following function :
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
DT <- data.table::fread(link,
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}
When i try to import acute dataset from uci , i get the following error :
acute=data_preprocessing("https://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
[100%] Downloaded 7276 bytes...
Error in data.table::fread(link, fill = TRUE, na.strings = "?") :
File is encoded in UTF-16, this encoding is not supported by fread(). Please recode the file to UTF-8.
I also tried :
acute=read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 3 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 4 appears to contain embedded nulls
5: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
Thank you for help !
Use read.table with appropriate encoding instead.
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
DT <- read.table(link,
fileEncoding="UTF-16",
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}
acute=data_preprocessing("https://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data")
head(acute)
V1 V2 V3 V4 V5 V6 V7
2 35,9 no no yes yes yes yes
3 35,9 no yes no no no no
4 36,0 no no yes yes yes yes
5 36,0 no yes no no no no
6 36,0 no yes no no no no
7 36,2 no no yes yes yes yes
Edit:
To find automatically the encoding used in the data file, you can use the guess_encoding function in readr package.
data_preprocessing<-function(link,drop_last_column=TRUE){
link=as.character(link)
enc_guess <- readr::guess_encoding(link)
enc <- enc_guess[enc_guess$confidence == max(enc_guess$confidence),]$encoding
DT <- read.table(link,
fileEncoding = enc,
fill = TRUE,
na.strings = "?")
DT=DT[-1,]
DT=as.data.frame(DT)
if(drop_last_column==TRUE){
DT=as.data.frame(DT)[,-ncol(DT)]
}
return(DT)
}

showing error when iam trying to import xlsx file into R

d=read.csv(file.choose())
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'C:\Users\xforce47\Desktop\airbnb .xlsx'
d=read.csv(file.choose())
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'C:\Users\xforce47\Desktop\airbnb .xlsx'
Thats because you try to read in an excel document with a function for csv's. Try
library(rio)
d <- import(file.choose(), setclass = "tbl")
instead. The setclass argument is optional and only useful if you work with the tidyverse.
Just save the file as .csv and read it.
Set the working directory correctly
x <- read.csv(‘myfile1.csv’)

How can I read from a text file in R?

How can I read from a text file? I have the following data in a text file-
A,B,C,D
E,F,G,H
Iam trying to choose the file interactively.
read.delim(file.choose(), sep=",")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls<br>
2: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 2 appears to contain embedded nulls<br>
3: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 3 appears to contain embedded nulls<br>
4: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 4 appears to contain embedded nulls<br>
5: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 5 appears to contain embedded nulls<br>
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
:
EOF within quoted string<br>
7: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
:
embedded nul(s) found in input
I wish to simply read the data and store it in a variable.
Just use read.csv:
your_df <- read.csv(file="/path/to/your_file.csv", header=FALSE)
your_df
v1 v2 v3 v4
1 A B C D
2 E F G H
The header parameter to read.csv tells R that your input CSV file does not have a leading header row with column names.
Install and attach data.table then use fread
fread(file.choose(), sep = ",")
Your error could be due to encoding issues - specify the right encoding:
fread(file.choose(), sep = ",", encoding = "INSERT YOUR ENCODING"`)

R - read.csv add rows when loading the dataset

I'm trying to read a .csv in R. Some rows of one column have text with a komma, within double quotes: "example, another example"
But R alters the rows (it adds rows) when I try to read it like this:
steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',')
This one doesn't work either when I did a search on the internet:
steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
This is the error message:
steekproef <- read.csv("steekproef.csv", header = T, sep =",", quote ="\"")
comes with error:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Gives data.frame: 1391160 obs. of 29 variables
str(steekproef) gives no error but a
'data.frame': 3103620 obs. of 29 variables:
The dataset has 29 columns and 3019438 rows
I don't think that the problem is caused be the "example, another example":
I created a testfile in Excel and saved it as .csv. It looks like this in Notepad++:
test,num
"example, another example",1
"example, another example",2
example,3
example,4
I could import it without problems using
steekproef<- read.csv('steekproef.csv', header = T, sep = ',')
or steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
Your first try: steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',') gave me the Error in read.table(file = file, header = header, sep = sep, quote = quote,: duplicate 'row.names' are not allowed

Resources