data <- read.table(
file = "Data/data_2021_03.txt",
header = TRUE,
sep = "\t",
stringsAsFactors = TRUE
)
I have large TXT files that I try to read with R. Problem is that in the header a hash tag # is use. R sees this as a comment en stops reading the line after the '#'. But I get the error that I have more columns than headers. For one file I replace the '#' for a '', in a txt editor. That works for one file. But I don't like to change my txt files. So how can I read a txt file with a '#' and replace it a '' in R?
Just set the comment character als empty string (see last line):
data <- read.table(
file = "Data/data_2021_03.txt",
header = TRUE,
sep = "\t",
stringsAsFactors = TRUE,
comment.char=""
)
In R, how does one read delimiter or and also convert delimiter for "|" vertical line (ASCII: | |). I need to split on whole numbers inside the file, so strsplit() does not help me.
I have R code that reads csv file, but it still retains the vertical line "|" character. This file has a separator of "|" between fields. When I try to read with read.table() I get comma, "," separating every individual character. I also try to use dplyr in R for tab_spanner_delim(delim = "|") to convert the vertical line after the read.delim("file.csv", sep="|") read the file, even this read.delmin() does not work. I new to special char R programming.
read.table(text = gsub("|", ",", readLines("file.csv")))
dat_csv <- read.delim("file.csv", sep="|")
x <- cat_csv %>% tab_spanner_delim(delim = "|")
dput() from read.table(text = gsub("|", ",", readLines("file.csv")))
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,0,|,0,0,:,0,0,|,|,A,M,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\",",
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,6,|,0,0,:,0,0,|,4,.,9,|,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\","
dput() from dat_csv <- read.delim("file.csv", sep="|")
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"
dput(dat_csv)
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"
We can read the data line by line using readLines. Remove unwanted characters at the end of each line using trimws, paste the string into one string with new line (\n) character as the collapse argument and use this string in read.table to read data as dataframe.
data <- read.table(text = paste0(trimws(readLines('file.csv'),
whitespace = '[", ]'), collapse = '\n'), sep = '|')
I am trying to download some CSV's from some links. Most of the CSV's are separated by ; however, one or two are separated by ,. Running the following code:
foo <- function(csvURL){
downloadedCSV = read.csv(csvURL, stringsAsFactors = FALSE, fileEncoding = "latin1", sep = ";")
return(downloadedCSV)
}
dat <- purrr::map(links, foo)
Gives me a list of 3 data.frame's. Two of them have 2 columns (correctly read in by the ; separator) and one of them has 1 column (incorrectly read in by the ; separator) because this file uses the , separator.
How can I incorporate into the function something like if the number of columns == 1 re-read the data but this time using , instead of ;? I tried passing sep = ";|," to the read.csv function but had no luck.
Links data:
links <- c("https://dadesobertes.gva.es/dataset/686fc564-7f2a-4f22-ab4e-0fa104453d47/resource/bebd28d6-0de6-4536-b522-d013301ffd9d/download/covid-19-total-acumulado-de-casos-confirmados-pcr-altas-epidemiologicas-personas-fallecidas-y-da.csv",
"https://dadesobertes.gva.es/dataset/686fc564-7f2a-4f22-ab4e-0fa104453d47/resource/b4b4d90b-08cf-49e4-bef1-5608311ce78a/download/covid-19-total-acumulado-de-casos-confirmados-pcr-altas-epidemiologicas-personas-fallecidas-y-da.csv",
"https://dadesobertes.gva.es/dataset/686fc564-7f2a-4f22-ab4e-0fa104453d47/resource/62990e05-9530-4f2f-ac41-3fad722b8515/download/covid-19-total-acumulado-de-casos-confirmados-pcr-altas-epidemiologicas-personas-fallecidas-y-da.csv"
)
We can also specify the sep as an argument
foo <- function(csvURL, sep){
downloadedCSV = read.csv(csvURL, stringsAsFactors = FALSE,
fileEncoding = "latin1", sep = sep)
return(downloadedCSV)
}
lstdat <- map2(links, c(";", ",", ";"), ~ foo(.x, sep=.y))
Or use fread from data.table, which can pick up the delimiter automatically
foo <- function(csvURL){
downloadedCSV = data.table::fread(csvURL, encoding = "Latin-1")
return(downloadedCSV)
}
dat <- purrr::map(links, foo)
How to define "," as the column separator (sep) in read.csv in R?
I have tried read.csv(file=x,header = FALSE,sep = "",""), which doest work correctly.
sep can only be one character, but you can open your file x e.g. with readLines and exchange your "," separator e.g. with \t by using gsub.
read.table(text=gsub("\",\"", "\t", readLines("x")))
I have some strings in one of the columns of my data frame that look like:
bem\\2015\black.rec
When I export the data frame into a text file using the next line:
write.table(data, file = "sample.txt", quote = FALSE, row.names = FALSE, sep = '\t')
Then in the text file the text looks like:
bem\2015BELblack.rec
Do you know an easy way to ignore all backslashes when writing the table into a text file so the backslashes are kept.
They way I have resolved this is converting backslashes into forwardslashes:
dataset <- read_delim(dataset.path, delim = '\t', col_names = TRUE, escape_backslash = FALSE)
dataset$columnN <- str_replace_all(dataset$Levfile, "\\\\", "//")
dataset$columnN <- str_replace_all(dataset$Levfile, "//", "/")
write.table(dataset, file = "sample.txt", quote = FALSE, row.names = FALSE, sep = '\t')
This exports the text imported as bem\\2015\black.rec with the required slashes: bem//2015/black.rec