how to read delimiter "|" vertical line in R external csv file - r

In R, how does one read delimiter or and also convert delimiter for "|" vertical line (ASCII: | |). I need to split on whole numbers inside the file, so strsplit() does not help me.
I have R code that reads csv file, but it still retains the vertical line "|" character. This file has a separator of "|" between fields. When I try to read with read.table() I get comma, "," separating every individual character. I also try to use dplyr in R for tab_spanner_delim(delim = "|") to convert the vertical line after the read.delim("file.csv", sep="|") read the file, even this read.delmin() does not work. I new to special char R programming.
read.table(text = gsub("|", ",", readLines("file.csv")))
dat_csv <- read.delim("file.csv", sep="|")
x <- cat_csv %>% tab_spanner_delim(delim = "|")
dput() from read.table(text = gsub("|", ",", readLines("file.csv")))
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,0,|,0,0,:,0,0,|,|,A,M,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\",",
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,6,|,0,0,:,0,0,|,4,.,9,|,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\","
dput() from dat_csv <- read.delim("file.csv", sep="|")
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"
dput(dat_csv)
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"

We can read the data line by line using readLines. Remove unwanted characters at the end of each line using trimws, paste the string into one string with new line (\n) character as the collapse argument and use this string in read.table to read data as dataframe.
data <- read.table(text = paste0(trimws(readLines('file.csv'),
whitespace = '[", ]'), collapse = '\n'), sep = '|')

Related

R RegEx gsub() Equivalent of "Line Operations>Remove Empty Lines (Containing Blank Characters)" in CSV file

I have a CSV fwith several columns: Tweet, date, etc. The spaces in some Tweets is causing blank lines and undesired truncated lines.
What works:
1. Using Notepad++'s function "Line Operations>Remove Empty Lines (Containing Blank Characters)"
2. Search and replace: \r with nothing.
However, I need to do this for a large number of files, and I can't manage to find a Regular Expression with gsub() in R that will do what the Notepadd++ function does.
Note that replacing ^[ \t]*$\r?\n with nothing and then \r with nothing does work in Notepad++, but not in R, as suggested here, but it does not work with g(sub) in R.
I have tried the following code:
tx <- readLines("tweets.csv")
subbed <-gsub(pattern = "^[ \\t]*$\\r?\\n", replace = "", x = tx)
subbed <-gsub(pattern = "\r", replace = "", x = subbed)
writeLines(subbed, "output.csv")
This is the input:
This is the desired output:
You may use
library(readtext)
tx <- readtext("weets.csv")
subbed <- gsub("(?m)^\\h*\\R?", "", tx$text, perl=TRUE)
subbed <- gsub("\r", "", subbed, fixed=TRUE)
writeLines(trimws(subbed), "output.csv")
The readtext llibrary reads the file into a single variable and thus all line break chars are kept.

"," separator in read.csv - R

How to define "," as the column separator (sep) in read.csv in R?
I have tried read.csv(file=x,header = FALSE,sep = "",""), which doest work correctly.
sep can only be one character, but you can open your file x e.g. with readLines and exchange your "," separator e.g. with \t by using gsub.
read.table(text=gsub("\",\"", "\t", readLines("x")))

Transform dput(remove) data.frame from txt file into R object with commas

I have a txt file (remove.txt) with these kind of data (that's RGB Hex colors):
"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"...
Which are colors I don't want into my analysis.
And I have a R object (nacreHEX) with other data like in the file, but there are into this the good colors and the colors wich I don't want into my analysis. So I use this code to remove them:
nacreHEX <- nacreHEX [! nacreHEX %in% remove] .
It's works when remove is a R object like this remove <- c("#DDDEE0", "#D8D9DB"...), but it doesn't work when it's come from a txt file and I change it into a data.frame, and neither when I try with remove2 <-as.vector(t(remove)).
So there is my code:
remove <- read.table("remove.txt", sep=",")
remove2 <-as.vector(t(remove))
nacreHEX <- nacreHEX [! nacreHEX %in% remove2]
head(nacreHEX)
With this, there are no comas with as.vector, so may be that's why it doesn't work.
How can I make a R vector with comas with these kind of data?
What stage did I forget?
The problem is that your txt file is separated by ", " not ",'. The spaces end up in your string:
rr = read.table(text = '"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"', sep = ",")
(rr = as.vector(t(rr)))
# [1] "#DDDEE0" " #D8D9DB" " #F5F6F8" " #C9CBCA"
You can see the leading spaces before the #. We can trim these spaces with trimws().
trimws(rr)
# [1] "#DDDEE0" "#D8D9DB" "#F5F6F8" "#C9CBCA"
Even better, you can use the argument strip.white to have read.table do it for you:
rr = read.table(text = '"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"',
sep = ",", strip.white = TRUE)

Formatting a XLSX file in R into a custom text blob

I want to read a xlsx file and I want to convert the data in the file into a long text string. I want to format this string in an intelligent manner, such as each row is contained in parentheses “()”, and keep the data in a comma separated value string. So for example if this was the xlsx file looked like this..
one,two,three
x,x,x
y,y,y
z,z,z
after formatting the string would look like
header(one,two,three)row(x,x,x)row(y,y,y)row(z,z,z)
How would you accomplish this task with R?
my first instinct was something like this… but I can’t figure it out..
library(xlsx)
sheet1 <- read.xlsx("run_info.xlsx",1)
paste("(",sheet1[1,],")")
This works for me:
DF <- read.xlsx("run_info.xlsx",1)
paste0("header(", paste(names(DF), collapse = ","), ")",
paste(paste0("row(", apply(DF, 1, paste, collapse = ","), ")"),
collapse = ""))
# [1] "header(one,two,three)row(x,x,x)row(y,y,y)row(z,z,z)"

How to remove trailing blanks or linebreaks from CSV file created with write.table?

I want to write a data frame from R into a CSV file. Consider the following toy example
df <- data.frame(ID = c(1,2,3), X = c("a", "b", "c"), Y = c(1,2,NA))
df[which(is.na(df[,"Y"])), 1]
write.table(t(df), file = "path to CSV/test.csv", sep = ""), col.names=F, sep=",", quote=F)
The output in test.csvlooks as follows
ID,1,2,3
X,a,b,c
Y, 1, 2,NA
At first glance, this is exactly as I need it, BUT what cannot be seen in the code insertion above is that after the NA in the last line, there is another linebreak. When I pass test.csv to a Javascript chart on a website, however, the trailing linebreak causes trouble.
Is there a way to avoid this final linebreak within R?
This is a little convoluted, but obtains your desired result:
zz <- textConnection("foo", "w")
write.table(t(df), file = zz, col.names=F, sep=",", quote=F)
close(zz)
foo
# [1] "ID,1,2,3" "X,a,b,c" "Y, 1, 2,NA"
cat(paste(foo, collapse='\n'), file = 'test.csv', sep='')
You should end up with a file that has newline character after only the first two data rows.
You can use a command line utility like sed to remove trailing whitespace from a file:
sed -e :a -e 's/^.\{1,77\}$/ & /;ta'
Or, you could begin by writing a single row then using append.
An alternative in the similar vein of the answer by #Thomas, but with slightly less typing. Send output from write.csv to a character string (capture.out). Concatenate the string (paste) and separate the elements with linebreaks (collapse = \n). Write to file with cat.
x <- capture.output(write.csv(df, row.names = FALSE, quote = FALSE))
cat(paste(x, collapse = "\n"), file = "df.csv")
You may also use format_csv from package readr to create a character vector with line breaks (\n). Remove the last end-of-line \n with substr. Write to file with cat.
library(readr)
x <- format_csv(df)
cat(substr(x, 1, nchar(x) - 1), file = "df.csv")

Resources