R read csv as character string - r

new to R and haven't been able to locate an answer to this question. I am using the following to create a new variable that tags each line as containing a word, or not.
a$keywordtag <- (1:nrow(a) %in% c(sapply(needle, grep, a$text, fixed = TRUE)))
the 'needle' or the words to search for is being read in as:
needle <- c("foo", "x", "y")
However, I want the needle to read in as a csv file. read.csv doesn't seem to have the option to read in as a character string. stringsAsFactors=FALSE doesn't work either. Any suggestions on this?
The csv would be:
a <- read.table(text='
"foo"
"x"
"y"', header=FALSE)

You should have all the text in one string and end each line with a new line character
(rc <- read.csv(text = paste0(needle, collapse = "\n"), header = FALSE))
V1
1 foo
2 x
3 y
identical(a, rc)
# [1] TRUE
You could also try readLines
read.csv(text = readLines(textConnection(needle)), sep = "\n", header = FALSE)
V1
1 foo
2 x
3 y
In the last line, if needle is actually a file, replace textConnection(needle) with the file name

If stringsAsFactors=FALSE isn't working for you, you might focus on troubleshooting that. The following code should work just fine to read in as character strings:
> needle = read.csv("PathToNeedle\\needle.csv", stringsAsFactors=FALSE, header=FALSE)
> needle[1]
V1
1 foo
2 x
3 y
> typeof(needle[1,1])
[1] "character"
If the csv file you are reading in to needle is really just:
"foo"
"x"
"y"
then that's very peculiar. What is the resulting dataframe you get when you run read.csv? If it simply isn't working, an alternative to try is to directly specify the data type as follows:
needle = read.csv("PathToNeedle\\needle.csv", colClasses=c('character'), header=FALSE)

Related

Can I get str_extract to take a piece of text in the middle of a file name?

A different program I am using (Raven Pro) results in hundreds of .txt files that include nine variables with headers. I also need the file name that each line is being pulled from.
I am using stringr::str_extract(names in order to get a file name thrown into a dataframe with rbindlist. My problem is that I only want a portion of the file name included.
Here's an example of one of my file names -
BIOL10_20201206_180000.wav.Table01.txt
so if I do ("\\d+")) to try and get numbers it only picks up the 10 before the underscore, but the portion of the file name I need is 20201206_180000
Any help to get around this is appreciated :)
library(plyr)
myfiles <- list.files(path=folder, pattern="*.txt", full.names = FALSE)
dat_tab <- sapply(myfiles, read.table, header= TRUE, sep = "\t", simplify = FALSE, USE.NAMES = TRUE)
names(dat_tab) <- stringr::str_extract(names(dat_tab), ("\\d+"))
binded1 = rbindlist(dat_tab, idcol = "files", fill = TRUE)
ended up with file name coming in as "10" from the file name "BIOL10_20201206_180000.wav.Table01.txt"
A couple other options:
x <- "BIOL10_20201206_180000.wav.Table01.txt"
#option 1
sub("^.*?(\\d+_\\d+).*$", "\\1", x)
#> [1] "20201206_180000"
#option 2
stringr::str_extract(x, "(?<=_)\\d+_\\d+")
#> [1] "20201206_180000"
You can specify the length:
library(stringr)
str_extract(x, "\\d{8}_\\d{6}")
# "20201206_180000"

Read list of names from CSV into R

I have a text file of names, separated by commas, and I want to read this into whatever in R (data frame or vector are fine). I try read.csv and it just reads them all in as headers for separate columns, but 0 rows of data. I try header=FALSE and it reads them in as separate columns. I could work with this, but what I really want is one column that just has a bunch of rows, one for each name. For example, when I try to print this data frame, it prints all the column headers, which are useless, and then doesn't print the values. It seems like it should be easily usable, but it appears to me one column of names would be easier to work with.
Since the OP asked me to, I'll post the comment above as an answer.
It's very simple, and it comes from some practice in reading in sequences of data, numeric or character, using scan.
dat <- scan(file = your_filename, what = 'character', sep = ',')
You can use read.csv are read string as header, but then just extract names (using names) and put this into a data.frame:
data.frame(x = names(read.csv("FILE")))
For example:
write.table("qwerty,asdfg,zxcvb,poiuy,lkjhg,mnbvc",
"FILE", col.names = FALSE, row.names = FALSE, quote = FALSE)
data.frame(x = names(read.csv("FILE")))
x
1 qwerty
2 asdfg
3 zxcvb
4 poiuy
5 lkjhg
6 mnbvc
Something like that?
Make some test data:
# test data
list_of_names <- c("qwerty","asdfg","zxcvb","poiuy","lkjhg","mnbvc" )
list_of_names <- paste(list_of_names, collapse = ",")
list_of_names
# write to temp file
tf <- tempfile()
writeLines(list_of_names, tf)
You need this part:
# read from file
line_read <- readLines(tf)
line_read
list_of_names_new <- unlist(strsplit(line_read, ","))

Transform dput(remove) data.frame from txt file into R object with commas

I have a txt file (remove.txt) with these kind of data (that's RGB Hex colors):
"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"...
Which are colors I don't want into my analysis.
And I have a R object (nacreHEX) with other data like in the file, but there are into this the good colors and the colors wich I don't want into my analysis. So I use this code to remove them:
nacreHEX <- nacreHEX [! nacreHEX %in% remove] .
It's works when remove is a R object like this remove <- c("#DDDEE0", "#D8D9DB"...), but it doesn't work when it's come from a txt file and I change it into a data.frame, and neither when I try with remove2 <-as.vector(t(remove)).
So there is my code:
remove <- read.table("remove.txt", sep=",")
remove2 <-as.vector(t(remove))
nacreHEX <- nacreHEX [! nacreHEX %in% remove2]
head(nacreHEX)
With this, there are no comas with as.vector, so may be that's why it doesn't work.
How can I make a R vector with comas with these kind of data?
What stage did I forget?
The problem is that your txt file is separated by ", " not ",'. The spaces end up in your string:
rr = read.table(text = '"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"', sep = ",")
(rr = as.vector(t(rr)))
# [1] "#DDDEE0" " #D8D9DB" " #F5F6F8" " #C9CBCA"
You can see the leading spaces before the #. We can trim these spaces with trimws().
trimws(rr)
# [1] "#DDDEE0" "#D8D9DB" "#F5F6F8" "#C9CBCA"
Even better, you can use the argument strip.white to have read.table do it for you:
rr = read.table(text = '"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"',
sep = ",", strip.white = TRUE)

How to remove trailing blanks or linebreaks from CSV file created with write.table?

I want to write a data frame from R into a CSV file. Consider the following toy example
df <- data.frame(ID = c(1,2,3), X = c("a", "b", "c"), Y = c(1,2,NA))
df[which(is.na(df[,"Y"])), 1]
write.table(t(df), file = "path to CSV/test.csv", sep = ""), col.names=F, sep=",", quote=F)
The output in test.csvlooks as follows
ID,1,2,3
X,a,b,c
Y, 1, 2,NA
At first glance, this is exactly as I need it, BUT what cannot be seen in the code insertion above is that after the NA in the last line, there is another linebreak. When I pass test.csv to a Javascript chart on a website, however, the trailing linebreak causes trouble.
Is there a way to avoid this final linebreak within R?
This is a little convoluted, but obtains your desired result:
zz <- textConnection("foo", "w")
write.table(t(df), file = zz, col.names=F, sep=",", quote=F)
close(zz)
foo
# [1] "ID,1,2,3" "X,a,b,c" "Y, 1, 2,NA"
cat(paste(foo, collapse='\n'), file = 'test.csv', sep='')
You should end up with a file that has newline character after only the first two data rows.
You can use a command line utility like sed to remove trailing whitespace from a file:
sed -e :a -e 's/^.\{1,77\}$/ & /;ta'
Or, you could begin by writing a single row then using append.
An alternative in the similar vein of the answer by #Thomas, but with slightly less typing. Send output from write.csv to a character string (capture.out). Concatenate the string (paste) and separate the elements with linebreaks (collapse = \n). Write to file with cat.
x <- capture.output(write.csv(df, row.names = FALSE, quote = FALSE))
cat(paste(x, collapse = "\n"), file = "df.csv")
You may also use format_csv from package readr to create a character vector with line breaks (\n). Remove the last end-of-line \n with substr. Write to file with cat.
library(readr)
x <- format_csv(df)
cat(substr(x, 1, nchar(x) - 1), file = "df.csv")

Scanning a csv file for a string in R

I would like to be able to scan a csv file row by row in R and exclude the rows that contain the word "target".
The problem is that the data comes from different places and the word "target" can come up in a number of different columns in the data frame.
So I need a line in a function that will look for this string, and if it is not present, then append that row to a new data frame (that I will then write out as a new csv).
Any and all help gratefully recieved.
Andrie's comment is probably the way most users would approach this, but if you want to do this at the reading in stage, you can try this:
Read in your csv using readLines and make any lines that have the text target blank:
temp = gsub(".*target.*", "", readLines("test.csv"))
Use read.table to convert temp to a data.frame. Since all lines that have the text target are now blank, the default blank.lines.skip=TRUE in read.table should correctly read in the rest of your data as a data.frame.
read.table(text=temp, sep=",", header=TRUE)
Use readLines:
lines <- readLines(file)
n.lines <- length(lines)
vec.1 <- rep(0, n.lines)
vec.2 <- rep(0, n.lines)
# more vectors as necessary
counter <- 0
for (i in 1:n.lines){
this.line <- strplit(lines[i], ",")
if ("target" %in% this.line) next
counter <- counter + 1
vec.1[counter] <- this.line[1]
vec.2[counter] <- this.line[2]
# etc.
}
df <- data.frame(vec.1[1:counter], vec.2[1:counter])
You may have to change n.lines slightly and change the indexing of the for loop if your file has headers; two lines would change as follows:
n.lines <- length(lines) - 1
and
for(i in 2:(n.lines+1)){
I would call from.readLines <- readLines(filename) and then just sub-select the rows that don't contain the target string: data <- read.csv(text = from.readLines[-grep('target', from.readLines)], header = F).
The faster way to do it (if your file is huge) would be to grep -v 'target' original.csv > new.csv first on the command line and then run read.csv(new.csv, ...) in R.
But anyway,
> #Without header
> from.readLines <- c('afaf,afasf,target', 'afaf,target,afasf', 'dagdg,asgst,sagga', 'dagdg,dg,sfafgsgg')
> data <- read.csv(text = from.readLines[-grep('target', from.readLines)], header = F)
> print(data)
V1 V2 V3
1 dagdg asgst sagga
2 dagdg dg sfafgsgg
>
> #With header
> from.readLines <- c('var1,var2,var3', 'afaf,afasf,target', 'afaf,target,afasf', 'dagdg,asgst,sagga', 'dagdg,dg,sfafgsgg')
> data <- read.csv(text = from.readLines[-(grep('target', from.readLines[-1]) + 1)])
> print(data)
var1 var2 var3
1 dagdg asgst sagga
2 dagdg dg sfafgsgg

Resources