From csv to txt - r

How can i convert a csv file to a plain text file?
My CSV file consists of 3 columns and i only want to get the values of the column "Text" in the text file. I tried to achieve this with the following:
name <- read.csv('c:/Users/bi2/Documents/TextminingRfiles/ScoreOutput/RangersScores.csv', header=T, sep=",")
attach(name)
posText <- name[score > 0,]> name <- read.csv('c:/Users/bi2/Documents/TextminingRfiles/ScoreOutput/RangersScores.csv', header=T, sep=",")
attach(name)
posText <- name[score > 0,]
write(posText$text, file = "C:/Users/bi2/Documents/TextminingRfiles/ScoreOutput/namePositive.txt", sep="")
This code only copies the indexes to the text file, but not the text values of the text column. How can i fix this?
Tnx for your help.

A few extra arguments to write.table will probably get what you want.
Here's a reproducible example, make a CSV with three columns...
write.csv(data.frame(x = sample(26),
y = sample(26),
text = letters),
file = "test.csv")
Now read it into R...
test <- read.csv("test.csv")
Now do your other calculations, then subset to get only the column you want to write to a txt file...
test <- test[ ,which(names(test) == 'text')]
And now write it to a txt file, with no row names, column names or quote marks...
write.table(test, "test.txt",
row.names = FALSE,
quote = FALSE,
col.names = FALSE)
By the way, the attaching in the code in your question is unnecessary and not recommended.

Related

How do I read multiple Excel Files into R and then convert them into Fasta files (without concatenating)?

I have a folder with about 80 Excel files that contain the fastaname in one column and a gene sequence another column. All files have a different number of rows.
I am trying to automate the process of writing an Excel File into a Fasta by reading in all the files I have in one folder and using the command I have for each file.
I do NOT want to have all the data written into a single Fasta file, but want for example the file "gene1.xlsx" as "gene1.fas", "gene2.xlsx" as "gene2.fas" and so on.
The code I am using to convert a single Excel file into a Fasta file is as follows:
library(readxl)
library(tibble)
X<-read_excel("*name of the file*.xlsx", col_names=FALSE)
D<-do.call(rbind, lapply (seq(nrow(X)), function(i) t(X[i,])))
write.table(D,file = "*name of the file*.fas", row.names=FALSE, col.names=FALSE, quote=FALSE)
I guess that I need a for-loop for that, but I am new to programming and everything I have tried just gave me a single empty Fasta files as the output.
The code I used for that is the following:
library(readxl)
library(tibble)
file.list<- list.files(pattern="*.xlsx")
df.list <- lapply(file.list, read_excel)
library(tibble)
for (i in file.list) {
B <-do.call(rbind, lapply (seq(nrow(file.list)), function(i) t(file.list[i,])))
write.table(B,file = "*.fas", row.names=FALSE, col.names=FALSE, quote=FALSE)
}
Is there a way to do this?
I appreciate any help!
Clara
If your data looks like this:
A column of names and a column of sequences.
And the fasta format is:
>name1
sequence1
>name2
sequence2
And you have a folder containing 80 excel files in the same format.
You can convert one excel to fasta like this
excelfname <- "seqfile1.xlsx"
# I use explicit colun types to make sure that the names are not interpreted as numbers or dates.
seqlist <- read_excel(excelfname,
col_names = c("header", "sequence"),
col_types = c("text", "text" ))
fastalist <- paste0(">", seqlist$header, "\n", seqlist$sequence)
fastafname <- str_replace(excelfname, "xlsx", "fas")
writeLines(fastalist, con = file(fastafname))
Then for many excel files, we can wrap this code in lapply()
excelfnames <- list.files(pattern = "*.xlsx")
seqlists <- lapply(excelfnames, function(excelfname)
read_excel(excelfname,
col_names = c("header", "sequence"),
col_types = c("text", "text" )))
fastalists <- lapply(seqlists,
function(record) paste0(">", record$header, "\n",
record$sequence))
fastafnames <- str_replace(excelfnames, "xlsx", "fas")
names(fastalists) <- fastafnames
lapply(names(fastalists), function(fname) writeLines(fastalists[[fname]],
con = file(fname)))

How do I extract specific rows from a CSV and format the data in R?

I have a CSV file that contains thousands of lines like this:
1001;basket/files/legobrick.mp3
4096;basket/files/sunshade.avi
2038;data/lists/blockbuster.ogg
2038;data/random/noidea.dat
I want to write this to a new CSV file but include only rows which contain '.mp3' or '.avi'. The output file should be just one column and look like this:
"basket/files/legobrick.mp3#1001",
"basket/files/sunshade.avi#4096",
So the first column should be suffixed to the second column and separated by a hash symbol and each line should be quoted and separated by a comma as shown above.
The source CSV file does not contain a header with column names. It's just data.
Can someone tell me how to code this in R?
Edit (following marked answer): This question is not a duplicate because it involves filtering rows and the output code format is completely different requiring different processing methods. The marked answer is also completely different which really backs up my assertion that this is not a duplicate.
You can do it in the following way :
#Read the file with ; as separator
df <- read.csv2(text = text, header = FALSE, stringsAsFactors = FALSE)
#Filter the rows which end with "avi" or "mp3"
inds <- grepl("avi$|mp3$", df$V2)
#Create a new dataframe by pasting those rows with a separator
df1 <- data.frame(new_col = paste(df$V2[inds], df$V1[inds], sep = "#"))
df1
# new_col
#1 basket/files/legobrick.mp3#1001
#2 basket/files/sunshade.avi#4096
#Write the csv
write.csv(df1, "/path/of/file.csv", row.names = FALSE)
Or if you want it as a text file you can do
write.table(df1, "path/test.txt", row.names = FALSE, col.names = FALSE, eol = ",\n")
data
text = "1001;basket/files/legobrick.mp3
4096;basket/files/sunshade.avi
2038;data/lists/blockbuster.ogg
2038;data/random/noidea.dat"
See whether the below code helps
library(tidyverse)
df %>%
filter(grepl("\\.mp3|\\.avi", file_path)) %>%
mutate(file_path = paste(file_path, ID, sep="#")) %>%
pull(file_path) %>% dput
A data.table answer:
dt <- fread("file.csv")
fwrite(dt[V2 %like% "mp3$|avi$", .(paste0(V2, "#", V1))], "output.csv", col.names = FALSE)

Reading a dat file in R

I am trying to read a dat file with ";" separated. I want to read a specific line that starts with certain characters like "B" and the other line are not the matter of interest. Can anyone guide me.
I have tried using the read_delim, read.table and read.csv2.But since some lines are not of equal length. So, I am getting errors.
file <- read.table(file = '~/file.DAT',header = FALSE, quote = "\"'",dec = ".",numerals = c("no.loss"),sep = ';',text)
I am expecting a r dataframe out of this file which I can write it to a csv file again.
You should be able to do that through readLines
allLines <- readLines(con = file('~/file.DAT'), 'r')
grepB <- function(x) grepl('^B',x)
BLines <- filter(grepB, allLines)
df <- as.data.frame(strsplit(BLines, ";"))
And if your file contains header, then you can specify
names(df) <- strsplit(allLines[1], ";")[[1]]

Read list of names from CSV into R

I have a text file of names, separated by commas, and I want to read this into whatever in R (data frame or vector are fine). I try read.csv and it just reads them all in as headers for separate columns, but 0 rows of data. I try header=FALSE and it reads them in as separate columns. I could work with this, but what I really want is one column that just has a bunch of rows, one for each name. For example, when I try to print this data frame, it prints all the column headers, which are useless, and then doesn't print the values. It seems like it should be easily usable, but it appears to me one column of names would be easier to work with.
Since the OP asked me to, I'll post the comment above as an answer.
It's very simple, and it comes from some practice in reading in sequences of data, numeric or character, using scan.
dat <- scan(file = your_filename, what = 'character', sep = ',')
You can use read.csv are read string as header, but then just extract names (using names) and put this into a data.frame:
data.frame(x = names(read.csv("FILE")))
For example:
write.table("qwerty,asdfg,zxcvb,poiuy,lkjhg,mnbvc",
"FILE", col.names = FALSE, row.names = FALSE, quote = FALSE)
data.frame(x = names(read.csv("FILE")))
x
1 qwerty
2 asdfg
3 zxcvb
4 poiuy
5 lkjhg
6 mnbvc
Something like that?
Make some test data:
# test data
list_of_names <- c("qwerty","asdfg","zxcvb","poiuy","lkjhg","mnbvc" )
list_of_names <- paste(list_of_names, collapse = ",")
list_of_names
# write to temp file
tf <- tempfile()
writeLines(list_of_names, tf)
You need this part:
# read from file
line_read <- readLines(tf)
line_read
list_of_names_new <- unlist(strsplit(line_read, ","))

using column names when appending data in write.table

I am looping through some data, and appending it to csv file. What I want is to have column names on the top of the file once, and then as it loops to not repeat column names in the middle of file.
If I do col.names=T, it repeats including column names for each new loop. If I have col.names=F, there are no column names at all.
How do I do this most efficiently? I feel that this is such a common case that there must be a way to do it, without writing code especially to handle it.
write.table(dd, "data.csv", append=TRUE, col.names=T)
See ?file.exists.
write.table(dd, "data.csv", append=TRUE, col.names=!file.exists("data.csv"))
Thus column names are written only when you are not appending to a file that already exists.
You may or may not also see a problem with the row names being identical, as write.table does not allow identical row names when appending. You could give this a try. In the first write to file, try write.table with row.names = FALSE only. Then, starting from the second write to file, use both col.names = FALSE and row.names = FALSE
Here's the first write to file
> d1 <- data.frame(A = 1:5, B = 1:5) ## example data
> write.table(d1, "file.txt", row.names = FALSE)
We can check it with read.table("file.txt", header = TRUE). Then we can append the same data frame to that file with
> write.table(d1, "file.txt", row.names = FALSE,
col.names = FALSE, append = TRUE)
And again we can check it with read.table("file.txt", header = TRUE)
So, if you have a list of data frames, say dlst, your code chunk that appends the data frames together might look something like
> dlst <- rep(list(d1), 3) ## list of example data
> write.table(dlst[1], "file.txt", row.names = FALSE)
> invisible(lapply(dlst[-1], write.table, "file.txt", row.names = FALSE,
col.names = FALSE, append = TRUE))
But as #MrFlick suggests, it would be much better to append the data frames in R, and then send them to file once. This would eliminate many possible errors/problems that could occur while writing to file. If the data is in a list, that could be done with
> dc <- do.call(rbind, dlst)
> write.table(dc, "file.txt")
Try changing the column names of the data frame using names() command in R and replace with the same names as existing and then try the dbWriteTable command keeping row.names = False. The issue will get solved.
e.g.
if your data frame df1 has columns as obs, name, age then
names(df1) <- c('obs','name','age')
and then try
dbWriteTable(conn, 'table_name', df1, append = T, row.names = F)

Resources