I am reading file in R:
data <- read.delim ("file.fas", header=TRUE, sep="\t" )
However, after I have done some manipulations to the data, the output format is not same. It now contains commas "," like this all over.
write.table(x= data, file = "file_1.fas")
How can I avoid this? Maybe I should use some different function to write a file?
I export my CSV file with python, numbers are wrapped as ="10000000000" in cells, for example:
name,price
"something expensive",="10000000000",
in order to display the number correctly, I prefer to wrap the big number or string of numbers(so someone could open it directly without reformating the column), like order ID into this format.
It's correct with excel or number, but when I import it with R by using read.csv, cells' values show as =10000000000.
Is there any solution to this?
Thank you
how about:
yourcsv <- read.csv("yourcsv.csv")
yourcsv <- gsub("=", "", yourcsv$price)
Also, in my experience read_csv() from the tidyverse library reads data in much faster than read.csv() and I think also has more logic built into it for nonideal cases encountered, so maybe it's worth trying.
I am trying to import data from in xls format in R, but it reads the header incorrectly, instead of
X1
R interprets the data as
`X1 `
that makes writing complicated R syntax impossible.
How this issue can be resolved ?
One can skip the header record and give your own column names with any number of R packages that read excel data. Here is an example with readxl::read_excel().
library(readxl)
data <- read_excel("./data/anExcelWorksheet.xlsx",
col_names=FALSE,
skip=1)
I just asked a few days ago, how to set a specific column type when using readr package. big integers when reading file with readr in r
Is there a way to define the column names by wildcard? In my case, I have sometimes several columns starting with Intensity and an appendix depending on the experiment. It is hard to use read_tsv in a function if you not know upfront which project names where used.
So something like col_types = cols('Intensity.*' = col_double()) would be awesome.
Anyone an idea how to get this feature?
EDIT:
Maybe something like read the first 2 lines, grep 'Intensity' in the names and then somehow create this parameter like cols(Intensity=col_double(), 'Intensity pg'=col_double(), 'Intensity hs'=col_double()).
But I have no idea how to create this parameter value on the fly.
I add the answer which solved my question, based on the comment of lukeA...
read_MQtsv <- function(file) {
require('readr')
jnk <- read.delim(file, nrows=1, check.names=FALSE)
matches <- grep('Intensity|LFQ|iBAQ', names(jnk), value=TRUE)
read_tsv(file,
col_types=setNames(
rep(list(col_double()), length(matches)),
matches))
}
So I adapted the single line from the comment to a new function which I would use when reading my special files which are produced by a program called MaxQuant.
I have used the following code to read multiple .csv files in R:
Assembly<-t(read.table("E:\\test\\exp1.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Assembly","f"))[1:4416,"Assembly",drop=FALSE])
Top1<-t(read.table("E:\\test\\exp2.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top1","f"))[1:4416,"Top1",drop=FALSE])
Top3<-t(read.table("E:\\test\\exp3.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top3","f"))[1:4416,"Top3",drop=FALSE])
Top11<-t(read.table("E:\\test\\exp4.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top11","f"))[1:4416,"Top11",drop=FALSE])
Assembly1<-t(read.table("E:\\test\\exp5.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Assembly1","f"))[1:4416,"Assembly1",drop=FALSE])
Area<-t(read.table("E:\\test\\exp6.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Area","f"))[1:4416,"Area",drop=FALSE])
data<-rbind(Assembly,Top1,Top3,Top11,Assembly1,Area)
So the entire data is in the folder "test" in E drive. Is there a simpler way in R to read multiple .csv data with a couple of lines of code or some sort of function call to substitute what has been made above?
(Untested code; no working example available) Try: Use the list.files function to generate the correct names and then use colClasses as argument to read.csv to throw away the first 4 columns (and since that vector is recycled you will alss throw away the 6th column):
lapply(list.files("E:\\test\\", patt="^exp[1-6]"), read.csv,
colClasses=c(rep("NULL", 4), "numeric"), nrows= 4416)
If you want this to be returned as a dataframe, then wrap data.frame around it.