I have script as below:
setwd ("I:/prep/Coord/RData/test")
#load .csv files
a.files <- grep("^Whirr", dir(), value=TRUE) #pattern matching
b.files <- paste0("Files_", a.files)
for(i in length(a.files)){
a <- read.table(a.files[i], header=T, sep=",", row.names=1) #read files start with Whirr_
b <- read.table(b.files[i], header=T, sep=",", row.names=1) #read files start with Files_
a
b
cr <- as.matrix(a) %*% as.matrix(t(a)
cr
diag(cr)<-0
cr
#write to file
write.csv(cr, paste0("CR_", a.files[i], ".csv"))
}
Basically, what I want to do is to compare two files which have similar filename at the end of file name, and do the calculation, and write the result to file.
When I tried to print a.files and b.files, the output seems ok for me. The output as below:
> a.files <- grep("^Whirr", dir(), value=TRUE) #pattern matching
> b.files <- paste0("Files_", a.files, sep="")
Error: could not find function "paste0"
> a.files
[1] "Whirr_127.csv" "Whirr_128.csv"
> b.files
[1] "Files_ Whirr_127.csv" "Files_ Whirr_128.csv"
>
I tried to feed the script with multiple files, but I got an error msg as below:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file 'Files_ Whirr_128.csv': No such file or directory
So, I tried to use file.choose, but it also doesn't work for me.
Appreciate help from the expert
Change the line:
b.files <- paste0("Files_", a.files)
to:
b.files <- paste("Files_", a.files, sep="")
You are using a version of R that does not have paste0 (I see that code was given to you in an earlier answer). This means you were keeping an earlier version of b.files, perhaps one that had been constructed using paste.
One important lesson about this is that whenever you get an error message about a line, such as Error: could not find function "paste0", that means the line did not happen! You have to fix that error before you paste the code, or tell us about the error when you do- otherwise we assume the b.files <- paste0("Files_", a.files) line works.
Related
I have a directory with 144 files (.bed).
I would like to analyze them with a function from a R package (ChAseR). To avoid a for loop, I decided to use a lapply function.
# Define path
setwd("/home/works/Project/")
netpath="Table"
bedpath="Data/Naive_CD4_positive_T_Cells_BP/H3K27ac/bed_files"
# PCHiC contact network data loading and preprocessing
pchic=read.table(paste(netpath, "merged_samples_12Apr2015_full.txt", sep="/"), sep="\t", skip=4, header=T)
pchic[,1]=paste('chr', pchic[,1], sep='')
pchic[,6]=paste('chr', pchic[,6], sep='')
Tcellnaive <- pchic[which(pchic$Naive_CD4>=5),c(1,2,3,6,7,8)]
tnet=chaser::make_chromnet(Tcellnaive)
# Load histone modifications data from Chen et al. 2016 (doi: 10.1016/j.cell.2016.10.026)
files <- list.files(path=bedpath, pattern="*.bed", full.names=TRUE, recursive=FALSE)
tnet2 <- lapply(files, chaser::load_features(tnet, files, type="macs2", missingv=0, featnames="H3K27ac"))
I obtained this message:
reading from file Data/Naive_CD4_positive_T_Cells_BP/H3K27ac/bed_files/S01342H2.H3K27ac.ppqt_macs2_wp10.20150819.bed...
reading from file Data/Naive_CD4_positive_T_Cells_BP/H3K27ac/bed_files/S0137XH3.H3K27ac.ppqt_macs2_wp10.20150819.bed...
reading from file Data/Naive_CD4_positive_T_Cells_BP/H3K27ac/bed_files/S013CNH3.H3K27ac.ppqt_macs2_wp10.20150819.bed...
reading from file Data/Naive_CD4_positive_T_Cells_BP/H3K27ac/bed_files/S013DLH2.H3K27ac.ppqt_macs2_wp10.20150819.bed...
Error in file(file, "rt") : invalid 'description' argument
How to resolve this error? Maybe a problem with the path or the function?
Fresh lettuce here so don't laugh at my questions:
Say I have a folder containing 40 individual .txt files and I would like to convert them into .csv format.
To have an end product : a new folder with 40 individual .csv files.
I have seen similar question posted and their code, however the code did run but the .csv files is nothing like the orginal .txt file: all the data are scrambled.
Since I want to keep the header, and I want to read all the data/rows in the .txt file. I made some cosmetic changes to the code, still didnt run and returned a warning "Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C:/Users/mli/Desktop/All TNFa controls in the training patients ctrl_S1018263__3S_TNFaCHx_333nM+0-1ugml_none.txt': Invalid argument"
My code as below:
directory <- "C:/Users/mli/Desktop/All TNFa controls in the training patients"
ndirectory <- "C:/Users/mli/Desktop/All TNFa controls in the training patients/CSV"
file_name <- list.files(directory, pattern = ".txt")
files.to.read <- paste(directory, file_name, sep="\t")
files.to.write <- paste(ndirectory, paste0(sub(".txt","", file_name),".csv"), sep=",")
for (i in 1:length(files.to.read)) {
temp <- (read.csv(files.to.read[i], sep="\t", header = T))
write.csv(temp, file = files.to.write[i])
}
When you paste your path with the name of your file at line 4 and 5, use /, to obtain a new path in a character string. The sep value is what the function will put when it will paste together multiple strings.
> paste('hello','world',sep=" ")
[1] "hello world"
> paste('hello','world',sep="_")
[1] "hello_world"
This is different of the sep value you need in read.csv that define the character between each column of you csv file.
I am trying to read a zip file that has 1 csv file in it.
It works great when I know the csv file name but when I just try to extract the zip file alone, it doesn't work.
Here is an example of where it does works:
zip_file <- abc.zip
csv_file <- abcde.csv
data <- read.table(unz(zip_file,csv_file), skip = 10, header=T, quote="\"", sep=",")
Here is where it doesn't work when I try to only extract the zip file:
read.table(zip_file, skip = 10, nrows=10, header=T, quote="\"", sep=",")
An error comes up saying:
Error in read.table(attachment_file, skip = 10, nrows = 10, header = T, :
no lines available in input
In addition: Warning messages:
1: In readLines(file, skip) : line 2 appears to contain an embedded nul
2: In readLines(file, skip) : line 3 appears to contain an embedded nul
3: In readLines(file, skip) :
incomplete final line found on
'C:\Users\nickk\AppData\Local\Temp\RtmpIrqdl8\file2c9860d62381'
So this shows there is definitely a csv file present because it works when I include the csv file name but when I just do the zip file, then the error comes up.
For context, the reason why I do not want to include the csv file name is because I need to read this zip file daily and the name of the csv file changes with no pattern everytime. So my goal is to only read the zip file to bypass this.
Thanks!
Why don't you try using unzip to find the filename inside the ZIP archive:
zipdf <- unzip(zip_file, list = TRUE)
# the following line assuming the archive has only a single file
csv_file <- zipdf$Name[0]
your_df <- read.table(csv_file, skip = 10, nrows=10, header=T, quote="\"", sep=",")
If you are open to data.table, you can try:
data.table::fread(paste('unzip -cq', zip_file), skip = 10)
-c: uncompress to standout;
-q: suppress messages printed by unzip;
I'm a bit stuck with this code... The purpose is to read only text files from a folder with few different kind of files, take a column for each one and create a data frame with every extracted column (cbind.fill is a hand-made function that add a new column and fill the "empty" spaces with NA values). Here is the code:
setwd("...folderOfInterest/")
genes_data <- data.frame()
for(i in list.files(pattern = "^GO_.*txt", full.names = TRUE)){
print(i) #this works perfectly, it only prints desired files...
q <- read.table(i, header = TRUE, sep = "\t", quote = NULL)
genes_data <- cbind.fill(genes_data, q[,2])
}
As #Adam B suggests, here is the print(i) output and a screenshot of the folder (folder_screenshot):
[1] "./GO_ALPHA_AMINO_ACID_CATABOLIC_PROCESS.xls"
[1] "./GO_ALPHA_AMINO_ACID_METABOLIC_PROCESS.xls"
[1] "./GO_ALPHA_BETA_T_CELL_ACTIVATION.xls"
[1] "./GO_AMINO_ACID_BETAINE_METABOLIC_PROCESS.xls"
[1] "./GO_AMINO_ACID_IMPORT.xls"
[1] "./GO_AMINO_ACID_TRANSMEMBRANE_TRANSPORT.xls"
[1] "./GO_AMINO_ACID_TRANSPORT.xls"
[1] "./GO_AMINOGLYCAN_BIOSYNTHETIC_PROCESS.xls"
[1] "./GO_ANGIOGENESIS.xls"
[1] "./GO_ANION_TRANSPORT.xls"
[1] "./GO_ANTIGEN_PROCESSING_AND_PRESENTATION.xls"
[1] "./GO_ANTIGEN_PROCESSING_AND_PRESENTATION_OF_ENDOGENOUS_ANTIGEN.xls"
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file './GO_ANTIGEN_PROCESSING_AND_PRESENTATION_OF_ENDOGENOUS_ANTIGEN.xls': No such file or directory
(note: the files' extension is .xls, but really they are .txt files)
It propmts this message:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file './GO_ANTIGEN_PROCESSING_AND_PRESENTATION.txt': No such file or directory
Also running only q <- read.table(i, header = TRUE, sep = "\t", quote = NULL) appears this error message.
I think I'm in the correct folder (because print(i) works good), I've also changed full.names option and set list.files as a variable out the loop... but nothins seems to work. Please, if anybody has an idea it'll be welcome!
I've tried it on randomly generated files and it works. You probably do not need to cd into the directory with the data, just give the list.files a dir argument with the path to your data directory.
GOfls <- list.files("indata", pattern = "^GO_.*\\.txt", full.names = TRUE)
head(GOfls)
[1] "indata/GO_amswylfbgp.txt" "indata/GO_amswylfbgptxt" "indata/GO_apqqqktvir.txt"
[4] "indata/GO_arwudmbzsr.txt" "indata/GO_autljyljgn.txt" "indata/GO_beeqcmnayk.txt"
# lapply -> do.call for reading and binding the data is better approach
gene_data <- do.call('cbind', lapply(GOfls, function(path) read.delim(path)[,2]))
# have a look at the data
dim(gene_data)
[1] 100 100
I have tried to reproduce your problem this way (it's optional text):
dir.create("indata")
fls <- lapply(1:100, function(i) data.frame(matrix(rnorm(1000), ncol = 10)))
names(fls) <- replicate(100, paste0("./indata/", "GO_",
paste0(sample(letters, 10, replace = T),
collapse = ""), ".txt"
)
)
lapply(names(fls), function(x) write.table(fls[[x]], x, quote = F, sep = "\t"))
head(dir("indata"))
[1] "GO_acebruujkw.pdf" "GO_amswylfbgp.txt" "GO_amswylfbgptxt" "GO_apqqqktvir.txt"
[5] "GO_arwudmbzsr.txt" "GO_autljyljgn.txt"
# I have added some renamed .txt files (.pdf, .tiff, .gel) to the indata
rm(list = ls())
That's solved! It's a bit strange but copying the folder of interest into the desktop the code seems to work again.
A mate and I saw that hard disk's activity was collapsed, so we thought that maybe there could be a problem in the process of reading... so copying the folder was the (simple) solution!
Nevertheless, if anybody has an idea that explains this strange situation I'm sure it'll be useful! Thanks a lot!
EDIT
I've done some tests and maybe the problem is the name of the folder path, which it'd too long and crashes the loop.
I think it's because you're searching for .xls files, but then trying to open it at as a .txt file
In excel try saving the files as comma or tab delimited text files.
If you want to open excel files directly they have a few packages that can do that. Try readxl.
I am writing an R function that reads a directory full of files and reports the number of completely observed cases in each data file. The function returns a data frame where the first column is the name of the file and the second column is the number of complete cases.
such as,
id nobs
1 108
2 345
...
etc
Here is the function I wrote:
complete <- function(directory, id = 1:332) {
for(i in 1:332) {
path<-paste(directory,"/",id,".csv",sep="")
mydata<-read.csv(path)
#nobs<-nrow(na.omit(mydata))
nobs<-sum(complete.cases(mydata))
i<-i+1
}
completedata<-c(id,nobs)
}
I execute the function:
complete("specdata",id=1:332)
but I'm getting this error:
Error in file(file, "rt") : invalid 'description' argument
I also tried the traceback() function to debug my code and it gives this output:
traceback()
# 4: file(file, "rt") at #6
# 3: read.table(file = file, header = header, sep = sep, quote = quote,
# dec = dec, fill = fill, comment.char = comment.char, ...) at #6
# 2: read.csv(path) at #6
# 1: complete("specdata", id = 1:332)
It's hard to tell without a completely reproducible example, but I suspect your problem is this line:
path<-paste(directory,"/",id,".csv",sep="")
id here is a vector, so path becomes a vector of character strings, and when you call read.csv you're passing it all the paths at once instead of just one. Try changing the above line to
path<-paste(directory,"/",id[i],".csv",sep="")
and see if that works.
It seems you have a problem with your file path.
You are passing the full vector id =c(1:332) to the file path name.
If your files are named 1.csv, 2.csv, 3.csv, etc..
You can change this line:
path<-paste(directory,"/",id,".csv",sep="")
to
path<-paste(directory,"/",i,".csv",sep="")
and leave out or rework the id input of your function.
Instead of using a for to read the data in, you can try sapply. For example
mydata <- sapply(path, read.csv).
Since path is a vector, sapply will iterate the vector and apply read.csv to it. Therefore, there will be no need for the for loop and your code will be much cleaner.
From there you will have a matrix which each of your files and their respective information from which you can extract the observations.
To find the observations, you can do mydata[2,1][[1]]. Remember that the rows will be your factors and your columns will be your files.
I am working on the exact problem.. file names in the directory "specdata" are named with 001.csv and 002.csv.... 099.csv all the way to file 332.csv
however, when you are recalling id=1 then your file name becomes 1.csv which does not exist in the directory.
try using this function to get the path of each id file.
filepaths <- function (id){
allfiles = list.files(getwd())
file.path(getwd(), allfiles[id])
}
I met the same problem in this sentence:
Browse[2]> read.csv(list.files(".", "XCMS-annotated-diffreport--.*csv$"), row.names = 1)
Error in file(file, "rt") : invalid 'description' argument
then, I found there are two different csv files in the same path, like this:
Browse[2]> list.files(".", "XCMS-annotated-diffreport--.*csv$")
[1] "XCMS-annotated-diffreport--1-vs-2-Y.csv" "XCMS-annotated-diffreport--1-vs-2.csv"
When I deleted one file, it works again.
In my code the problem was that I´ve misstyped the name of the file. And the other file was´t in this directory. So check if all files are where they should be.
I had this problem because I was trying to run a for loop against the data frame and not a vector:
ids <- th[th$nobs > threshold,]
for(i in ids) {
this is what the variable "ids" looks like:
id nobs
2 2 1041
154 154 1095
248 248 1005
should have been:
ids <- th[th$nobs > threshold,]
for(i in ids$id) {
change object id to i - because you are in for loop with iteration object i
i.e path<-paste(directory,"/",id,".csv",sep="")
to
i.e path<-paste(directory,"/",i,".csv",sep="")