merge different files into 1 text file in R

merge different files into 1 text file in R - r

I have two files with one being text, and the other being a data frame, now I just want to merge them into one as a text file. With linux, I can use:
cat file1 file2 > outputfile
I wonder if we can do the same thing with R?
file1
##TITLE=POOLED SAMPLES COLLECTED 05-06/03/2018
##JCAMP-DX=4.24
##DATA TYPE=LINK
#ORIGIN Bentley_FTS SomaCount_FCM 82048
##OWNER=Bentley Instruments Inc
##DATE=2018-03-09
##TIME=15:34:48
##BLOCKS=110
##LAB1=Auto Generated
##LAB2=
##CHANNELNAMES=8
file 2:
649.025085449219 0.063037105 0.021338785 -0.00053594 0.008937807 0.03266982
667.675231457819 0.028557044 0.005877694 -0.015043681 0.014945094 0.051547796
686.325377466418 0.021499421 0.017125281 0.043007832 0.04132269 0.027496092
704.975523475018 0.006128653 -0.014599532 -0.000335723 0.020189898 0.024547976
723.625669483618 0.018550962 0.018567896 0.014100821 0.013067127 0.027075281
742.275815492218 0.030145327 0.039745297 0.050556265 0.056621946 0.058416516
760.925961500818 0.040279277 0.01392867 -0.00143011 0.015103153 0.03580305
779.576107509418 0.031955898 0.013671243 0.000861743 0.000641993 0.001747168
Thanks alot
Phuong

We can use file.append:
file.append("fileMerged.txt", "file1.txt")
file.append("fileMerged.txt", "file2.txt")
Or if files are already imported into R, then write with append:
#import to R
f1 <- readLines("file1.txt")
f2 <- readLines("file2.txt")
# output with append
write(f1, "fileMerged.txt")
write(f2, "fileMerged.txt", append = TRUE)

Related

Read csv files in a loop in R

I am trying to read CSV files that are in a folder on my computer using R. This is the code I am using:
First, I create a list with the names of the files using the pattern CSV. The elements of the list are in format chr.
Second, I loop over the files and I create a data frame in R for each of the files.
files = list.files(pattern="*.csv")
> dput(files)
c("USC00020098.csv", "USC00020104.csv", "USC00020170.csv", "USC00020307.csv",
"USC00020406.csv", "USC00020482.csv", "USC00020487.csv", "USC00020490.csv",
"USC00020492.csv", "USC00020494.csv", "USC00020625.csv", "USC00020632.csv",
"USC00020670.csv", "USC00020675.csv", "USC00020678.csv", "USC00020758.csv",
"USC00020808.csv", "USC00020810.csv", "USC00021161.csv", "USC00021193.csv",
"USC00021419.csv", "USC00021614.csv", "USC00021654.csv", "USC00021749.csv",
"USC00022193.csv", "USC00022705.csv", "USC00022927.csv", "USC00023082.csv",
"USC00023185.csv", "USC00023190.csv", "USC00023448.csv", "USC00023498.csv",
"USC00023500.csv", "USC00023501.csv", "USC00023505.csv", "USC00023573.csv",
"USC00023621.csv", "USC00023643.csv", "USC00023828.csv", "USC00023926.csv",
"USC00024069.csv", "USC00024182.csv", "USC00024345.csv", "USC00024391.csv",
"USC00024453.csv", "USC00024508.csv", "USC00025312.csv", "USC00025467.csv",
"USC00025512.csv", "USC00025560.csv", "USC00025635.csv", "USC00025700.csv",
"USC00025765.csv", "USC00025780.csv", "USC00025825.csv", "USC00026037.csv",
"USC00026244.csv", "USC00026246.csv", "USC00026315.csv", "USC00026320.csv",
"USC00026321.csv", "USC00026424.csv", "USC00026476.csv", "USC00026571.csv",
"USC00026603.csv", "USC00026653.csv", "USC00026796.csv", "USC00026840.csv",
"USC00027081.csv", "USC00027131.csv", "USC00027143.csv", "USC00027281.csv",
"USC00027466.csv", "USC00027661.csv", "USC00027708.csv", "USC00027716.csv",
"USC00027720.csv", "USC00027741.csv", "USC00027876.csv", "USC00028112.csv",
"USC00028214.csv", "USC00028273.csv", "USC00028326.csv", "USC00028489.csv",
"USC00028494.csv", "USC00028499.csv", "USC00028500.csv", "USC00028647.csv",
"USC00028649.csv", "USC00028650.csv", "USC00028653.csv", "USC00028904.csv",
"USC00028940.csv", "USC00029015.csv", "USC00029158.csv", "USC00029271.csv",
"USC00029367.csv", "USC00029534.csv", "USC00029622.csv", "USC00029626.csv",
"USW00003192.csv", "USW00023183.csv", "USW00023184.csv", "USW00053019.csv",
"USW00053156.csv", "USW00053160.csv", "USW00093139.csv", "USW00093140.csv"
)
for(i in seq_along(files)){
name <- files[[i]]
y <- read.csv(file=name,header=TRUE)
assign(name,y)
}
However, I am getting the following error
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
first five rows are empty: giving up
When I run for example: t <- read.csv(files[[94]],header=TRUE,sep=",") , it works properly.
Any idea why my loop is not working?

efficiently read in fasta file and calculate nucleotide frequencies in R

How can I read in a fasta file (~4 Gb) and calculate nucleotide frequencies in a window of 4 bps in length?
it takes too long to read in the fasta file using
library(ShortRead)
readFasta('myfile.fa')
I have tried to index it using (and there are many of them)
library(Rsamtools)
indexFa('myfile.fa')
fa = FaFile('myfile.fa')
however I do not know how to access the file in this format

I would guess that 'slow' to read in a file that size would be a minute; longer than that and something other than software is the problem. Maybe it's appropriate to ask where your file comes from, your operating system, and whether you have manipulated the files (e.g., trying to open them in a text editor) before processing.
If 'too slow' is because you are running out of memory, then reading in chunks might help. With Rsamtools
fa = "my.fasta"
## indexFa(fa) if the index does not already exist
idx = scanFaIndex(fa)
create chunks of index, e.g., into n=10 chunks
chunks = snow::splitIndices(length(idx), 10)
and then process the file
res = lapply(chunks, function(chunk, fa, idx) {
dna = scanFa(fa, idx[chunk])
## ...
}, fa, idx)
Use do.call(c, res) or similar to concatenate the final result, or perhaps use a for loop if you're accumulating a single value. Indexing the fasta file is via a call to the samtools library; using samtools on the command line is also an option, on non-Windows.
An alternative is to use Biostrings::fasta.index() to index the file, then chunk through with that
idx = fasta.index(fa, seqtype="DNA")
chunks = snow::splitIndices(nrow(fai), 10)
res = lapply(chunks, function(chunk) {
dna = readDNAStringSet(idx[chunk, ])
## ...
}, idx)
If each record consists of a single line of DNA sequence, then reading the records in to R, in (even-numbered) chunks via readLines() and processing from there is relatively easy
con = file(fa)
open(fa)
chunkSize = 10000000
while (TRUE) {
lines = readLines(fa, chunkSize)
if (length(lines) == 0)
break
dna = DNAStringSet(lines[c(FALSE, TRUE)])
## ...
}
close(fa)

Load the Biostrings Package and then use the readDNAStringSet() method
From example("readDNAStringSet"), slightly modified:
library(Biostrings)
# example("readDNAStringSet") #optional
filepath1 <- system.file("extdata", "someORF.fa", package="Biostrings")
head(fasta.seqlengths(filepath1, seqtype="DNA")) #
x1 <- readDNAStringSet(filepath1)
head(x1)

For Loop in R, all in 1 command

I created this random time series:
MM=1584
Z0<-rnorm(MM,8,1.0)#;ts.plot(Z0)
s_1=1.50; p_1=121; p_2=240
s_2=1.25; p_3=361; p_4=480
s_3=1.10; p_5=601; p_6=720
s_4=1.50; p_7=960; p_8=1020
s_5=1.25; p_9=1140; p_10=1320
s_6=1.50; p_11=1369; p_12=1440
a=(Z0[1:p_1-1])
b=(s_1+Z0[p_1:p_2])
c=(Z0[(p_2+1):(p_3-1)])
d=(s_2+Z0[p_3:p_4])
e=(Z0[(p_4+1):(p_5-1)])
f=(s_2+Z0[p_5:p_6])
g=(Z0[(p_6+1):(p_7-1)])
h=(s_3+Z0[p_7:p_8])
i=(Z0[(p_8+1):(p_9-1)])
l=(s_4+Z0[p_9:p_10])
m=(Z0[(p_10+1):(p_11-1)])
n=(s_5+Z0[p_11:p_12])
o=Z0[(p_12+1):MM]
Z=c(a,b,c,d,e,f,g,h,i,l,m,n,o);ts.plot(Z)
abline(v=p_1,col="red");abline(v=p_2,col="red");abline(v=p_3,col="red")
abline(v=p_4,col="red");abline(v=p_5,col="red");abline(v=p_6,col="red")
abline(v=p_7,col="red");abline(v=p_8,col="red");abline(v=p_9,col="red")
abline(v=p_10,col="red");abline(v=p_11,col="red");abline(v=p_12,col="red")
Zm=as.data.frame(Z)
write.csv2(Zm, file="C:/Users/Luca/Dekstop/Zm/Zm1.csv")
I would like to repeat these commands to create 100 series and to save these with write.cs2(...Zm"...".csv).
I don't want to change the file names and repeat the commands all manually.
I searched something useful in other questions but I didn't find it.
The loop has to change only the name of data frame (Zm) and the file names, for each loop.
I'm looking to repeat 100 times the creation of Z0 (Z01, Z02, Z03 ... Z0100) , then Z (Z1, Z2, ... Z100) so Zm (Zm1, Zm2, Zm3... Zm100) and save them in the folder with new file names (folder/Zm1, Zm2, Zm3 etc...) all in 1 command with a loop.

I'm not sure why you want to change the name of the data frames, but dynamically changing the name of the file is straightforward.
for (i in 1:100) { ... write.csv2(Zm, file=paste("C:/Users/Luca/Dekstop/Zm/Zm", i, ".csv", sep = "")) }
If you want to keep the created data frames, why not just simply use a list?

incomplete list of csv file imported in R

I need to import a list of 36 csv files, but after running the code I get only 26 of them. Probably, 10 files have format problems. Is there a way in R to detect the 10 files that cannot be imported?

If you the file names in a list, you can use the following code:
all <- c("16048.txt", "16062.txt", "16066.txt", "16093.txt", "16095.txt", "16122.txt", "16241.txt", "16360.txt", "16380.txt", "16389.txt", "16510.txt", "16511.txt", "16701.txt", "16729.txt", "16735.txt", "16737.txt", "16761.txt", "16816.txt", "16867.txt", "16876.txt", "16880.txt", "16883.txt", "16884.txt", "16885.txt", "16893.txt", "16904.txt", "16906.txt", "16908.txt", "16929.txt", "16931.txt", "16938.txt", "16943.txt", "16959.txt", "16967.txt", "16968.txt", "16969.txt")
imp <- c("16761.txt", "16959.txt", "16884.txt", "16093.txt", "16883.txt", "16122.txt", "16906.txt", "16737.txt", "16968.txt", "16095.txt", "16062.txt", "16816.txt", "16360.txt", "16893.txt", "16885.txt", "16938.txt", "16048.txt", "16931.txt", "16876.txt", "16511.txt", "16969.txt", "16241.txt", "16967.txt", "16701.txt", "16380.txt", "16510.txt")
Where all is the list of filenames you need and imp is the imperfect result you got. You can get a list of the missing files with:
missing <- all[!all %in% imp]

mv a file from system command using r

I have a directory which has 5 files named like this
A.abcd (1).txt
B.abcd (1).txt
C.abcd (1).txt
D.abcd (1).txt
E.abcd (1).txt
I want to change the names of the file so that they should become like this :
A.defg.txt
B.defg.txt
C.defg.txt
D.defg.txt
E.defg.txt
In short I want to change abcd (1) to defg in the files.
I tried to run the system command from the R console.
system("mv A.abcd (1).txt A.defg.txt")
But I have to do this one by one.
Is there any way I can do it in one shot through R ??

You can use file.rename() to rename files. And use sub with a regular expression for the text manipulation.
x <- c("A.abcd (1).txt", "B.abcd (1).txt", "C.abcd (1).txt", "D.abcd (1).txt", "E.abcd (1).txt")
newx <- sub("abcd \\(1\\)", "defg", x)
newx
[1] "A.defg.txt" "B.defg.txt" "C.defg.txt" "D.defg.txt" "E.defg.txt"
## The following is untested
file.rename(x, newx)
See ?files for help on this and the other base R file manipulation functions.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

merge different files into 1 text file in R - r

Related

Read csv files in a loop in R

efficiently read in fasta file and calculate nucleotide frequencies in R

For Loop in R, all in 1 command

incomplete list of csv file imported in R

mv a file from system command using r

Categories

Resources