Extract the line with the same content from two files in R

Extract the line with the same content from two files in R - r

I would like to use readLines function to read the text file line by line
69C_t.txt
Also, I would like to write a simple for loop with condition to extract the identical lines in two files.
69C_t <- "69C_t.txt"
conn <- file(69C_t,open="r")
t <-readLines(conn)
69C_b <- "69C_b.txt"
conn <- file(69C_b,open="r")
b <-readLines(conn)
for (i in 1:length(t)){
for (j in 1:length(b)){
if (i==j)
write(t[i], file = "overlap.txt")
}
}
close(tumor)
However, it seems only print out the first line.
Can someone please have a check ?

A faster approach would be, instead of the loop
writeLines(t[t %in% b],"overlap.txt")

How about adding append in the write function:
write(t[i], file = "overlap.txt", append = TRUE)

Related

how to save RDS file with different title in r?

I have want to save the cars data in a loop with different title name.
for (i in 1:10)
{ outfilename=paste0(i ,".RDS")
saveRDS(cars, file = "/home/outfilename.RDS")}
however, it looks like the out filename still did not work

You probably meant something like this:
for (i in 1:10){
outfilename <- paste0("/home/", i ,".RDS")
saveRDS(cars, file = outfilename)
}

Using lapply
lapply(1:10, function(i)
saveRDS(cars, file = file.path('/home', paste0(i, ".RDS"))))

Interchangeable simulating and writing data to a file

I'm experimenting with R and I try to interchangeably simulate and write data to a file. I tried out many variants for example:
connection<-file("file.txt", open="w")
for (i in 1:2){
X<-runif(3,0,1)
writeLines(as.character(X), con=connection, sep="\n")
}
close(connection)
But what I get is
0.442033957922831
0.0713443560525775
0.950616024667397
0.0807233764789999
0.186026858631521
0.658676357707009
instead of something like
0.442033957922831 0.0713443560525775 0.950616024667397
0.0807233764789999 0.186026858631521 0.658676357707009
Could you explain me what I'm doing wrong?

We can paste the elements in 'X' to a single string and then use sep='\n', otherwise after each element, it is jumping to nextline
connection<-file("file.txt", open="w")
for (i in 1:2){
X<-runif(3,0,1)
writeLines(paste(X, collapse=" "), con=connection, sep="\n")
}
close(connection)
-output

Instead of writing line by line in a for loop we can create the string once and write it in the text file in one-go.
We can use replicate to repeat the runif code n times, paste the numbers row-wise, and paste them again collapsing with a new line character.
temp <- paste0(apply(t(replicate(2, runif(3,0,1))), 1, paste, collapse = ' '),
collapse = '\n')
connection <- file("file.txt")
writeLines(temp, connection)
close(connection)
where temp gives us a string of length one which looks like this :
temp
#[1] "0.406911700032651 0.416268902365118 0.698520892066881\n0.96398281189613 0.834513065638021 0.655840792460367"
which looks in text file as :
cat(temp)
#0.406911700032651 0.416268902365118 0.698520892066881
#0.96398281189613 0.834513065638021 0.655840792460367

For loop with file names in R

I have a list of files like:
nE_pT_sbj01_e2_2.csv,
nE_pT_sbj02_e2_2.csv,
nE_pT_sbj04_e2_2.csv,
nE_pT_sbj05_e2_2.csv,
nE_pT_sbj09_e2_2.csv,
nE_pT_sbj10_e2_2.csv
As you can see, the name of the files is the same with the exception of 'sbj' (the number of the subject) which is not consecutive.
I need to run a for loop, but I would like to retain the original number of the subject. How to do this?
I assume I need to replace length(file) with something that keeps the original number of the subject, but not sure how to do it.
setwd("/path")
file = list.files(pattern="\\.csv$")
for(i in 1:length(file)){
data=read.table(file[i],header=TRUE,sep=",",row.names=NULL)
source("functionE.R")
Output = paste("e_sbj", i, "_e2.Rdata")
save.image(Output)
}
The code above gives me as output:
e_sbj1_e2.Rdata,e_sbj2_e2.Rdata,e_sbj3_e2.Rdata,
e_sbj4_e2.Rdata,e_sbj5_e2.Rdata,e_sbj6_e2.Rdata.
Instead, I would like to obtain:
e_sbj01_e2.Rdata,e_sbj02_e2.Rdata,e_sbj04_e2.Rdata,
e_sbj05_e2.Rdata,e_sbj09_e2.Rdata,e_sbj10_e2.Rdata.

Drop the extension "csv", then add "Rdata", and use filenames in the loop, for example:
myFiles <- list.files(pattern = "\\.csv$")
for(i in myFiles){
myDf <- read.csv(i)
outputFile <- paste0(tools::file_path_sans_ext(i), ".Rdata")
outputFile <- gsub("nE_pT_", "e_", outputFile, fixed = TRUE)
save(myDf, file = outputFile)
}
Note: I changed your variable names, try to avoid using function names as a variable name.

If you use regular expressions and sprintf (or paste0), you can do it easily without a loop:
fls <- c('nE_pT_sbj01_e2_2.csv', 'nE_pT_sbj02_e2_2.csv', 'nE_pT_sbj04_e2_2.csv', 'nE_pT_sbj05_e2_2.csv', 'nE_pT_sbj09_e2_2.csv', 'nE_pT_sbj10_e2_2.csv')
sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
[1] "e_sbj01_e2.Rdata" "e_sbj02_e2.Rdata" "e_sbj04_e2.Rdata" "e_sbj05_e2.Rdata" "e_sbj09_e2.Rdata" "e_sbj10_e2.Rdata"
You can easily feed the vector to a function (if possible) or feed the function to the vector with sapply or lapply
fls_new <- sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
res <- lapply(fls_new,function(x) yourfunction(x))

If I understood correctly, you only change extension from .csv to .Rdata, remove last "_2" and change prefix from "nE_pT" to "e". If yes, this should work:
Output = sub("_2.csv", ".Rdata", sub("nE_pT, "e", file[i]))

Specifying consecutive file names and assigning consecutive vectors with counter variable in for loops

I am trying to analyze 10 sets of data, for which I have to import the data, remove some values and plot histograms. I could do it individually but can naturally save a lot of time with a for loop. I know this code is not correct, but I have no idea of how to specify the name for the input files and how to name each iterated variable in R.
par(mfrow = c(10,1))
for (i in 1:10)
{
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
What I want to do is to have the counter number in every "i" in my code. Am I just approaching this the wrong way in R? I have read about the assign and paste functions, but honestly do not understand how I can apply them properly in this particular problem.

you can do if in several ways:
Use list.files() to get all files given directory. You can use regular expression as well. See here
If the names are consecutive, then you can use
for (i in 1:10)
{
filename <- sprintf("freqspeci.frq_%s",i)
freqi <- read.delim(filename, sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
Use also can use paste() to create file name.
paste("filename", 1:10, sep='_')

you could just save all your datafiles into an otherwise empty Folder. Then get the filenames like:
filenames <- dir()
for (i in 1:length(filenames)){
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
# and here whatever else you want to do on These files
}

Looping through files in R and applying a function

I'm not a very experienced R user. I need to loop through a folder of csv files and apply a function to each one. Then I would like to take the value I get for each one and have R dump them into a new column called "stratindex", which will be in one new csv file.
Here's the function applied to a single file
ctd=read.csv(file.choose(), header=T)
stratindex=function(x){
x=ctd$Density..sigma.t..kg.m.3..
(x[30]-x[1])/29
}
Then I can spit out one value with
stratindex(Density..sigma.t..kg.m.3..)
I tried formatting another file loop someone made on this board. That link is here:
Looping through files in R
Here's my go at putting it together
out.file <- 'strat.csv'
for (i in list.files()) {
tmp.file <- read.table(i, header=TRUE)
tmp.strat <- function(x)
x=tmp.file(Density..sigma.t..kg.m.3..)
(x[30]-x[1])/29
write(paste0(i, "," tmp.strat), out.file, append=TRUE)
}
What have I done wrong/what is a better approach?

It's easier if you read the file in the function
stratindex <- function(file){
ctd <- read.csv(file)
x <- ctd$Density..sigma.t..kg.m.3..
(x[30] - x[1]) / 29
}
Then apply the function to a vector of filenames
the.files <- list.files()
index <- sapply(the.files, stratindex)
output <- data.frame(File = the.files, StratIndex = index)
write.csv(output)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract the line with the same content from two files in R - r

A faster approach would be, instead of the loop writeLines(t[t %in% b],"overlap.txt")

How about adding append in the write function: write(t[i], file = "overlap.txt", append = TRUE)

Related

how to save RDS file with different title in r?

Interchangeable simulating and writing data to a file

For loop with file names in R

Specifying consecutive file names and assigning consecutive vectors with counter variable in for loops

Looping through files in R and applying a function

Categories

Resources