Read txt files and separate by the chr - r

I have a question on reading large txt files and separate it based on the character "TIME".
Each "TIME" represents the pressure of a spatial area at a particular point in time.
How should I write the readtext functions that recognize the "TIME" characters and split them ?

I would first create a folder so that I can save the new files in it. Also, I would put the original data file in this folder.
# setwd("....") # Set the working directory as the folder you just created.
I saved the data structure that you provided in "data.txt"
The following lines will split your data (which is in "data.txt" in my computer) into files that have consecutive names, such as "data1.txt", "data2.txt", and so on.
incon = file("data.txt", "r")
i = 0
while (TRUE) {
line = readLines(incon, n = 1)
if (length(line) == 0) {
break
}
if (regexpr("TIME:", line) > 0) {
if (exists("outcon")) close(outcon)
i = i + 1
outcon = file(paste("data", i, sep=""), "w")
writeLines(line, outcon)
} else {
writeLines(line, outcon)
}
}
close(outcon)

Related

R conditional loop, multiple lists

I need help stepping through a second loop in R when a test fails in my first loop. Here's the logic:
to start use config_list[1] from list
then download file path_list[1] from list
check if file passes test,
if so, download path_list[1 + 1] file from list and go back to step 3
if not, change config to next in list and go back to step 2 for failed file
Here's how far I've gotten:
path_list <- list("path1", "path2", "path3")
config_list <- list("a", "b", "c")
for (con in config_list) {
con[1] # set initial config
for (val in path_list) {
print(paste(val, "downloaded")) # download file
if (val == "path2"){ # check if file passes some test
con[1 + 1] # if above test fails change to con[1 + 1]
print(paste(val, "downloaded")) # download file again with new config ???
}
print(val)
}
}

readLines function not recognizing separating character "\t"

My input file contains many lines of tab-delineated information in a text file. Below would be a line from the text file:
100026 TGACTGCATGACGTACAC NM_006342.1 TACC3
My code is as follows:
constant_source <- 'constants.R'
source(constant_source)
source(classes_file)
processFile = function(filepath) {
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, sep="\t")
print(line)
if (length(line) == 0 ) {
break
}
}
close(con)
}
The output, however, is as follows:
100026\tTGACTGCATGACGTACAC\tNM_006342.1\tTACC3
Why is the readLines function not respecting the separation parameter? I have been toying with this for a while and am stuck. Sorry about this; I just started learning R today. If it makes a difference, I am using RStudio.

Using functions in R to copy the same file from multible folders and renaming the file based on folder structure

I am new to R, and have built a working series of loops that copies and renames a single file from multible subfolders.
Great it works!! ... but what is this question about then?
Well, I have learned enough of R to know that most users agree that one should avoid loops and use functions instead whenever possible. This is because functions are a lot faster than loops.
When I apply my loop to my real data, it will have to cycle through +10000 files, and I would therefore like to make it as fast as possible.
Can my loop be expressed by using functions instead?
Alternatively can the loop be optimized in some way?
Any examples or suggestions on how to achieve any of the above will be much appreciated.
Info on my file-structure:
F:/Study
1000
1001
Gait1
annotations.txt
Gait2
annotations.txt
1002
Gait1
annotations.txt
Gait2
annotations.txt
59000
59003
Gait1
annotations.txt
Gait2
annotations.txt
nb. the "Gait" folders contain many other files and directories, but I am only interested in the "annotations.txt" files.
My loop:
for( a in seq(1000, 99000, by = 1000) ) {
if (!dir.exists(paste0("F:/Study/", a))) {
next()
}
for ( b in seq(1, 200, by = 1) ) {
if (!dir.exists(paste0("F:/Study/", a,"/", a+b))) {
next()
}
for ( c in seq(1, 2, by = 1)) {
if (!dir.exists(paste0("F:/Study/", a,"/", a+b, "/", "Gait", c))) {
next()
}
file.copy(from = paste0("F:/Study/", a,"/", a+b, "/", "Gait", c,"/annotations.txt"), to = "F:/Study/Annotations", overwrite = TRUE)
setwd("F:/Study/Annotations")
file.rename( from = "annotations.txt", to = paste0(a+b, "_Gait", c, "_annotations.txt") )
}
}
}
Result is files in my Annotations folder called:
1001_Gait1_annotations
1001_Gait2_annotations
1002_Gait1_annotations
1002_Gait2_annotations
59003_Gait1_annotations
59003_Gait2_annotations
tl;dr Can the loop be expressed using functions? How?
You might try the following (I have assumed that your /Annotations directory already exists). Does that work for you?
#get all file names (full path) with the pattern "annotations" within all folders in directory
files <- list.files("F:/Study/", full.names = TRUE, recursive = TRUE, pattern = "annotations")
#extract the desired parts for the new filename with regex
#d+ stands for 1+x digits, .* for anything and $ for the end of line
#extract only the second capturing group (the parts within parenthesis) via \\2
filenames_new <- gsub("(.*/)(\\d+/Gait\\d+.*$)", "\\2", files)
filenames_new <- gsub("/", "_", filenames_new)
#create the new filepaths
files_new <- paste0("F:/Study/Annotations/", filenames_new)
#perform copy
file.copy(from = files, to = files_new)

R - load_all() and adding a new function file

I'm new to R and need to add a new function file to an existing package. For the programming and testing I used load_all() (from the devtools pkg) to have the original R files. I wrote my function and saved it in the same directory as the rest of the (original) R files.
But now when I do load_all(), this function is executed! I have no idea why. What am I doing wrong here?
library(devtools)
calc_curve<-function(){
CurvePars<-c(0,0)
doMIC<-readline("Do you want to calculate MIC? (y/n) \n")
if(doMIC=="y")
{
CAlb<-readline("Do you want to use C.Albicans/FLC standard curve? (y/n)\n")
#use the equation we calculated MIC=a*exp(b*Rad). V1=a, V2=b
if(CAlb=="y") CurvePars=c(7.17, -0.129)
else
{
exFile <- readline("Do you have a standard curve file? (y/n)\n ")
if(exFile=="y"){
setwd(getwd())
curveFile <- tcltk::tk_choose.files(caption = "Select the standard curve data file") ;
tempPars<-read.table(curveFile, header=FALSE,sep=",",dec=".");
CurvePars=unlist(tempPars)
# calculate MIC and add to data file
}
else{
calib<-readline("Do you want to provide data for MIC calibration? (y/n)\n")
if(calib=="y"){
MICFile <- tcltk::tk_choose.files(caption = "Select the MIC calibration file")
MICdata<- read.csv(MICFile, header=FALSE,sep="\t",dec=".");
MIC_length=length(MICdata[[1]])
R1<-0
MIC<-0
for (i in 1:MIC_length)
{
R1[i]<-RAD2.df[MICdata[i,1],"RAD20"];
MIC[i]<-MICdata[i,2]
}
fit<-lm(log(MIC)~R1)
A=summary(fit)$coefficients[1]
CurvePars<-exp(A)
B=summary(fit)$coefficients[2]
CurvePars[2]<-B
CurveFile <- readline( "How would you like to name the curve file?\n ")
# handle the case of a pre-existing file
#open(File=paste(CurveFile,".csv",sep=""),"w")
CurveFile=paste(getwd(),"/",CurveFile,".csv",sep="")
write.table(CurvePars,CurveFile,col.names = FALSE,row.names = FALSE)
}
else {CurvePars[1]=0
CurvePars[2]=0}
}
}
}
CurvePars
}

How to save all images in a separate folder?

So, I am running the following code:
dirtyFolder = "Myfolder/test"
filenames = list.files(dirtyFolder, pattern="*.png")
for (f in filenames)
{
print(f)
imgX = readPNG(file.path(dirtyFolder, f))
x = data.table(img2vec(imgX), kmeansThreshold(imgX))
setnames(x, c("raw", "thresholded"))
yHat = predict(gbm.mod, newdata=x, n.trees = best.iter)
img = matrix(yHat, nrow(imgX), ncol(imgX))
img.dt=data.table(melt(img))
names.dt<-names(img.dt)
setnames(img.dt,names.dt[1],"X1")
setnames(img.dt,names.dt[2],"X2")
Numfile = gsub(".png", "", f, fixed=TRUE)
img.dt[,id:=paste(Numfile,X1,X2,sep="_")]
write.table(img.dt[,c("id","value"),with=FALSE], file = "submission.csv", sep = ",", col.names = (f == filenames[1]),row.names = FALSE,quote = FALSE,append=(f != filenames[1]))
# show a sample
if (f == "4.png")
{
writePNG(imgX, "train_101.png")
writePNG(img, "train_cleaned_101.png")
}
}
What it does is basically, takes as input images which have noise in them and removes noise from them. This is only the later part of the code which applies the algorithm prepared from a training dataset (not shown here).
Now, I am not able to figure out, how can I save the cleaned image for each of the images in the test folder. That is, I wish to save the cleaned image for each of the images in the folder and not just the 4.png image. The output image should have the name as 4_cleaned.png if the input image has the name 4.png and it should be saved in a separate folder in the same directory. That is, if input image has the name x.png, the output image should have the name x_cleaned.png and saved in a separate folder. How can I do it?
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Alright, so construct the output filename using file.path and a function such as paste or sprintf:
folder_name = 'test'
output_filename_pattern = file.path(folder_name, '%s_cleaned.png')
remove_extension = function (filename)
gsub('\\.[^.]$', '', filename)
for (f in filenames) {
# … your code her …
new_filename = sprintf(output_filename_pattern, remove_extension(f))
# … save file here …
}

Resources