How do I read multiple binary files in R? - r

Suppose we have files in one folder file1.bin, file2.bin, ... , and file1460.bin in directory C:\R\Data and we want to read them and make a loop to go from 1 to 4 and take the average then from 4 to 8 average and so on till 1460.in the end will get 360 files
I tried to have them in a list,but did not know how to make the loop.
How do I read multiple files and manupulat them? in R language
I have been wasting countless hourse to figuer it out.any help

results <- array(dim=360)
for (i in 1:360){
results <- mean(yourlist[[(i*4):(i*4+3)]])
}
YMMV with the mean(yourList) call, but that structure would be how you could loop through the data once it's loaded.

Related

How Can I Download and Use a Matrix from Matrix Market?

I am trying to write code to store a matrix to a variable directly from Matrix Market's website. Below is a sample URL that I'd use:
https://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/bcsstruc1/bcsstk01.mtx.gz
The example URL will download a bcsstk01.mtx.gz file. I need to extract the bcsstk01.mtx file. Then I need to use MatrixMarket.mmread() so I can save to a variable.
I first tried saving the downloaded file (or URL location) to a variable A = HTTP.get(), but lack of online resources and lack of knowledge led to no results. Then I used HTTP.download() and got the .mtx.gz file, but I can't unzip it. And finally, MatrixMarket.mmread() cannot read .gz files. So I'm stuck with a downloaded file I can't do anything with unless I manually unzip it.
Using the info from link in the comments and some fiddling, I managed to get the following:
using TranscodingStreams, CodecZlib
using Downloads
stream = PipeBuffer()
openstream = TranscodingStream(GzipDecompressor(), stream)
Downloads.download("https://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/bcsstruc1/bcsstk01.mtx.gz", stream)
for line in eachline(openstream)
println(line)
end
This prints:
%%MatrixMarket matrix coordinate real symmetric
48 48 224
1 1 2.8322685185200e+06
5 1 1.0000000000000e+06
6 1 2.0833333333300e+06
7 1 -3.3333333333300e+03
...
which I suppose is the desired data.

Applying a System Call for ImageJ over a List in R

I am working with a large number of image files within several subdirectories of one parent folder.
I am attempting to run an ImageJ macro to batch-process the images (specifically, I am trying to stitch together a series of images taken on the microscope into single images). Unfortunately, I don't think I can't run this as an ImageJ Macro because the images were taken with varying grid sizes, ie some are 2x3, some are 3x3, some are 3x2, etc.
I've written an R script that is able to evaluate the image folders and determine the grid size, now I am trying to feed that information to my ImageJ macro to batch process the folder.
The issue I am running into seems like it should be easy to solve, but I haven't had any luck figuring it out: in R, I have a data.frame that I need to pass to the system command line-by-line with the columns concatenated into a single character string delimited by *'s.
Here's an example from the data.frame I have in R:
X xcoord ycoord input
1 4_10249_XY01_Fused_CH2 2 3 /XY01
2 4_10249_XY02_Fused_CH2 2 2 /XY02
3 4_10249_XY03_Fused_CH2 3 3 /XY03
4 4_10249_XY04_Fused_CH2 2 2 /XY04
5 4_10249_XY05_Fused_CH2 2 2 /XY05
6 4_10249_XY06_Fused_CH2 2 3 /XY06
Here's what each row needs to be transformed into so that ImageJ can understand it:
4_10249_XY01_Fused_CH2*2*3*/XY01
4_10249_XY02_Fused_CH2*2*2*/XY02
4_10249_XY03_Fused_CH2*3*3*/XY03
4_10249_XY04_Fused_CH2*2*2*/XY04
4_10249_XY05_Fused_CH2*2*2*/XY05
4_10249_XY06_Fused_CH2*2*3*/XY06
I tried achieving this with a for loop inside of a function that I thought would pass each row into the system command, but the macro only runs for the first line, none of the others.
macro <- function(i) {
for (row in 1:nrow(i)) {
df<-paste(i$X, i$xcoord, i$ycoord, i$input, sep='*')
}
system2('/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx', args=c('-batch "/Users/All Stitched CH2.ijm"', df))
}
macro(table)
I think this is because the for loop is not maintaining the list-form of the data.frame. How do I concatenate the table by row and maintain the list-structure? I don't know if I'm asking the right question, but hopefully I'm close enough that someone here understands what I'm trying to do.
I appreciate any help or tips you can provide!
Turns out taking a break helps a lot!
I came back to this after lunch and came up with an easy solution (duh!)- I thought I would post it in case anyone comes along later with a similar issue.
I used stringr to combine my datatable by columns, then put them back into list form using as.list. Finally, for feeding the list into my macro, I edited the macro to only contain the system command and then used lapply to apply the macro to my list of inputs. Here is what my code looks like in the end:
library(stringr)
tablecombined<- str_c(table$X, table$xcoord, table$ycoord, table$input, sep = "*")
listylist<-as.list(tablecombined)
macro <- function(i) {
system2('/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx', args=c('-batch "/Users/All Stitched CH2.ijm"', i))
}
runme<- lapply(listylist, macro)
Note: I am using the system2 command because it can take arguments, which is necessary for me to be able to feed it a series of images to iterate over. I started with the solution posted here: How can I call/execute an imageJ macro with R?
but needed additional flexibility for my specific situation. Hopefully someone may find this useful in the future when running ImageJ Macros from R!

Moving large amounts of files from one large folder to several smaller folders using R

I have over 7,000 .wav files in one folder which need to be split up into groups of 12 and placed into separate smaller folders.
The files correspond to 1-minute recordings taken every 5 minutes, so every 12 files corresponds to 1 hour.
The files are stored on my PC in the working directory: "E:/Audiomoth Files/Winter/Rural/Emma/"
Examples of the file names are as follows:
20210111_000000.wav
20210111_000500.wav
20210111_001000.wav
20210111_001500.wav
20210111_002000.wav
20210111_002500.wav
20210111_003000.wav
20210111_003500.wav
20210111_004000.wav
20210111_004500.wav
20210111_005000.wav
20210111_005500.wav
which would be one hour, then
20210111_010000.wav
20210111_010500.wav
20210111_011000.wav
and so on.
I need the files split into groups of 12 and then I need a new folder to be created in: "E:/Audiomoth Files/Winter/Rural/Emma/Organised Files"
With the new folders named 'Hour 1', 'Hour 2' and so on.
What is the exact code I need to do this?
As is probably very obvious I'm a complete beginner with R so if the answer could be spelt out in layman's terms that would be brilliant.
Thank you in advance
Something like this?
I intentionally used copy instead of cut in order to prevent data from being lost. I edited the answer so the files will keep their old names. I order to give them new names, replace name in the last line by "Part_", i, ".wav", for example.
# get a list of the paths to all the files
old_files <- list.files("E:/Audiomoth Files/Winter/Rural/Emma/", pattern = "\\.wav$", full.names = TRUE)
# create new directory
dir.create("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files")
# start a loop, repeat as often as there are groups of 12 within the list of files
for(i in 1:(round(length(old_files)/12)+1)){
# create a directory for the hour
directory <- paste("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files", "/Hour_", i, sep = "")
dir.create(directory)
# select the files that are to copy (I guess it will start with 1*12-11 = 1st file
# and end with i*12 = 12th file)
filesToCopy <- old_files[(i*12-11):(i*12)]
# for those files run another loop:
for(file in 1:12){
# get the name of the file
name <- basename(filesToCopy[file])
# copy the file to the current directory
file.copy(filesToCopy[file], paste(directory, "/", name, sep = ""))
}
}
When you're not entirely sure, I'd recommend to copy the files instead of moving them directly (which is what I hope this script here does). You can delete them manually, later on. After you checked that everything worked well and all data is where it should be. Otherwise data can be lost due to even small errors, which we do not want to happen.

Datastage Sequence job- how to process each file at a time if those files are in 7 different folders

DataStage - There are 7 folders in a path and in each folder there are 2 files . for eg : the 2 files are in the folllowing format- filename = test_s1_YYYYMMDD.txt, test_s1_YYYYMMDD.done. The path for these files are user/test/test_s1/
user/test/test_s2/
...
...
..
user/test/test_s7/------here s1,s2...s7 represents the different folders
In these folders the 2 above mentioned files are present , so how can i process each file in a sequence job?
First you need a job to process a file and the filename needs to be a parameter of that job.
For the Sequence level you need two levels - the inner one for the two files within each folder and a outer one for the different directories.
For the inner one you can choose to build a loop with to iterations or simply add the processing job twice to the sequence (which will reduce complexity in case it will always be two files).
The outer Sequence is a loop where you could parameterize the path in a way that the loop counter could be used to generate your 1-7 flexible path addon.
Check out more details on loops here
You can use the loop counter (stage_label.$Counter) to parameterize your job.
Depending on what you want to do with the files, it is an important decision how to process your files. Starting a job (or more) in a sequence for each file can lead to heavy overhead for just starting the jobs. Try loading all files at once in a parallel job using the sequenial file stage.
In the Sequential File Stage, set the appropriate Format. You can also set everything to none to just put each row in one column and process that in a later job. This will make the reading very flexible and forgiving. If your files are all the same structure, define your columns as needed.
To select the files, use File Patterns. In the Options of the Sequential File Stage, choose to have a File Name Column so you can process the filenames in a later job. You might also want to add a Row Number Column.
This method works pretty fast.

R - read html files within a folder, count frequency, and export output

I'm planning to use R to do some simple text mining tasks. Specifically, I would like to do the following:
Automatically read each html file within a folder, then
For each file, do frequency count of some particular words (e.g., "financial constraint" "oil export" etc.), then
Automatically write output to a csv. file using the following data structure (e.g., file 1 has "financial constraint" showing 3 times and "oil export" 4 times, etc.):
file_name count_financial_constraint count_oil_export
1 3 4
2 0 3
3 4 0
4 1 2
Can anyone please let me know where I should start, so far I think I've figured out how to clean html files and then do the count but I'm still not sure how to automate the process (I really need this as I have around 5 folders containing about 1000 html files within each)? Thanks!
Try this:
gethtml<-function(path=".") {
files<-list.files(path)
setwd(path)
html<-grepl("*.html",files)
files<-files[html]
htmlcount<-vector()
for (i in files) {
htmlcount[i]<- ##### add function that reads html file and counts it
}
return(sum(htmlcount))
}
R is not intended for doing rigorous text parsing. Subsequently, the tools for such tasks are limited. If you insist on doing it with R then you better get familiar with regular expressions and have a look at this.
However, I highly recommend using Python with the beautifulsoup library, which is specifically designed for this task.

Resources