How to change randomly the dates of creation and modification of files - r

I need to change the files creation / modification dates for an exercice.
According to http://theautomatic.net/2018/12/18/how-to-change-file-last-modified-date-with-r/ , I can find files info with file.info()function.
I can also change the dates modification of all the files in a folder combining sapply and Sys.setFileTime :
sapply(list.files("path", full.names = TRUE),
function(FILE) Sys.setFileTime(FILE, "1975-01-01"))
How can I also change date of creation ? (ctime <S3: POSIXct>)
What I expect : for the exercice : I want to change randomly the dates of creation and modification of files (in a range of a year, files are not are created at the same day)
What I tried (changing only date of modification) :
sapply(list.files("C:/Users/cariou-w/Nextcloud/sync-uncloud/1920-1921/master/web_scrapping/plans", full.names = TRUE),
function(FILE) Sys.setFileTime(FILE, paste0("2020-11-", sample(days,1))))

There is no generic cross platform way to change file creation time in R - many file systems do not even track it. You can use system or system2 to call whatever the appropriate shell command is for you. E.g. on OS X, touch -t 202001011234 /path/to/my/file

Related

scp_download to download multiple files based on a pattern?

I need to download many files from a server (specifically tectia) ideally using the ssh package. These files all follow the a predictable pattern across multiple sub folders. The filepath is formatted like this
/directory/subfolder/A001/abcde001.csv
Where A001 counts up alongside the last 3 digits of the filename (/A002/abcde002.csv and so on)
In the vignette for scp_download it states that the files parameter may contain wildcards so I have tried to do something like
scp_download(session, "/directory/subfolder/A.*/abcde.*[.]csv", to=tempdir())
and
scp_download(session, "directory/subfolder/A\\d{3}/abcde\\d{3}[.]csv", to=tempdir())
but no matter which combination of patterns or wildcards I can think of (which isn't many) I only get something like
Warning: SSH warning: scp: /directory/subfolder/A\d{3}/abcde\d{3}[.]csv: No such file or directory
What I'm hoping to do is either find a way to do pattern matching here, or to find a way to store tectia directories as a string to be read by scp_download. I've made sure that my session is connected properly and it works without attempting to pattern match, which it does.
I had the same problem. The problem is that when you use * in your pattern it gets escaped when you send it to the server. However, when you request a special file name like this /directory/subfolder/A001/abcde001.csv, it works fine.
Finally I changed my code based on the below steps:
I got the list of files/folders using ls command with ssh_exec_wait function and then store them on a variable.
Download files in the variable separately
session <- ssh_connect("username#ip",passwd="password")
files<-capture.output(ssh_exec_wait(session, command = 'ls /directory/subfolder/A001/*'))
dnc1<- scp_download(session, files[1], to = paste0(getwd(),"/data/"))
dnc2<- scp_download(session, files[2], to = paste0(getwd(),"/data/"))
dnc3<- scp_download(session, files[3], to = paste0(getwd(),"/data/"))
The bottom 3 commands can be done in a loop as this could be hundreds or thousands of records.

Moving large amounts of files from one large folder to several smaller folders using R

I have over 7,000 .wav files in one folder which need to be split up into groups of 12 and placed into separate smaller folders.
The files correspond to 1-minute recordings taken every 5 minutes, so every 12 files corresponds to 1 hour.
The files are stored on my PC in the working directory: "E:/Audiomoth Files/Winter/Rural/Emma/"
Examples of the file names are as follows:
20210111_000000.wav
20210111_000500.wav
20210111_001000.wav
20210111_001500.wav
20210111_002000.wav
20210111_002500.wav
20210111_003000.wav
20210111_003500.wav
20210111_004000.wav
20210111_004500.wav
20210111_005000.wav
20210111_005500.wav
which would be one hour, then
20210111_010000.wav
20210111_010500.wav
20210111_011000.wav
and so on.
I need the files split into groups of 12 and then I need a new folder to be created in: "E:/Audiomoth Files/Winter/Rural/Emma/Organised Files"
With the new folders named 'Hour 1', 'Hour 2' and so on.
What is the exact code I need to do this?
As is probably very obvious I'm a complete beginner with R so if the answer could be spelt out in layman's terms that would be brilliant.
Thank you in advance
Something like this?
I intentionally used copy instead of cut in order to prevent data from being lost. I edited the answer so the files will keep their old names. I order to give them new names, replace name in the last line by "Part_", i, ".wav", for example.
# get a list of the paths to all the files
old_files <- list.files("E:/Audiomoth Files/Winter/Rural/Emma/", pattern = "\\.wav$", full.names = TRUE)
# create new directory
dir.create("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files")
# start a loop, repeat as often as there are groups of 12 within the list of files
for(i in 1:(round(length(old_files)/12)+1)){
# create a directory for the hour
directory <- paste("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files", "/Hour_", i, sep = "")
dir.create(directory)
# select the files that are to copy (I guess it will start with 1*12-11 = 1st file
# and end with i*12 = 12th file)
filesToCopy <- old_files[(i*12-11):(i*12)]
# for those files run another loop:
for(file in 1:12){
# get the name of the file
name <- basename(filesToCopy[file])
# copy the file to the current directory
file.copy(filesToCopy[file], paste(directory, "/", name, sep = ""))
}
}
When you're not entirely sure, I'd recommend to copy the files instead of moving them directly (which is what I hope this script here does). You can delete them manually, later on. After you checked that everything worked well and all data is where it should be. Otherwise data can be lost due to even small errors, which we do not want to happen.

Datastage Sequence job- how to process each file at a time if those files are in 7 different folders

DataStage - There are 7 folders in a path and in each folder there are 2 files . for eg : the 2 files are in the folllowing format- filename = test_s1_YYYYMMDD.txt, test_s1_YYYYMMDD.done. The path for these files are user/test/test_s1/
user/test/test_s2/
...
...
..
user/test/test_s7/------here s1,s2...s7 represents the different folders
In these folders the 2 above mentioned files are present , so how can i process each file in a sequence job?
First you need a job to process a file and the filename needs to be a parameter of that job.
For the Sequence level you need two levels - the inner one for the two files within each folder and a outer one for the different directories.
For the inner one you can choose to build a loop with to iterations or simply add the processing job twice to the sequence (which will reduce complexity in case it will always be two files).
The outer Sequence is a loop where you could parameterize the path in a way that the loop counter could be used to generate your 1-7 flexible path addon.
Check out more details on loops here
You can use the loop counter (stage_label.$Counter) to parameterize your job.
Depending on what you want to do with the files, it is an important decision how to process your files. Starting a job (or more) in a sequence for each file can lead to heavy overhead for just starting the jobs. Try loading all files at once in a parallel job using the sequenial file stage.
In the Sequential File Stage, set the appropriate Format. You can also set everything to none to just put each row in one column and process that in a later job. This will make the reading very flexible and forgiving. If your files are all the same structure, define your columns as needed.
To select the files, use File Patterns. In the Options of the Sequential File Stage, choose to have a File Name Column so you can process the filenames in a later job. You might also want to add a Row Number Column.
This method works pretty fast.

How to create a new output file in R if a file with that name already exists?

I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.
I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.
See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.

why does field value comes as 1'000,24 instead of 1,000.24 when the format is >,>>>,>>9.99 in progress 4gl?

we have recently upgraded to oe rdbms 11.3 version from 9.1d. While generating
reports,i found the field value of a field comes as 2'239,00 instead of
2,239.00.I checked the format its >,>>>,>>9.99.
what could be the reason behind this?
The admin installing the database didn't to it's homework and selected wrong default numeric and decimal separator.
However no greater harm done:
Set these startup parameters
-numsep 44 -numdec 46
This is an simplified database startup example with added parameters as above:
proserve /db/db -H dbserver -S dbservice -numsep 44 -numdec 46
When you install Progress you are prompted for the numeric format to use. That information is then written to a file called "startup.pf" which is located in the install directory (C:\Progress\OpenEdge by default on Windows...)
If you picked the wrong numeric format you can edit startup.pf with any text editor. It should look something like this:
#This is a placeholder startup.pf
#You may put any global startup parameters you desire
#in this file. They will be used by ALL Progress modules
#including the client, server, utilities, etc.
#
#The file dlc/prolang/locale.pf provides examples of
#settings for the different options that vary internationally.
#
#The directories under dlc/prolang contain examples of
#startup.pf settings appropriate to each region.
#For example, the file dlc/prolang/ger/german.pf shows
#settings that might be used in Germany.
#The file dlc/prolang/ger/geraus.pf gives example settings
#for German-speaking Austrians.
#
#Copy the file that seems appropriate for your region or language
#over this startup.pf. Edit the file to meet your needs.
#
# e.g. UNIX: cp /dlc/prolang/ger/geraus.pf /dlc/startup.pf
# e.g. DOS, WINDOWS: copy \dlc\prolang\ger\geraus.pf \dlc\startup.pf
#
# You may want to include these same settings in /dlc/ade.pf.
#
#If the directory for your region or language does not exist in
#dlc/prolang, please check that you have ordered AND installed the
#International component. The International component provides
#these directories and files.
#
-cpinternal ISO8859-1
-cpstream ISO8859-1
-cpcoll Basic
-cpcase Basic
-d mdy
-numsep 44
-numdec 46
Changes to the startup.pf file are GLOBAL -- they impact all sessions started on this machine. If you only want to change a single session then you can add the parameters to the command line (or the shortcut icons properties) or to a local .pf file or to an ini file being used by that session.
You can also programmatically override the format in your code by using the SESSION system handle:
assign
session:numeric-decimal-point = "."
session:numeric-separator = ","
.
display 123456.999.
(You might want to consider saving the current values and restoring them if this is a temporary change.)
(You can also use the shorthand session:numeric-format = "american". or "european" for the two most common cases.)

Resources