Validate a character as a file path? - r

What's the best way to determine if a character is a valid file path? So CheckFilePath( "my*file.csv") would return FALSE (on windows * is invalid character), whereas CheckFilePath( "c:\\users\\blabla\\desktop\\myfile.csv" ) would return TRUE.
Note that a file path can be valid but not exist on disk.

This is the code that save is using to perform that function:
....
else file(file, "wb")
on.exit(close(con))
}
else if (inherits(file, "connection"))
con <- file
else stop("bad file argument")
......

Perhaps file.exists() is what you're after? From the help page:
file.exists returns a logical vector indicating whether the files named by its argument exist.
(Here ‘exists’ is in the sense of the system's stat call: a file will be reported as existing only
if you have the permissions needed by stat. Existence can also be checked by file.access, which
might use different permissions and so obtain a different result.
Several other functions to tap into the computers file system are available as well, also referenced on the help page.

No, there's no way to do this (reliably). I don't see an operating system interface in neither Windows nor Linux to test this. You would normally try and create the file and get a fail message, or try and read the file and get a 'does not exist' kind of message.
So you should rely on the operating system to let you know if you can do what you want to do to the file (which will usually be read and/or write).
I can't think of a reason other than a quiz ("Enter a valid fully-qualified Windows file path:") to want to know this.

I would suggest trying checkPathForOutput function offered by the checkmate package. As stated in the linked documentation, the function:
Check[s] if a file path can be safely be used to create a file and write to it.
Example
checkmate::checkPathForOutput(x = tempfile(pattern = "sample_test_file", fileext = ".tmp"))
# [1] TRUE
checkmate::checkPathForOutput(x = "c:\\users\\blabla\\desktop\\myfile.csv")
# [1] TRUE
Invalid path
\0 character should not be used in Linux1 file names:
checkmate::check_path_for_output("my\0file.csv")
# Error: nul character not allowed (line 1)
1 Not tested on Windows, but looking at the code of checkmate::check_path_for_output indicates that function should work correctly on MS Windows system as well.

Related

How to save file into a path containing special characters such as '&'? ('&' which is different from '&' typed in English Keyboard)

I need to write out a file to a certain path that contains a special character in R. the path is something like this: C:/Users/Technology & Innovation/Webscraping files/US_data/data
It works totally fine when I access this path through python, but I cannot access the same path in R. And I cannot change this path name or remove '&' as this path is used by a lot of people. Does anyone have a good idea on how to solve it?
I found out it is '&' which has subtle difference from '&' that we usually type in through English Keyboard. May be that's the reason causing the problem?
Here is what I have tried:
write.csv(df, 'C:/Users/Technology & Innovation/Webscraping files/US_data/data/file.csv').
write.csv(df, 'C:\\Users\\Technology & Innovation\\Webscraping files\\US_data/data/file.csv')
Not matter whether I try to read or write a file, it is not working in my case.
I also tried reset the working directory path and got the error message:
Error in setwd("C:/Users/Technology & Innovation/Webscraping files/US_data/data") : cannot change working directory
Write it like this
C:\\Users\\Technology & Innovation\\Webscraping files\\US_data\\data
also, you can change your current directory.
Changing your current directory will help you because you can write read.csv("filename.csv") or write.csv(name_of_file, "filename.csv") as it is without mentioning path.
If you have to write a file you have to use syntax properly.
write.csv(C:\\Users\\Technology & Innovation\\Webscraping files\\US_data\\data,"filename.csv")

R test if a file exists, and is not a directory

I have an R script that takes a file as input, and I want a general way to know whether the input is a file that exists, and is not a directory.
In Python you would do it this way: How do I check whether a file exists using Python?, but I was struggling to find anything similar in R.
What I'd like is something like below, assuming that the file.txt actually exists:
input.good = "~/directory/file.txt"
input.bad = "~/directory/"
is.file(input.good) # should return TRUE
is.file(input.bad) #should return FALSE
R has something called file.exists(), but this doesn't distinguish files from directories.
There is a dir.exists function in all recent versions of R.
file.exists(f) && !dir.exists(f)
The solution is to use file_test()
This gives shell-style file tests, and can distinguish files from folders.
E.g.
input.good = "~/directory/file.txt"
input.bad = "~/directory/"
file_test("-f", input.good) # returns TRUE
file_test("-f", input.bad) #returns FALSE
From the manual:
Usage
file_test(op, x, y) Arguments
op a character string specifying the test to be performed. Unary
tests (only x is used) are "-f" (existence and not being a directory),
"-d" (existence and directory) and "-x" (executable as a file or
searchable as a directory). Binary tests are "-nt" (strictly newer
than, using the modification dates) and "-ot" (strictly older than):
in both cases the test is false unless both files exist.
x, y character vectors giving file paths.
You can also use is_file(path) from the fs package.

Definition of OSCOMPSTAT values

I've tried to find a table with the definition for each COMPSTAT (related to the tool Control-M workload Automation) return code but without any success.
Can anyone tell me if such a table exists?
Thank you.
It's the return code from whatever task was being executed at that time. By convention, a zero value means 'OK', and anything non-zero means an error of some kind.
Different utilities (i.e. external commands) have different possible return codes, so if the command were SCP then you would look up the code in the SCP documentation, and find that for example, '67' meant 'key exchange failed'.
There is no table that contains the definition of each COMPSTAT return code.
OSCOMPSTAT stand for Control-M Operating System Completion Status.
The value of COMPSTAT is set by the exit code of the command that was called.
Example:
After calling the command [cat file1.txt] the value of COMPSTAT will be:
0 if the file "file1.txt" is found
1 if the file "file1.txt" is not found
After calling the command [ctmfw] the value of COMPSTAT will be:
0 if the specified file is found
7 if the specified file is not found

Tesseract use of number-dawg

I need to specify a numeric pattern. I already made training normally.
I created a config file that has the line
user_patterns_suffix user-patterns
and the file user-patterns contains my patterns, for example:
:\d\d\d\d\d\d\d.
:\d\d\d\d\d\d\d\d\d;
!\d\d\d\d\d\d\d\d}
then I launch tesseract with the config file over a tif, and it tells me "Error: failed to insert pattern " message, for the first two patterns. It ultimately acts as if no pattern has been issued.
I need to recognize only and ever that patterns, and tried to train a language with a number-dawg file, but then, when using tesseract command, I got a segmentation fault.
I used in the number-dawg file its conversion of the above patterns:
: .
: ;
! }
The questions, as the google documentation is not clear, and I do not speak english:
the patterns file, where have to be used? I suppose number-dawg has to be used during training, but I got seg fault so couldn't try with it, and user-patterns during recognition phase, when launching Tesseract, but didn't work. Where am I doing errors?
do I need a dictionary, also, when training with number-dawg? I have a digit and punctiation only set of possible characters, and all the possible numbers in the digits, a dictionary is not possible. If I need to use dictionaries, how could I do?
Thanks in advance for help, any hint would be very appreciated

PROGRESS - Validating a user-input file output path

I've written some PROGRESS code that outputs some data to a user defined file. The data itself isn't important, the output process works fine. It's basically
DEFINE VARIABLE filePath.
UPDATE filePath /*User types in something like C:\UserAccount\New.txt */
OUTPUT TO (VALUE) filePath.
Which works fine, a txt file is created in the input directory. My question is:
Does progress have any functionality that would allow me to check if an input
file path is valid? (Specifically, if the user has input a valid directory, and if they have permission to create a file in the directory they've chosen)
Any input or feedback would be appreciated.
FILE-INFO
Using the system handle FILE-INFO gives you a lot of information. It also works on directories.
FILE-INFO:FILE-NAME = "c:\temp\test.p".
DISPLAY
FILE-INFO:FILE-NAME
FILE-INFO:FILE-CREATE-DATE
FILE-INFO:FILE-MOD-DATE
FILE-INFO:FILE-INFO
FILE-INFO:FILE-MOD-TIME
FILE-INFO:FILE-SIZE
FILE-NAME:FILE-TYPE
FILE-INFO:FULL-PATHNAME
WITH FRAME f1 1 COLUMN SIDE-LABELS.
A simple check for existing directory with write rights could be something like:
FUNCTION dirOK RETURNS LOGICAL (INPUT pcDir AS CHARACTER):
FILE-INFO:FILE-NAME = pcDir.
IF INDEX(FILE-INFO:FILE-TYPE, "D") > 0
AND INDEX(FILE-INFO:FILE-TYPE, "W") > 0 THEN
RETURN TRUE.
ELSE
RETURN FALSE.
END FUNCTION.
FILE-NAME:FILE-TYPE will start with a D for directories and a F for plain files. It also includes information about reading and writing rights. Check the help for more info. If the file doesn't exist basically all attributes except FILE-NAME will be empty or unknown (?).
Edit: it seems that FILE-TYPE returns W in some cases even if there's no actual writing rights in that directory so I you might need to handle this through error processing instead
ERROR PROCESSING
OUTPUT TO VALUE("f:\personal\test.txt").
PUT UNFORMATTED "Test" SKIP.
OUTPUT CLOSE.
CATCH eAnyError AS Progress.Lang.ERROR:
/* Here you could check for specifically error no 98 indicating a problem opening the file */
MESSAGE
"Error message and number retrieved from error object..."
eAnyError:GetMessage(1)
eAnyError:GetMessageNum(1) VIEW-AS ALERT-BOX BUTTONS OK.
END CATCH.
FINALLY:
END FINALLY.
SEARCH
When checking for a single file the SEARCH command will work. If the file exists it returns the complete path. It does however not work on directory, only files. If you SEARCH without complete path e g SEARCH("test.p") the command will search through the directories set in the PROPATH environment variable and return the first matching entry with complete path. If there's no match it will return unknown value (?).
Syntax:
IF SEARCH("c:\temp\test.p") = ? THEN
MESSAGE "No such file" VIEW-AS ALERT-BOX ERROR.
ELSE
MESSAGE "OK" VIEW-AS ALERT-BOX INFORMATION.
SYSTEM-DIALOG GET-FILE character-field has an option MUST-EXIST if you want to use a dailogue to get filename/dir from user. Example from manual
DEFINE VARIABLE procname AS CHARACTER NO-UNDO.
DEFINE VARIABLE OKpressed AS LOGICAL INITIAL TRUE.
Main:
REPEAT:
SYSTEM-DIALOG GET-FILE procname
TITLE "Choose Procedure to Run ..."
FILTERS "Source Files (*.p)" "*.p",
"R-code Files (*.r)" "*.r"
MUST-EXIST
USE-FILENAME
UPDATE OKpressed.
IF OKpressed = TRUE THEN
RUN VALUE(procname).
ELSE
LEAVE Main.
END.

Resources