I have an R script that retrieves CSV files from a daily email in my outlook and then depending whether the date in email subject is greater than a set date, moves them to a specific folder.
The code is splitting the subject line to extract the date - the position of which can be in one of two places in the string, due to recent changes.
I have an if statement built which can successfully locate the date within the string in either circumstance, but I can't then use a second if statement to see if the output from the first if statement is greater than the sample date.
Below is the code I am trying to execute (I have included data that can be reproduced):
# Test data
testLoop <- c("[EXTERNAL] Test Promo Sessions was executed at 28062019 100005",
"[EXTERNAL] Test Promo Sessions was executed at 29062019 100023",
"Test Promo Sessions was executed at 30062019 100007",
"Test Promo Sessions was executed at 01072019 100043",
"Test Promo Sessions was executed at 02072019 100049",
"Test Promo Sessions was executed at 03072019 100001")
# Example date
todaysDateFormatted2 <- '30062019'
# Loop
for(i in testLoop){
if(if(nchar(i) == 51){
strptime(sapply(strsplit(i, "\\s+"), "[", 7),"%d%m%Y")
} else {
strptime(sapply(strsplit(i, "\\s+"), "[", 8),"%d%m%Y")
} > strptime(todaysDateFormatted2,"%d%m%Y")){
print("greater than - move file")
} else {
print("not greater than - do nothing")
}
}
When attempting the execute this code, I get the following error, however I'm not sure how to interpret it:
[1] "not greater than - do nothing"
[1] "not greater than - do nothing"
Error in if (if (nchar(i) == 51) { :
argument is not interpretable as logical
In addition: Warning message:
In if (if (nchar(i) == 51) { :
the condition has length > 1 and only the first element will be used
There were several flaws in your code. The duplicated if was weird, and you strptime into nowhere if you don't assign it to something, below t. Also you may want to assign the else condition to t. Now you can compare t to todaysDateFormatted2 and print the result for each iteration.
for (i in testLoop) {
if (nchar(i) == 51) {
t <- strptime(sapply(strsplit(i, "\\s+"), "[", 7),"%d%m%Y")
} else {
t <- strptime(sapply(strsplit(i, "\\s+"), "[", 8),"%d%m%Y")
}
if (t > strptime(todaysDateFormatted2,"%d%m%Y")) {
print("greater than - move file")
} else {
print("not greater than - do nothing")
}
}
# [1] "not greater than - do nothing"
# [1] "not greater than - do nothing"
# [1] "not greater than - do nothing"
# [1] "greater than - move file"
# [1] "greater than - move file"
# [1] "greater than - move file"
The code in the OP fails because R does not consistently resolve the inner if() statement to a vector of length 1, which causes the outer if() to fail as described in the OP.
If the intent of the code is to decide whether to move a file based on the date in a file name, a simpler version of the code can accomplish what is desired. Here, we reduce the levels of nesting by using lapply() and saving the output from the original inner if() clause to an object. We then compare the saved object to the object representing today's date and write a message to the R log.
# Test data
testLoop <- c("[EXTERNAL] Test Promo Sessions was executed at 28062019 100005",
"[EXTERNAL] Test Promo Sessions was executed at 29062019 100023",
"Test Promo Sessions was executed at 30062019 100007",
"Test Promo Sessions was executed at 01072019 100043",
"Test Promo Sessions was executed at 02072019 100049",
"Test Promo Sessions was executed at 03072019 100001")
# Example date
todaysDateFormatted2 <- '30062019'
datesProcessed <- lapply(testLoop,function(x){
if(nchar(x) == 51) y <- strptime(sapply(strsplit(x, "\\s+"), "[", 7),"%d%m%Y")
else y <- strptime(sapply(strsplit(x, "\\s+"), "[", 8),"%d%m%Y")
if(y > strptime(todaysDateFormatted2,"%d%m%Y")) message("greater than - move file")
else message("not greater than - do nothing")
y
})
...and the output:
> datesProcessed <- lapply(testLoop,function(x){
+ if(nchar(x) == 51) y <- strptime(sapply(strsplit(x, "\\s+"), "[", 7),"%d%m%Y")
+ else y <- strptime(sapply(strsplit(x, "\\s+"), "[", 8),"%d%m%Y")
+ if(y > strptime(todaysDateFormatted2,"%d%m%Y")) message("greater than - move file")
+ else message("not greater than - do nothing")
+ y
+ })
not greater than - do nothing
not greater than - do nothing
not greater than - do nothing
greater than - move file
greater than - move file
greater than - move file
>
Related
I'm using parallel::mclapply() with R 4.0.1 and getting the following warning: "scheduled cores ... did not deliver results, all values of the jobs will be affected".
Here the result of my investigation: inspecting the function source code, I realized that it happens when the vector dr is not all TRUE. This means that for some cores the second condition inside the for loop below (is.raw(a)) is never executed. a is the value returned by readChild(), that if returned raw data at least once, the condition would be verified at least once. So I'm thinking that readChild() is returning NULL.
readChild and readChildren return a raw vector with a "pid" attribute if data were available, an integer vector of length one with the process ID if a child terminated or NULL if the child no longer exists (no children at all for readChildren).
I ask you to validate or reject my conclusions. Finally if true what are the possible reasons?
while (!all(fin)) {
s <- selectChildren(ac[!fin], -1)
if (is.null(s)) break # no children -> no hope we get anything (should not happen)
if (is.integer(s))
for (ch in s) {
a <- readChild(ch)
if (is.integer(a)) {
core <- which(cp == a)
fin[core] <- TRUE
} else if (is.raw(a)) {
core <- which(cp == attr(a, "pid"))
job.res[[core]] <- ijr <- unserialize(a)
if (inherits(ijr, "try-error"))
has.errors <- c(has.errors, core)
dr[core] <- TRUE
} else if (is.null(a)) {
# the child no longer exists (should not happen)
core <- which(cp == ch)
fin[core] <- TRUE
}
}
}
This error message can occur when the child process dies/crashes, e.g.
> y <- parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Warning message:
In parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x) :
scheduled core 1 did not deliver a result, all values of the job will be affected
> str(y)
List of 2
$ : NULL
$ : int 2
That a child process completely dies is of course not good. It can happen for several reasons. My best guess is that you parallelize something that must not be parallelized. Forked processing (=mclapply()) is known to be unstable with code that multi-thread, among other things.
For what's it worth, if you use the future framework instead (disclaimer: I'm the author), you'll get an error message that is a bit more informative, e.g.
> library(future.apply)
> plan(multicore)
> y <- future_lapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Error: Failed to retrieve the result of MulticoreFuture (future_lapply-1) from
the forked worker (on localhost; PID 19959). Post-mortem diagnostic: No process
exists with this PID, i.e. the forked localhost worker is no longer alive.
So I am trying to learn R on my own and am just working through the online tutorial. I am trying to code a recursive function that prints the first n terms of the Fibonacci sequence and can't get the code to run without the error:
Error in if (nterms <= 0) { : missing value where TRUE/FALSE needed
My code does ask me for input before entering the if else statement either which I think is odd as well. Below is my code any help is appreciated.
#Define the fibonacci sequence
recurse_fibonacci <- function(n) {
# Define the initial two values of the sequence
if (n <= 1){
return(n)
} else {
# define the rest of the terms of the sequence using recursion
return(recurse_fibonacci(n-1) + recurse_fibonacci(n-2))
}
}
#Take input from the user
nterms = as.integer(readline(prompt="How many terms? "))
# check to see if the number of terms entered is valid
if(nterms <= 0) {
print("please enter a positive integer")
} else {
# This part actually calculates and displays the first n terms of the sequence
print("Fibonacci Sequence: ")
for(i in 0:(nterms - 1)){
print(recurse_fibonacci(i))
}
}
This is a problem of readline in non-interactive mode. readline does not wait for a keypress and immediately executes the next instruction. The solution below is the solution posted in this other SO post.
I post below a complete answer, with the Fibonnaci numbers function a bit modified.
recurse_fibonacci <- function(n) {
# Define the initial two values of the sequence
if (n <= 1){
n
} else{
# define the rest of the terms of the sequence using recursion
Recall(n - 1) + Recall(n - 2)
}
}
#Take input from the user
cat("How many terms?\n")
repeat{
nterms <- scan("stdin", what = character(), n = 1)
if(nchar(nterms) > 0) break
}
nterms <- as.integer(nterms)
# check to see if the number of terms entered is valid
if(nterms <= 0) {
print("please enter a positive integer")
} else {
# This part actually calculates and displays the first n terms of the sequence
print("Fibonacci Sequence: ")
for(i in 0:(nterms - 1)){
print(recurse_fibonacci(i))
}
}
This code is the contents of file fib.R. Running in a Ubuntu 20.04 terminal gives
rui#rui:~$ Rscript fib.R
How many terms?
8
Read 1 item
[1] "Fibonacci Sequence: "
[1] 0
[1] 1
[1] 1
[1] 2
[1] 3
[1] 5
[1] 8
[1] 13
rui#rui:~$
To make it work with Rscript replace
nterms = as.integer(readline(prompt="How many terms? "))
with
cat ("How many terms?")
nterms = as.integer (readLines ("stdin", n = 1))
Then you can run it as Rscript fib.R, assuming that the code is in the file fib.R in the current working directory.
Otherwise, execute it with source ("fib.R") within an R shell.
Rscript does not operate in interactive mode and does not expect any input from the terminal. Check what interactive () returns in both the cases. Rscript will return FALSE as it is non-interactive, but the same function when run within an R shell (with source ()) it will be true.
?readline mentions that it cannot be used in non-interactive mode. Whereas readLines explicitely connect to stdin.
The code works fine but you shouldn't enter it into the terminal as is. My suggestion: put the code into a script file (ending .R) and source it (get help about it with ?source but it's actually pretty straightforward).
In R-Studio you can simply hit the source button.
I would like to delete all of the rows that sit between certain headers in this example text file.
fileConn <- file("sample.txt")
one <- "*Keyword"
two <- "*Node"
three <- "$ Node,X,Y,Z"
four <- "1,639982.78040607,4733827.5104821,0"
five <- "2,639757.59709573,4733830.43494066,0"
six <- "3,639738.81268144,4733834.3619618,0"
seven <- "*End"
writeLines (c(one, two, three, four, five, six, seven), fileConn)
close(fileConn)
sample <- readLines("sample.txt")
What I am looking to do is delete all of the rows/lines between "*Node" and "*End". Since I am dealing with files with different lengths of rows between these headers, the deletion method needs to be based on headers only. I have no idea how to do this since I've only deleted rows in dataframes referenced by row numbers previously. Any clues?
Expected output is:
*Keyword
*Node
*End
readLines returns a vector, not a data frame, so we can create the sample input more simply:
sample = c("*Keyword",
"*Node",
"$ Node,X,Y,Z",
"1,639982.78040607,4733827.5104821,0",
"2,639757.59709573,4733830.43494066,0",
"3,639738.81268144,4733834.3619618,0",
"*End")
Find the starting and ending headers, and remove the elements in between with negative indexing:
node = which(sample == "*Node")
end = which(sample == "*End")
result = sample[-seq(from = node + 1, to = end - 1)]
result
# [1] "*Keyword" "*Node" "*End"
This assumes there is a single *Node and a single *End line. It also assumes that there is at least one line to delete. You may want to create a more robust solution with some handling for those special cases, e.g.,
delete_between = function(input, start, end) {
start_index = which(sample == start)
end_index = which(sample == end)
if (length(start_index) == 0 | length(end_index) == 0) {
warning("No start or end found, returning input as-is")
return(input)
}
if (length(start_index) > 1 | length(end_index) > 1) {
stop("Multiple starts or ends found.")
}
if (start_index == end_index - 1) {
return(input)
}
return(input[-seq(from = start_index + 1, to = end_index - 1)])
}
Please, I am trying to print a message based on an entry of a user.
I am studying for a test and I want to create a function that If I type an specific article( variable character) It will check over a set of vectors and print a message.
ExpfromUS <- function(x){
x <- readline("Check if your articles could be import or export to US. Entry the type of article that you want to ship: ")
a <- c(x == CBOExUS)
b <- c(x == RQSVExUS)
e <- c(x == NATExUS)
for ( i in length(a == TRUE)){
if (a[i] == TRUE){
print("Ok, but just with Contractual basis only");
break; }
else{ for (i in length(b)){
if (b[i] == TRUE){
print("Ok, but with restrictions of quantity, size or value");
break;}
else{ for (i in length(c)){
if (e[i] == TRUE){
print("Sorry, but we are not able to ship your cargo at this moment");
break;}
else{ print("Please check your entry we could not find this article in our database")
}}
}
}
}
}
}
But always print the last message "Please check your entry we could not find this article in our database", what am I doing wrong? (Sorry this is a beginner level doubt).
Thanks for all who spend their time helping me.
Expanding my comment: I suspect that your indexing for all the for loops is (part) the problem. The current indexing is only going to cause one iteration since length(a == TRUE) will return a single integer. I suspect you wanted the numeric values where "a == TRUE" so you could output a message at that row. The which function returns numeric values corresponding to the index of "TRUE" values of a logical vector, so perhaps you wanted:
for ( i in which(a) ){
....}
else{ for (i in which(b)){
...}
else{ for (i in which(c)){
....}
Further note: When working with logical vectors it is rarely necessary to include == TRUE and is sometimes going to return unexpected results when the vector includes NA's, since NA is never == to anything.
Given what you have offered as values for those three vectors I now thin it should have been
{....
a <- x %in% CBOExUS # the c() not needed. This returns a logical vector
b <- x %in% RQSVExUS
e <- x %in% NATExUS
.....
THe %in% function allows you to test for multiple values. The == function is asking if there is complete equality, obviously unlikely. There still may these correction be other flaws, but we're still without a [MCVE] and so we still won't be able to offer tested coding.
I did some programming work on R language to do the bubble sort. Sometimes it works perfectly without any error message, but sometimes, it shows "Error in if (x[i] > x[i + 1]) { : argument is of length zero". Can any one help me check whats wrong with it? I have attached my code below
example <- function(x) {
n <- length(x)
repeat {
hasChanged <- FALSE
n <- n - 1
for(i in 1:n) {
if ( x[i] > x[i+1] ) {
temp <- x[i]
x[i] <- x[i+1]
x[i+1] <- temp
hasChanged <- TRUE
cat("The current Vector is", x ,"\n")
}
}
if ( !hasChanged ) break;
}
}
x <-sample(1:10,5)
cat("The original Vector is", x ,"\n")
example(x)
The error occurs because you are iteratively decreasing n. Depending on the original vector's order (or lack thereof), n can reach the value of 1 after the last change. In that case, a further reduction of n in the next iteration step addresses the value x[0], which is undefined.
With a minimal correction your code will work properly, without giving error messages. Try to replace the line
if ( !hasChanged ) break;
with
if ( !hasChanged | n==1 ) break
Basically you have two termination criteria: Either nothing has been changed in the previous iteration or n is equal to one. In both cases, a further iteration won't change the vector since it is already ordered.
By the way, in R programming you don't need a semicolon at the end of a command. It is tolerated/ignored by the interpreter, but it clutters the code and is not considered good programming style.
Hope this helps.