The following R code is to add one column to the dataset and return the data.frame.
xdfAirDemo <- RxXdfData(file.path(rxGetOption("sampleDataDir"), "AirlineDemoSmall.xdf"))
I add a print function to check the length of the vector.
f.append <- function(lst){
lst$mod_val_test <- rep(1, length(lst[[1]]))
print(length(lst$mod_val_test))
return(lst)
}
df.Airline <- rxDataStep(inData = xdfAirDemo, transformFunc = f.append)
When I run the above rxDatastep , the print function in the 'f.append' function was executed twice and output two values. Can someone help me to understand how the rxDatastep works?
The result show as below.
[1] 10
[1] 600000
Rows Read: 600000, Total Rows Processed: 600000, Total Chunk Time: 0.651 seconds
When you call rxDataStep, it actually runs your code on the first 10 rows of the data as a test. If this succeeds, it then processes the entire dataset one chunk at a time.
If you don't want your code to be executed in the test run, you can check the value of the .rxIsTestChunk builtin variable:
f.append <- function(lst)
{
# don't print anything if this is the test chunk
if(.rxIsTestChunk)
return(NULL)
lst$mod_val_test <- rep(1, length(lst[[1]]))
print(length(lst$mod_val_test))
return(lst)
}
Related
I am trying to learn how to implement control structures such as FOR and while loops.
I created a function the simulates betting in a famous Brazilian lotto.
In the lotto, a player bets on 6 unique integers from a 1:60 vector (called your_bet).
The function samples 6 values from the 1 to 60 universe ("result") and tests how many values in result match your_bet, printing out:
your_bet
result
total points (out of 6 possible)
one of three possible commentaries on the result of the bet.
Code as follows:
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE)){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
}
```
I then tried to implement a loop that would make the function go over and over again in a loop until total points>= a given target (here as target_points), modifying the function as bellow.
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points)#inserted stubborn_until_x_points and target_points arguments{
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (stubborn_until_x_points==TRUE)#Inserted WHILE loop here{
while(total_points < target_points){
LOTTO(your_bet, stubborn_until_x_points=TRUE, target_points)}
}
}
```
This did make the function repeat in a loop, but for some reason, it keeps looping even when the condition is met.
Observation - I realized that when the condition is met on the first run, it actually stops, but after entering a loop, it goes on forever.
I canĀ“t find what is wrong. Any ideas?
Thanks in advance.
I haven't dug quite deep into the function, but the only reason that this should not break at any point is when target_points > n_samples (6 here).
The problem in this case is quite obvious and simple to fix. Reduce target_points to be less than 6 or add n_samples (6 atm.) and make it greater than target_points. Rather than this I suspect the main problem lies within the recursive function though. R is rather restrictive when it comes to recursion, if one tries to do a simple recursion for example
i <- 0
f <- function(){
i <<- i + 1
f()
}
f()
i
# 896 on my pc
one can see that we cannot use recursion for very deep recursive functions (in R). This throws the very unhelpful error
Error: C stack usage 7974196 is too close to the limit
To alleviate this, one simply has to remove the recursion (simply in italian because sometimes it is not simple). In this case we just move the while loop to contain the main body of the function, and use an if statement to break early if necessary.
Below is a slightly modified version of the function (note that sample_n and number_range has been added as arguments).
In this function the while loop has been moved to contain the main body, and the result is instead printed at the end (using the variable res to figure out the result). At the end of the loop I use a if(isFALSE(stubborn_until_x_points))break statement to exit early if necessary.
LOTTO <- function(your_bet,
sample_n = 6,
number_range = 1:60,
stubborn_until_x_points = FALSE,
target_points){
if(missing(target_points) || target_points > sample_n)
stop('missing target_points or target_points too large')
total_points <- -Inf # Always smaller than target_points
i <- 0
res <- 0
# If you want a new bet every iteration.
# Add this at the end of the loop, but remove the 'if'
if(missing(your_bet))
your_bet <- sample(number_range, size=sample_n, replace=FALSE)
while(total_points < target_points){
result <- sample(number_range, size=sample_n, replace=FALSE)
logical_vector <- your_bet %in% result
total_points <- sum(logical_vector)
if (total_points==6){
res <- 1
}else if (total_points==5){
res <- 2
}
i <- i + 1
if(isFALSE(stubborn_until_x_points))
break
}
if(res == 1)
cat(res <- 'You\'re a millionaire!\n', sep = '\n')
else if(res == 2)
cat(res <- '5 points, you are rich!\n', sep = '\n')
else
cat(res <- 'Better luck next time.', sep = '\n')
c(result = res, number_of_tries = i)
}
The function is called as before, but now also returns the number of attempts and the result obtained from trials as shown below.
LOTTO(target_points = 6, stubborn_until_x_points = TRUE)
You're a millionaire!
#Output:
result number_of_tries
"You're a millionaire!\n" "8297820"
Removing recursion, including the body of the function inside the while loop, assigning -Inf to the initial total_points and adding the break statement were much usefull.
Addapting the answer by #Oliver yielded exactly what I was looking for:
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points=0){
total_points<--Inf
while(total_points < target_points){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (isFALSE(stubborn_until_x_points==TRUE))
break
}
His answer, however, yields interesting additional results, and better control over inadequate argument inputs
Here is my R Script that works just fine:
perc.rank <- function(x) trunc(rank(x)) / length(x) * 100.0
library(dplyr)
setwd("~/R/xyz")
datFm <- read.csv("yellow_point_02.csv")
datFm <- filter(datFm, HRA_ClassHRA_Final != -9999)
quant_cols <- c("CL_GammaRay_Despiked_Spline_MLR", "CT_Density_Despiked_Spline_FinalMerged",
"HRA_PC_1HRA_Final", "HRA_PC_2HRA_Final","HRA_PC_3HRA_Final",
"SRES_IMGCAL_SHIFT2VL_Slab_SHIFT2CL_DT", "Ultrasonic_DT_Despiked_Spline_MLR")
# add an extra column to datFm to store the quantile value
for (column_name in quant_cols) {
datFm[paste(column_name, "quantile", sep = "_")] <- NA
}
# initialize an empty dataframe with the new column names appended
newDatFm <- datFm[0,]
# get the unique values for the hra classes
hraClassNumV <- sort(unique(datFm$HRA_ClassHRA_Final))
# loop through the vector and create currDatFm and append it to newDatFm
for (i in hraClassNumV) {
currDatFm <- filter(datFm, HRA_ClassHRA_Final == i)
for (column_name in quant_cols) {
currDatFm <- within(currDatFm,
{
CL_GammaRay_Despiked_Spline_MLR_quantile <- perc.rank(currDatFm$CL_GammaRay_Despiked_Spline_MLR)
CT_Density_Despiked_Spline_FinalMerged_quantile <- perc.rank(currDatFm$CT_Density_Despiked_Spline_FinalMerged)
HRA_PC_1HRA_Final_quantile <- perc.rank(currDatFm$HRA_PC_1HRA_Final)
HRA_PC_2HRA_Final_quantile <- perc.rank(currDatFm$HRA_PC_2HRA_Final)
HRA_PC_3HRA_Final_quantile <- perc.rank(currDatFm$HRA_PC_3HRA_Final)
SRES_IMGCAL_SHIFT2VL_Slab_SHIFT2CL_DT_quantile <- perc.rank(currDatFm$SRES_IMGCAL_SHIFT2VL_Slab_SHIFT2CL_DT)
Ultrasonic_DT_Despiked_Spline_MLR_quantile <- perc.rank(currDatFm$Ultrasonic_DT_Despiked_Spline_MLR)
}
)
}
newDatFm <- rbind(newDatFm, currDatFm)
}
newDatFm <- newDatFm[order(newDatFm$Core_Depth),]
# head(newDatFm, 10)
write.csv(newDatFm, file = "Ricardo_quantiles.csv")
I have a few questions though. Every R book or video that I have read or watched, recommends using the 'apply' family of language constructs over the classic 'for' loop stating that apply is much faster.
So the first question is: how would you write it using apply (or tapply or some other apply)?
Second, is this really true though that apply is much faster than for? The csv file 'yellow_point_02.csv' has approx. 2500 rows. This script runs almost instantly on my Macbook Pro which has 16 Gig of memory.
Third, See the 'quant_cols' vector? I created it so that I could write a generic loop (for columm_name in quant_cols) ....But I could not make it to work. So I hard-coded the column names post-fixed with '_quantile' and called the 'perc.rank' many times. Is there a way this could be made dynamic? I tried the 'paste' stuff that I have in my script, but that did not work.
On the positive side though, R seems awesome in its ability to cut through the 'Data Wrangling' tasks with very few statements.
Thanks for your time.
I have run into errors with my for loop. The code is as follows:
#finding IDs with >5% replicate variance
#initialize vectors
LS1repvariance = NULL
anomalylist = NULL
#open for loop iterating from 1 to end of dataset
for (i in 1:1523){
#call replicates, which start off as characters
charrep1 = widesubdat[i,2]
charrep2 = widesubdat[i,11]
#convert to numeric
rep1 = as.numeric(charrep1)
rep2 = as.numeric(charrep2)
#calculation
repvariance = (rep1-rep2)/((rep1+rep2)/2)*100
#if loop for anomalous replicates
if (abs(repvariance)>=5)
anomalylist[i]=widesubdat[i,0]
}
The error I get says
Error in if (abs(repvariance) >= 5) anomalylist[i] = widesubdat[i, 0]
: missing value where TRUE/FALSE needed
I think the error is in the iteration because it defines i as 336L, and it does not call charrep correctly, but I have no idea why. I've done for loops in python but never in R, but all of the for loop help pages seem to have the same structure. All of the lines that I can run outside of the for loop test out okay.
I've read that if statements also require curly brackets, but IDLE said unexpected "{" when I used them.
You could also drop the loop
pick <- abs(200*(widesubdat[,2]-widesubdat[,11])/(widesubdat[,2]+widesubdat[,11]))>=5
anomalylist <- widesubdat[,1] # Note the comment above with index 0
anomalylist[!pick] <- NA
I am running a for loop from (1:1700) in R, but I am loading different data in each iteration. But I am getting error in some iterations in between (may be because of corresponding data is missing).
I want to know if there is any way I could skip those particular iterations in which I get error and at least for loop should complete all the 1700 iterations skipping aforementioned error showing iterations.
I have to run a for loop, there is no other option.
Yoy can use tryCatch within your loop. here an example where I loop from 1 to 5 , and for some counter value I get an error ( i create it here using stop), I catch it and then I continue for other values of the counters.
for( i in 1:5) ## replace 5 by 1700
tryCatch({
if(i %in% c(2,5)) stop(e)
print(i) ## imagine you read a file here, or any more complicated process
}
,error = function(e) print(paste(i,'is error')))
[1] 1
[1] "2 is error"
[1] 3
[1] 4
[1] "5 is error"
I use try for such issues. It allows your loop to continue through the cycle of values without stopping at the error message.
Example
make data
set.seed(1)
dat <- vector(mode="list", 1800)
dat
tmp <- sample(1800, 900) # only some elements are filled with data
for(i in seq(tmp)){
dat[[tmp[i]]] <- rnorm(10)
}
dat
loop without try
#gives warning
res <- vector(mode="list", length(dat))
for(i in seq(dat)){
res[[i]] <- log(dat[[i]]) # warning given when trying to take the log of the NULL element
}
loop with try
#cycles through
res <- vector(mode="list", length(dat))
for(i in seq(dat)){
res[[i]] <- try(log(dat[[i]]), TRUE) # cycles through
}
I'm trying to get a function to run for a specified amount of time, at the moment I'm trying to use the system.time function. I can't figure out how to define a new variable that takes on cumulative value the function running, then put it into a while loop.
timer<-(system.time(simulated_results<-replicate(n=1,simulation(J,10000,FALSE,0.1),simplify="vector"))[3])
print(timer)
while(cumsum(timer)<15){
print(cumsum(timer))
simulated_results<-replicate(n=10000,simulation(J,10000,FALSE,0.1),simplify="vector")
}
I would greatly appreciate any help!!!
If you want to run some code for a specified number of seconds, you can try the following :
start <- as.numeric(Sys.time())
duration <- 5
results <- NULL
while(as.numeric(Sys.time())-start < duration) {
results <- c(results, replicate(...))
}
Of course, you have to change the value of duration (in seconds), and replace replicate(...) with your code.
You can use tryCatch approach for this task. For example, consider the following code
fun_test = function(test_parameter){
result <- 1+test_parameter #some execution
return(result)
}
time = 10 #seconds
res <- NULL
tryCatch({
res <- withTimeout({
check = fun_test(tsp)
}, timeout = time)
}, TimeoutException = function(ex) {
message("Timeout. Skipping.")
})
This program will run the function fun_test for 10 seconds. If the execution is successful in this time, the result is returned, else program is stoped. For more guidance, you can follow this URL
Time out an R command via something like try()