I'm writing the code to get the data from Uncomtrade- an UN's database. Because the database has a usage limit of 100 enquiries/hour so I need to put a time out there.
I want to write the code with tryCatch that will:
Automatically set programs to time out everytime the error for max limit appears
Rerun for the current level of i,j and k if a connection error orcurs
My current code still work though but I want to learn how to use tryCatch too
And also is there a way to get rid of the for loops. Can the apply family function be used here?
Thanks guys
n=0
a<-c()
for (i in (1996:2014)) {
for (j in c("0301","0302","0303","0304","0305","0306","0307","0308")) {
for (k in c("704","116","360","418","458","104","608","702","764")) {
s2<-paste(i,j,k,sep="")
a<-c(a,s2)
print (s2)
n<-n+1
if(n<=100) {
s1 <- get.Comtrade(r=k, ps=i, rg="2", cc=j, fmt="csv",px="H0")
Sys.sleep (1)
s1<-do.call(rbind.data.frame,s1)
library(foreign)
write.dta(s1,file=paste("D:/unTrade/",s2,".dta"))
}
else {
print(n)
print(s2)
print("reset here")
n=0
Sys.sleep(3610)
}
}
}
}
I can't really help you with the TryCatch(); I don't have the experience myself.
Regarding the for loops, this is one solution (although I think in these cases the for-loops are not that evil; vectorization really counts in all kinds of matrix operations etc).
dat <- expand.grid(i = 1996:1999, j = c("0301","0302","0303","0304","0305","0306","0307","0308"), k = c("704","116","360","418","458","104","608","702","764"))
library(dplyr)
dat %>% group_by(i, j, k) %>%
do({
cat('s1 <- get.Comtrade(r=', .$k, ', ps=', .$i, ', cc=', .$j, ', rg=\"2\", fmt=\"csv\",px=\"H0\")\n')
flush.console()
# return(s1)
})
From your own code s1 (also) appears to be a data.frame, so in this case, the dplyr do() nicely glues all these data frames together.
HTH
Related
I am trying to learn how to implement control structures such as FOR and while loops.
I created a function the simulates betting in a famous Brazilian lotto.
In the lotto, a player bets on 6 unique integers from a 1:60 vector (called your_bet).
The function samples 6 values from the 1 to 60 universe ("result") and tests how many values in result match your_bet, printing out:
your_bet
result
total points (out of 6 possible)
one of three possible commentaries on the result of the bet.
Code as follows:
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE)){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
}
```
I then tried to implement a loop that would make the function go over and over again in a loop until total points>= a given target (here as target_points), modifying the function as bellow.
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points)#inserted stubborn_until_x_points and target_points arguments{
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (stubborn_until_x_points==TRUE)#Inserted WHILE loop here{
while(total_points < target_points){
LOTTO(your_bet, stubborn_until_x_points=TRUE, target_points)}
}
}
```
This did make the function repeat in a loop, but for some reason, it keeps looping even when the condition is met.
Observation - I realized that when the condition is met on the first run, it actually stops, but after entering a loop, it goes on forever.
I canĀ“t find what is wrong. Any ideas?
Thanks in advance.
I haven't dug quite deep into the function, but the only reason that this should not break at any point is when target_points > n_samples (6 here).
The problem in this case is quite obvious and simple to fix. Reduce target_points to be less than 6 or add n_samples (6 atm.) and make it greater than target_points. Rather than this I suspect the main problem lies within the recursive function though. R is rather restrictive when it comes to recursion, if one tries to do a simple recursion for example
i <- 0
f <- function(){
i <<- i + 1
f()
}
f()
i
# 896 on my pc
one can see that we cannot use recursion for very deep recursive functions (in R). This throws the very unhelpful error
Error: C stack usage 7974196 is too close to the limit
To alleviate this, one simply has to remove the recursion (simply in italian because sometimes it is not simple). In this case we just move the while loop to contain the main body of the function, and use an if statement to break early if necessary.
Below is a slightly modified version of the function (note that sample_n and number_range has been added as arguments).
In this function the while loop has been moved to contain the main body, and the result is instead printed at the end (using the variable res to figure out the result). At the end of the loop I use a if(isFALSE(stubborn_until_x_points))break statement to exit early if necessary.
LOTTO <- function(your_bet,
sample_n = 6,
number_range = 1:60,
stubborn_until_x_points = FALSE,
target_points){
if(missing(target_points) || target_points > sample_n)
stop('missing target_points or target_points too large')
total_points <- -Inf # Always smaller than target_points
i <- 0
res <- 0
# If you want a new bet every iteration.
# Add this at the end of the loop, but remove the 'if'
if(missing(your_bet))
your_bet <- sample(number_range, size=sample_n, replace=FALSE)
while(total_points < target_points){
result <- sample(number_range, size=sample_n, replace=FALSE)
logical_vector <- your_bet %in% result
total_points <- sum(logical_vector)
if (total_points==6){
res <- 1
}else if (total_points==5){
res <- 2
}
i <- i + 1
if(isFALSE(stubborn_until_x_points))
break
}
if(res == 1)
cat(res <- 'You\'re a millionaire!\n', sep = '\n')
else if(res == 2)
cat(res <- '5 points, you are rich!\n', sep = '\n')
else
cat(res <- 'Better luck next time.', sep = '\n')
c(result = res, number_of_tries = i)
}
The function is called as before, but now also returns the number of attempts and the result obtained from trials as shown below.
LOTTO(target_points = 6, stubborn_until_x_points = TRUE)
You're a millionaire!
#Output:
result number_of_tries
"You're a millionaire!\n" "8297820"
Removing recursion, including the body of the function inside the while loop, assigning -Inf to the initial total_points and adding the break statement were much usefull.
Addapting the answer by #Oliver yielded exactly what I was looking for:
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points=0){
total_points<--Inf
while(total_points < target_points){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (isFALSE(stubborn_until_x_points==TRUE))
break
}
His answer, however, yields interesting additional results, and better control over inadequate argument inputs
I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.
Thank you very much in advance for helping me out - I am new to R programming, and have got stuck with trying to use user-inputs accepted through one function within another. The second function is a simple pay calculator where the three variables are number of hours, per hour pay rate and the number of times the rate is multiplied once the number of working hours exceeds 180. I have written a first function called enterval through which I am asking the user to enter the above variables. In a second function called salary, I am trying to use enterval to accept the inputs before running the payout calculations. I am getting stuck because the second function, salary, is breaking when I come to an "if" condition, specifying if h > 180. I am sharing my code below. Thanks again for your kind assistance. I searched among previous answers but could not find a specific instance that fully answered my query - apologies if I missed out an appropriate previous response.
The error I am getting on running this code is "Error in h > 180 :
comparison (6) is possible only for atomic and list types"
enterval <- function() {
h <- (readline("Please enter number of hours: "))
h <- as.integer(h)
r <- (readline("Please enter applicable rate: "))
r <- as.integer(r)
m <- (readline("Please confirm your multiplier: "))
m <- as.integer(m)
}
salary <- function () {
enterval()
if (h > 180) {
totalpay <- (180*r) + ((h-180)*r*m)
}
else {
totalpay <- (h*r)
}
totalpay
}
I think that what you need is the function to be like this:
enterval <- function() {
h <- (readline("Please enter number of hours: "))
h <- as.numeric(h)
r <- (readline("Please enter applicable rate: "))
r <- as.numeric(r)
m <- (readline("Please confirm your multiplier: "))
m <- as.numeric(m)
list(h=h, r=r, m=m)
}
salary <- function () {
inputs <- enterval()
if (inputs$h > 180) {
totalpay <- (180*inputs$r) + ((inputs$h-180)*inputs$r*inputs$m)
}
else {
totalpay <- (inputs$h*inputs$r)
}
totalpay
}
Output:
> salary()
Please enter number of hours: 5
Please enter applicable rate: 0.5
Please confirm your multiplier: 2
[1] 2.5
In your question, enterval just returned the value stored in m but even that was not saved anywhere (because you did not assign that to a variable inside salary so it could not be used by salary. In R functions return only the last object (or what the function return returns if used). In the function above I return a list with elements h, r and m.
Then I save that list to inputs which can be used by salary. Elements in inputs can be accessed using the $ operator.
Also, as a small addition, when you say rate I believe it is a number between 0-1 so I changed as.integer to as.numeric because as.integer will round down to the integer. Feel free to change that back to as.integer if indeed you needed an integer.
EDIT
Better and probably more advanced way of writing salary:
As per #RichardScriven 's comment a good way to avoid typing all the input$* variables is to use list2env like this:
salary <- function () {
inputs <- enterval()
list2env(inputs, environment())
if (h > 180) {
totalpay <- (180*r) + (h-180)*r*m)
}
else {
totalpay <- (h*r)
}
totalpay
}
list2env will essentially create variables out of the list elements inside salary's environment, which are immediately accesible without needing to use input$*.
Variables assigned in R functions (similar to many other programming languages) have limited scope, this means that the m you assign in your function will only be available within that function. If you want your variable to be available outside of the function you have a two major options:
Return the variable, this is the preferred option, its much cleaner and is good programming practice for numerous reasons described in many stack overflow post. An important thing to remember is a function can only return one variable.
You can do a global assignment, this will make the variable in your function have a global scope and be accessible within all functions. The code for this is m <<- 1 as opposed to m <- 1. This isn't recommended for a variety of reasons. See Global variables in R or Global and local variables in R for more on this subject.
Since you can only return one variable you might put all three objects into a data frame or a list and return that. Though I would question whether you want the value entry done in a function. Additionally if you're user input is primary to your goal R might not be the right language. That being said the code below accomplishes what you're looking for
enterval <- function() {
h <- (readline("Please enter number of hours: "))
h <- as.integer(h)
r <- (readline("Please enter applicable rate: "))
r <- as.integer(r)
m <- (readline("Please confirm your multiplier: "))
m <- as.integer(m)
salaryVariables <- data.frame("hours" = h, "rate" = r, "multiplier" = m)
return(salaryVariables)
}
salary <- function(salaryInfo) {
r <- salaryInfo$rate
h <- salaryInfo$hours
m <- salaryInfo$multiplier
if (h > 180) {
totalpay <- (180*r) + ((h-180)*r*m)
}
else {
totalpay <- (h*r)
}
return(totalpay)
}
mySalary <- enterval()
salary(mySalary)
I found the following piece of code here at stackoverflow:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
So i wanted to use it for my own proposes, but it did not work out as planned.
What i try to archive is to use it in a for loop or with lapply to use it with multiple data.frames. Amongst others I tried:
d.frame1 <- iris
d.frame2 <- cars
l.frames <- c("d.frame1","d.frame2")
for (b in l.frames){
columnFunction(b)
}
but it yields the following error message:
Error in dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res :
$ operator is invalid for atomic vectors
Well, what i need additionally is that I can loop though that function so that i can iterate through different data.frames.
Last but not least I would need something like:
for (xyz in l.frames){
xyz <- columnFunction(xyz)
}
to automate the saving step.
Does anyone have any idea how i could loop though that function or how i could change the function so that it performs all those steps and is loopable.
I`m quite new to R so perhaps Im missing something obvious.
lapply was designed for this task:
l.frames <- list(d.frame1, d.frame2)
l.frames <- lapply(l.frames, columnFunction)
If you insist on using a for loop:
for (i in seq_along(l.frames)) l.frames[[i]] <- columnFunction(l.frames[[i]])
it's the first i use mclapply to run parallel script on multiple process, but the problem that i've tried the script on my laptop and it worked very well and filled the dataframe correctly, but now when i run the script on my office pc, when the printing ends and it's time to collect the data, the script stops with this error :
Error: cannot allocate vector of size 80 Kb
fun <- function(testdf) {
l=12000
errordf=data.frame()
errordf <- mclapply(1:nrow(15000), function(i)
{
for (ind in 1:nrow(testdf))
{
if( i >= l/2 ){
testdf[ind,]$X = testdf[ind,]$pos * 2
} else
{
testdf[ind,]$X = testdf[ind,]$pos/l
}
}
permdf <- testdf
lapply(1:100, function(j)
{ permdf$X<- sample(permdf$X,nrow(permdf), replace=FALSE)
fit=lm(X ~ gx, permdf) #linear regression calculation
regerror=sum(residuals(fit)^2)
data.frame(pc=i,error=regerror )
})
}, mc.cores=3)
res<-NULL
tmp <- lapply(errordf, function(ii){
tmp <- lapply(ii, function(ij){ #rbind the data and return the dataframe
res<<- rbind(res, ij)
})
})
return (res)
}
testdf example:
structure(list(ax = c(-0.0242214, 0.19770304, 0.01587302, -0.0374415,
0.05079826, 0.12209738), gx = c(-0.3913043, -0.0242214, -0.4259067,
-0.725, -0.0374415, 0.01587302), pos = c(11222, 13564, 16532,
12543, 12534, 14354)), .Names = c("ax", "gx", "pos"), row.names = c(NA,
-6L), class = "data.frame")
i'm sure that the code is working (that's why i did not included the full code), because i tried it multiple times on my laptop, but when i tries it on my office pc it lunch this error.
any help would be appreciatd
Right now you don't use the apply as intended in you last double nested lapply loop, you might as well use a for loop instead of using lapply combined with a global variable. In addition, you continuously grow res, this is rather memory and time intensive. Normally, an lapply loop would not suffer from this problem, but your use of a global variable totally negates the advantage. You seem to have a double nested list you want to rbind. I would defintely not loop over the data structure, I would just use something along the lines of do.call("rbind", data_structure) to do this, although it is hard to provide concrete advice without a reproducible example. This solution does not suffer from the continuous growing problem you experience.