Searching an ordered "list" matching condition when nothing matches the condition, list length = 1 - r

I have a sorted list with 3 columns, and I'm searching to see if the second column matches 2 or 4, then returning the first column's element if so, and putting that into a function.
noOutliers((L1LeanList[order(L1LeanList[,1]),])[(L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4),1])
when nothing matches the condition. I get a
Error in ((L1LeanList[order(L1LeanList[, 1]), ])[1, ])[(L1LeanList[order(L1LeanList[, :
incorrect number of dimensions
due to the fact that we effectively have List[List[all false]]
I can't just sub out something like L1LLSorted<-(L1LeanList[order(L1LeanList[,1]),]
and use L1LLSorted[,2] since this returns an error when the list is of length exactly 1
so now my code would need to look like
noOutliers(ifelse(any((L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4)),0,
(L1LeanList[order(L1LeanList[,1]),])[(L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4),1])))
which seems a bit ridiculous for the simple thing I'm requesting.
while writing this I realized that I can end up putting all this error checking into the noOutliers function itself so it looks like
noOutliers(L1LeanList,2,2,4) which will look much better, a necessity since slightly varying versions of this appear in my code dozens of times. I can't help but wonder, still, if theres a more elegant way to write the actual function.
for the curious, noOutliers finds a mean of the 30th-70th percentile in the sorted data set like so
noOutliers<-function(oList)
{
if (length(oList)<=20) return ("insufficient data")
cumSum<-0
iterCount<-0
for(i in round(length(oList)*3/10-.000001):round(length(oList)*7/10+.000001)+1)#adjustments deal with .5->even number rounding r mishandling
{ #and 1-based indexing (ex. for a list 1-10, taking 3-7 cuts off 1,2,8,9,10, imbalanced.)
cumSum<-cumSum+oList[i]
iterCount<-iterCount+1
}
return(cumSum/iterCount)
}

Let's see...
foo <- bar[(bar[,2]==2 | bar[,2]==4),1]
should extract all the first-column values you want. Then run whatever function you want on foo perhaps with the caveat "if (length(foo) < 1) then {exit, or skip, or something} "

Related

Find the index of the last occurence of fulfilled criteria in a matrix in r

I have an array (x) in R of size 30x11x10.
x=array(-2:20, c(30,11,10))
Each 'grid' or matrix represents a day of data for a month (30 days represented here). I want to find the index (i,j,k) of when the last occurrence of a number less than 2 occurs. Ideally, I would also like the value returned too. If this was in Matlab, I could just use [i,j,k]=find(x(x<2)) but I don't see an exact equivalent for this in R.
I have looked at 'match' as suggested in other posts here, but it seems to find elements when they are specified, but not when a criteria (x<2) is given?
I tried this:
xxx<-match(x,x<2,0) but it returns a long vector of integers that don't appear to show what I am looking for.
Then I tried:xxx<-match(x,x[x<2],0) which looks a bit more promising, but still isn't what I want (to be honest I'm not sure what the output is indexing).
I think I'm probably asking a foolish question here because if I want 3 indices and the value returned, then I should be assigning them to something preemptively right (which I'm not doing)? Can anyone offer any advice?

Matrice help: Finding average without the zeros

I'm creating a Monte Carlo model using R. My model creates matrices that are filled with either zeros or values that fall within the constraints. I'm running a couple hundred thousand n values thru my model, and I want to find the average of the non zero matrices that I've created. I'm guessing I can do something in the last section.
Thanks for the help!
Code:
n<-252500
PaidLoss_1<-numeric(n)
PaidLoss_2<-numeric(n)
PaidLoss_3<-numeric(n)
PaidLoss_4<-numeric(n)
PaidLoss_5<-numeric(n)
PaidLoss_6<-numeric(n)
PaidLoss_7<-numeric(n)
PaidLoss_8<-numeric(n)
PaidLoss_9<-numeric(n)
for(i in 1:n){
claim_type<-rmultinom(1,1,c(0.00166439057698873, 0.000810856947763742, 0.00183509730283373, 0.000725503584841243, 0.00405428473881871, 0.00725503584841243, 0.0100290201433936, 0.00529190850119495, 0.0103277569136224, 0.0096449300102424, 0.00375554796858996, 0.00806589279617617, 0.00776715602594742, 0.000768180266302492, 0.00405428473881871, 0.00226186411744623, 0.00354216456128371, 0.00277398429498122, 0.000682826903379993))
claim_type<-which(claim_type==1)
claim_Amanda<-runif(1, min=34115, max=2158707.51)
claim_Bob<-runif(1, min=16443, max=413150.50)
claim_Claire<-runif(1, min=30607.50, max=1341330.97)
claim_Doug<-runif(1, min=17554.20, max=969871)
if(claim_type==1){PaidLoss_1[i]<-1*claim_Amanda}
if(claim_type==2){PaidLoss_2[i]<-0*claim_Amanda}
if(claim_type==3){PaidLoss_3[i]<-1* claim_Bob}
if(claim_type==4){PaidLoss_4[i]<-0* claim_Bob}
if(claim_type==5){PaidLoss_5[i]<-1* claim_Claire}
if(claim_type==6){PaidLoss_6[i]<-0* claim_Claire}
}
PaidLoss1<-sum(PaidLoss_1)/2525
PaidLoss3<-sum(PaidLoss_3)/2525
PaidLoss5<-sum(PaidLoss_5)/2525
PaidLoss7<-sum(PaidLoss_7)/2525
partial output of my numeric matrix
First, let me make sure I've wrapped my head around what you want to do: you have several columns -- in your example, PaidLoss_1, ..., PaidLoss_9, which have many entries. Some of these entries are 0, and you'd like to take the average (within each column) of the entries that are not zero. Did I get that right?
If so:
Comment 1: At the very end of your code, you might want to avoid using sum and dividing by a number to get the mean you want. It obviously works, but it opens you up to a risk: if you ever change the value of n at the top, then in the best case scenario you have to edit several lines down below, and in the worst case scenario you forget to do that. So, I'd suggest something more like mean(PaidLoss_1) to get your mean.
Right now, you have n as 252500, and your denominator at the end is 2525, which has the effect of inflating your mean by a factor of 100. Maybe that's what you wanted; if so, I'd recommend mean(PaidLoss_1) * 100 for the same reasons as above.
Comment 2: You can do what you want via subsetting. Take a smaller example as a demonstration:
test <- c(10, 0, 10, 0, 10, 0)
mean(test) # gives 5
test!=0 # a vector of TRUE/FALSE for which are nonzero
test[test!=0] # the subset of test which we found to be nonzero
mean(test[test!=0]) # gives 10, the average of the nonzero entries
The middle three lines are just for demonstration; the only necessary lines to do what you want are the first (to declare the vector) and the last (to get the mean). So your code should be something like PaidLoss1 <- mean(PaidLoss_1[PaidLoss_1 != 0]), or perhaps that times 100.
Comment 3: You might consider organizing your stuff into a dataframe. Instead of typing PaidLoss_1, PaidLoss_2, etc., it might make sense to organize all this PaidLoss stuff into a matrix. You could then access elements of the matrix with [ , ] indexing. This would be useful because it would clean up some of the code and prevent you from having to type lots of things; you could also then make use of things like the apply() family of functions to save you from having to type the same commands over and over for different columns (such as the mean). You could also use a dataframe or something else to organize it, but having some structure would make your life easier.
(And to be super clear, your code is exactly what my code looked like when I first started writing in R. You can decide if it's worth pursuing some of that optimization; it probably just depends how much time you plan to eventually spend in R.)

Cannot figure out how to use IF statement

I want to create a categorical variable for my DB: I want to create the "Same_Region" group, that includes all the people that live and work in the same Region and a "Diff_Region" for those who don't. I tried to use the IF statement, but I actually don't know how to proper say "if the variable Region of residence and Region of work are the same, return...". It's the very first time I try to approach by my self R, and I feel a lil bit lost.
I tried to put the two variables (Made by 2 letters - f.i. "BO") as Characters and use the "grep" command. But it eventually took to no results.
Then I tried by putting both the variables as factors, and nothing much changed.
----In R-----
extractSamepr <- function(RegionOfRes, RegionOfWo){
if(RegionOfRes== RegionOfWo){
return("SamePr")
}
else {
return("DiffPr")
}
SamePr <- NULL
for (i in 1:nrow(Data.Base)) {
SamePr <- c(SamePr, extractSamepr(Data.Base[i, "RegionOfRes", "RegionOfWo"]))
}
The ifelse way proposed in #deepseefan's comment is a standard way of solving this type of problem.
Here is another one. It uses the fact that FALSE/TRUE are coded as integers 0/1 to create a logical vector based on equality and then add 1 to that vector, giving a vector of 1/2 values. This result is used in the function's final instruction to index a vector with the two possible outcomes.
extractSamepr <- function(DF){
i <- 1 + (DF[["RegionOfRes"]] == DF[["RegionOfWo"]])
c("DiffPr", "SamePr")[i]
}
Data.Base$SamePr <- extractSamepr(Data.Base)

R code: i got NA as result when i am calling my functon

When I call my function it show NA? Although, when I'm sending different parameters it works. So my question how many results should be one or two for each call and why sometimes i got NA.
Here is my code:
trsp<-function(x,p,tr,mlo,mhi)
{
mm<-seq(mlo, mhi, length =101)
w<-double(length (mm))
for (k in 1:101)
{
xmm<-sort(abs((x-mm[k]))^p)
w[k]<-sum(xmm[c(1:ceiling(tr*length(x)))])
}
mmw<-cbind(mm, w)
plot(mmw)
mmw[w<-min(w)]
}
dta<-rcauchy(23)
trsp(dta,1,1,0,1)
trsp(dta,2,1,0,1)
trsp(dta,1,0.6,0,1)
trsp(dta,2,0.6,0,1)
trsp(dta,0.5,0.6,0,1)
Let me answer this question step by step.
1) how many results should be one or two for each call?: Well, it will display only ONE result for each call. The reason being the fact that when we cbind two vectors, we get a matrix as output. In a matrix, if we use one subscript instead of two, the output is similar to what you would have got after casting a matrix to a vector and accessing an element of the vector.
2) why sometimes i got NA?: According to me, NA may appear in the situation when the value of w(equal to min(w)) increases 2 * length(x) where x is a parameter used in the function.

R to ignore NULL values

I have 2 vector in R, but some of the values in both are marked as "NULL".
I want R to ignore "NULLS", but still "acknowledge" their presence because of indexes ( I´m using intersect and which function).
I have tried this:
for i in 1:length(vector)
if vector=="NULL"
i=i+1
else
'rest of the code'
Is this a good approach? The algorithm is running, but vector are very large.
You should change "NULL" for NA, which is R's native representation for NULL values. Then many functions have ways of dealing with NA values, such as na.action option... You shouldn't call your vector 'vector' since this is a reserved word for the class.
yourvector[yourvector == "NULL"] <- NA
Also you shouldn't add 1 to i in your if, just do nothing:
for (i in 1:length(yourvector)) {
if (!is.na(yourvector[i])) {
#rest of the code
}
}
Also tell what you wanna do. You probably don't need a for.
This code contains several errors:
First off, a vector cannot normally contain NULL values at all. Are you maybe using a list?
if vector=="NULL"
you probably mean if (vector[i] == "NULL"). Even so, that’s wrong. You cannot filter for NULL by comparing to the character string "NULL" – those two are fundamentally different things. You need to use the function is.null instead. Or, if you’re working with an actual vector which contains NA values (not NULL, like I said, that’s not possible), something like is.na.
i=i+1
This code makes no sense – leaving it out won’t change the result because the loop is in charge of incrementing i.
Finally, don’t iterate over indices – for (i in 1 : length(x)) is bad style in R. Instead, iterate over the elements directly:
for (x in vector) {
if (! is.na(x)) {
Perform action
}
}
But even this isn’t very R like. Instead, you would do two things:
use subsetting to get rid of NA values:
vector[! is.na(vector)]
Use one of the *apply functions (for instance, sapply) instead of a loop, and put the loop body into a function:
sapply(vector[! is.na(vector)], function (x) Do something with x)

Resources