R to ignore NULL values - r

I have 2 vector in R, but some of the values in both are marked as "NULL".
I want R to ignore "NULLS", but still "acknowledge" their presence because of indexes ( I´m using intersect and which function).
I have tried this:
for i in 1:length(vector)
if vector=="NULL"
i=i+1
else
'rest of the code'
Is this a good approach? The algorithm is running, but vector are very large.

You should change "NULL" for NA, which is R's native representation for NULL values. Then many functions have ways of dealing with NA values, such as na.action option... You shouldn't call your vector 'vector' since this is a reserved word for the class.
yourvector[yourvector == "NULL"] <- NA
Also you shouldn't add 1 to i in your if, just do nothing:
for (i in 1:length(yourvector)) {
if (!is.na(yourvector[i])) {
#rest of the code
}
}
Also tell what you wanna do. You probably don't need a for.

This code contains several errors:
First off, a vector cannot normally contain NULL values at all. Are you maybe using a list?
if vector=="NULL"
you probably mean if (vector[i] == "NULL"). Even so, that’s wrong. You cannot filter for NULL by comparing to the character string "NULL" – those two are fundamentally different things. You need to use the function is.null instead. Or, if you’re working with an actual vector which contains NA values (not NULL, like I said, that’s not possible), something like is.na.
i=i+1
This code makes no sense – leaving it out won’t change the result because the loop is in charge of incrementing i.
Finally, don’t iterate over indices – for (i in 1 : length(x)) is bad style in R. Instead, iterate over the elements directly:
for (x in vector) {
if (! is.na(x)) {
Perform action
}
}
But even this isn’t very R like. Instead, you would do two things:
use subsetting to get rid of NA values:
vector[! is.na(vector)]
Use one of the *apply functions (for instance, sapply) instead of a loop, and put the loop body into a function:
sapply(vector[! is.na(vector)], function (x) Do something with x)

Related

Cannot figure out how to use IF statement

I want to create a categorical variable for my DB: I want to create the "Same_Region" group, that includes all the people that live and work in the same Region and a "Diff_Region" for those who don't. I tried to use the IF statement, but I actually don't know how to proper say "if the variable Region of residence and Region of work are the same, return...". It's the very first time I try to approach by my self R, and I feel a lil bit lost.
I tried to put the two variables (Made by 2 letters - f.i. "BO") as Characters and use the "grep" command. But it eventually took to no results.
Then I tried by putting both the variables as factors, and nothing much changed.
----In R-----
extractSamepr <- function(RegionOfRes, RegionOfWo){
if(RegionOfRes== RegionOfWo){
return("SamePr")
}
else {
return("DiffPr")
}
SamePr <- NULL
for (i in 1:nrow(Data.Base)) {
SamePr <- c(SamePr, extractSamepr(Data.Base[i, "RegionOfRes", "RegionOfWo"]))
}
The ifelse way proposed in #deepseefan's comment is a standard way of solving this type of problem.
Here is another one. It uses the fact that FALSE/TRUE are coded as integers 0/1 to create a logical vector based on equality and then add 1 to that vector, giving a vector of 1/2 values. This result is used in the function's final instruction to index a vector with the two possible outcomes.
extractSamepr <- function(DF){
i <- 1 + (DF[["RegionOfRes"]] == DF[["RegionOfWo"]])
c("DiffPr", "SamePr")[i]
}
Data.Base$SamePr <- extractSamepr(Data.Base)

R function with variable args depending on presence/absence of other args

i've stumbled upon the varargs issue in R two or three times, but it seems that the problem i have is a little bit trickier than i expected. Here it is
i have a function, which does something with its variables, but i would like to introduce another variable, kind of a flag, that selects the way the function is working and which parameters are needed by the function itself: namely the number and type of inputs depends on a (flag) input.
Ok, an example is better:
example = function(x,flag=1,y){
if (flag) return(x)
else return(y)
}
and this is working fine.
The point is that in this example you need to specify both x and y every time. Instead I would like a function taking only x if flag=1 and only y if flag=0. (In this stupid example they basically would be two distinct functions, but in my actual case i have other (common) arguments on i do some calculations that both 'parts' of the functions need).
I know that one may specify whatever value for the unused argument and the result wouldn't change, but i want a function which is immediately readable by the user, and it is cumbersome to need to specify an argument which won't be used by the function
thank you for any help
What about the following.
example = function(x,flag=1,y){
if (flag && !missing(x)) return(x)
else if(!flag && !missing(y)) return(y)
}
This will check if the flag is 0 or non-zero plus it will check if an argument is missing. You may want to handle the case when neither of these is true cause this function will return NULL in that case.

R code: i got NA as result when i am calling my functon

When I call my function it show NA? Although, when I'm sending different parameters it works. So my question how many results should be one or two for each call and why sometimes i got NA.
Here is my code:
trsp<-function(x,p,tr,mlo,mhi)
{
mm<-seq(mlo, mhi, length =101)
w<-double(length (mm))
for (k in 1:101)
{
xmm<-sort(abs((x-mm[k]))^p)
w[k]<-sum(xmm[c(1:ceiling(tr*length(x)))])
}
mmw<-cbind(mm, w)
plot(mmw)
mmw[w<-min(w)]
}
dta<-rcauchy(23)
trsp(dta,1,1,0,1)
trsp(dta,2,1,0,1)
trsp(dta,1,0.6,0,1)
trsp(dta,2,0.6,0,1)
trsp(dta,0.5,0.6,0,1)
Let me answer this question step by step.
1) how many results should be one or two for each call?: Well, it will display only ONE result for each call. The reason being the fact that when we cbind two vectors, we get a matrix as output. In a matrix, if we use one subscript instead of two, the output is similar to what you would have got after casting a matrix to a vector and accessing an element of the vector.
2) why sometimes i got NA?: According to me, NA may appear in the situation when the value of w(equal to min(w)) increases 2 * length(x) where x is a parameter used in the function.

find indexes in R by not using `which`

Is there a faster way to search for indices rather than which %in% R.
I am having a statement which I need to execute but its taking a lot of time.
statement:
total_authors<-paper_author$author_id[which(paper_author$paper_id%in%paper_author$paper_id[which(paper_author$author_id%in%data_authors[i])])]
How can this be done in a faster manner?
Don't call which. R accepts logical vectors as indices, so the call is superfluous.
In light of sgibb's comment, you can keep which if you are sure that you will also get at least one match. (If there are no matches, then which returns an empty vector and you get everything instead of nothing. See Unexpected behavior using -which() in R when the search term is not found.)
Secondly, the code looks a little cleaner if you use with.
Thirdly, I think you want a single index with & rather than a double index.
total_authors <- with(
paper_author,
author_id[paper_id %in% paper_id & author_id %in% data_authors[i]
)

Searching an ordered "list" matching condition when nothing matches the condition, list length = 1

I have a sorted list with 3 columns, and I'm searching to see if the second column matches 2 or 4, then returning the first column's element if so, and putting that into a function.
noOutliers((L1LeanList[order(L1LeanList[,1]),])[(L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4),1])
when nothing matches the condition. I get a
Error in ((L1LeanList[order(L1LeanList[, 1]), ])[1, ])[(L1LeanList[order(L1LeanList[, :
incorrect number of dimensions
due to the fact that we effectively have List[List[all false]]
I can't just sub out something like L1LLSorted<-(L1LeanList[order(L1LeanList[,1]),]
and use L1LLSorted[,2] since this returns an error when the list is of length exactly 1
so now my code would need to look like
noOutliers(ifelse(any((L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4)),0,
(L1LeanList[order(L1LeanList[,1]),])[(L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4),1])))
which seems a bit ridiculous for the simple thing I'm requesting.
while writing this I realized that I can end up putting all this error checking into the noOutliers function itself so it looks like
noOutliers(L1LeanList,2,2,4) which will look much better, a necessity since slightly varying versions of this appear in my code dozens of times. I can't help but wonder, still, if theres a more elegant way to write the actual function.
for the curious, noOutliers finds a mean of the 30th-70th percentile in the sorted data set like so
noOutliers<-function(oList)
{
if (length(oList)<=20) return ("insufficient data")
cumSum<-0
iterCount<-0
for(i in round(length(oList)*3/10-.000001):round(length(oList)*7/10+.000001)+1)#adjustments deal with .5->even number rounding r mishandling
{ #and 1-based indexing (ex. for a list 1-10, taking 3-7 cuts off 1,2,8,9,10, imbalanced.)
cumSum<-cumSum+oList[i]
iterCount<-iterCount+1
}
return(cumSum/iterCount)
}
Let's see...
foo <- bar[(bar[,2]==2 | bar[,2]==4),1]
should extract all the first-column values you want. Then run whatever function you want on foo perhaps with the caveat "if (length(foo) < 1) then {exit, or skip, or something} "

Resources