Run a formula based on another cells value? - formula

I'd like to run this formula
=COUNTIF(N29:N295,">85")-COUNTIF(N29:N295,">95")
but only when another cell has the value "93"
I've tried IF functions but I can't get the syntax correct. Can anyone help?

You would likely benefit by switching to the COUNTIFS function.
Assuming that 'another cell' is Z1 then,
=IF(Z1=93, COUNTIFS(N29:N295,">"&85, N29:N295,"<="&95), "")
If, on the other hand, you meant that the values in column Z had to be 93 then,
=COUNTIFS(N29:N295,">"&85, N29:N295,"<="&95, Z29:Z295, 93)

Related

Subselection of a variable

I have a problem with selecting a variable that should contain a certain range of values. I want to split up my variable into 3 categories. Namely; small, medium and big. A piece of context. I have a variable named obj_hid_woonopp which is (size in m2) and it goes from 16-375. And my dataset is called datalogitvar.
I'm sorry I have no reproduceable code. But since I think it's a rather simple question I hope it can be answered nonetheless. The code that I'm using is as follows
datalogitvar$size_small<- as.numeric(obj_hid_WOONOPP>="15" & obj_hid_WOONOPP<="75" )
datalogitvar$size_medium<- as.numeric(obj_hid_WOONOPP>="76" & obj_hid_WOONOPP<="100" )
datalogitvar$size_large<- as.numeric(obj_hid_WOONOPP>="101")
When I run this, I do get a result. Just not the result I'm hoping for. For example the small category also contains very high numbers. It seems that (since i define "75") it also takes values of "175" since it contains "75". I've been thinking about it and I feel it reads my data as text and not as numbers. However I do say as.numeric so I'm a bit confused. Can someone explain to me how I make sure I create these 3 variables with the proper range? I feel I'm close but the result is useless so far.
Thank you so much for helping.
For a question like this you can replicate your problem with a publicly available dataset like mtcars.
And regarding your code
1) you will need to name the dataset for DATASET$obj_hid_WOONOPP on the right side of your code.
2) Why are you using quotes around your numeric values? These quotes prevent the numbers from being treated as numbers. They are instead treated as string values.
I think you want to use something like the code I've written below.
mtcars$mpg_small <- as.numeric(mtcars$mpg >= 15 & mtcars$mpg <= 20)
mtcars$mpg_medium <- as.numeric(mtcars$mpg > 20 & mtcars$mpg <= 25)
mtcars$mpg_large <- as.numeric(mtcars$mpg > 25)
Just to illustrate your problem:
a <- "75"
b <- "175"
a > b
TRUE (75 > 175)
a < b
FALSE (75 < 175)
Strings don't compare as you'd expect them to.
Two ideas come to mind, though an example of code would be helpful.
First, look into the documentation for cut(), which can be used to convert numeric vector into factors based on cut-points that you set.
Second, as #MrFlick points out, your code could be rewritten so that as.numeric() is run on a character vector containing strings that you want to convert to numeric values THEN perform Boolean comparisons such as > or &.
To build on #Joe
mtcars$mpg_small <- (as.numeric(mtcars$mpg) >= 15 &
(as.numeric(mtcars$mpg) <= 20))
Also be careful, if your vector of strings obj_hid_WOONOPP contains some values that cannot be coerced into numerics, they will become NA.

How to use cell reference in R

I am trying to use a formula in current cell with reference to the cell above it in R. For example:
data$srno = data$srno[offset(-1,0)] + 1
Is there a way we can code this in R ?
What may be more convenient for you is to use a lag or shift function from different packages.
Here are some different ways of tackling the challenge:
myvector<-1:26
# base version
1+c(0,myvector[1:length(myvector)-1])
# returns an NA for 1st row
1+Hmisc::Lag(myvector)
1L + data.table::shift(myvector, fill=0)
The problem is the top cell has no cell above it. One approach is to use NA for that cell:
data$srno <- c(NA,data$srno[-length(data$srno)]+1);
Another approach is to consider the bottom cell to "wrap around", so that it can be used in the formula for calculating the new value for the top cell. Whether this makes sense depends on your data/formula, but here's how it could be done:
data$srno <- data$srno[c(length(data$srno),1:(length(data$srno)-1))]+1;

how to transform columns of a data frame according to the values in a vector in R?

I am trying to normalize some columns on a data frame so they have the same mean. The solution I am now implementing, even though it works, feels like there is a simpler way of doing this.
# we make a copy of women
w = women
# print out the col Means
colMeans(women)
height weight
65.0000 136.7333
# create a vector of factors to normalize with
factor = colMeans(women)/colMeans(women)[1]
# normalize the copy of women that we previously made
for(i in 1:length(factor)){w[,i] <- w[,i] / factor[i]}
#We achieved our goal to have same means in the columns
colMeans(w)
height weight
65 65
I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer?
By the way, what does women/factor actually doing? as doing:
colMeans(women/factor)
height weight
49.08646 98.40094
Is not the same result.
Can use mapply too
colMeans(mapply("/", w, factor))
Re your question re what does women/factor do, so women is a data.frame with two columns, while factor is numeric vector of length two. So when you do women/factor, R takes each entry of women (i.e. women[i,j]) and divides it once by factor[1] and then factor[2]. Because factor is shorter in length than women, R rolls factor over and over again.
You can see, for example, that every second entry of women[, 1]/factor equals to every second entry of women[, 1] (because factor[1] equals to 1)
One way of doing this is using sweep. By default this function subtracts a summary statistic from each row, but you can also specify a different function to perform. In this case a division:
colMeans(sweep(women, 2, factor, '/'))
Also:
rowMeans(t(women)/factor)
#height weight
#65 65
Regarding your question:
I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing?
women/factor ## is similar to
unlist(women)/rep(factor,nrow(women))
What you need is:
unlist(women)/rep(factor, each=nrow(women))
or
women/rep(factor, each=nrow(women))
In my solution, I didn't use rep because factor gets recycled as needed.
t(women) ##matrix
as.vector(t(women))/factor #will give same result as above
or just
t(women)/factor #preserve the dimensions for ?rowMeans
In short, column wise operations are happening here.

How to better deal with group by in data.table in this case?

Suppose I have data like:
dt <- data.table(x=1:5, y=c(1,1,2,2,1), y.z=c(1,1,2,2,3))
And I like to group by per y.z. dt is constructed in a way that for each distinct y.z group, all values of y should be equal. The resulting data table I would like is sum of x , and the unique 1 value of y per group of 'y.z'
So, there are 2 approaches that meet my needs:
dt[,list(x=sum(x), y=y[1]), by=y.z]
dt[,list(x=sum(x)), by=list(y.z, y)]
# it might have performance drawback, but I assume it is minor.
Due to my laziness, normally I would just opt to the 2nd way, cus it saves some typing if the list of y-like arguments is long. I.e. write list(y.z, y1, y2, y3,...) instead of y1=y1[1], y2=y2[1], y3=y3[1], ...
However, I am not very sure if this is a good practice. Especially if it happens that there is some errors in y so that it is not all equal for each group, my approach wouldn't trigger any error so the issue is not automatically detectable.
Is it best to customize a function like this?
dt[,list(x=sum(x), y=assert.identical(y)]), by=y.z]
So if y contains only 1 unique value it returns a scalar, otherwise it can trigger an exception. However a customized function is a bit inconvenient to apply since it requires to type even more than y=y[1].
I encounter this dilemma everyday, in R as well as in SQL, both have no cure. What do people normally do when they face it?
unique.data.table has a by argument, and .SD is just a data.table.
Putting this together allows you to execute something like:
dt[,list(x= unique(.SD[, sum(x)], by=c("y1","y2", "y3")), by=y.z]
Note that the by in unique must be a vector of strings (the names of the columns). This is different than the requirements for the by in [.data.table )

Bandwidth selection using NP package

New to R and having problem with a very simple task! I have read a few columns of .csv data into R, the contents of which contains of variables that are in the natural numbers plus zero, and have missing values. After trying to use the non-parametric package, I have two problems: first, if I use the simple command bw=npregbw(ydat=y, xdat=x, na.omit), where x and y are column vectors, I get the error that "number of regression data and response data do not match". Why do I get this, as I have the same number of elements in each vector?
Second, I would like to call the data ordered and tell npregbw this, using the command bw=npregbw(ydat=y, xdat=ordered(x)). When I do that, I get the error that x must be atomic for sort.list. But how is x not atomic, it is just a vector with natural numbers and NA's?
Any clarifications would be greatly appreciated!
1) You probably have a different number of NA's in y and x.
2) Can't be sure about this, since there is no example. If it is of following type:
x <- c(3,4,NA,2)
Then ordered(x) should work fine. Please provide an example of your case.
EDIT: You of course tried bw=npregbw(ydat=y, xdat=x)? ordered() makes your vector an ordered factor (see ?ordered), which is not an atomic vector (see 2.1.1 link and ?factor)
EDIT2: So the problem was the way of subsetting data. Note the difference in various ways of subsetting. data$x and data[,i] (where i = column number of column x) give you vectors, while data[c("x")] and data[i] give a data frame. Functions expect vectors, unless they call for data = (your data). In that case they work with column names

Resources