I want to exclude the rows in which x has values less than or equal to -10, so I wrote this:
newdata <- data[which(data$x> -10), ]
Is this right or I need to put -10 in double quotation marks?
Thank you.
(Decided to upgrade this from a comment to an answer.)
Using double quotation marks is not wise: it will mess you up in some quite surprising ways. For example, 1 > "-10" is FALSE (!!) because of the way in which R compares strings.
R's use of <- for assignment may get you in trouble; if you want x<-10 to do the comparison rather than assign the value 10 to x, you need either spaces x < -10 or parentheses (x<(-10)). However, this doesn't arise with the > comparison.
You can always use parentheses if you're worried (x > (-10)); the only drawback is that things get harder to read if you use too many (e.g., data[(which(((data$x)>(-10)))),])).
As pointed out in the comments, R is an interactive environment; if you can't figure something like this out from the documentation or other help sources, you should just try a small example and convince yourself that it works.
For example:
x <- c(-20,-15,-10,-4,0)
x[x>-10]
## -4 0
Related
I've been trying to figure out how to mimic a piecewise linear regression model developed in the pricing software Emblem, using R. I did that using #Roland's answer in the below post.
https://stats.stackexchange.com/questions/61805/standard-error-of-slopes-in-piecewise-linear-regression-with-known-breakpoints
So to get the slopes, thanks to #Roland, I used the as.numeric((variable < X)) to get the slope of the second segment in the predictor variables.
What is going on here? Why does the "as.numeric" give me the correct answer? I can't find documentation on it and I would like to understand why this works.
It converts a boolean (TRUE / FALSE) value to numeric (1 / 0).
(The R-y name for boolean is "logical": is.logical(TRUE) returns TRUE.)
x < 10 # TRUE if x is less than 10, FALSE if x is 10 or more
as.numeric(x<10) # 1 if x is less than 10, 0 if x is 10 or more
This being said, you don't really need an as.numeric there. What you could do instead is:
# will also work:
mod2 <- lm(y~I((x<9.6)*x)+(x<9.6)+I((x>=9.6)*x)+(x>=9.6)-1)
This version will use the boolean values directly -- these are converted implicitly to factors, and how a factor functions within lm is that it is converted into k-1 dichotomous variables where k is the number of levels. So that's why, if you use the code above, you'll see variable names like x < 9.6TRUE in the lm output.
Then again, technically, as.numeric is a hack, and a more transparent way to do it may be something like ifelse(x<9.6,1,0). But hacks are not necessarily bad, so you might also prefer a hackier hack such as (x<9.6)*1 but that won't work within a formula because * has a special meaning in formulas, so you'd have to use I around it: I((x<9.6)*1) - I'd say as.numeric looks cleaner.
I have a problem with selecting a variable that should contain a certain range of values. I want to split up my variable into 3 categories. Namely; small, medium and big. A piece of context. I have a variable named obj_hid_woonopp which is (size in m2) and it goes from 16-375. And my dataset is called datalogitvar.
I'm sorry I have no reproduceable code. But since I think it's a rather simple question I hope it can be answered nonetheless. The code that I'm using is as follows
datalogitvar$size_small<- as.numeric(obj_hid_WOONOPP>="15" & obj_hid_WOONOPP<="75" )
datalogitvar$size_medium<- as.numeric(obj_hid_WOONOPP>="76" & obj_hid_WOONOPP<="100" )
datalogitvar$size_large<- as.numeric(obj_hid_WOONOPP>="101")
When I run this, I do get a result. Just not the result I'm hoping for. For example the small category also contains very high numbers. It seems that (since i define "75") it also takes values of "175" since it contains "75". I've been thinking about it and I feel it reads my data as text and not as numbers. However I do say as.numeric so I'm a bit confused. Can someone explain to me how I make sure I create these 3 variables with the proper range? I feel I'm close but the result is useless so far.
Thank you so much for helping.
For a question like this you can replicate your problem with a publicly available dataset like mtcars.
And regarding your code
1) you will need to name the dataset for DATASET$obj_hid_WOONOPP on the right side of your code.
2) Why are you using quotes around your numeric values? These quotes prevent the numbers from being treated as numbers. They are instead treated as string values.
I think you want to use something like the code I've written below.
mtcars$mpg_small <- as.numeric(mtcars$mpg >= 15 & mtcars$mpg <= 20)
mtcars$mpg_medium <- as.numeric(mtcars$mpg > 20 & mtcars$mpg <= 25)
mtcars$mpg_large <- as.numeric(mtcars$mpg > 25)
Just to illustrate your problem:
a <- "75"
b <- "175"
a > b
TRUE (75 > 175)
a < b
FALSE (75 < 175)
Strings don't compare as you'd expect them to.
Two ideas come to mind, though an example of code would be helpful.
First, look into the documentation for cut(), which can be used to convert numeric vector into factors based on cut-points that you set.
Second, as #MrFlick points out, your code could be rewritten so that as.numeric() is run on a character vector containing strings that you want to convert to numeric values THEN perform Boolean comparisons such as > or &.
To build on #Joe
mtcars$mpg_small <- (as.numeric(mtcars$mpg) >= 15 &
(as.numeric(mtcars$mpg) <= 20))
Also be careful, if your vector of strings obj_hid_WOONOPP contains some values that cannot be coerced into numerics, they will become NA.
I've tried a couple ways of doing this problem but am having trouble with how to write it. I think I did the first three steps correctly, but now I have to fill the vector z with numbers from y that are divisible by four, not divisible by three, and have an odd number of digits. I know that I'm using the print function in the wrong way, I'm just at a loss on what else to use ...
This is different from that other question because I'm not using a while loop.
#Step 1: Generate 1,000,000 random, uniformly distributed numbers between 0
#and 1,000,000,000, and name as a vector x. With a seed of 1.
set.seed(1)
x=runif(1000000, min=0, max=1000000000)
#Step 2: Generate a rounded version of x with the name y
y=round(x,digits=0)
#Step 3: Empty vector named z
z=vector("numeric",length=0)
#Step 4: Create for loop that populates z vector with the numbers from y that are divisible by
#4, not divisible by 3, with an odd number of digits.
for(i in y) {
if(i%%4==0 && i%%3!=0 && nchar(i,type="chars",allowNA=FALSE,keepNA=NA)%%2!=0){
print(z,i)
}
}
NOTE: As per #BenBolker's comment, a loop is an inefficient way to solve your problem here. Generally, in R, try to avoid loops where possible to maximise the efficiency of your code. #SymbolixAU has provided an example of doing so here in the comments. Having said that, in aid of helping you learn the ins-and-outs of loops and vectors, here's a solution which only requires a change to one line of your code:
You've got the vector created before the loop, that's a good start. Now, inside your loop, you need to populate that vector. To do so, you've currently got print(z,i), which won't really do too much. What you need to to change the vector itself:
z <- c( z, i )
Should work for you (just replace that print line in your loop).
What's happening here is that we're taking the existing z vector, binding i to the end of it, and making that new vector z again. So every time a value is added, the vector gets a little longer, such that you'll end up with a complete vector.
where you have print put this instead:
z <- append(z, i)
if I have a vector a<-c(3, 5, 7, 8)
and run a[1], not surprisingly I will get 3
but if I will run a[0] I basically get numeric(0)
What does this mean?
And what does this do?
How can I use it for normal reasons?
Others have answered what x[0] does, so I thought I'd expand on why it's useful: generating test cases. It's great for making sure that your functions work with unusual data structure variants that users sometimes produce accidentally.
For example, it makes it easy to generate 0 row and 0 column data frames:
mtcars[0, ]
mtcars[, 0]
These can arise when subsetting goes wrong:
mtcars[mtcars$cyl > 10, ]
But in your testing code it's useful to flag that you're doing it deliberately.
http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors
As you can see it says: A special case is the zero index, which has null effects: x[0] is an empty vector and otherwise including zeros among positive or negative indices has the same effect as if they were omitted.
I am not sure what I am doing wrong here.
ee <- eigen(crossprod(X))$values
for(i in 1:length(ee)){
if(ee[i]==0:1e^-9) stop("singular Matrix")}
Using the eigen value approach, I am trying to determine if the matrix is singular or not. I am attempting to find out if one of the eigen values of the matrix is between 0 and 10^-9. How can I use the if statement (as above) correctly to achieve my goal? Is there any other way to approach this?
what if I want to concatenate the zero eigen value in vector
zer <-NULL
ee <- eigen(crossprod(X))$values
for(i in 1:length(ee)){
if(abs(ee[i])<=1e-9)zer <- c(zer,ee[i])}
Can I do that?
#AriBFriedman is quite correct. I can, however see a couple of other issues
1e^-9 should be 1e-9.
0:1e-9 returns 0, (: creates a sequence by one between 0 and 1e-9, therefore returns just 0. See ?`:` for more details
Using == with decimals will cause problems due to floating point arithmetic
In the form written, your code checks (individually) whether the elements ee[i] == 0, which is not what you want (nor does it make sense in terms floating point arithmetic)
You are looking for cases where the eigen value is less than this small number, so use less than (<).
What you are looking for is something like
if(any(abs(ee) < 1e-9)) stop('singular matrix')
If you want to get the 0 (or small) eigen vectors, then use which
# this will give the indexs (which elements are small)
small_values <- which(abs(ee) < 1e-9))
# and those small values
ee[small_values]
There is no need for the for loop as everything being done is vectorized.
if takes a single argument of length 1.
Try either ifelse or using any() or all() to turn your vector of logicals into a logical vector of length 1.
Here's an example reproducing your data:
X <- matrix(1:10,1:10)
ee <- eigen(crossprod(X))$values
This will test if any of the values of ee are > 0 AND< 1e-9
if (any((ee > 0) & (ee < 1e-9))) {stop("singular matrix")}