I am tying to code a new variable based on ifs and elses using isTRUE and am having difficulty. I would like to have a condition such as
if (isTRUE(t$a > t$b)) {
t$c <- 0
} else if (isTRUE(t$a < t$b)) {
t$c <- 1
} else {
t$c <- 2
}
Consider the following data:
t<-as.data.frame(c(1:5))
names(t)<-"a"
t$b<-c(5:1)
Running the above code gives c values as always being 2 i.e. isTRUE(t$a > t$b) and isTRUE(t$a < t$b) are always FALSE.
Read the help for isTRUE:
‘isTRUE(x)’ is an abbreviation of ‘identical(TRUE, x)’, and so is
true if and only if ‘x’ is a length-one logical vector whose only
element is ‘TRUE’ and which has no attributes (not even names).
This is probably not what you want.
I'm guessing that you want a vector, t$c that is 0 if t$a>t$b, 1 if t$a<t$b and 2 otherwise. In R, we can do that in a single vectorised operation:
Easier setup:
> t = data.frame(a=1:5, b=5:1)
> t
a b
1 1 5
2 2 4
3 3 3
4 4 2
5 5 1
Now if c is 0 if a>b, 1 if a
> t$c=2-((t$a>t$b)+(t$a!=t$b))
> t
a b c
1 1 5 1
2 2 4 1
3 3 3 2
4 4 2 0
5 5 1 0
Logical operations (>, != etc) operate along vectors, and evaluate numerically to 1 for TRUE and 0 for FALSE. If you try typing parts of my expression for t$c you should learn how this all works together.
If you don't like that tricksy boolean arithmetic, a couple of nested ifelse functions work:
t$c = ifelse(t$b>t$a, 1, ifelse(t$b==t$a,2,0))
This has the advantage of being a bit more readable - if b>a its 1 otherwise if b=a its 2 otherwise its 0. Note how ifelse works, like lots of R functions, on each element of a vector.
Here's an approach using sign() and indexing a vector of desired values:
t <- data.frame(a=1:5,b=5:1);
t;
## a b
## 1 1 5
## 2 2 4
## 3 3 3
## 4 4 2
## 5 5 1
t$c <- c(1,2,0)[sign(t$a-t$b)+2];
t;
## a b c
## 1 1 5 1
## 2 2 4 1
## 3 3 3 2
## 4 4 2 0
## 5 5 1 0
The advantage here is that you can easily change the desired values later, because they're defined explicitly in the indexed vector as (a<b,a==b,a>b). Spacedman's solution to use logical arithmetic is rather brilliant (+1 from me!), but does not easily lend itself to future changes in values.
use a nested ifelse
t$c=ifelse(t$a<t$b,1,
ifelse(t$a>t$b,0,2)
)
from help of nested ifelse
?ifelse
S4 method for class 'db.obj':
ifelse((test, yes, no))
Arguments
test
A db.obj object, which has only one column. The column can be casted into boolean values.
yes
A normal value or a db.obj object. It is the returned value when test is TRUE.
no
The returned value when test is FALSE.
Related
Consider an example data frame:
A B C v
5 4 2 3
7 1 3 5
1 2 1 1
I want to set all elements of a row to 1 if the element is bigger or equal than v, and 0 otherwise. The example data frame would result in the following:
A B C v
1 1 0 3
1 0 0 5
1 1 1 1
How can I do this efficiently? The number of columns will be much higher, and I would like a solution that does not require me to specify the names of the columns individually, and will apply it to all of them (except v) instead.
My solution with a for loop is way too slow.
We can create a logical matrix and coerce to binary
df1[-4] <- +(df1[-4] >= df1$v)
All of the variables are on the same scale in the data.frame 1-5.
Example of data.frame
rpi_invert
A B C D
5 2 4 1
3 5 5 2
1 1 3 4
For all values that equal 5 I would like to change it to 1.
for 4 change to 2.
for 2 change to 4.
for 1 change to 5.
Example of data.frame after values have been changed.
rpi_invert
A B C D
1 4 2 5
3 1 1 4
5 5 3 2
What I have tired.
for(b in colnames(rpi_invert)){
rpi_invert[[b]][rpi_invert[[b]] == 5] <- 1
rpi_invert[[b]][rpi_invert[[b]] == 4] <- 2
rpi_invert[[b]][rpi_invert[[b]] == 2] <- 4
rpi_invert[[b]][rpi_invert[[b]] == 1] <- 5
}
This will only change the values in the first row and not the second column.
for(b in colnames(rpi_invert)){
rpi_invert <- ifelse(rpi_invert[[b]] == 5,1,
ifelse(rpi_invert[[b]] == 4,2,
ifelse(rpi_invert[[b]] == 2,4,
ifelse(rpi_invert[[b]] == 1,5,rpi_invert[[b]]))))
}
But this gives me the error:
Error in rpi_invert[[b]] : subscript out of bounds
If I try to the same methods for an individual column instead of looping through the data.frame then both methods work so I am not sure what is the problem.
I am sure what I am trying to do can be done more efficiently without a for loop probably with some type of apply function but I am not sure how.
Any help will be appreciated please let me know if further information is needed.
You can try (if your data.frame is df):
3-(df-3)
# A B C D
#1 1 4 2 5
#2 3 1 1 4
#3 5 5 3 2
or, same but written a bit differently: 6-df
In R, in a vector, i.e. a 1-dim matrix, I would like to change components with value 3 to with value 1, and components with value 4 with value 2. How shall I do that? Thanks!
The idiomatic r way is to use [<-, in the form
x[index] <- result
If you are dealing with integers / factors or character variables, then == will work reliably for the indexing,
x <- rep(1:5,3)
x[x==3] <- 1
x[x==4] <- 2
x
## [1] 1 2 1 2 5 1 2 1 2 5 1 2 1 2 5
The car has a useful function recode (which is a wrapper for [<-), that will let you combine all the recoding in a single call
eg
library(car)
x <- rep(1:5,3)
xr <- recode(x, '3=1; 4=2')
x
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
xr
## [1] 1 2 1 2 5 1 2 1 2 5 1 2 1 2 5
Thanks to #joran for mentioning mapvalues from the plyr package, another wrapper for [<-
x <- rep(1:5,3)
mapvalues(x, from = c(3,1), to = c(1,2))
plyr::revalue is a wrapper for mapvalues specifically factor or character variables.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, what is the difference between the [] and [[]] notations for accessing the elements of a list?
I'm confused with the difference of [1], [1,], [,1], [[1]] for dataframe type.
As I know, [1,] will fetch the first row of a matrix, [,1] will fetch the first column. [[1]] will fetch the first element of a list.
But I checked the document of data.frame, which says
A data frame is a list of variables of the same number of rows with
unique row names
Then I typed in some code to test the usage.
>L3 <- LETTERS[1:3]
>(d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE)))
x y fac
1 1 1 C
2 1 2 B
3 1 3 C
4 1 4 C
5 1 5 A
6 1 6 B
7 1 7 C
8 1 8 A
9 1 9 A
10 1 10 A
> d[1]
x
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
>d[1,]
x y fac
1 1 1 C
>d[,1]
[1] 1 1 1 1 1 1 1 1 1 1
>d[[1]]
[1] 1 1 1 1 1 1 1 1 1 1
What confused me is: [1,] and [,1] is only used in matrix. [[1]] is only used in list, and [1] is used in vector, but why all of them are available in dataframe?
Could anybody explain the difference of these usage?
In R, operators are not used for one data type only. Operators can be overloaded for whatever data type you like (e.g. also S3/S4 classes).
In fact, that's the case for data.frames.
as data.frames are lists, the [i] and [[i]] (and $) show list-like behaviour.
row, colum indices do have an intuitive meaning for tables, and data.frames look like tables. Probably that is the reason why methods for data.frame [i, j] were defined.
You can even look at the definitions, they are coded in the S3 system (so methodname.class):
> `[.data.frame`
and
> `[[.data.frame`
(the backticks quote the function name, otherwise R would try to use the operator and end up with a syntax error)
I have returned stats on my data using the table command as such:
subject<-c(4,4,2,2,3,3)
correct<-c(0,1,1,1,0,0)
test<-data.frame(subject,correct)
freq_test<-head(table(test$subject,test$correct))
This returns a table which looks like this
0 1
2 0 2
3 2 0
4 1 1
That's great, but the problem is that I would like, the first column to be a vector rather than row.names (so that I can code it properly as "subject").
Is there a way to get this column to act in this way?
Just make a new data frame with the row names of freq_test as the first column:
> df<-data.frame(as.numeric(rownames(freq_test)),freq_test)
> colnames(df)[1]="subject"
> df
subject X0 X1
2 2 0 2
3 3 2 0
4 4 1 1
>
Of course, you can rename X0 and X1 to whatever you want by editing colnames(df) as above.
If you want the data in "long" format (useful for some models and plotting, and especially when your tables are more complicated), the table method for the generic function as.data.frame will take care of this for you:
> as.data.frame(table(test))
subject correct Freq
1 2 0 0
2 3 0 2
3 4 0 1
4 2 1 2
5 3 1 0
6 4 1 1
I think you should have used the standard method of construction of a data.frame, which is with name=values pairs:
test <- data.frame( subject=subject, correct=correct)
The first subject will be interpreted as a name to be quoted and the second subject will be interpreted .... i.e, the enclosing environments will be searched for an object named subject and its value will be assigned to the "subject" column of "test".