I don't find the help page for the replace function from the base package to be very helpful. Worst part, it has no examples which could help understand how it works.
Could you please explain how to use it? An example or two would be great.
If you look at the function (by typing it's name at the console) you will see that it is just a simple functionalized version of the [<- function which is described at ?"[". [ is a rather basic function to R so you would be well-advised to look at that page for further details. Especially important is learning that the index argument (the second argument in replace can be logical, numeric or character classed values. Recycling will occur when there are differing lengths of the second and third arguments:
You should "read" the function call as" "within the first argument, use the second argument as an index for placing the values of the third argument into the first":
> replace( 1:20, 10:15, 1:2)
[1] 1 2 3 4 5 6 7 8 9 1 2 1 2 1 2 16 17 18 19 20
Character indexing for a named vector:
> replace(c(a=1, b=2, c=3, d=4), "b", 10)
a b c d
1 10 3 4
Logical indexing:
> replace(x <- c(a=1, b=2, c=3, d=4), x>2, 10)
a b c d
1 2 10 10
You can also use logical tests
x <- data.frame(a = c(0,1,2,NA), b = c(0,NA,1,2), c = c(NA, 0, 1, 2))
x
x$a <- replace(x$a, is.na(x$a), 0)
x
x$b <- replace(x$b, x$b==2, 333)
Here's two simple examples
> x <- letters[1:4]
> replace(x, 3, 'Z') #replacing 'c' by 'Z'
[1] "a" "b" "Z" "d"
>
> y <- 1:10
> replace(y, c(4,5), c(20,30)) # replacing 4th and 5th elements by 20 and 30
[1] 1 2 3 20 30 6 7 8 9 10
Be aware that the third parameter (value) in the examples given above: the value is a constant (e.g. 'Z' or c(20,30)).
Defining the third parameter using values from the data frame itself can lead to confusion.
E.g. with a simple data frame such as this (using dplyr::data_frame):
tmp <- data_frame(a=1:10, b=sample(LETTERS[24:26], 10, replace=T))
This will create somthing like this:
a b
(int) (chr)
1 1 X
2 2 Y
3 3 Y
4 4 X
5 5 Z
..etc
Now suppose you want wanted to do, was to multiply the values in column 'a' by 2, but only where column 'b' is "X". My immediate thought would be something like this:
with(tmp, replace(a, b=="X", a*2))
That will not provide the desired outcome, however. The a*2 will defined as a fixed vector rather than a reference to the 'a' column. The vector 'a*2' will thus be
[1] 2 4 6 8 10 12 14 16 18 20
at the start of the 'replace' operation. Thus, the first row where 'b' equals "X", the value in 'a' will be placed by 2. The second time, it will be replaced by 4, etc ... it will not be replaced by two-times-the-value-of-a in that particular row.
Here's an example where I found the replace( ) function helpful for giving me insight. The problem required a long integer vector be changed into a character vector and with its integers replaced by given character values.
## figuring out replace( )
(test <- c(rep(1,3),rep(2,2),rep(3,1)))
which looks like
[1] 1 1 1 2 2 3
and I want to replace every 1 with an A and 2 with a B and 3 with a C
letts <- c("A","B","C")
so in my own secret little "dirty-verse" I used a loop
for(i in 1:3)
{test <- replace(test,test==i,letts[i])}
which did what I wanted
test
[1] "A" "A" "A" "B" "B" "C"
In the first sentence I purposefully left out that the real objective was to make the big vector of integers a factor vector and assign the integer values (levels) some names (labels).
So another way of doing the replace( ) application here would be
(test <- factor(test,labels=letts))
[1] A A A B B C
Levels: A B C
Related
I am looking to select 3 values above every Event=x to view in a table. My data is as follows:
Event
1 a
2 b
3 c
4 a
5 x
6 c
7 a
8 b
9 c
10 x
This is what I would like for a return:
Value
1 b
2 c
3 a
4 a
5 b
6 c
Any help would be appreciated!
library(tidyverse)
library(magrittr)
Event <- as_data_frame(c("a","b","c","a","x","c","a","b","c","x"))
names(Event) <- "Event"
Event %<>%
filter(lead(Event,3)=="x" | lead(Event,2)=="x" | lead(Event,1)=="x")
Hope this helps. The lead and lag function are useful, although when you apply it you should be very careful about how the data is grouped and arranged/sorted. Cheers!
Here's a base R solution
Event = c("a","b","c","a","x","c","a","b","c","x")
xs = which(Event == "x")
lag = sort(c(xs-3,xs-2,xs-1))
Event[lag[lag > 0]
# [1] "b" "c" "a" "a" "b" "c"
which returns the index of the TRUE values in a logical vector. In this case, which positions are "x". We can then decrement the index by 1, 2 and 3 to get the lagged positions.
Now, which can be dangerous because if there's nothing meeting the condition then it returns a length 0 vector which can wreak havoc. In this case though, it's OK because you'll get a length 0 vector as output.
xs_bad = which(Event == "y")
lag_bad = sort(c(xs_bad - 3,xs_bad - 2,xs_bad - 1))
Event[lag_bad]
# character(0)
which can bite you in another way, too.
Event_bad = c("a","x","a","b","c","x")
Now there's an "x" that's not even 3 positions away from the beginning.
xs_bad = which(Event_bad == "x")
lag_bad = sort(c(xs_bad - 3,xs_bad - 2,xs_bad - 1))
lag_bad
# [1] -1 0 1 3 4 5
Negative indexes can't be mixed with non-0 indexes so our last step will fail. Defensively then we can change the code to remove this chance.
Event_bad[lag_bad > 0]
# [1] "a" "a" "b" "c"
I'm returning to R after some time, and the following has me stumped:
I'd like to build a list of the positions factor values have in the facor levels list.
Example:
> data = c("a", "b", "a","a","c")
> fdata = factor(data)
> fdata
[1] a b a a c
Levels: a b c
> fdata$lvl_idx <- ????
Such that:
> fdata$lvl_idx
[1] 1 2 1 1 3
Appreciate any hints or tips.
If you convert a factor to integer, you get the position in the levels:
as.integer(fdata)
## [1] 1 2 1 1 3
In certain situations, this is counter-intuitive:
f <- factor(2:4)
f
## [1] 2 3 4
## Levels: 2 3 4
as.integer(f)
## [1] 1 2 3
Also if you silently coerce to integer, for example by using a factor as a vector index:
LETTERS[2:4]
## [1] "B" "C" "D"
LETTERS[f]
## [1] "A" "B" "C"
Converting to character before converting to integer gives the expected values. See ?factor for details.
The solution provided years ago by Matthew Lundberg is not robust. It could be that the as.integer() function was defined for a specific S3 type of factors. Imagine someone would create a new factor class to keep operators like >=.
as.myfactor <- function(x, ...) {
structure(as.factor(x), class = c("myfactor", "factor"))
}
# and that someone would create an S3 method for integers - it should
# only remove the operators, which makes sense...
as.integer.myfactor <- function(x, ...) {
as.integer(gsub("(<|=|>)+", "", as.character(x)))
}
Now this is not working anymore, - it just removes operators:
f <- as.myfactor(">=2")
as.integer(f)
#> [1] 2
But this is robust with any factor you want to know the index of the level of, using which():
f <- factor(2:4)
which(levels(f) == 2)
#> [1] 1
If I want to number all elements in two vectors, vector 1 gets all odd bumbers and vector 2 gets all even numbers, I can do this assuming the vectors are of length 10.
seq(1, 10, by=2)
[1] 1 3 5 7 9
seq(2, 11, by=2)
[1] 2 4 6 8 10
but if my vector has only one element I will run into problems:
seq(2)
[1] 1 2
so I use:
seq_along(2)
[1] 1
BUT I cant use by= in seq_long(). How do i get the reliability of seq_along with the functionality of seq()?
This example might clear things.
Imagine I ahve two lists:
list1 <- list(4)
list2 <- list(4)
list1 must get even names along the element of the list.
list2 must get odd names along the element of the list.
I dont know how long the list elements will be.
seq_along(list1[[1]]) # this will know to only give one name but I cant make it even
seq(list2[[1]]) # this know to give 1 name
#and
seq(2, list1[[1]], by=2) # this gives me even but too nay names
Here's a function that adds a 'by' argument to seq_along:
seq_along_by = function(x, by=1L, from = 1L) (seq_along(x) - 1L) * by + from
and some test cases
> seq_along_by(integer(), 2L)
integer(0)
> seq_along_by(1, 2L)
[1] 1
> seq_along_by(1:4, 2L)
[1] 1 3 5 7
> seq_along_by(1:4, 2.2)
[1] 1.0 3.2 5.4 7.6
> seq_along_by(1:4, -2.2)
[1] 1.0 -1.2 -3.4 -5.6
one way i just found is:
y <- seq_along(1:20)
y[y %% 2 == 0 ]
[1] 2 4 6 8 10 12 14 16 18 20
y[ !y %% 2 == 0 ]
[1] 1 3 5 7 9 11 13 15 17 19
But this will only work when my vectors are even. Must be able to do better.
I'm not sure what you are trying to do, but if you want to split odd and even elements in a vector, you can do just that:
x <- 1:19
split(x,x%%2)
$`0`
[1] 2 4 6 8 10 12 14 16 18
$`1`
[1] 1 3 5 7 9 11 13 15 17 19
To extract the odd and even numbered elements, use lapply on this list using seq_along to enumerate the element numbers:
x <- rep(c("odd","even"),times=4)
lapply(split(seq_along(x),seq_along(x)%%2),function(y) "["(x,y))
$`0`
[1] "even" "even" "even" "even"
$`1`
[1] "odd" "odd" "odd" "odd"
This can of course be made into a function:
split_oe <- function(x) lapply(split(seq_along(x),seq_along(x)%%2),function(y) "["(x,y))
split_oe(1:10)
$`0`
[1] 2 4 6 8 10
$`1`
[1] 1 3 5 7 9
> split_oe(2)
$`1`
[1] 2
I'm adding another answer to address what may be your intent of the question rather than the question as you've stated it.
Let's assume you have a couple arrays, A1 and A2, with values, and you want to link an index to those values, so you can say index[n] and get a corresponding value from A1[n/2 + 1] if n is odd and A2[n/2] if n is even.
We would build a new vector, index, like so:
# Sample arrays
A1 <- sample(LETTERS, 5, rep=TRUE)
A2 <- sample(LETTERS, 5, rep=TRUE)
n_Max <- length(c(A1,A2))
index <- integer(n_Max)
index[seq(1,n_Max,by=2)] <- A1
index[seq(2,n_Max,by=2)] <- A2
Now, index[n] returns A1 values when n is odd, and returns A2 values when n is even. This breaks if length(A2) is not equal to or one less than length(A1).
If I understand correctly, what you really want is a to get the 'seq' function to return only odd or oven numbers 1..max or 2..max, respectively. You would write that like so:
seq(1, max, by=2) # Odd numbers
seq(2, max, by=2) # Even numbers
Where max is the top number in your series. The only time this will break is if max is less than 2.
Update 1: There seems to be a bit of discussion about what the OP is requesting. If we assume there are two existing vectors to be numbered, we can obtain the total number of vector items using max <- length(c(vector1, vector2)) to obtain the maximum number being used. Then, the indices would be assigned like so:
vector1 <- seq(1, max, by=2)
vector2 <- seq(2, max, by=2)
And this will work for any set EXCEPT when one vector does not have any elements at all.
Update 2: There is one final approach, which you can take if your vectors do not represent all values between 1 and max. This is how it would work:
vector1 <- seq(1, length(vector1) * 2, by=2)
vector2 <- seq(1, length(vector2) * 2, by=2)
This independently assigns the values of vector1 and vector2 according to their own lengths.
Quick question. Why does the following work in R (correctly assigning the variable value "Hello" to the first element of the vector):
> a <- "Hello"
> b <- c(a, "There")
> b
[1] "Hello" "There"
And this works:
> c <- c("Hello"=1, "There"=2)
> c
Hello There
1 2
But this does not (making the vector element name equal to "a" rather than "Hello"):
> c <- c(a=1, "There"=2)
> c
a There
1 2
Is it possible to make R recognize that I want to use the value of a in the statement c <- c(a=1, "There"=2)?
I am not sure how c() internally creates the names attribute from the named objects. Perhaps it is along the lines of list() and unlist()? Anyway, you can assign the values of the vector first, and the names attribute later, as in the following.
a <- "Hello"
b <- c(1, 2)
names(b) = c(a, "There")
b
# Hello There
# 1 2
Then to access the named elements later:
b[a] <- 3
b
# Hello There
# 3 2
b["Hello"] <- 4
b
# Hello There
# 4 2
b[1] <- 5
b
# Hello There
# 5 2
Edit
If you really wanted to do it all in one line, the following works:
eval(parse(text = paste0("c(",a," = 1, 'there' = 2)")))
# Hello there
# 1 2
However, I think you'll prefer assigning values and names separately to the eval(parse()) approach.
Assign the values in a named list. Then unlist it. e.g.
lR<-list("a" = 1, "There" = 2 )
v = unlist(lR)
this gives a named vector v
v
a There
1 2
I am using matching operators to grab values that appear in a matrix from a separate data frame. However, the resulting matrix has the values in the order they appear in the data frame, not in the original matrix. Is there any way to preserve the order of the original matrix using the matching operator?
Here is a quick example:
vec=c("b","a","c"); vec
df=data.frame(row.names=letters[1:5],values=1:5); df
df[rownames(df) %in% vec,1]
This produces > [1] 1 2 3 which is the order "a" "b" "c" appears in the data frame. However, I would like to generate >[1] 2 1 3 which is the order they appear in the original vector.
Thanks!
Use match.
df[match(vec, rownames(df)), ]
# [1] 2 1 3
Be aware that if you have duplicate values in either vec or rownames(df), match may not behave as expected.
Edit:
I just realized that row name indexing will solve your issue a bit more simply and elegantly:
df[vec, ]
# [1] 2 1 3
Use match (and get rid of the NA values for elements in either vector for those that don't match in the other):
Filter(function(x) !is.na(x), match(rownames(df), vec))
Since row name indexing also works on vectors, we can take this one step further and define:
'%ino%' <- function(x, table) {
xSeq <- seq(along = x)
names(xSeq) <- x
Out <- xSeq[as.character(table)]
Out[!is.na(Out)]
}
We now have the desired result:
df[rownames(df) %ino% vec, 1]
[1] 2 1 3
Inside the function, names() does an auto convert to character and table is changed with as.character(), so this also works correctly when the inputs to %ino% are numbers:
LETTERS[1:26 %in% 4:1]
[1] "A" "B" "C" "D"
LETTERS[1:26 %ino% 4:1]
[1] "D" "C" "B" "A"
Following %in%, missing values are removed:
LETTERS[1:26 %in% 3:-5]
[1] "A" "B" "C"
LETTERS[1:26 %ino% 3:-5]
[1] "C" "B" "A"
With %in% the logical sequence is repeated along the dimension of the object being subsetted, this is not the case with %ino%:
data.frame(letters, LETTERS)[1:5 %in% 3:-5,]
letters LETTERS
1 a A
2 b B
3 c C
6 f F
7 g G
8 h H
11 k K
12 l L
13 m M
16 p P
17 q Q
18 r R
21 u U
22 v V
23 w W
26 z Z
data.frame(letters, LETTERS)[1:5 %ino% 3:-5,]
letters LETTERS
3 c C
2 b B
1 a A