R version of Vlookup with multiple matches per cell

R version of Vlookup with multiple matches per cell - r

I have a vector with numbers and a lookup table. I want the numbers replaced by the description from the lookup table.
This is easy when vectors are straight forward like this example:
> variable <- sample(1:5, 10, replace=T)
> variable
[1] 5 4 5 3 2 3 2 3 5 2
>
> lookup <- data.frame(var = 1:5, description=LETTERS[1:5])
> lookup
var description
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
>
> with(lookup, description[match(variable, var)])
[1] E D E C B C B C E B
Levels: A B C D E
However, when single elements of a vector contain multiple outcomes, I get in trouble:
variable <- c("1", "2^3", "1^5", "4", "4")
I would like the vector returned to give:
c("A", "B^C", "A^E", "D", "D")

If you have only one character match and replacement you can use chartr
chartr(paste0(lookup$var, collapse = ""),
paste0(lookup$description, collapse = ""), variable)
#[1] "A" "B^C" "A^E" "D" "D"
chartr basically tells that replace
paste0(lookup$var, collapse = "")
#[1] "12345"
with
paste0(lookup$description, collapse = "")
#[1] "ABCDE"
It is also useful since it does not change or return NA for characters which do not match.

As mentioned in the comments, there are a couple of steps needed to achieve the desired output. The following splits your variable, indexes the results against the description variable and then uses paste to collapse multiple elements.
sapply(strsplit(variable, "\\^"), function(x) paste0(lookup$description[as.numeric(x)], collapse = "^"))
[1] "A" "B^C" "A^E" "D" "D"

You can use scan to parse text into numeric, which can then be used as an index to pick items which can then be collapsed together. Add quiet=TRUE to suppress "Read" messages.
sapply(variable, function(t) {
paste( lookup$description[ scan(text=t, sep="^")], collapse="^")} )
Read 1 item
Read 2 items
Read 2 items
Read 1 item
Read 1 item
1 2^3 1^5 4 4
"A" "B^C" "A^E" "D" "D"

Related

Storing unique values of each column (of a df) in list

It is straight forward to obtain unique values of a column using unique. However, I am looking to do the same but for multiple columns in a dataframe and store them in a list, all using base R. Importantly, it is not combinations I need but simply unique values for each individual column. I currently have the below:
# dummy data
df = data.frame(a = LETTERS[1:4]
,b = 1:4)
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols)
{
x = unique(i)
unique_values_by_col[[i]] = x
}
The problem comes when displaying unique_values_by_col as it shows as empty. I believe the problem is i is being passed to the loop as a text not a variable.
Any help would be greatly appreciated. Thank you.

Why not avoid the for loop altogether using lapply:
lapply(df, unique)
Resulting in:
> $a
> [1] A B C D
> Levels: A B C D
> $b
> [1] 1 2 3 4

Or you have also apply that is specifically done to be run on column or line:
apply(df,2,unique)
result:
> apply(df,2,unique)
a b
[1,] "A" "1"
[2,] "B" "2"
[3,] "C" "3"
[4,] "D" "4"
thought if you want a list lapply return you a list so may be better

Your for loop is almost right, just needs one fix to work:
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols) {
x = unique(df[[i]])
unique_values_by_col[[i]] = x
}
unique_values_by_col
# $a
# [1] A B C D
# Levels: A B C D
#
# $b
# [1] 1 2 3 4
i is just a character, the name of a column within df so unique(i) doesn't make sense.
Anyhow, the most standard way for this task is lapply() as shown by demirev.

Could this be what you're trying to do?
Map(unique,df)
Result:
$a
[1] A B C D
Levels: A B C D
$b
[1] 1 2 3 4

Find indices of vector elements in a list

I have this toy character vector:
a = c("a","b","c","d","e","d,e","f")
in which some elements are concatenated with a comma (e.g. "d,e")
and a list that contains the unique elements of that vector, where in case of comma concatenated elements I do not keep their individual components.
So this is the list:
l = list("a","b","c","d,e","f")
I am looking for an efficient way to obtain the indices of the elements of a in the l list. For elements of a that are represented by the comma concatenated elements in l it should return the indices of the these comma concatenated elements in l.
So the output of this function would be:
c(1,2,3,4,4,4,5)
As you can see it returns index 4 for a elements: "d", "e", and "d,e"

I would make your search vector into a set of regular expressions, by substituting the comma with a pipe. Add names to the search vector too, according to its position in the list.
L <- setNames(lapply(l, gsub, pattern = ",", replacement = "|"), seq_along(l))
Then you can do:
lapply(L, function(x) grep(x, a, value = TRUE))
# $`1`
# [1] "a"
#
# $`2`
# [1] "b"
#
# $`3`
# [1] "c"
#
# $`4`
# [1] "d" "e" "d,e"
#
# $`5`
# [1] "f"
The names are important, because you can now use stack to get what you are looking for.
stack(lapply(L, function(x) grep(x, a, value = TRUE)))
# values ind
# 1 a 1
# 2 b 2
# 3 c 3
# 4 d 4
# 5 e 4
# 6 d,e 4
# 7 f 5

You could use a strategy with factors. First, find the index for each element in your list with
l <- list("a","b","c","d,e","f")
idxtr <- Map(function(x) unique(c(x, strsplit(x, ",")[[1]])), unlist(l))
This build a list for each item in l along with all possible matches for each element. Then we take the vector a and create a factor with those levels, and then reassign based on the list we just build
a <- c("a","b","c","d","e","d,e","f")
a <- factor(a, levels=unlist(idxtr));
levels(a) <- idxtr
as.numeric(a)
# [1] 1 2 3 4 4 4 5
finally, to get the index, we use as.numeric on the factor

"replace" function examples

I don't find the help page for the replace function from the base package to be very helpful. Worst part, it has no examples which could help understand how it works.
Could you please explain how to use it? An example or two would be great.

If you look at the function (by typing it's name at the console) you will see that it is just a simple functionalized version of the [<- function which is described at ?"[". [ is a rather basic function to R so you would be well-advised to look at that page for further details. Especially important is learning that the index argument (the second argument in replace can be logical, numeric or character classed values. Recycling will occur when there are differing lengths of the second and third arguments:
You should "read" the function call as" "within the first argument, use the second argument as an index for placing the values of the third argument into the first":
> replace( 1:20, 10:15, 1:2)
[1] 1 2 3 4 5 6 7 8 9 1 2 1 2 1 2 16 17 18 19 20
Character indexing for a named vector:
> replace(c(a=1, b=2, c=3, d=4), "b", 10)
a b c d
1 10 3 4
Logical indexing:
> replace(x <- c(a=1, b=2, c=3, d=4), x>2, 10)
a b c d
1 2 10 10

You can also use logical tests
x <- data.frame(a = c(0,1,2,NA), b = c(0,NA,1,2), c = c(NA, 0, 1, 2))
x
x$a <- replace(x$a, is.na(x$a), 0)
x
x$b <- replace(x$b, x$b==2, 333)

Here's two simple examples
> x <- letters[1:4]
> replace(x, 3, 'Z') #replacing 'c' by 'Z'
[1] "a" "b" "Z" "d"
>
> y <- 1:10
> replace(y, c(4,5), c(20,30)) # replacing 4th and 5th elements by 20 and 30
[1] 1 2 3 20 30 6 7 8 9 10

Be aware that the third parameter (value) in the examples given above: the value is a constant (e.g. 'Z' or c(20,30)).
Defining the third parameter using values from the data frame itself can lead to confusion.
E.g. with a simple data frame such as this (using dplyr::data_frame):
tmp <- data_frame(a=1:10, b=sample(LETTERS[24:26], 10, replace=T))
This will create somthing like this:
a b
(int) (chr)
1 1 X
2 2 Y
3 3 Y
4 4 X
5 5 Z
..etc
Now suppose you want wanted to do, was to multiply the values in column 'a' by 2, but only where column 'b' is "X". My immediate thought would be something like this:
with(tmp, replace(a, b=="X", a*2))
That will not provide the desired outcome, however. The a*2 will defined as a fixed vector rather than a reference to the 'a' column. The vector 'a*2' will thus be
[1] 2 4 6 8 10 12 14 16 18 20
at the start of the 'replace' operation. Thus, the first row where 'b' equals "X", the value in 'a' will be placed by 2. The second time, it will be replaced by 4, etc ... it will not be replaced by two-times-the-value-of-a in that particular row.

Here's an example where I found the replace( ) function helpful for giving me insight. The problem required a long integer vector be changed into a character vector and with its integers replaced by given character values.
## figuring out replace( )
(test <- c(rep(1,3),rep(2,2),rep(3,1)))
which looks like
[1] 1 1 1 2 2 3
and I want to replace every 1 with an A and 2 with a B and 3 with a C
letts <- c("A","B","C")
so in my own secret little "dirty-verse" I used a loop
for(i in 1:3)
{test <- replace(test,test==i,letts[i])}
which did what I wanted
test
[1] "A" "A" "A" "B" "B" "C"
In the first sentence I purposefully left out that the real objective was to make the big vector of integers a factor vector and assign the integer values (levels) some names (labels).
So another way of doing the replace( ) application here would be
(test <- factor(test,labels=letts))
[1] A A A B B C
Levels: A B C

R - preserve order when using matching operators (%in%)

I am using matching operators to grab values that appear in a matrix from a separate data frame. However, the resulting matrix has the values in the order they appear in the data frame, not in the original matrix. Is there any way to preserve the order of the original matrix using the matching operator?
Here is a quick example:
vec=c("b","a","c"); vec
df=data.frame(row.names=letters[1:5],values=1:5); df
df[rownames(df) %in% vec,1]
This produces > [1] 1 2 3 which is the order "a" "b" "c" appears in the data frame. However, I would like to generate >[1] 2 1 3 which is the order they appear in the original vector.
Thanks!

Use match.
df[match(vec, rownames(df)), ]
# [1] 2 1 3
Be aware that if you have duplicate values in either vec or rownames(df), match may not behave as expected.
Edit:
I just realized that row name indexing will solve your issue a bit more simply and elegantly:
df[vec, ]
# [1] 2 1 3

Use match (and get rid of the NA values for elements in either vector for those that don't match in the other):
Filter(function(x) !is.na(x), match(rownames(df), vec))

Since row name indexing also works on vectors, we can take this one step further and define:
'%ino%' <- function(x, table) {
xSeq <- seq(along = x)
names(xSeq) <- x
Out <- xSeq[as.character(table)]
Out[!is.na(Out)]
}
We now have the desired result:
df[rownames(df) %ino% vec, 1]
[1] 2 1 3
Inside the function, names() does an auto convert to character and table is changed with as.character(), so this also works correctly when the inputs to %ino% are numbers:
LETTERS[1:26 %in% 4:1]
[1] "A" "B" "C" "D"
LETTERS[1:26 %ino% 4:1]
[1] "D" "C" "B" "A"
Following %in%, missing values are removed:
LETTERS[1:26 %in% 3:-5]
[1] "A" "B" "C"
LETTERS[1:26 %ino% 3:-5]
[1] "C" "B" "A"
With %in% the logical sequence is repeated along the dimension of the object being subsetted, this is not the case with %ino%:
data.frame(letters, LETTERS)[1:5 %in% 3:-5,]
letters LETTERS
1 a A
2 b B
3 c C
6 f F
7 g G
8 h H
11 k K
12 l L
13 m M
16 p P
17 q Q
18 r R
21 u U
22 v V
23 w W
26 z Z
data.frame(letters, LETTERS)[1:5 %ino% 3:-5,]
letters LETTERS
3 c C
2 b B
1 a A

How can I remove an element from a list?

I have a list and I want to remove a single element from it. How can I do this?
I've tried looking up what I think the obvious names for this function would be in the reference manual and I haven't found anything appropriate.

If you don't want to modify the list in-place (e.g. for passing the list with an element removed to a function), you can use indexing: negative indices mean "don't include this element".
x <- list("a", "b", "c", "d", "e"); # example list
x[-2]; # without 2nd element
x[-c(2, 3)]; # without 2nd and 3rd
Also, logical index vectors are useful:
x[x != "b"]; # without elements that are "b"
This works with dataframes, too:
df <- data.frame(number = 1:5, name = letters[1:5])
df[df$name != "b", ]; # rows without "b"
df[df$number %% 2 == 1, ] # rows with odd numbers only

I don't know R at all, but a bit of creative googling led me here: http://tolstoy.newcastle.edu.au/R/help/05/04/1919.html
The key quote from there:
I do not find explicit documentation for R on how to remove elements from lists, but trial and error tells me
myList[[5]] <- NULL
will remove the 5th element and then "close up" the hole caused by deletion of that element. That suffles the index values, So I have to be careful in dropping elements. I must work from the back of the list to the front.
A response to that post later in the thread states:
For deleting an element of a list, see R FAQ 7.1
And the relevant section of the R FAQ says:
... Do not set x[i] or x[[i]] to NULL, because this will remove the corresponding component from the list.
Which seems to tell you (in a somewhat backwards way) how to remove an element.

I would like to add that if it's a named list you can simply use within.
l <- list(a = 1, b = 2)
> within(l, rm(a))
$b
[1] 2
So you can overwrite the original list
l <- within(l, rm(a))
to remove element named a from list l.

Here is how the remove the last element of a list in R:
x <- list("a", "b", "c", "d", "e")
x[length(x)] <- NULL
If x might be a vector then you would need to create a new object:
x <- c("a", "b", "c", "d", "e")
x <- x[-length(x)]
Work for lists and vectors

Removing Null elements from a list in single line :
x=x[-(which(sapply(x,is.null),arr.ind=TRUE))]
Cheers

If you have a named list and want to remove a specific element you can try:
lst <- list(a = 1:4, b = 4:8, c = 8:10)
if("b" %in% names(lst)) lst <- lst[ - which(names(lst) == "b")]
This will make a list lst with elements a, b, c. The second line removes element b after it checks that it exists (to avoid the problem #hjv mentioned).
or better:
lst$b <- NULL
This way it is not a problem to try to delete a non-existent element (e.g. lst$g <- NULL)

Use - (Negative sign) along with position of element, example if 3rd element is to be removed use it as your_list[-3]
Input
my_list <- list(a = 3, b = 3, c = 4, d = "Hello", e = NA)
my_list
# $`a`
# [1] 3
# $b
# [1] 3
# $c
# [1] 4
# $d
# [1] "Hello"
# $e
# [1] NA
Remove single element from list
my_list[-3]
# $`a`
# [1] 3
# $b
# [1] 3
# $d
# [1] "Hello"
# $e
[1] NA
Remove multiple elements from list
my_list[c(-1,-3,-2)]
# $`d`
# [1] "Hello"
# $e
# [1] NA
my_list[c(-3:-5)]
# $`a`
# [1] 3
# $b
# [1] 3
my_list[-seq(1:2)]
# $`c`
# [1] 4
# $d
# [1] "Hello"
# $e
# [1] NA

There's the rlist package (http://cran.r-project.org/web/packages/rlist/index.html) to deal with various kinds of list operations.
Example (http://cran.r-project.org/web/packages/rlist/vignettes/Filtering.html):
library(rlist)
devs <-
list(
p1=list(name="Ken",age=24,
interest=c("reading","music","movies"),
lang=list(r=2,csharp=4,python=3)),
p2=list(name="James",age=25,
interest=c("sports","music"),
lang=list(r=3,java=2,cpp=5)),
p3=list(name="Penny",age=24,
interest=c("movies","reading"),
lang=list(r=1,cpp=4,python=2)))
list.remove(devs, c("p1","p2"))
Results in:
# $p3
# $p3$name
# [1] "Penny"
#
# $p3$age
# [1] 24
#
# $p3$interest
# [1] "movies" "reading"
#
# $p3$lang
# $p3$lang$r
# [1] 1
#
# $p3$lang$cpp
# [1] 4
#
# $p3$lang$python
# [1] 2

Don't know if you still need an answer to this but I found from my limited (3 weeks worth of self-teaching R) experience with R that, using the NULL assignment is actually wrong or sub-optimal especially if you're dynamically updating a list in something like a for-loop.
To be more precise, using
myList[[5]] <- NULL
will throw the error
myList[[5]] <- NULL : replacement has length zero
or
more elements supplied than there are to replace
What I found to work more consistently is
myList <- myList[[-5]]

Just wanted to quickly add (because I didn't see it in any of the answers) that, for a named list, you can also do l["name"] <- NULL. For example:
l <- list(a = 1, b = 2, cc = 3)
l['b'] <- NULL

In the case of named lists I find those helper functions useful
member <- function(list,names){
## return the elements of the list with the input names
member..names <- names(list)
index <- which(member..names %in% names)
list[index]
}
exclude <- function(list,names){
## return the elements of the list not belonging to names
member..names <- names(list)
index <- which(!(member..names %in% names))
list[index]
}
aa <- structure(list(a = 1:10, b = 4:5, fruits = c("apple", "orange"
)), .Names = c("a", "b", "fruits"))
> aa
## $a
## [1] 1 2 3 4 5 6 7 8 9 10
## $b
## [1] 4 5
## $fruits
## [1] "apple" "orange"
> member(aa,"fruits")
## $fruits
## [1] "apple" "orange"
> exclude(aa,"fruits")
## $a
## [1] 1 2 3 4 5 6 7 8 9 10
## $b
## [1] 4 5

Using lapply and grep:
lst <- list(a = 1:4, b = 4:8, c = 8:10)
# say you want to remove a and c
toremove<-c("a","c")
lstnew<-lst[-unlist(lapply(toremove, function(x) grep(x, names(lst)) ) ) ]
#or
pattern<-"a|c"
lstnew<-lst[-grep(pattern, names(lst))]

You can also negatively index from a list using the extract function of the magrittr package to remove a list item.
a <- seq(1,5)
b <- seq(2,6)
c <- seq(3,7)
l <- list(a,b,c)
library(magrittr)
extract(l,-1) #simple one-function method
[[1]]
[1] 2 3 4 5 6
[[2]]
[1] 3 4 5 6 7

There are a few options in the purrr package that haven't been mentioned:
pluck and assign_in work well with nested values and you can access it using a combination of names and/or indices:
library(purrr)
l <- list("a" = 1:2, "b" = 3:4, "d" = list("e" = 5:6, "f" = 7:8))
# select values (by name and/or index)
all.equal(pluck(l, "d", "e"), pluck(l, 3, "e"), pluck(l, 3, 1))
[1] TRUE
# or if element location stored in a vector use !!!
pluck(l, !!! as.list(c("d", "e")))
[1] 5 6
# remove values (modifies in place)
pluck(l, "d", "e") <- NULL
# assign_in to remove values with name and/or index (does not modify in place)
assign_in(l, list("d", 1), NULL)
$a
[1] 1 2
$b
[1] 3 4
$d
$d$f
[1] 7 8
Or you can remove values using modify_list by assigning zap() or NULL:
all.equal(list_modify(l, a = zap()), list_modify(l, a = NULL))
[1] TRUE
You can remove or keep elements using a predicate function with discard and keep:
# remove numeric elements
discard(l, is.numeric)
$d
$d$e
[1] 5 6
$d$f
[1] 7 8
# keep numeric elements
keep(l, is.numeric)
$a
[1] 1 2
$b
[1] 3 4

Here is a simple solution that can be done using base R. It removes the number 5 from the original list of numbers. You can use the same method to remove whatever element you want from a list.
#the original list
original_list = c(1:10)
#the list element to remove
remove = 5
#the new list (which will not contain whatever the `remove` variable equals)
new_list = c()
#go through all the elements in the list and add them to the new list if they don't equal the `remove` variable
counter = 1
for (n in original_list){
if (n != ){
new_list[[counter]] = n
counter = counter + 1
}
}
The new_list variable no longer contains 5.
new_list
# [1] 1 2 3 4 6 7 8 9 10

How about this? Again, using indices
> m <- c(1:5)
> m
[1] 1 2 3 4 5
> m[1:length(m)-1]
[1] 1 2 3 4
or
> m[-(length(m))]
[1] 1 2 3 4

You can use which.
x<-c(1:5)
x
#[1] 1 2 3 4 5
x<-x[-which(x==4)]
x
#[1] 1 2 3 5

if you'd like to avoid numeric indices, you can use
a <- setdiff(names(a),c("name1", ..., "namen"))
to delete names namea...namen from a. this works for lists
> l <- list(a=1,b=2)
> l[setdiff(names(l),"a")]
$b
[1] 2
as well as for vectors
> v <- c(a=1,b=2)
> v[setdiff(names(v),"a")]
b
2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R version of Vlookup with multiple matches per cell - r

Related

Storing unique values of each column (of a df) in list

Find indices of vector elements in a list

"replace" function examples

R - preserve order when using matching operators (%in%)

How can I remove an element from a list?

Categories

Resources