R : Map a column to column using key-value list - r

In R, I want to use a key-value list to convert a column of keys to values. It's similar to: How to map a column through a dictionary in R, but I want to use a list not a data.frame.
I've tried to do this using list and columns:
d = list('a'=1, 'b'=2, 'c'=3)
d[c('a', 'a', 'c', 'b')] # I want this to return c(1,1,3,2) but it doesn't
However, the above returns a list:
list('a'=1, 'a'=1, 'c'=3, 'b'=2)

unlist is a useful function in this situation
unlist(d[c('a', 'a', 'c', 'b')], use.names=FALSE)
#[1] 1 1 3 2
Or another option is stack which returns the 'key/value' as columns in a 'data.frame'. By subsetting the values column, we get
stack( d[c('a', 'a', 'c', 'b')])[,1]
#[1] 1 1 3 2

Related

Convert simple data.table to named vector [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 3 years ago.
I need to convert simple data.table to named vector.
Lets say I have data.table
a <- data.table(v1 = c('a', 'b', 'c'), v2 = c(1,2,3))
and I want to get the following named vector
b <- c(1, 2, 3)
names(b) <- c('a', 'b', 'c')
Is there a way to do it simple
Using setNames() in j:
a[, setNames(v2, v1)]
# a b c
# 1 2 3
We can use split and unlist to get it as named vector.
unlist(split(a$v2, a$v1))
#a b c
#1 2 3

Frequent Sequential Patterns

What would be the best way to get the sequential pattern for such data in R :
The idea is to get the frequency of letters in process 1,2, and 3. Is there GSP function that can do that ? any insight or tutorial is appreciated.
you can use an apply and table combo (provided you read your data into R):
dat <- data.frame(process1 = c('A', 'B', 'A', 'A', 'C'), process2 = c('B', 'C', 'B', 'B', 'A'), process3 = c('C', 'C', 'A', 'B', 'B'))
apply(dat, 2, table)
# process1 process2 process3
#A 3 1 1
#B 1 3 2
#C 1 1 2
apply iterates through the columns of dat (this is what argument 2 refers to) and applies table to each, which counts each unique element. see help pages for *apply family of functions for more info.
d.b's solution above, lapply(dat, table), does the same thing but returns a list rather than a matrix.

index vector by value in R

Say I have two character vectors
vec <- c('A', 'B', 'C', 'D', 'E')
pat <- c('D', 'B', 'A')
how do I get the indexes of the occurrences in vec of the values in pat in the order they appear in pat?
I can try
which(vec %in% pat)
but this gives me them in the incorrect order: 1 2 4. I want them as 4 2 1.
I tried different ways to solve this problem before and always found that the easiest way to solve it is the solution as mentioned in #DavidArenburg's comment:
match(pat, vec)
# [1] 4 2 1

SQL attribute FROM and WHERE in R data.frames

Please how to select data using SQL like features in R data.frames ?
Let's say I have the following data.frame :
Names Numbers
A 1
B 2
C 3
How to select number 2 using strings "B" and "Numbers" and not data[2,2] ? I would like to use something like data["B", "Numbers"] but it doesn't work, help please !!!
You can use [, or subset when using data.frames. Note that [ has a drop = TRUE argument which will coerce to an atomic vector if a single value / column is returned.
DF <- data.frame(Names = LETTERS[1:3], Numbers = 1:3)
subset(DF, Names == 'B', select = Numbers)
## Numbers
## 2 2
DF[DF$Names == 'B', 'Numbers']
## [1] 2
DF[DF$Names == 'B', 'Numbers', drop = FALSE]
## Numbers
## 2 2
I like data.tables. FAQ 2.16 describes the similarities between SQL and data.table syntax
library(data.table)
DT <- data.table(DF)
DT[Names == 'B', Numbers]
## [1] 2
# using keys
setkey(DT,Names)
DT['B'][,list(Numbers)]
## Numbers
## 1: 2
or there is sqldf which lets you use SQL in data.frames
library(sqldf)
sqldf('select Numbers from DF where Names == "B"')
## Numbers
## 1 2

Combine vector and data.frame matching column values and vector values

I have
vetor <- c(1,2,3)
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
I need a data.frame output that match each vector value to a specific id, resulting:
id vector1
1 a 1
2 b 2
3 a 1
4 c 3
5 a 1
Here are two approaches I often use for similar situations:
vetor <- c(1,2,3)
key <- data.frame(vetor=vetor, mat=c('a', 'b', 'c'))
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
data$vector1 <- key[match(data$id, key$mat), 'vetor']
#or with merge
merge(data, key, by.x = "id", by.y = "mat")
So you want one unique integer for each different id column?
This is called a factor in R, and your id column is one.
To convert to a numeric representation, use as.numeric:
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
data$vector1 <- as.numeric(data$id)
This works because data$id is not a column of strings, but a column of factors.
Here's an answer I found that follows the "mathematical.coffee" tip:
vector1 <- c('b','a','a','c','a','a') # 3 elements to be labeled: a, b and c
labels <- factor(vector1, labels= c('char a', 'char b', 'char c') )
data.frame(vector1, labels)
The only thing we need to observe is that in the factor(vector1,...) function, vector1 will be ordered and the labels must follow that order correctly.

Resources