Converting letter vector into numeric vector - r

If I want to convert the letter vector c("A","B","C") into c(10,20,30), what function could I use?
Sorry for asking a question that seems to be trivial. I am a self-taught beginner and I am still getting familiar with the functions.
Edit:
I explain why I ask such strange question.
So here is the background:
A standard deck of playing cards can be created in R as a data frame with the following
command.
Note that D = Diamond, C = Club, H = Heart, S = Spade
deck <- data.frame(
suit = rep(c("D","C","H","S"), 13),
rank = rep(2:14, 4)
11 = Jack, 12 = Queen, 13 = King, 14 = Ace
)
A poker hand is a set of five playing cards. Sample a poker hand using the data frame
deck and name it as hand.
hand<-deck[sample(nrow(deck),5),]
hand
A flush is a hand that contains five cards all of the same suit. Create a logical value named
is.flush which is TRUE if and only if hand is a flush.
is.flush<-length(unique(hand[,1]))==1
is.flush
And here starts the problem:
"A straight is a hand that contains five cards of sequential rank. Note that both
A K Q J 10 and 5 4 3 2 A are considered to be straight, but Q K A 2 3 is
not. Create a logical value named is. straight which is TRUE if and only if the hand is
straight."
Hint: The all() function could be useful.
So here is my attepmt:
I can set:
y <- read.table(text = "
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10")
apply(y, 1, function(x) all(diff(sort(x[ x != 2 ])) == 1))
Then I can have a TRUE FALSE value.
But I cannot input letters in the function above.
Hence I am stuck here, and I have to convert the letter to numbers.
(Unless there is a smarter way)
P.S.
The background code I have so far:
deck <- data.frame(
suit = rep(c("D","C","H","S"), 13),
rank = rep(2:14, 4)
)
deck
hand<-deck[sample(nrow(deck),5),]
hand
is.flush<-length(unique(hand[,1]))==1
is.flush

Sounds like you want case_when inside a custom function
library(tidyverse)
my_func <- function(letter) {
case_when(letter == 'A' ~ 10,
letter == 'B' ~ 20,
letter == 'C' ~ 30,
TRUE ~ 0)
}
my_func(c("A","B","C"))
Will give you
[1] 10 20 30

If you want to map each letter to an arbitrary output value, you can use a named vector as a dictionary, for example:
dictionary <- (1:26) * 10 # 10, 20, 30 .. 260
names(dictionary) <- LETTERS # built-in vector of uppercase letters
dictionary
A B C D E F G H I J K L M N O P Q R S T U V
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220
W X Y Z
230 240 250 260
You can then use letters to index the dictionary and return the mapped value:
test <- c("B", "L", "A")
dictionary[test]
B L A
20 120 10
The function that is actually performing the mapping here is the [ operator, see the Extract docs.

You could do:
library(tidyverse)
x <- c("A","B","C")
recode(x, A = 10, B = 20, C = 30, .default = 0)

Related

Conditioned vector creation based on another vector

I'm a new user in R. Considering the following vector example <- c (15 1 1 1 7 8 8 9 5 9 5), I would like to create two additional vectors, the first with only the repeated numbers and the second with numbers that are not repeated, something like:
example1 <- c (15, 7)
example2 <- c (1, 8, 9, 5)
Thank you for your support.
Using example shown reproducibly in the Note at the end dups is formed from the duplicated elements and singles is the rest, This always gives two vectors (one will be zero length if there are no duplicates of if there are no singles) and it uses the numeric values directly without converting them to character.
dups <- unique(example[duplicated(example)])
singles <- setdiff(example, dups)
dups
## [1] 1 8 9 5
singles
## [1] 15 7
Note
The input shown in the question was not valid R syntax so we provide the input reproducibly here:
example <- scan(text = "15 1 1 1 7 8 8 9 5 9 5", quiet = TRUE)
You can count the appereances of the values using table:
example <- c(15,1,1,1,7,8,8,9,5,9,5)
tt <- table(example)
The names of the table are the counted values, so you can write:
repeatedValues <- as.numeric(names(tt)[tt>1])
uniqueValues <- as.numeric(names(tt))[tt==1]
Here's a one-liner using rle that puts the resultant vectors in a list:
split(rle(sort(example))$values, rle(sort(example))$lengths < 2)
#> $`FALSE`
#> [1] 1 5 8 9
#> $`TRUE`
#> [1] 7 15

Combine data.frames of different dimensions creating duplicates where needed /r dplyr

I am looking for a way to combine two tables of different dimensions by ID. But the final table should have some douplicated values depending on each table.
Here is a random example:
IDx = c("a", "b", "c", "d")
sex = c("M", "F", "M", "F")
IDy = c("a", "a", "b", "c", "d", "d")
status = c("single", "children", "single", "children", "single", "children")
salary = c(30, 80, 50, 40, 30, 80)
x = data.frame(IDx, sex)
y = data.frame(IDy, status, salary)
Here is x:
IDx sex
1 a M
2 b F
3 c M
4 d F
Here is y:
IDy status salary
1 a single 30
2 a children 80
3 b single 50
4 c children 40
5 d single 30
6 d children 80
I am looking for this:
IDy sex status salary
1 a M single 30
2 a M children 80
3 b F single 50
4 c M children 40
5 d F single 30
6 d F children 80
Basically, sex should be matched to fit the needs of table y. All values in both tables should be used, the actual table is a lot larger. Not all IDs will need to duplicate.
This should be fairly simple, but I cannot find a good answer anywhere online.
Note, I don't want NAs to be introduced.
I am new in R and since I have been focused in dplyr it would help if the example comes from there. It might be simple with base R, too.
UPDATE
The bolded sentences above might be confusing to the final answer. Sorry, it has been a confusing case which I realised should include one extra column tha complicates things, but more of that later.
First, I tried to see what is happening on my actuall table and to find which suggested answer fits my needs. I removed any problematic columns for the following result. So, I checked this:
dim(x)
> [1] 231 2
dim(y)
> [1] 199 8
# left_join joins matching rows from y to x
suchait <- left_join(x, y, by= c("IDx" = "IDy"))
# inner_join retains only rows in both sets
jdobres <- inner_join(y, anno2, by = c(IDx = "IDy"))
dim(suchait) # actuall table used
> [1] 225 9
dim(jdobres)
> [1] 219 9
But why/where do they look different?
This shows the 6 rows that are introduced in suchait's table but not on jdobres and it is because of the different approach.
setdiff(suchait, jdobres )
Using dplyr:
library(dplyr)
df <- left_join(x, y, by = c("IDx" = "IDy"))
Your result would be:
IDx sex status salary
1 a M single 30
2 a M children 80
3 b F single 50
4 c M children 40
5 d F single 30
6 d F children 80
Or you could do:
df <- left_join(y, x, by = c("IDy" = "IDx"))
It would give:
IDy status salary sex
1 a single 30 M
2 a children 80 M
3 b single 50 F
4 c children 40 M
5 d single 30 F
6 d children 80 F
You can also reorder your columns to get it exactly the way you wanted:
df <- df[, c("IDy", "sex", "status", "salary")]
result:
IDy sex status salary
1 a M single 30
2 a M children 80
3 b F single 50
4 c M children 40
5 d F single 30
6 d F children 80

Fastest way to find nearest value in vector

I have two integer/posixct vectors:
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements
Now my resulting vector c should contain for each element of vector a the nearest element of b:
c <- c(4,4,4,4,4,6,6,...)
I tried it with apply and which.min(abs(a - b)) but it's very very slow.
Is there any more clever way to solve this? Is there a data.table solution?
As it is presented in this link you can do either:
which(abs(x - your.number) == min(abs(x - your.number)))
or
which.min(abs(x - your.number))
where x is your vector and your.number is the value. If you have a matrix or data.frame, simply convert them to numeric vector with appropriate ways and then try this on the resulting numeric vector.
For example:
x <- 1:100
your.number <- 21.5
which(abs(x - your.number) == min(abs(x - your.number)))
would output:
[1] 21 22
Update: Based on the very kind comment of hendy I have added the following to make it more clear:
Note that the answer above (i.e 21 and 22) are the indexes if the items (this is how which() works in R), so if you want to get the actual values, you have use these indexes to get the value. Let's have another example:
x <- seq(from = 100, to = 10, by = -5)
x
[1] 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10
Now let's find the number closest to 42:
your.number <- 42
target.index <- which(abs(x - your.number) == min(abs(x - your.number)))
x[target.index]
which would output the "value" we are looking for from the x vector:
[1] 40
Not quite sure how it will behave with your volume but cut is quite fast.
The idea is to cut your vector a at the midpoints between the elements of b.
Note that I am assuming the elements in b are strictly increasing!
Something like this:
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements
cuts <- c(-Inf, b[-1]-diff(b)/2, Inf)
# Will yield: c(-Inf, 5, 8, 13, Inf)
cut(a, breaks=cuts, labels=b)
# [1] 4 4 4 4 4 6 6 6 10 10 10 10 10 16 16
# Levels: 4 6 10 16
This is even faster using a lower-level function like findInterval (which, again, assumes that breakpoints are non-decreasing).
findInterval(a, cuts)
[1] 1 1 1 1 2 2 2 3 3 3 3 3 4 4 4
So of course you can do something like:
index = findInterval(a, cuts)
b[index]
# [1] 4 4 4 4 6 6 6 10 10 10 10 10 16 16 16
Note that you can choose what happens to elements of a that are equidistant to an element of b by passing the relevant arguments to cut (or findInterval), see their help page.
library(data.table)
a=data.table(Value=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
a[,merge:=Value]
b=data.table(Value=c(4,6,10,16))
b[,merge:=Value]
setkeyv(a,c('merge'))
setkeyv(b,c('merge'))
Merge_a_b=a[b,roll='nearest']
In the Data table when we merge two data table, there is an option called nearest which put all the element in data table a to the nearest element in data table b. The size of the resultant data table will be equal to the size of b (whichever is within the bracket). It requires a common key for merging as usual.
For those who would be satisfied with the slow solution:
sapply(a, function(a, b) {b[which.min(abs(a-b))]}, b)
Here might be a simple base R option, using max.col + outer:
b[max.col(-abs(outer(a,b,"-")))]
which gives
> b[max.col(-abs(outer(a,b,"-")))]
[1] 4 4 4 4 6 6 6 10 10 10 10 10 16 16 16
Late to the party, but there is now a function from the DescTools package called Closest which does almost exactly what you want (it just doesn't do multiple at once)
To get around this we can lapply over your a list, and find the closest.
library(DescTools)
lapply(a, function(i) Closest(x = b, a = i))
You might notice that more values are being returned than exist in a. This is because Closest will return both values if the value you are testing is exactly between two (e.g. 3 is exactly between 1 and 5, so both 1 and 5 would be returned).
To get around this, put either min or max around the result:
lapply(a, function(i) min(Closest(x = b, a = i)))
lapply(a, function(i) max(Closest(x = b, a = i)))
Then unlist the result to get a plain vector :)

R code for repeating value into column

I am basically new to using R software.
I have a list of repeating codes (numeric/ categorical) from an excel file. I need to add another column values (even at random) to which every same code will get the same value.
Codes Value
1 122
1 122
2 155
2 155
2 155
4 101
4 101
5 251
5 251
Thank you.
We can use match:
n <- length(code0 <- unique(code))
value <- sample(4 * n, n)[match(code, code0)]
or factor:
n <- length(unique(code))
value <- sample(4 * n, n)[factor(code)]
The random integers generated are between 1 and 4 * n. The number 4 is arbitrary; you can also put 100.
Example
set.seed(0); code <- rep(1:5, sample(5))
code
# [1] 1 1 1 1 1 2 2 3 3 3 3 4 4 4 5
n <- length(code0 <- unique(code))
sample(4 * n, n)[match(code, code0)]
# [1] 5 5 5 5 5 18 18 19 19 19 19 12 12 12 11
Comment
The above gives the most general treatment, assuming that code is not readily sorted or taking consecutive values.
If code is sorted (no matter what value it takes), we can also use rle:
if (!is.unsorted(code)) {
n <- length(k <- rle(code)$lengths)
value <- rep.int(sample(4 * n, n), k)
}
If code takes consecutive values 1, 2, ..., n (but not necessarily sorted), we can skip match or factor and do:
n <- max(code)
value <- sample(4 * n, n)[code]
Further notice: If code is not numerical but categorical, match and factor method will still work.
What you could also do is the following, it is perhaps more intuitive to a beginner:
data <- data.frame('a' = c(122,122,155,155,155,101,101,251,251))
duplicates <- unique(data)
duplicates[, 'b'] <- rnorm(nrow(duplicates))
data <- merge(data, duplicates, by='a')

Repeat an argument in function

I have a list l and an integer n. I would like to pass l n-times to expand.grid.
Is there a better way than writing expand.grid(l, l, ..., l) with n times l?
The function rep seems to do what you want.
n <- 3 #number of repetitions
x <- list(seq(1,5))
expand.grid(rep(x,n)) #gives a data.frame of 125 rows and 3 columns
x2 <- list(a = seq(1,5), b = seq(6, 10))
expand.grid(rep(x2,n)) #gives a data.frame of 15625 rows and 6 columns
If the solution by #Phann doesn't fit to your situation, you can try the following "evil trio" solution:
l <- list(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("male", "female"))
n <- 4
eval(parse(text = paste("expand.grid(",
paste(rep("l", times = n), collapse = ","), ")")))
I think the easiest way to solve the original question is to nest the list using rep.
For example, to expand the same list, n times, use rep to expand the nested list as many times as necessary (n), then use the expanded list as the only argument to expand.grid.
# Example list
l <- list(1, 2, 3)
# Times required
n <- 3
# Expand as many times as needed
m <- rep(list(l), n)
# Expand away
expand.grid(m)
If the function is wanted to (repeatedly) act on the elements of the list freely (i.e., the list members being unconnected from the defined list itself), the following will be useful:
l <- list(1:5, "s") # A list with numerics and characters
n <- 3 # number of repetitions
expand.grid(unlist(rep(l, n))) # the result is:
Var1
1 1
2 2
3 3
4 4
5 5
6 s
7 1
8 2
9 3
10 4
11 5
12 s
13 1
14 2
15 3
16 4
17 5
18 s

Resources