Using letters in function argument - r

Im trying to create a function that returns characteristic symbol to a defined value like this
"a" to 1
"b" to 2
"c" to 3
And where there is only one input argument (one of "a", "b" or "c") in the function. Like this: function(x), for example function("a") returns 1.

We can convert with matching to the default Constant vector letters
f1 <- function(arg1){
match(arg1, letters)
}
f1('a')
#[1] 1
f1('b')
#[1] 2
f1(c('a', 'b', 'c'))
#[1] 1 2 3

letterToNumber <- function(x){
which(x == letters)}
sapply(letters[1:10], letterToNumber)
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10

You can create a dictionary like structure by using a named vector.
f <- function(x)
{
dict <- setNames(seq_along(letters),letters)
unname(dict[x])
}
f("a")
[1] 1
f("g")
[1] 7
f(c("a","z"))
[1] 1 26

This will be faster than other solutions but won't fail if you don't feed a lower case letter :
foo <- function(x) utf8ToInt(x) - 96L
foo("m")
#> [1] 13

Related

R how to merge 2 list [duplicate]

I'm trying to set the default value for a function parameter to a named numeric. Is there a way to create one in a single statement? I checked ?numeric and ?vector but it doesn't seem so. Perhaps I can convert/coerce a matrix or data.frame and achieve the same result in one statement? To be clear, I'm trying to do the following in one shot:
test = c( 1 , 2 )
names( test ) = c( "A" , "B" )
The setNames() function is made for this purpose. As described in Advanced R and ?setNames:
test <- setNames(c(1, 2), c("A", "B"))
How about:
c(A = 1, B = 2)
A B
1 2
...as a side note, the structure function allows you to set ALL attributes, not just names:
structure(1:10, names=letters[1:10], foo="bar", class="myclass")
Which would produce
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
attr(,"foo")
[1] "bar"
attr(,"class")
[1] "myclass"
The convention for naming vector elements is the same as with lists:
newfunc <- function(A=1, B=2) { body} # the parameters are an 'alist' with two items
If instead you wanted this to be a parameter that was a named vector (the sort of function that would handle arguments supplied by apply):
newfunc <- function(params =c(A=1, B=2) ) { body} # a vector wtih two elements
If instead you wanted this to be a parameter that was a named list:
newfunc <- function(params =list(A=1, B=2) ) { body}
# a single parameter (with two elements in a list structure
magrittr offers a nice and clean solution.
result = c(1,2) %>% set_names(c("A", "B"))
print(result)
A B
1 2
You can also use it to transform data.frames into vectors.
df = data.frame(value=1:10, label=letters[1:10])
vec = extract2(df, 'value') %>% set_names(df$label)
vec
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
df
value label
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
To expand upon #joran's answer (I couldn't get this to format correctly as a comment): If the named vector is assigned to a variable, the values of A and B are accessed via subsetting using the [ function. Use the names to subset the vector the same way you might use the index number to subset:
my_vector = c(A = 1, B = 2)
my_vector["A"] # subset by name
# A
# 1
my_vector[1] # subset by index
# A
# 1

Remove elements in a list in R

I want to remove part of the list where it is a complete set of the other part of the list. For example, B intersect A and E intersect C, therefore B and E should be removed.
MyList <- list(A=c(1,2,3,4,5), B=c(3,4,5), C=c(6,7,8,9), E=c(7,8))
MyList
$A
[1] 1 2 3 4 5
$B
[1] 3 4 5
$C
[1] 6 7 8 9
$E
[1] 7 8
MyListUnique <- RemoveSubElements(MyList)
MyListUnique
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9
Any ideas ? Any know function to do it ?
As long as your data is not too huge, you can use an approach like the following:
# preparation
MyList <- MyList[order(lengths(MyList))]
idx <- vector("list", length(MyList))
# loop through list and compare with other (longer) list elements
for(i in seq_along(MyList)) {
idx[[i]] <- any(sapply(MyList[-seq_len(i)], function(x) all(MyList[[i]] %in% x)))
}
# subset the list
MyList[!unlist(idx)]
#$C
#[1] 6 7 8 9
#
#$A
#[1] 1 2 3 4 5
Similar to the other answer, but hopefully clearer, using a helper function and 2 sapplys.
#helper function to determine a proper subset - shortcuts to avoid setdiff calculation if they are equal
is.proper.subset <- function(x,y) !setequal(x,y) && length(setdiff(x,y))==0
#double loop over the list to find elements which are proper subsets of other elements
idx <- sapply(MyList, function(x) any(sapply(MyList, function(y) is.proper.subset(x,y))))
#filter out those that are proper subsets
MyList[!idx]
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9

Extract values from list of lists with R

I have list of lists similar to this sample:
z <- list(list(num1=list((list(tab1=list(list(a=1, b=2, c=5), list(a=3, b=4), list(d=4,e=7)))))),list(num2=list((list(tab2=list(list(a=1, b=2), list(a=3, b=4)))))))
I would like to extract the figures out of the last list of lists names:
Desired output list (since 1 list entries are shorter) or as dataframe with columns corresponding to main list:
[1] a b c a b d e
[2] a b a b
dataframe:
column1 column2
a a
b b
c a
a b
b ""
d ""
e ""
I have tried various combinations of sapply(z, "[[", c("a","b"...) but failed, since the sublist names varies.
EDIT: Sorry, I needed the actual values not the last node (letters)! Additionally, each numeric value has column name, not set in the example above; it is like this:
[[1]]$num1[[1]]$tab1[[1]]$a
Name
1
So the desired solution are values:
[1]
1 2 5 3 4 4 7
[2]
1 2 3 4
I would actually need the numeric values instead of the letters. If you could adjust your solution to this I would be grateful. Thanks.
Try
lapply(z, function(x) as.numeric(unlist(x)))
## [[1]]
## [1] 1 2 5 3 4 4 7
##
## [[2]]
## [1] 1 2 3 4
z1 <- lapply(z, function(x) names(unlist(x)))
z1 <- lapply(z1, function(x) gsub(".*\\.", "", x))
n <- max(sapply(z1, length))
z1 <- lapply(z1, `length<-`, value = n)
setNames(as.data.frame(z1), paste0("Column", seq_along(z1)))
# Column1 Column2
#1 a a
#2 b b
#3 c a
#4 a b
#5 b <NA>
#6 d <NA>
#7 e <NA>
A bit far-fetched and everything but elegant, here is a way to get what you want :
lista<-unlist(lapply(strsplit(names(unlist(z)),"\\."),function(vec) vec[3]))
names(lista)<-unlist(lapply(strsplit(names(unlist(z)),"\\."),function(vec) vec[1]))
uninames<-unique(names(lista))
res<-sapply(uninames,function(x,vec){vec[names(vec)==x]},lista)
> res
$num1
num1 num1 num1 num1 num1 num1 num1
"a" "b" "c" "a" "b" "d" "e"
$num2
num2 num2 num2 num2
"a" "b" "a" "b"
UPDATE
To get the numbers :
a<-unlist(z)
b<-names(unique(z))
res<-sapply(unique(b),function(name,vec,l_name){vec[l_name==name]},a,b)
>res
$num1
num1.tab1.a num1.tab1.b num1.tab1.c num1.tab1.a num1.tab1.b num1.tab1.d num1.tab1.e
1 2 5 3 4 4 7
$num2
num2.tab2.a num2.tab2.b num2.tab2.a num2.tab2.b
1 2 3 4

R: change data frame structure using values from one variable as new variable

df1 <- data.frame(
name = c("a", "b", "b", "c"),
score = c(1, 1, 2, 1)
)
How can I get a new data frame with variables/columns from df$name and with each 'corresponding' df$score. I figure that its actually a two-step problem:
First I would need to make a list of (in this example) unequal length vectors like this:
$a
[1] 1
$b
[1] 1 2
$c
[1] 1
Second, NAs need to be padded so one get vectors of equal length before making the desired data frame
that would be like:
a b c
1 1 1 1
2 NA 2 NA
I cannot find any simple means to do this - Im sure there must be!
If the solution can be delivered using dplyr it would be fantastic! Thanks!
To split the data:
(s <- split(df1$score, df1$name))
# $a
# [1] 1
#
# $b
# [1] 1 2
#
# $c
# [1] 1
To create the new data frame:
as.data.frame(sapply(s, `length<-`, max(vapply(s, length, 1L))))
# a b c
# 1 1 1 1
# 2 NA 2 NA
Slightly more efficient would be to use vapply in place of sapply
len <- max(vapply(s, length, 1L))
as.data.frame(vapply(s, `length<-`, double(len), len))
# a b c
# 1 1 1 1
# 2 NA 2 NA

Difference between `names(df[1]) <- ` and `names(df)[1] <- `

Consider the following:
df <- data.frame(a = 1, b = 2, c = 3)
names(df[1]) <- "d" ## First method
## a b c
##1 1 2 3
names(df)[1] <- "d" ## Second method
## d b c
##1 1 2 3
Both methods didn't return an error, but the first didn't change the column name, while the second did.
I thought it has something to do with the fact that I'm operating only on a subset of df, but why, for example, the following works fine then?
df[1] <- 2
## a b c
##1 2 2 3
What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:
df <- data.frame(a = 1:3, b = 5:7)
# a b
# 1 1 5
# 2 2 6
# 3 3 7
df2 <- data.frame(c = 10:12)
# c
# 1 10
# 2 11
# 3 12
df[1] <- df2[1] # in this case `df[1] <- df2` is equivalent
Which produces:
# a b
# 1 10 5
# 2 11 6
# 3 12 7
Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.
In the scenario:
names(df[2]) <- "x"
You can think of the assignment as follows (this is a simplification, see end of post for more detail):
tmp <- df[2]
# b
# 1 5
# 2 6
# 3 7
names(tmp) <- "x"
# x
# 1 5
# 2 6
# 3 7
df[2] <- tmp # `tmp` has "x" for names, but it is ignored!
# a b
# 1 10 5
# 2 11 6
# 3 12 7
The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.
But in the scenario:
names(df)[2] <- "x"
you can think of the assignment as (again, a simplification):
tmp <- names(df)
# [1] "a" "b"
tmp[2] <- "x"
# [1] "a" "x"
names(df) <- tmp
# a x
# 1 10 5
# 2 11 6
# 3 12 7
Notice how we directly assign to names, instead of assigning to df which ignores attributes.
df[2] <- 2
works because we are assigning directly to the values, not the attributes, so there are no problems here.
EDIT: based on some commentary from #AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):
Version 1 names(df[2]) <- "x" translates to:
df <- `[<-`(
df, 2,
value=`names<-`( # `names<-` here returns a re-named one column data frame
`[`(df, 2),
value="x"
) )
Version 2 names(df)[2] <- "x" translates to:
df <- `names<-`(
df,
`[<-`(
names(df), 2, "x"
) )
Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks #Frank):
right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2

Resources