Using character values as object names - r

I would like to use the characters in a vector as the names of character objects
aiming to get
first as say "d","e","a","t" etc.
tried this approach but am clearly missing some function to apply to x[i]
x <- c("first","second","third"..)
for (i in 1:length(x)) {
x[i] <- sample(letters,4)
}
TIA

The function you are looking for is assign():
> x <- c("first","second","third")
> for (i in 1:length(x)) {
+ assign(x[i], sample(letters,4))
+ }
>
> ls()
[1] "first" "i" "second" "third" "x"
> first
[1] "t" "d" "u" "j"
> second
[1] "o" "i" "p" "l"
> third
[1] "w" "v" "r" "n"
As an alternative, you could build these vectors as different elements of a list:
> mylist <- list()
> for (i in 1:length(x)) {
+ mylist[[x[i]]] <- sample(letters,4)
+ }
> mylist
$first
[1] "e" "l" "y" "d"
$second
[1] "t" "o" "k" "h"
$third
[1] "g" "x" "p" "b"

You don't say what you will be doing with this object. You may get the simplest structure by using a named vector:
names(x) <- x
x[] <- sample(letters, 4)
If you do not use the paired bracket on the LHS, the whole vector gets replaced and the names will be lost. You can now access the values with quoted names:
> x
first second third fourth
"w" "c" "r" "x"
> x["second"]
second
"c"

Related

R function for repeat loop to create multiple variables from table

Apologies for any poorly formed question - I'm very new to R.
I am looking to create multiple character strings from this data table..
I have created the character string:
coor1 <- R_data[1,8]
I am looking to iterate this for other indices as follows:
coor1 <- data[1,8]
coor2 <- data[2,8]
coor3 <- data[3,8]
coor4 <- data[4,8]
coor5 <- data[5,8] etc....
I have tried using a for loop but with no success. Any advice would be great.
Thanks very much.
I think you just need to subset the 8th column with $:
data <- data.frame(V1 = rep(NA, 26))
data[, 2:7] <- NA
data[, 8] <- letters
data$V8
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u"
[22] "v" "w" "x" "y" "z"
data[1, 8]
[1] "a"
data[2, 8]
[1] "b"
data[3, 8]
[1] "c"
data[4, 8]
[1] "d"
You can assign it to a variable as well:
coor <- data$V8
And extract a single result with []:
coor[1]
[1]"a"
coor[2]
[1] "b"
This can also be accomplished from the original dataframe:
data$V8[1]
[1] "a"
which is same as:
data[1, 8]
[1] "a"
What a loop would look like:
coors <- vector() #allocate space for storage
for(i in seq_len(nrow(data))){
coors[i] <- data[i, 8]
}

R: Check if strings in a vector are present in other vectors, and return name of the match

I need a tool more selective than %in% or match(). I need a code that matches a vector of string with another vector, and that returns the names of the matches.
Currently I have the following,
test <- c("country_A", "country_B", "country_C", "country_D", "country_E", "country_F") rating_3 <- c("country_B", "country_D", "country_G", "country_K")
rating_3 <- c("country_B", "country_D", "country_G", "country_K")
rating_4 <- c("country_C", "country_E", "country_M", "country_F)
i <- 1
while (i <= 33) {
print(i)
print(test[[i]])
if (grepl(test[[i]], rating_3) == TRUE) {
print(grepl(test[[i]], rating_3)) }
i <- i+1
},
This should check each element of test present in rating_3, but for some reason, it returns only the position, the name of the string, and a warning;
[1]
[country_A]
There were 6 warnings (use warnings() to see them)
I need to know what this piece of code fails, but I'd like to eventually have it return the name only when it's inside another vector, and if possible, testing it against several vectors at once, having it print the name of the vector in which it fits, something like
[1]
[String]
[rating_3]
How could I get something like that?
Without a reproducible example, it is hard to determine what exactly you need, but I think this could be done using %in%:
# create reprex
test <- sample(letters,10)
rating_3 <- sample(letters, 20)
print(rating_3[rating_3 %in% test])
[1] "r" "z" "l" "e" "m" "c" "p" "t" "f" "x" "n" "h" "b" "o" "s" "v" "k" "w" "a"
[20] "i"

Finding midpoint of string in R (mid character of a word)

I'd like to find the midpoint of any word after the following is done to the word:
>x = 'hello'
>y = strsplit(x, '')
>y
[[1]]
[1] "h" "e" "l" "l" "o"
>z = unlist(y)
>z
[1] "h" "e" "l" "l" "o"
Doing this then allows for :
> z[1]
[1] "h"
> z[4]
[1] "l"
The difference being that before z=unlist(y) when you try z[index] you get back NA, example:
> x = 'hello'
> strsplit(x, '')
[[1]]
[1] "h" "e" "l" "l" "o"
> x[1]
[1] "hello"
> x[2]
[1] NA
Anyways, what I want to do is find the mid point of words that are in this format so that the output would be something like:
"l"
in the case of the word "hello". Also, in this example we have a word with 5 letters allowing to easily designate a single character as the midpoint but for a word like "bake" I would like to designate both "a" and "k" together as the midpoint.
Try
f1 <- function(str1){
N <- nchar(str1)
if(!N%%2){
res <- substr(str1, N/2, (N/2)+1)
}
else{
N1 <- median(sequence(N))
res <- substr(str1, N1, N1)
}
res
}
f1('bake')
#[1] "ak"
f1('hello')
#[1] "l"
Another option. get_middle assumes the word has already been split into characters, as per your description:
get_middle <- function(x) {
mid <- (length(x) + 1) / 2
x[unique(c(ceiling(mid), floor(mid)))]
}
Then:
words <- c("bake", "hello")
lapply(strsplit(words, ""), get_middle)
Produces:
[[1]]
[1] "k" "a"
[[2]]
[1] "l"
You could try this:
midpoint <- function(word) {
# Split the word into a vector of letters
split <- strsplit(word, "")[[1]]
# Get the number of letters in the word
n <- nchar(word)
# Get the two middle letters for words of even length,
# otherwise get the single middle letter
if (n %% 2 == 0) {
c(split[n/2], split[n/2+1])
} else {
split[ceiling(n/2)]
}
}
In the case of a word of even length, the middle two characters are returned as a vector.
midpoint("hello")
#[1] "l"
midpoint("bake")
#[1] "a" "k"
How about:
mid<-function(str)substr(str,(nchar(str)+1)%/%2,(nchar(str)+2)%/%2)
Or slightly more legibly:
mid2<-function(str){
n1<-nchar(str)+1
substr(str,n1%/%2,(n1+1)%/%2)
}
> mid("bake")
[1] "ak"
> mid("hello")
[1] "l"
This has the advantage that it immediately vectorizes:
> mid(c("bake","hello"))
[1] "ak" "l"
It is slower than #akrun's solution for long words, but my second version is faster; apparently counting characters can be costly for longer strings.
If you want the final product in a list, you can just strsplit the result:
mid3<-function(str)strsplit(mid2(str),"")
word = c("bake","hello")
print(nchar(word))
q = ifelse (nchar(word)%%2==0, substr(word,nchar(word)/2,nchar(word)/2+1),substr(word,nchar(word)/2+1,nchar(word)/2+1))
print(q)
[1] 4 5
[1] "ak" "l"

R: Recreate historical membership from a list of changes in membership

Suppose I have the current membership status of a group, i.e. names of members. Additionally, I have data on times when some new member may have been added to the group and / or an old member may have been removed from the group.
The task at hand is to recreate the membership of the group at all these points in time. I've looked around but did not find a ready solution for this problem. Does anybody know an elegant method of doing this?
Reproducible example:
Input:
periods <- 5
indx <- paste0("t-", seq_len(periods))
[1] "t-1" "t-2" "t-3" "t-4" "t-5"
current <- letters[seq_len(10)]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
incoming <- setNames(letters[seq_len(periods) + 5], indx)
incoming[2] <- NA
t-1 t-2 t-3 t-4 t-5
"f" NA "h" "i" "j"
outgoing <- setNames(letters[seq_len(periods) + 10], indx)
outgoing[4] <- NA
t-1 t-2 t-3 t-4 t-5
"k" "l" "m" NA "o"
Output:
$current
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
$`t-1`
[1] "a" "b" "c" "d" "e" "g" "h" "i" "j" "k"
$`t-2`
[1] "a" "b" "c" "d" "e" "g" "h" "i" "j" "k" "l"
$`t-3`
[1] "a" "b" "c" "d" "e" "g" "i" "j" "k" "l" "m"
$`t-4`
[1] "a" "b" "c" "d" "e" "g" "j" "k" "l" "m"
$`t-5`
[1] "a" "b" "c" "d" "e" "g" "k" "l" "m" "o"
Disclaimer: I've written a solution for this which I will be posting as my answer to the question. The intent is to document this problem and a possible solution and to elicit other ingenious and / or existing solutions or improvements.
The function create_mem_ts (membership timeseries) will generate the desired output as posted in the question.
create_mem_ts <- function (ctime, added, removed, current) {
# Create a time-series of membership of a set.
# Inputs:
## ctime: Time of changes in set.
## An atomic vector of a time-series class or otherwise,
##
## interpretable as a time-series in descending order (for e.g.
## `t-1`, `t-2`, `t-3` etc.
##
## Is an index of when the changes in membership happened in time.
## Allows repeats but no NAs.
## added: Member(s) added to the set.
## An atomic vector or a list of the same length as ctime.
##
## If an atomic vector, represents exactly one member added at
## the corresponding ctime.
##
## If a list, represents multiple members added at corresponding
## ctime.
## removed: Member(s) removed from the set.
## An atomic vector or a list of the same length as ctime.
##
## If an atomic vector, represents exactly one member removed at
## the corresponding ctime.
##
## If a list, represents multiple members removed at the
## corresponding ctime.
## current: Current membership of the set.
## An atomic vector listing the current membership of the set.
# Output:
## A list of the same length as ctime named by values in ctime (coerced to
## character by the appropriate method).
stopifnot(is.atomic(ctime),
is.atomic(added) || is.list(added),
is.atomic(removed) || is.list(removed))
if (any(is.na(ctime))) stop("NAs not allowed in the ctime.")
stopifnot(length(ctime) == length(added),
length(added) == length(removed))
if (any(duplicated(ctime))) {
ctime.u <- unique(ctime)
ctime.f <- factor(ctime, levels=as.character(ctime.u))
added <- split(added, ctime.f)
removed <- split(removed, ctime.f)
} else {
ctime.u <- ctime
}
out <- setNames(vector(mode="list", length=length(ctime.u) + 1),
c("current", as.character(ctime.u)))
out[["current"]] <- current
for (i in 2:length(out))
out[[i]] <- union(setdiff(out[[i - 1]], added[[i - 1]]),
na.omit(removed[[i - 1]]))
attr(out, "index") <- ctime.u
out
}
Moreover, if ctime is a valid time-series class in the function above, the output from that can be used to generate membership on any time-stamp using the function (within the range in ctime) using this function memship_at.
memship_at <- function (mem_ts, at) {
stopifnot(inherits(at, class(attr(mem_ts, "index"))))
just.before <- which(at > attr(mem_ts, "index"))[1]
if (just.before > 1)
mem_ts[[just.before - 1]]
else
mem_ts[[1]]
}

Excluding elements of a vector from another vector, not using setdiff

I have a character vector, and I want to exclude elements from it which are present in a second vector. I don't know how to work the negation in this case while still considering the entire vector
vector[vector ! %in% vector2]
I can obviously do vector[vector != single_character] but that only works for a single character.
You're close
vector[!vector %in% vector2]
or, even though you said "not using setdiff"
setdiff(vector, vector2)
vector1 <- letters[1:4]
set.seed(001)
vector2 <- sample(letters[1:15], 10, replace=TRUE)
vector1
[1] "a" "b" "c" "d"
vector2
[1] "d" "f" "i" "n" "d" "n" "o" "j" "j" "a"
vector2 [!(vector2 %in% vector1)] # elements in vector2 that are not in vector1
[1] "f" "i" "n" "n" "o" "j" "j"
You can define a new operator,
`%ni%` = Negate(`%in%`)
A more elegant solution is available now:
library(textclean)
# master character vector
vector1 = c("blue", "green", "red")
# vector containing elements to be removed from master vector
vector2 = c("green", "red")
drop_element_fixed(vector1, vector2)
# Output:
# [1] "blue"

Resources