Removing list items based on presence of a sub-list - r

I have a list and I would like to remove any list object with a sublist. In the example below, I would like to remove ob2 and ob5 and keep all other objects.
dat <- list(ob1 = c("a", "b", "c"),
ob2 = list(a = c("d")),
ob3 = c("e", "f", "g"),
ob4 = c("h", "i", "j"),
ob5 = list(1:3))
Can anyone offer a solution of how to do this?

We can create a condition with sapply (from base R)
dat[!sapply(dat, is.list)]
Or with Filter from base R
Filter(Negate(is.list), dat)
Or with discard
library(purrr)
discard(dat, is.list)

Related

R How to remap letters in a string

I’d be grateful for suggestions as to how to remap letters in strings in a map-specified way.
Suppose, for instance, I want to change all As to Bs, all Bs to Ds, and all Ds to Fs. If I do it like this, it doesn’t do what I want since it applies the transformations successively:
"abc" %>% str_replace_all(c(a = "b", b = "d", d = "f"))
Here’s a way I can do what I want, but it feels a bit clunky.
f <- function (str) str_c( c(a = "b", b = "d", c = "c", d = "f") %>% .[ strsplit(str, "")[[1]] ], collapse = "" )
"abc" %>% map_chr(f)
Better ideas would be much appreciated.
James.
P.S. Forgot to specify. Sometimes I want to replace a letter with multiple letters, e.g., replace all As with the string ZZZ.
P.P.S. Ideally, this would be able to handle vectors of strings too, e.g., c("abc", "gersgaesg", etc.)
We could use chartr in base R
chartr("abc", "bdf", "abbbce")
#[1] "bdddfe"
Or a package solution would be mgsub which would also match and replace strings with number of characters greater than 1
library(mgsub)
mgsub("abbbce", c("a", "b", "c"), c("b", "d", "f"))
#[1] "bdddfe"
mgsub("abbbce", c("a", "b", "c"), c("ba", "ZZZ", "f"))
#[1] "baZZZZZZZZZfe"
Maybe this is more elegant? It will also return warnings when values aren't found.
library(plyr)
library(tidyverse)
mappings <- c(a = "b", b = "d", d = "f")
str_split("abc", pattern = "") %>%
unlist() %>%
mapvalues(from = names(mappings), to = mappings) %>%
str_c(collapse = "")
# The following `from` values were not present in `x`: d
# [1] "bdc"

Trying to write data into newick format R

I have a dataset with different levels of branches starting from the top level: stock -> mbranch -> sbranch -> lsbranch. I want to be able to visualize these levels of my data into Newick format. I have different language groups within each stock level and would like to make different trees based off of these highest level groups.
For example my data is in the format as follows:
sample= data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
And I am trying to have an output of the newick tree format which would be something like:
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
plot(read.dendrogram(tree))
I'm doing this so later on I can do a distance matrix of the nodes of my outputted tree.
Would the function write.tree be able to analyze data like this and make a tree from this (assuming my actual dataset is much larger)? Or in general, a function that would output the tree format. Thanks
You can use the ape::read.tree() function to read your newick format tree
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
my_tree <- read.tree(text = tree)
plot(my_tree)
You can then use ape::write.tree to save the tree into a newick file:
write.tree(my_tree, file = "my_file_name.tre")
To convert your table into a "phylo" object from ape you can use this function (that might need some adjustements):
## The function
data.frame.to.phylo <- function(sample){
## Making an edge table
edge_table <- rbind(
## The root connecting A to B
rbind(c("root", "A"),c("root", "B")),
## All the nodes connecting to the tips
cbind(sample$stock, sample$name)
)
## Translating the values in the edge table into edge IDs
## The order must be tips, root, nodes
element_names <- c(unique(sample$name), "root", unique(sample$stock))
element_ids <- seq(1:length(element_names))
## Looping through each ID and name
for(element in element_ids) {
edge_table <- ifelse(edge_table == element_names[element], element_ids[element], edge_table)
}
## Make numeric
edge_table <- apply(edge_table, 2, as.numeric)
## Build the phylo object
phylo_object <- list()
phylo_object$edge <- edge_table
phylo_object$tip.label <- unique(sample$name)
phylo_object$node.label <- c("root", unique(sample$stock))
phylo_object$Nnode <- length(phylo_object$node.label)
## Forcing the class to be "phylo"
class(phylo_object) <- "phylo"
return(phylo_object)
}
## The data
sample = data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
## Plotting the data.frame for testing the function
plot(data.frame.to.phylo(sample))
Cheers,
Thomas

Extract variable names from list or vector in R

Assuming:
aa = c('A','B','C')
bb = c('D','E','F')
list1 = list(aa,bb)
vect1 = c(aa,bb)
Is there a way to extract the variable names as strings ('aa', 'bb') from either list1 or vect1?
Is this information being stored in lists and vectors? If not, what would be the appropriate format?
Thanks in advance!
For the situation what you have done the answer is no. But if you are ready to do some changes in your code then you can easily get it,
list1 <- list( aa = aa, bb = bb)
Now you can easily access the string version of names of variables from which list is formed,
names(list1)
aa = c('A','B','C')
bb = c('D','E','F')
list1 = list(aa,bb)
vect1 = c(aa,bb)
The short answer is no. If you look at the results of dput(list1) and dput(vect1), you'll see that these objects don't contain that information any more:
list(c("A", "B", "C"), c("D", "E", "F"))
c("A", "B", "C", "D", "E", "F")
There is one way you can get this information, which is when the expression has been passed to a function:
f <- function(x) {
d <- substitute(x)
n <- sapply(d[-1],deparse)
return(n)
}
f(c(aa,bb))
## [1] "aa" "bb"
However, it would be good to have more context about what you want to do.
You can also get there by adapting vect1 using cbind:
vect1 = cbind(aa,bb)
colnames(vect1)

R , Replicating the rownames in data.frame

I have a data.frame with dimension [6587 37] and the rownames must repeat after every 18 rows. How i can do this in Rstudio.
If your 18 column names are:
mynames <- c("a", "b", "c", "d", "e", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s")
You can get what you want with:
paste0(rep(mynames,length.out=6587),rep(1:366,each=18,length.out=6587))
Or you can modify the names pasting different things.
Row names in data.frames have to be unique.
> df <- data.frame(x = 1:2)
> rownames(df) <- c("a", "a")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘a’
You could use make.names to make the names unique, but still carry some repeating information.
> make.names(c("a","a"), unique = TRUE)
[1] "a" "a.1"
These could be identified with help from grep
Or you could make a column in df or a second data.frame that holds the information

R - combining columns by specific conditions

I currently has a data frame as follow:
groups <- data.frame(name=paste("person",c(1:27),sep=""),
assignment1 = c("F","A","B","H", "A", "E", "D", "G", "I", "I", "E", "A", "D", "C", "F", "C", "D", "H", "F", "H", "G", "I", "G", "C", "B", "E", "B"),
assignment2 = c("H", "F", "F", "D", "E", "G", "A", "E", "I", "C", "A", "H", "G", "B", "I", "C", "E", "I", "C", "A", "B", "B", "G", "D", "H", "F", "D"),stringsAsFactors = FALSE)
It would looks like this:
I would like to create a list for each person that only contains the people he had already worked with. For example, person1 is on group F and H for 1st and 2nd assignment respectively and
The member of groups F on 1st assignment are {"person1","person15", "person19"}.
The member of groups D on 2nd assignment are {"person1","person12", "person25"}.
I would like to create a vector for person1 like
{"person15", "person19", "person12", "person25"}.
Any one knows a convenient way to do this in R?
Any help will be appreciated. Thanks in advance.
You could do this:
teammates <- lapply(1:nrow(groups), function(i) {
assig1 <- subset(groups, assignment1 == groups$assignment1[i])$name
assig2 <- subset(groups, assignment2 == groups$assignment2[i])$name
unq_set <- unique(c(assig1, assig2))
return(setdiff(unq_set, groups$name[i]))
})
This takes a vector of row indices, and for each one applies a function that a) gets the names of those where assignments 1 & 2 match the given row, b) gets the unique superset of these, c) returns that, less the name of the person around whom the group is built
The output is a list like this:
[[1]]
[1] "person15" "person19" "person12" "person25"
[[2]]
[1] "person5" "person12" "person3" "person26"
[[3]]
[1] "person25" "person27" "person2" "person26"
...and so on
For more brevity, the following is equivalent (though order inside list items may be different). Same logic as #user5219763's answer for subsetting, but the setdiff part is important
teammates <- lapply(1:nrow(groups), function(i) {
setdiff(
with(groups, name[assignment1 == assignment1[i] |
assignment2 == assignment2[i] ]),
groups$name[i])
})
Here's a solution using dplyr and tidyr:
library(dplyr)
library(tidyr)
groups %>%
gather(var, val, -name) %>%
unite(comb, var, val) %>%
left_join(.,., by = 'comb') %>%
group_by(name.x) %>%
summarise(out = list(name.y))
The heavy lifting is done using the left_join before that we are combining columns, so that we can merge on eg assignment1_f. The output contains itself, and is not corrected for dupes - that is up to you.
However, as #akrun says, if you are doing a lot of this stuff, use igraph
You could use is.element()
workedWith <- function(index,data=groups){
data[is.element(data[,2],data[index,2]) | is.element(data[,3],data[index,3]),1]
}
lapply(X = seq(1:nrow(groups)),FUN = workedWith)

Resources