Select list element based on their name - r

I have a named list of vectors that represent events originated from 2 samples, "A" and "B":
l.temp <- list(
SF1_t_A = c(rep(1:10)),
SF2_t_A = c(rep(9:15)),
SF1_t_B = c(rep(8:12)))
l.temp
$SF1_t_A
[1] 1 2 3 4 5 6 7 8 9 10
$SF2_t_A
[1] 9 10 11 12 13 14 15
$SF1_t_B
[1] 8 9 10 11 12
Now I want select only the elements of the list that are either from sample "A" or "B". I could go about doing it with a loop but that sort of defies the point of using list when plyr is around. This, and variations, is what I've tried so far:
llply(l.temp , function(l){
if ((unlist(strsplit(names(l), "_"))[3]) == "A"){
return(l)}
})
This is the error I am getting:
Error in unlist(strsplit(names(l), "_")) :
error in evaluating the argument 'x' in selecting a method for function 'unlist':
Error in strsplit(names(l), "_") : non-character argument
Help on what I am doing wrong is appreciated.

You can find the pattern in the names of the list, which gives you an index of which ones:
grep("_A$", names(l.temp))
And then use it to subset:
l.temp[grep("_A$", names(l.temp))]

Related

How to extract outstanding values from an object returned by waldo::compare()?

I'm trying to use a new R package called waldo (see at the tidyverse blog too) that is designed to compare data objects to find differences. The waldo::compare() function returns an object that is, according to the documentation:
a character vector with class "waldo_compare"
The main purpose of this function is to be used within the console, leveraging coloring features to highlight outstanding values that are not equal between data objects. However, while just examining in console is useful, I do want to take those values and act on them (filter them out from the data, etc.). Therefore, I want to programmatically extract the outstanding values. I don't know how.
Example
Generate a vector of length 10:
set.seed(2020)
vec_a <- sample(0:20, size = 10)
## [1] 3 15 13 0 16 11 10 12 6 18
Create a duplicate vector, and add additional value (4) into an 11th vector element.
vec_b <- vec_a
vec_b[11] <- 4
vec_b <- as.integer(vec_b)
## [1] 3 15 13 0 16 11 10 12 6 18 4
Use waldo::compare() to test the differences between the two vectors
waldo::compare(vec_a, vec_b)
## `old[8:10]`: 12 6 18
## `new[8:11]`: 12 6 18 4
The beauty is that it's highlighted in the console:
But now, how do I extract the different value?
I can try to assign waldo::compare() to an object:
waldo_diff <- waldo::compare(vec_a, vec_b)
and then what? when I try to do waldo_diff[[1]] I get:
[1] "`old[8:10]`: \033[90m12\033[39m \033[90m6\033[39m \033[90m18\033[39m \n`new[8:11]`: \033[90m12\033[39m \033[90m6\033[39m \033[90m18\033[39m \033[34m4\033[39m"
and for waldo_diff[[2]] it's even worse:
Error in waldo_diff[3] : subscript out of bounds
Any idea how I could programmatically extract the outstanding values that appear in the "new" vector but not in the "old"?
As a disclaimer, I didn't know anything about this package until you posted so this is far from an authoritative answer, but you can't easily extract the different values using the compare() function as it returns an ANSI formatted string ready for pretty printing. Instead the workhorses for vectors seem to be the internal functions ses() and ses_context() which return the indices of the differences between the two objects. The difference seems to be that ses_context() splits the result into a list of non-contiguous differences.
waldo:::ses(vec_a, vec_b)
# A tibble: 1 x 5
x1 x2 t y1 y2
<int> <int> <chr> <int> <int>
1 10 10 a 11 11
The results show that there is an addition in the new vector beginning and ending at position 11.
The following simple function is very limited in scope and assumes that only additions in the new vector are of interest:
new_diff_additions <- function(x, y) {
res <- waldo:::ses(x, y)
res <- res[res$t == "a",] # keep only additions
if (nrow(res) == 0) {
return(NULL)
} else {
Map(function(start, end) {
d <- y[start:end]
`attributes<-`(d, list(start = start, end = end))
},
res[["y1"]], res[["y2"]])
}
}
new_diff_additions(vec_a, vec_b)
[[1]]
[1] 4
attr(,"start")
[1] 11
attr(,"end")
[1] 11
At least for the simple case of comparing two vectors, you’ll be better off
using diffobj::ses_dat() (which is from the package that waldo uses
under the hood) directly:
waldo::compare(1:3, 2:4)
#> `old`: 1 2 3
#> `new`: 2 3 4
diffobj::ses_dat(1:3, 2:4)
#> op val id.a id.b
#> 1 Delete 1 1 NA
#> 2 Match 2 2 NA
#> 3 Match 3 3 NA
#> 4 Insert 4 NA 3
For completeness, to extract additions you could do e.g.:
extract_additions <- function(x, y) {
ses <- diffobj::ses_dat(x, y)
y[ses$id.b[ses$op == "Insert"]]
}
old <- 1:3
new <- 2:4
extract_additions(old, new)
#> [1] 4

R changes my list of character strings with "na" into the words as missing values (ex : BDNA3 --> NA) - How to deal with this?

I am struggling with R since 2 days without finding any solution !
Here is my problem :
I have a list of symbols extracted from one data-frame : annotation$"SYMBOL"
I would like to bind it to another data-frame, called "matrix", and to assign them as rownames.
I extracted the column, bound it without problems. However, I realized that once this was done, changing them into rownames doesn't work because ~ 5000 genes / 15000 are then changed as "NA"
I realize that actually it's all the genes with "NA" in their symbol that are seen as "missing values"
I try to change them as.character(annotation$"SYMBOL") but that doesn't change....
HERE:
X=as.character(annotation$"SYMBOL")
summary(X)
Length Class Mode
16978 character character
unique (unlist (lapply (as.character(annotation$"SYMBOL"), function (x) which (is.na (x)))))
[1] 1
Y=na.exclude(X)
summary(Y)
Length Class Mode
9954 character character
U=na.exclude(annotation$"SYMBOL")
Error in `$<-.data.frame`(`*tmp*`, "SYMBOL", value = c("SCYL3", "C1orf112", :
replacement has 9954 rows, data has 16978
And I know that they replace all the genes with "NA" in their names as NA....
Does someone have an idea how to go through this?
For example, Number 11 and number 15 in this image are deleted when I use "na.omit" function ....
To set your NA values, you should use the code df[df == "NA"] <- NA. I used this with your test dataset and produced the desired results. You can then use the na.omit() function on your df to remove the now set NA data. I don't have a working code from you, so I will supply the outline of what your code should look like:
df <- data.frame(lapply(df, as.character), stringAsFactors = FALSE)
df
X1 X2
1 1 SCYL3
2 2 C1orf112
3 3 FGR
4 4 CFH
5 5 STPG1
6 6 NIPAL3
7 7 AK2
8 8 KDM1A
9 9 TTC22
10 10 ST7L
11 11 DNAJC11
12 12 FMO3
13 13 E2F2
14 14 CDK11A
15 15 NADK
16 16 CSDE1
17 17 MASP2
df[df == "NA"] <- NA
The is.na(df) function will return FALSE for all results. If you add any data which is NA, you can omit that row using the na.omit(df) now.

Error: invalid subscript type 'list' in R

Having an issue here - I'm creating a function using the eclipse parameter to deal with a varying function parameters. I recreated as similar situation to show the issue I keep bumping into,
> d <- data.frame(alpha=1:3, beta=4:6, gamma=7:9)
> d
alpha beta gamma
1 1 4 7
2 2 5 8
3 3 6 9
> x <- list("alpha", "beta")
> rowSums(d[,c(x)])
Error in .subset(x, j) : invalid subscript type 'list'
How do I deal with the issue of feeding a list into a subset call?
We need to use concatenate to create a vector instead of list
x <- c("alpha", "beta")
rowSums(d[x])
#[1] 5 7 9
and if we are using list, then unlist it to create a vector as data.frame takes a vector of column names (column index) or row names (row index) to subset the columns or rows
x <- list("alpha", "beta")
rowSums(d[unlist(x)])
#[1] 5 7 9

Assign character for names of vectors in R

would like to know how to assign a character element as the name of a vector in R.
e.g.
hk=0.55
paste0("rr",hk)
[1] "rr0.55"
now I'd like to do
paste0("rr",hk)<-c(1:10)
Error in paste0("rr", scale) <- c(1:10) :
Target of assignment expands to an object outside language
like leaving the vector so
> rr0.55<-c(1:10)
> rr0.55
[1] 1 2 3 4 5 6 7 8 9 10
????
thank you help
Use assign:
assign(paste0("rr",hk), c(1:10))

Example Needed: Change the default print method of an object

I need a bit of help with jargon, and a short piece of example code. Different types of objects have a specific way of outputting themselves when you type the name of the object and hit enter, an lm object shows a summary of the model, a vector lists the contents of the vector.
I'd like to be able to write my own way for "showing" the contents of a specific type of object. Ideally, I'd like to be able to seperate this from existing types of objects.
How would I go about doing this?
Here's an example to get you started. Once you get the basic idea of how S3 methods are dispatched, have a look at any of the print methods returned by methods("print") to see how you can achieve more interesting print styles.
## Define a print method that will be automatically dispatched when print()
## is called on an object of class "myMatrix"
print.myMatrix <- function(x) {
n <- nrow(x)
for(i in seq_len(n)) {
cat(paste("This is row", i, "\t: " ))
cat(x[i,], "\n")
}
}
## Make a couple of example matrices
m <- mm <- matrix(1:16, ncol=4)
## Create an object of class "myMatrix".
class(m) <- c("myMatrix", class(m))
## When typed at the command-line, the 'print' part of the read-eval-print loop
## will look at the object's class, and say "hey, I've got a method for you!"
m
# This is row 1 : 1 5 9 13
# This is row 2 : 2 6 10 14
# This is row 3 : 3 7 11 15
# This is row 4 : 4 8 12 16
## Alternatively, you can specify the print method yourself.
print.myMatrix(mm)
# This is row 1 : 1 5 9 13
# This is row 2 : 2 6 10 14
# This is row 3 : 3 7 11 15
# This is row 4 : 4 8 12 16

Resources