I have two rather long lists (both are 232000 rows). When trying to run analyses using both, R is giving me an error that some elements in one lists are not in the other (for a particular code to run, both lists need to be exactly the same). I have done the following to try and decipher this:
#In Both
both <- varss %in% varsg
length(both)
#What is in Both
int <- intersect(varss,varsg)
length(int)
#What is different in varss
difs <- setdiff(varss,varsg)
length(difs)
#What is different in varsg
difg <- setdiff(varsg,varss)
length(difg)
I think I have the code right, but my problem is that the results from the code above are not yielding what I need. For instance, for both <- varss %in% varsg I only get a single FALSE. Do both my lists need to be in a specific class in order for this to work? I've tried data.frame, list and character. Not sure whether anything major like a function needs to be applied.
Just to give a little bit more information about my lists, both are a list of SNP names (genetic data)
Edit:
I have loaded these two files as readRDS() and not sure whether this might be causing some problems. When trying to just use varss[1:10,] i get the following info:
[1] rs41531144 rs41323649 exm2263307 rs41528348 exm2216184 rs3901846
[7] exm2216185 exm2216186 exm2216191 exm2216198
232334 Levels: exm1000006 exm1000025 exm1000032 exm1000038 ... rs9990343
I have little experience with RData files, so not sure whether this is a problem or not...
Same happens with using varsg[1:10,] :
[1] exm2268640 exm41 exm1916089 exm44 exm46 exm47
[7] exm51 exm53 exm55 exm56
232334 Levels: exm1000006 exm1000025 exm1000032 exm1000038 ... rs999943
All of the functions you have shown do not play well with lists or data.frames, e.g:
varss <- list(a = 1:8)
varsg <- list(a = 2:9)
both <- varss %in% varsg
both
# [1] FALSE
#What is in Both
int <- intersect(varss,varsg)
int
# list()
#What is different in varss
difs <- setdiff(varss,varsg)
difs
# [[1]]
# [1] 1 2 3 4 5 6 7 8
#What is different in varsg
difg <- setdiff(varsg,varss)
difg
# [[1]]
# [1] 2 3 4 5 6 7 8 9
I suggest you switch to vectors by doing:
varss <- unlist(varss)
varsg <- unlist(varsg)
Related
I am trying to understand how to succinctly implement something like the argument capture/parsing/evaluation mechanism that enables the following behavior with dplyr::tibble() (FKA dplyr::data_frame()):
# `b` finds `a` in previous arg
dplyr::tibble(a=1:5, b=a+1)
## a b
## 1 2
## 2 3
## ...
# `b` can't find `a` bc it doesn't exist yet
dplyr::tibble(b=a+1, a=1:5)
## Error in eval_tidy(xs[[i]], unique_output) : object 'a' not found
With base:: classes like data.frame and list, this isn't possible (maybe bc arguments aren't interpreted sequentially(?) and/or maybe bc they get evaluated in the parent environment(?)):
data.frame(a=1:5, b=a+1)
## Error in data.frame(a = 1:5, b = a + 1) : object 'a' not found
list(a=1:5, b=a+1)
## Error: object 'a' not found
So my question is: what might be a good strategy in base R to write a function list2() that is just like base::list() except that it allows tibble() behavior like list2(a=1:5, b=a+1)??
I'm aware that this is part of what "tidyeval" does, but I am interested in isolating the exact mechanism that makes this trick possible. And I'm aware that one could just say list(a <- 1:5, b <- a+1), but I am looking for a solution that does not use global assignment.
What I've been thinking so far: One inelegant and unsafe way to achieve the desired behavior would be the following -- first parse the arguments into strings, then create an environment, add each element to that environment, put them into a list, and return (suggestions for better ways to parse ... into a named list appreciated!):
list2 <- function(...){
# (gross bc we are converting code to strings and then back again)
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
# (terrible bc commas aren't allowed except to separate args!!!)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# (icky bc all args must have names)
for (arg in argstrings){
eval(parse(text=arg), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
This allows us to derive the basic behavior, but it doesn't generalize well at all (see comments in list2() definition):
list2(a=1:5, b=a+1)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We could introduce hacks to fix little things like producing names when they aren't supplied, e.g. like this:
# (still gross but at least we don't have to supply names for everything)
list3 <- function(...){
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# if a name isn't supplied, create one of the form `v1`, `v2`, ...
ctr <- 0
for (arg in argstrings){
ctr <- ctr+1
if (grepl("^[a-zA-Z_] ?= ?", arg))
eval(parse(text=arg), envir=env)
else
eval(parse(text=paste0("v", ctr, "=", arg)), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
Then instead of this:
# evaluates `a+b-2`, but doesn't include in `env`
list2(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We get this:
list3(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
##
## $v3
## [1] 1 3 5 7 9
But it feels like there will still be problematic edge cases even if we fix the issue with commas, with names, etc.
Anyone have any ideas/suggestions/insights/solutions/etc.??
Many thanks!
The reason data.frame(a=1:5, b=a+1) doesn't work is a scoping issue, not an evaluation order issue.
Arguments to a function are normally evaluated in the calling frame. When you say a+1, you are referring to the variable a in the frame that made the call to data.frame, not the column that you are about to create.
dplyr::data_frame does very non-standard evaluation, so it can mix up frames as you saw. It appears to look first in the frame corresponding to the object that is under construction, and second in the usual place.
One way to use the dplyr semantics with a base function is to do both,
e.g.
do.call(data.frame, as.list(dplyr::data_frame(a = 1:5, b = a+1)))
but this is kind of useless: you can convert a tibble to a dataframe directly, and this can't be used with other base functions, since it forces all arguments to the same length.
To write your list2 function, I'd recommend looking at the source of dplyr::data_frame, and do everything it does except the final conversion to a tibble. It's source is deceptively short:
function (...)
{
xs <- quos(..., .named = TRUE)
as_tibble(lst_quos(xs, expand = TRUE))
}
This is deceptive, because lst_quos is a private function in the tibble package, so you'll need your own copy of that, plus any private functions it calls, etc. Unless of course you don't mind using private functions, then here's your list2:
list2 <- function(...) {
xs <- rlang::quos(..., .named = TRUE)
tibble:::lst_quos(xs, expand = TRUE)
}
This will work until the tibble maintainer chooses to change lst_quos, which he's free to do without warning (since it's private). It wouldn't be acceptable code in a CRAN package because of this fragility.
In the R environment, I have already have some variable, their name:
id_01_r
id_02_l
id_05_l
id_06_r
id_07_l
id_09_1
id_11_l
So, their pattern seems like id_ and follows two figures, then _ and r or l randomly.
Each of them corresponds to one frame but different dim() output.
Also, there are some other variables in the environment, so first I should extract these frames. For this, I'm going to adopt:
> a <- list(ls()[grep("id*",ls())])` #a little sample for just id* I know
But, this function put them as one element, so I don't think it's good way
> length(a) [1] 1
I know how to read them in like below, but now for extact and same processes, I'm so confused.
i_set <- Sys.glob(paths='mypath/////id*.txt')
for (i in i_set) {
assign(substring(i, startx, endx),read.table(file=i,header=F))
}
Here, the key point is I want to do a series of same data processing for each of these frames. But based on these, what can I do instead of one by one?
Thanks your kind consideration.
Here is an example:
id_01_r <- iris
id_02_l <- mtcars
foo <- 42
vars <- grep("^id_\\d{2}_[rl]$", ls(), value = TRUE)
# [1] "id_01_r" "id_02_l"
process_data <- function(df) {
dim(df)
}
processed_data <- lapply(
mget(vars),
process_data
)
# $id_01_r
# [1] 150 5
#
# $id_02_l
# [1] 32 11
I am using the following code in a loop, I am just replicating the part which I am facing the problem in. The entire code is extremely long and I have removed parts which are running fine in between these lines. This is just to explain the problem:
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
assign(paste("numeric_data",j,sep="_"),
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE))
}
}
The problem that I am facing is that instead of assign in the second step, I want to use (eval+paste)
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
eval(as.symbol((paste("numeric_data",j,sep="_"))))<-
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE)
}
}
However R does not accept eval while assigning new variables. I looked at the forum and everywhere assign is suggested to solve the problem. However, if I use assign the loop overwrites my previously generated "numeric_data" instead of adding to it, hence I get output for only one value of i instead of both.
Here is a very basic intro to one of the most fundamental data structures in R. I highly recommend reading more about them in standard documentation sources.
#A list is a (possible named) set of objects
numeric_data <- list(A1 = 1, A2 = 2)
#I can refer to elements by name or by position, e.g. numeric_data[[1]]
> numeric_data[["A1"]]
[1] 1
#I can add elements to a list with a particular name
> numeric_data <- list()
> numeric_data[["A1"]] <- 1
> numeric_data[["A2"]] <- 2
> numeric_data
$A1
[1] 1
$A2
[1] 2
#I can refer to named elements by building the name with paste()
> numeric_data[[paste0("A",1)]]
[1] 1
#I can change all the names at once...
> numeric_data <- setNames(numeric_data,paste0("B",1:2))
> numeric_data
$B1
[1] 1
$B2
[1] 2
#...in multiple ways
> names(numeric_data) <- paste0("C",1:2)
> numeric_data
$C1
[1] 1
$C2
[1] 2
Basically, the lesson is that if you have objects with names with numeric suffixes: object_1, object_2, etc. they should almost always be elements in a single list with names that you can easily construct and refer to.
I have a list which contains list entries, and I need to transpose the structure.
The original structure is rectangular, but the names in the sub-lists do not match.
Here is an example:
ax <- data.frame(a=1,x=2)
ay <- data.frame(a=3,y=4)
bw <- data.frame(b=5,w=6)
bz <- data.frame(b=7,z=8)
before <- list( a=list(x=ax, y=ay), b=list(w=bw, z=bz))
What I want:
after <- list(w.x=list(a=ax, b=bw), y.z=list(a=ay, b=bz))
I do not care about the names of the resultant list (at any level).
Clearly this can be done explicitly:
after <- list(x.w=list(a=before$a$x, b=before$b$w), y.z=list(a=before$a$y, b=before$b$z))
but this is ugly and only works for a 2x2 structure. What's the idiomatic way of doing this?
The following piece of code will create a list with i-th element of every list in before:
lapply(before, "[[", i)
Now you just have to do
n <- length(before[[1]]) # assuming all lists in before have the same length
lapply(1:n, function(i) lapply(before, "[[", i))
and it should give you what you want. It's not very efficient (travels every list many times), and you can probably make it more efficient by keeping pointers to current list elements, so please decide whether this is good enough for you.
The purrr package now makes this process really easy:
library(purrr)
before %>% transpose()
## $x
## $x$a
## a x
## 1 1 2
##
## $x$b
## b w
## 1 5 6
##
##
## $y
## $y$a
## a y
## 1 3 4
##
## $y$b
## b z
## 1 7 8
Here's a different idea - use the fact that data.table can store data.frame's (in fact, given your question, maybe you don't even need to work with lists of lists and could just work with data.table's):
library(data.table)
dt = as.data.table(before)
after = as.list(data.table(t(dt)))
While this is an old question, i found it while searching for the same problem, and the second hit on google had a much more elegant solution in my opinion:
list_of_lists <- list(a=list(x="ax", y="ay"), b=list(w="bw", z="bz"))
new <- do.call(rbind, list_of_lists)
new is now a rectangular structure, a strange object: A list with a dimension attribute. It works with as many elements as you wish, as long as every sublist has the same length. To change it into a more common R-Object, one could for example create a matrix like this:
new.dims <- dim(new)
matrix(new,nrow = new.dims[1])
new.dims needed to be saved, as the matrix() function deletes the attribute of the list. Another way:
new <- do.call(c, new)
dim(new) <- new.dims
You can now for example convert it into a data.frame with as.data.frame() and split it into columns or do column wise operations. Before you do that, you could also change the dim attribute of the matrix, if it fits your needs better.
I found myself with this problem but I needed a solution that kept the names of each element. The solution I came up with should also work when the sub lists are not all the same length.
invertList = function(l){
elemnames = NULL
for (i in seq_along(l)){
elemnames = c(elemnames, names(l[[i]]))
}
elemnames = unique(elemnames)
res = list()
for (i in seq_along(elemnames)){
res[[elemnames[i]]] = list()
for (j in seq_along(l)){
if(exists(elemnames[i], l[[j]], inherits = F)){
res[[i]][[names(l)[j]]] = l[[names(l)[j]]][[elemnames[i]]]
}
}
}
res
}
I am using RStudio (ver. 3.0.2) and new to R.
In my task, would like to create the below variables in RScript and assign a value.
AA_1_11 <- 1
AA_2_22 <- 2
AA_3_33 <- 3
AA_4_44 <- 4
BB_1_11 <- 5
BB_2_22 <- 6
BB_3_33 <- 7
BB_4_44 <- 8
Rather to statically create the 8 lines above. I am intending to dynamically code those so that the codes are useful when the variables expand in future i.e AA,BB, could later extend to CC, DD etc.
Tried the below code:
char_list <- c("AA", "BB")
num_list <- c("1_11", "2_22", "3_33", "4_44")
i=1
char_num_array <-vector()
for (char in char_list) {
for (num in num_list) {
char_num_array[i] <- paste(char,num, sep = '_')
i <- i + 1
}
}
(char_num_array)
for (j in 1:length(char_num_array)){
char_num_array[j] <- j
}
Though I can create all the variables names as a string name, have no idea why the actual variables are not created in RStudio as shown in the picture below.
Hope you can guide me on this.
Update:
Thanks to all for your help. I really wanted to make separate variables so I can monitor the changes in the values on RStudio as the variables will continually be used for other manipulations i.e fitting, refitting etc.
Thanks to JDB. I have modified the codes to below, where i could be replaced with other mathematical equation.
char_list <- c("AA", "BB")
num_list <- c("1_11", "2_22", "3_33", "4_44")
i <- 1
for (char in char_list) {
for (num in num_list) {
assign(paste(char,num, sep = '_'), i)
i <- i + 1
}
}
You don't need a loop for this. The preferred "R way" to do this type of operation is to keep all these variables in a list. Along with keeping the global environment "clean", this makes it easier to refer to the variables, and also makes it easier to perform operations on them all at once later.
For example, for this operation you can do
char_list <- c("AA", "BB")
num_list <- c("1_11", "2_22", "3_33", "4_44")
x <- setNames(
as.list(1:8),
paste(rep(char_list, each=4), num_list, sep="_")
)
Now the variables you want are all stored in x
names(x)
# [1] "AA_1_11" "AA_2_22" "AA_3_33" "AA_4_44" "BB_1_11"
# [6] "BB_2_22" "BB_3_33" "BB_4_44"
and can be accessed by name with $, by name or number with [ and [[, with with(), etc...
x$AA_1_11
# [1] 1
x[[5]]
# [1] 5
with(x, c(AA_1_11, BB_4_44))
# [1] 1 8
There's no point in your code where you're telling R to create those variables. Your first loop DOES succeed in creating a vector list (a string) of the names you desire, but it's not assigning them any values as variables.
R is pretty good at dynamic variable creation, however. Why do you want to assign the same value to a bunch of different variables? Could you create a data frame, for example, with these variables?
array <- data.frame(variables = numeric(length(char_num_array)))
rownames(array) <- char_num_array
array
variables
AA_2_22 0
AA_3_33 0
AA_4_44 0
BB_1_11 0
BB_2_22 0
BB_3_33 0
BB_4_44 0
If you really want to create these as separate variables, you can use assign():
for(item in char_num_array) assign(item, 1)