Make a list from ls(pattern="") [R] - r

The ls(pattern="") function is very useful for me, since my list of objects seem to keep growing and growing. I am curious if this feature can be more useful.
For example, let's say i have 4 objects,
a.c<-1
b.c<-2
c.c<-3
d.c<-4
Now i use the useful ls(pattern="") function
ls(pattern=".c")
Now i try to make a list
list(ls(patter=".c"))
But it doesn't give me anything useful( "a.c" "b.c" "c.c" "d.c" ). I want either of these two outputs
1,2,3,4
OR
a.c, b.c, c.c, d.c

A couple of issues:
1) The . in ".c" gets ignored, you need to "escape" it:
ls(pattern="\\.c")
Otherwise it will return all objects with c regardless of having a period.
2) ls returns names of objects as character. To get the value of an object based on its name you need the function get:
lapply(ls(pattern="\\.c"), get)
3) As joran mentioned in the comments, it's much better to keep objects associated with each other in lists:
List.c = list(a.c=1, b.c=2, c.c=3, d.c=4)

Related

Referencing recently used objects in R

My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!
The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

How to edit multi-person objects in R

I find the following behaviour of the R person object rather unexpected:
Let's create a multi-person object:
a = c(person("Huck", "Finn"), person("Tom", "Sawyer"))
Imagine we want to update the given name of one person in the object:
a[[1]]$given <- 'Huckleberry'
Then if we inspect our object, to my surprise we have:
> a
[1] " <> [] ()" "Tom Sawyer"
Where'd Huckleberry Finn go?! (Note that if we try this with just a single person object, it works fine.) Why does this happen?
How can we do the above so that we get the more logical behavior of correcting just the first name?
The syntax you want here is
a <- c(person("Huck", "Finn"), person("Tom", "Sawyer"))
a[1]$given<-"Huckleberry"
a
#[1] "Huckleberry Finn" "Tom Sawyer"
A group of people is still a "person" and it has it's own special indexing function [.person and concat function c.person so it has perhaps different behavior than you were expecting. The problem was that [[ ]] was messing with the underlying hidden list.
Actually, it's interesting because they've overloaded nearly all the indexing methods for person but not the [<- or [[<- and that's really what's causing the error. Because up to here, we're the same
`$<-`(`[`(a,1), "given", "Huckleberry") #works
`$<-`(`[[`(a,1), "given", "Huckleberry") #works
but when we get to
`[<-`(a, 1, `$<-`(`[`(a,1), "given", "Huckleberry")) #works
`[[<-`(a, 1, `$<-`(`[[`(a,1), "given", "Huckleberry")) #no work
we see a difference. The special wrapping/unwrapping that happens during retrieval does not happen during assignment.
So what's going on is that a "person" is always a list of lists. The outer list holds all the people and the inner lists hold the data. You can think of the data like this
x<-list(
list(name="a"),list(name="b")
)
y<-list(
list(name="c")
)
where x is a collection of two people and y is a "single" person. When you do
x[1]<-y
x
you end up with
list(
list(name="c"),list(name="b")
)
since you're replacing a list with a list which is how [ indexing works with lists. But if you try to replace the element at [[1]] with a list of lists, that list will get nested. For example
x[[1]]<-y
x
becomes
x<-list(
list(list(name="c")),list(name="b")
)
And that extra level of nesting is what's confusing R when it goes to print the person in the first position. That first person won't have any named elements at the second level, so when it goes to print, it will return
emptyp <- structure(list(structure(list(), class="person")), class="person")
utils:::format.person(emptyp)
# " <> [] ()"
which gives is the symbols where it's trying to place the name, e-mail address, role, and comment.

How to remove selected R variables without having to type their names

While testing a simulation in R using randomly generated input data, I have found and fixed a few bugs and would now like to re-run the simulation with the same data, but with all intermediate variables removed to ensure it's a clean test.
Is there a way to remove several dozen manually selected variables from the workspace without having to:
a) clobber the entire workspace, e.g. rm(list=ls()), or b) type each variable name, e.g. remove(name1, name2, ...)?
Ideal solution would be to use ls() to inspect the definitions and then pick out the indices of the ones I want to remove, e.g.
ls() # inspect definitions
delme <- c(3,5,7:9,11,13) # names selected for removal
remove(ls()[delme]) # DESIRED SOLUTION -- doesn't quite work this way
(In hindsight, I should have used a fixed seed to generate the random input data, which allow clearing everything and then re-running the test...)
There is a much simpler and more direct solution:
vars.to.remove <- ls()
vars.to.remove <- temp[c(1,2,14:15)]
rm(list = vars.to.remove)
Or, better yet, if you are good about variable naming schemes, you can use the following pattern matching strategy:
E.g. I name all temporary variables with the starting string "Temp."
... so, you can have Temp.Names, Temp.Values, Temp.Whatever
The following produces the list of variables that match this pattern
ls(pattern = "^Temp\\.")
So, you can remove all unneeded variables using ONE line of code, as follows:
rm(list = ls(pattern = "^Temp\\."))
Hope this helps.
Assad, while I think the actual answer to the question is in the comments, let me suggest this pattern as a broader solution:
rm(list=
Filter(
Negate(is.na), # filter entries corresponding to objects that don't meet function criteria
sapply(
ls(pattern="^a"), # only objects that start with "a"
function(x) if(is.matrix(get(x))) x else NA # return names of matrix objects
) ) )
In this case, I'm removing all matrix object that start with "a". By modifying the pattern argument and the function used by sapply here, you can get pretty fine control over what you delete, without having to specify many names.
If you are concerned that this could delete something you don't want to delete, you can store the result of the Filter(... operation in a variable, review the contents, and then execute the rm(list=...) command.
Try
eval(parse(text=paste("rm(",paste(ls()[delme],sep=","),")")))
I had a similar requirement. I pulled all the elements I needed to a list:
varsToPurge = as.list(ls())
I then reassign the few values I wish to keep with new variable names which will not be in the variable varsToPurge. After that I looped through the elements
for (j in 1:length(varsToPurge)){
rm(list = as.character(varsToPurge[j]))
}
Do a little garbage collecting, and you maintain a clean environment as you go through your code.
gc()
You can also use a vector of row numbers you wish to keep instead and run through the vector in the loop but it won't be as dynamic if you add rough work you wish to remove.

R data table issue

I'm having trouble working with a data table in R. This is probably something really simple but I can't find the solution anywhere.
Here is what I have:
Let's say t is the data table
colNames <- names(t)
for (col in colNames) {
print (t$col)
}
When I do this, it prints NULL. However, if I do it manually, it works fine -- say a column name is "sample". If I type t$"sample" into the R prompt, it works fine. What am I doing wrong here?
You need t[[col]]; t$col does an odd form of evaluation.
edit: incorporating #joran's explanation:
t$col tries to find an element literally named 'col' in list t, not what you happen to have stored as a value in a variable named col.
$ is convenient for interactive use, because it is shorter and one can skip quotation marks (i.e. t$foo vs. t[["foo"]]. It also does partial matching, which is very convenient but can under unusual circumstances be dangerous or confusing: i.e. if a list contains an element foolicious, then t$foo will retrieve it. For this reason it is not generally recommended for programming.
[[ can take either a literal string ("foo") or a string stored in a variable (col), and does not do partial matching. It is generally recommended for programming (although there's no harm in using it interactively).

Resources