Subsetting a List in R where not a value - r

I'm having trouble "filtering" a list in R because I don't have specific parameters. The function i've created will evaluate 4000 html strings and "decide" if it is a valid or not address:
Tree<-lapply(TreeList,ValURL)
#Returns a list with "Error" or a html string in each element (about 4000 elements total).
I want to create a subset of the Tree list with only the elements that are NOT "Error".I'm used to SQL, so it would be something like:
SELECT * FROM Tree WHERE Column1!="Error"
Obviously it's different in R but I can't seem to get it. I've been trying (to no avail):
Tree$"Error"
Help!

Assuming your Tree looks kind of like this
Tree<-list(
"Error",
"<p>Hello</p>",
"<h1>Heading</h1>",
"Error",
"<strong>Bold</strong"
)
then this should work:
Tree[Tree != "Error"]

Related

Referencing recently used objects in R

My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!
The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.

Extract items in a list using variable names in R

I'm parsing a JSON using the RJSONIO package.
The parsed item contains nested lists.
Each item in the list can be extracted using something like this:
dat_raw$`12`[[31]]
which correctly returns the string stored at this location (in this example, the '12' refers to the month and [[31]] to day).
"31-12-2021"
I now want to run a for loop to sequentially extract the date for every month. Something like this:
for (m in 1:12) {
print(dat_raw$m[[31]])
}
This, naturally, returns a NULL because there is no $m[[31]] in the list.
Instead, I'd like to extract the objects stored at $`1`[[31]], $`2`[[31]], ... $`12`[[31]].
There must be a relatively easy solution here but I haven't managed to crack it. I'd value some help. Thanks.
EDIT: I've added a screenshot of the list structure I'm trying to extract. The actual JSON object is quite large for a dput() output. Hope this helps
So, to get the date in this list, I'd use something like dat_raw$data$`1`[[1]]$date$gregorian$date.
What I'm trying to do is run a loop to extract multiple items of the list by cycling through $data$`1`[[1]]$..., $data$`2`[[1]]$... ... $data$`12`[[1]]$... using $data$m[[1]]$... in a for loop where m is the month.
Instead of dat_raw$`12`[[31]], you can have dat_raw[[12]][[31]] if 12 is the 12th element of the JSON. So your for loop would be:
for (m in 1:12) {
print(dat_raw[[m]][[31]])
}

Iterate conditional count across list items R

I am attempting to count all the instances across a list of data frames where a certain variable is over a given value. I have tried to do it as so:
for (name in myList){
nrow(subset(myList[[name]], var >=6))
}
as I found here: http://www.statisticsblog.com/2010/03/r-tip-iterating-over-list/
However, I get the following error:
Error in myList[[name]] : invalid subscript type 'list'
I know that nrow works because I have used it on a specific list item outside of the loop and it succeeded. I can't seem to figure out why the error is arising. The list names are set up as so:
myList$`i.j.k`
with i, j, and k each taking on a different numerical value. I generated the list as so from a data frame read in from a .csv file:
myList <- split(data, f=list(data$i, data$j, data$k))
What is causing the error? Or, is there a better way to do a conditional count across all list elements (there are 2000+ of them, so any non-loop way would be ideal). Thanks!
I figured it out thanks to the comment from #PoGibas:
Rather than
for (name in myList){
nrow(subset(myList[[name]], var >=6))
}
it should be:
for (name in myList){
nrow(subset(name, var >=6))
}

R Loop error using character

I have the below function which inserts a row into a table (new_scores) based upon the attribute that I feed into it (where the attribute represents a table that I select things from):
buildNewScore <- function(x) {
something <- bind_rows(new_scores,x%>%select(ATT,ADJLOGSCORE))
return(something)
}
Which works fine when I define x.
But when I try to create a for loop that feeds the rest of my attributes into the function it falls over as I'm feeding in a character.
attlist <- c('Z','Y','X','W','V','U','T','RT','RO')
record_count <- length(attlist)
for (x in c(1:record_count)){
buildNewScore(attlist[x])
}
I've tried to convert the attribute into other classes but I can't get the loop to use anything I change it to (name, data.frame etc.).
Anyone have any ideas as to where I'm going wrong - is my attlist vector in the wrong format?
Thanks,
Spikelete.

Creating (and saving to) an object with a random name

I have a function which I use repeatedly. One of the things it returns is a plot visualising effects of a model. I want the function to save the plot to an object, but I want the name of the object to have a random component to it. I use the function multiple times and don't want the plots to overwrite. But I could use the unique identifier in its name to reference it later for the writeup.
So I tried a few things, trying to save a simple object under a partially-random name. All of them fail because I put a function left from the "<-" sign. I'm not going to give examples, because they are just very very wrong.
So I'd like to have something like:
NAME(randomNumber) <- "some plot"
Which, after running multiple times in a function (with the actual input on the right of course) would result in objects named randomly like
NAME104, NAME314, NAME235, etc.
Is this at all doable?
Yes its doable.
Don't do it.
Make a LIST of objects. You can use the name as the key in the list. Example:
plots = list()
plots[["NAME104"]] = "some plot"
plots[["NAMEXXX"]] = "some other plot"
Why? Because now it's easy to loop over the plots stored in the list. Its also easy to create the list in a loop in the first place, something like:
for(i in 1:100){
data = read.table(paste("data",i,".csv"))
name = data$name[1] # get name from column in file
plots[[name]] = plotthing(data)
}
If you really really want to create a thing with a random name, use assign:
> assign(paste0("NAME",round(runif(1,1,1000))), "hello")
> ls(pattern="NAME*")
[1] "NAME11" "NAME333" "NAME717" "NAME719"
But really DONT do that.

Resources