I have a nested list videos inside the sublist there is element "title" I want to filter and remove all the sublists in which x$title has the words like {trailer, highlights, match}. Can some good soul help me in solving this ?
Here is the Nested List in R
Here is the Sublist
(Sorry for my language) Thanks in advance
Find all the sublists x for which x$title contains any of the forbidden words and remove them.
forbidden <- c("trailer", "highlights", "match")
bad <- sapply(videos, function(x) any(stringr::str_detect(x$title, regex(forbidden, ignore_case = T))))
videos <- videos[-which(bad)]
Making a small sample dataset (See this SO post).
l <- list(
list(id = "a", title = "blabla trailer"),
list(id = "b", title = "keep this one"),
list(id = "c", title = "remove this match"))
To subset list elements based on search patterns we can use some regular expression to find matches those matches. We can use | to search for multiple possibilities.
# base R
l[!sapply(l, function(x){grepl("match|highlight|trailer", x$title)})]
# purrr
library(purrr)
l[!map_lgl(l, ~ grepl("match|highlight|trailer", .x$title))]
[[1]]
[[1]]$id
[1] "b"
[[1]]$title
[1] "keep this one"
I have some lists:
my_list1 <- list("data" = list(c("a", "b", "c")), "meta" = list(c("a", "b")))
my_list2 <- list("data" = list(c("x", "y", "z")), "meta" = list(c("x", "y")))
I'd like to be able to perform some operations on these lists but I need to use the names of the lists stored in a vector as I'm creating them dynamically from an API call. Such a vector might be:
list_vec <- c("my_list1", "my_list2")
I'm running into problems evaluating the character string in the vector into the name of the list. I know this topic's been covered but the part I'm stuck on specifically is being able to extract just the data sublist when running functions within assign. Essentially a situation like this:
library(purrr)
for(i in seq_along(1:length(list_vec))){
assign(list_vec[[i]], map_df(list_vec[[i]][["data"]], unlist))
}
Which would give a result of:
# A tibble: 3 x 1
data
<chr>
1 a
2 b
3 c
I could also do something like:
my_list1$meta <- NULL
with
list_vec[[1]][["meta"]] <- NULL
To reduce the list to just the data sublist, but I can't within dynamically assigned names.
I've also wrapping things with eval but can't get that to work.
So specifically I need to evaluate the list's name from a string so I can extract a sublist from it.
We can pass the vector list_vec to mget, which returns a nested list. We use lapply to extract ([[) the data element and use unlist to convert this nested list to a list.
unlist(lapply(mget(list_vec), `[[`, "data"), recursive = FALSE)
Result
#$my_list1
#[1] "a" "b" "c"
#$my_list2
#[1] "x" "y" "z"
This is actually a series of questions about the referencing character type of values in R. Would add more bullets when I recalled any other related questions I believe which is interesting and related to this topic. For simplification, here I shall use some simple random examples to explain my questions. Hope this helps:
When building up a set of datasets using for loops and wanted to output a series of vectors with names restored in a list called name_list = ("a", "b", "c", "d", "e", "f") in the loop we would like to define as
for(i in 1:4){
a <- data[data$Year == 2010,]
b <- unique(data$Name)
c <- summarise(group_by(data,Year,Name), avg = mean(quantity))
...
f <- left_join(data,data1, by = c("Year", "Names)
}
Is there any function that allows me to use function(name_list[1]) through function(name_list[6]) to replace the a through f in the for loop? This question also goes for trying to create columns using column names in some tables/data frames embedded a chunk of code. (as.name and noquote function work when just referencing the vector/dataset but don't work when attempting to assign values to the target variable, if possible could anyone share why this happens?)
When we extract some information from SQL or other data sources we might have some information separated by comma or some other delimiters as one variable. How could we test if certain values is among one of the values separated by commas? See the example below:
1567 %in% c(1567,1456,123)
TRUE
a <- "c(1567,1456,123)"
noquote(a)
c(1567,1456,123)
1567 %in% noquote(a)
FALSE
1567 %in% list(noquote(a))
FALSE
b <- "1567,1456,123"
noquote(b)
1567,1456,123
1567 %in% noquote(strsplit(a,","))
FALSE
1567 %in% list(noquote(strsplit(a,",")))
FALSE
I kind of get why the %in% here doesn't work, seems like R is taking 1567,1456,123 as one element. So I used the strsplit to separate them. But seems that it's still not working. Wondering is there any way that allows us to get R taking the string as commands?
If all you need to do is convert comma-separated lists like "1567,1456,123" into R vectors like c(1567, 1456, 123), you definitely do not need to wrap them in c(...) and try to evaluate them directly as vectors. You should just use strsplit to split the data:
data_str <- "1567,1456,123"
data_vec <- as.integer(strsplit(string_data, ","))
stopifnot(1567 %in% data_vec)
Note that strsplit returns a list, because it can also character vectors of length greater than one:
stopifnot(
all.equal(
list(c("a", "b"), c("x", "y")),
strsplit(c("a,b", "x,y"), ",")) == TRUE)
which makes it useful for operating on columns of SQL output:
| id | concatenated_field |
|----|--------------------|
| 1 | 5362,395,9000,7 |
| 2 | 319,75624,63 |
(etc.)
d <- data.frame(
id = c(1, 2),
concatenated_field = c("5362,395,9000,7", "319,75624,63"))
d$split_field <- strsplit(d$concatenated_field, ",")
sapply(d, class)
# id concatenated_field split_field
# "numeric" "character" "list"
d$split_field[[1]]
# [1] "5362" "395" "9000" "7"
Alternatively, if you're reading in one big stream of comma-separated data, you can use scan:
data_vec <- scan(
what = 0, # arcane way to say "expect numeric input"
sep = ",",
text = "1,2,3,4,5,6,7,8,9,10")
stopifnot(all.equal(data_vec, 1:10) == TRUE)
scan is more heavy-duty than strsplit and can handle more complicated inputs as well, such as data with quoted fields:
weird_data <- scan(what="", sep=",", text='marvin,ruby,"joe,joseph",dean')
print(weird_data)
# [1] "marvin" "ruby" "joe,joseph" "dean"
If you are really really sure you need to be able to accept and evaluate R code passed as an input (this can be VERY DANGEROUS since it means you will be executing arbitrary unverified R code), you can use
r_code_string <- 'c("a", "b"), c("x", "y"))'
stopifnot(
all.equal(
c("a", "b"), c("x", "y")),
eval(parse(r_code_string))) == TRUE)
parse converts raw text into an unevaluated "expression", which is a representation of R code in the form of a special R object, eval passes the expression to the interpreter for execution.
As for noquote, it doesn't do what you think it does. It doesn't actually modify the string, it just adds a flag to the variable so that it will print without quotation marks. You can emulate this behavior with print(..., quote = FALSE).
I would like to use attributes to store variable names like Stata does with their labels: Instead of e.g. printing the variable name (to e.g. output tables), I'd rather have attribute thereof (hence I call the attribute name). But how can I access it in a loop?
dummies <- c("a", "b", "c")
attr(dummies, "names") <- c("First letter", "Second letter", "Third letter")
for (dummy in dummies) {
# do something with dummy
# e.g. accessing a variable in a dataframe
# and printing something to a table
print attr(dummies$dummy, "names") # doesn't work
print attr(dummies, "names")$dummy # doesn't work
}
As an alternative approach one can use a matrix:
dummies <- c("a", "b", "c")
names <- c("First letter", "Second letter", "Third letter")
dummies.matrix <- matrix(c(dummies, names), nrow=3)
Then I loop over dummies.matrix:
for (i in 1:nrow(dummies.matrix)) {
print(dummies.matrix[i,1]) # value
print(dummies.matrix[i,2]) # name or label
}
But that's neither convenient nor intuitive.
It looks like you have an indexing problem.
dummies <- c("a", "b", "c")
attr(dummies, "names") <- c("First letter", "Second letter", "Third letter")
for (i in seq_along(dummies)) {
print(dummies[i])
print(attr(dummies[i], "names"))
}
As a style point, be cautious about using indexes like dummy on vectors named dummies. At some point the vector and the index start to blend together, which makes it harder to interpret what the code should be doing.
I am parsing the left-hand side of an R formula. In my specific case, this can be a variable or object with an index (something like myvariable[[3]]). I would like to access the third sub-object of this object and store it in another object. The following example starts at the point where I have the string of the indexed object, but I need the reference.
mychars <- c("a", "b", "c")
mystring <- "mychars[2]"
get(mystring) # does not work
eval(as.name(mystring)) # does not work either
I could of course parse the number using regular expressions and use as.numeric to convert it to a real index. But in some cases, there may be named indices, like mystring["second"]. So how can I extract the sub-object?
You can parse and then eval this expression.
mychars <- c("a", "b", "c")
mystring <- "mychars[2]"
eval(parse(text = mystring))
[1] "b"
It works for named indices too
names(mychars) <- c("first", "second", "third")
eval(parse(text = 'mychars["second"]'))
second
"b"