My select_first function doesn't choose the very first element - r

I'm a beginner with R.
I tried making a function that selects the first element of a vector.
Then I used the function with lapply on my vector.
However, instead of choosing the first set of student and score, all students pop out..
I understand my function is selecting the first element per group(?) of the vector. But can anyone explain why my function results in the students instead of the first student and score?
Help please. Thanks!
student_score <- c("Philip:70", "Jimmy:80", "Alex:90", "Steve:100")
split_score <- strsplit(student_score, split = ":")
select_first <- function(x) {
x[1]
}
unlist(lapply(split_score, select_first))
#Used unlist() just to make the view of the result simpler
Expected Result: "Philip" "70" /// Actual Result: "Philip" "Jimmy" "Alex" "Steve"

the lapply function is just hiding a for loop around each element of split_score. Here what's inside of it after the strsplit:
strsplit(student_score, split = ":") # gives a list
[[1]]
[1] "Philip" "70"
[[2]]
[1] "Jimmy" "80"
[[3]]
[1] "Alex" "90"
[[4]]
[1] "Steve" "100"
So your call to lapply could be translated like this:
for each element in split_score (each element is the pair name-score)
extract the first (x[1])
So at the first iteration of the loop inside lapply the x is x = c("Philip" "70") and x[1] is "Philip", and so on.
That's why you get the list of names. Note that lapply is just hiding the for loop.
Your lapply is basically doing this:
for (i in 1:4){
split_score[[i]][1]
}

Related

Name lists in R (not the elements)

I am trying to name a nested list. This would be one of the several lists in my nested list:
paths_list[i]
[[1]]
[[1]]$CLASS
[1] "Signal Transduction (Saccharomyces cerevisiae)"
[[1]]$GENES
[1] "YPR165W"
[[1]]$ORGANISM
[1] "Saccharomyces cerevisiae"
Basically what I want to do is to put an ID name for example R-SCE-198203 as the main name for the list (so above $CLASS it should appear the name R-SCE-198203). List paths_list[i] to have the name R-SCE-198203.
I want this:
paths_list[i]
[[1]]R-SCE-198203
[[1]]$CLASS
[1] "Signal Transduction (Saccharomyces cerevisiae)"
[[1]]$GENES
[1] "YPR165W"
[[1]]$ORGANISM
[1] "Saccharomyces cerevisiae"
I have searched and the closest I have found was with lapply but you ends up like this:
setNames(lapply(tabs, setNames, varB), varA)
#$varA1
#$varA1$varB1
#[1] "integer"
#
#$varA1$varB2
#[1] "integer"
# ...
I want to avoid the main ID to appear in every element of the list (do not want $varA1 being repeated all the time).
Is that possible?
Thanks in advance
I think your lapply approach is already what you want. The "true" names of the sub-elements do not have the $ signs. The output in the console when you print a full list object shows these signs to help you read the data, however, if you access the individual sub-element via [[]] their names do not have these signs. Maybe the following code example helps. Check the outputs.
a_list <- list("dummy")
names(a_list) <- "dummyname"
a_list <- list(a_list)
names(a_list) <- "name"
a_list
#$name
#$name$dummyname
#[1] "dummy"
names(a_list)
#[1] "name"
a_list[[1]]
#$dummyname
#[1] "dummy"
names(a_list[[1]])
#[1] "dummyname"

a list of multiple lists of 2 for synonyms

I want to read the synonyms from a csv file , where the first word is the "main" word and the rest of the words in the same record are its synonyms
now i basically want to create a list like i would have in R ,
**synonyms <- list(
list(word="ss", syns=c("yy","yyss")),
list(word="ser", syns=c("sert","sertyy","serty"))
)**
This gives me a list as
synonyms
[[1]]
[[1]]$word
[1] "ss"
[[1]]$syns
[1] "yy" "yyss"
[[2]]
[[2]]$word
[1] "ser"
[[2]]$syns
[1] "sert" "sertyy" "serty"
which is essentially a list of lists of "word" and "syns".
how do i go about creating the similar list while reading the word and synonyms from a csv file
any pointers would help !! Thanks
This process should return what you want.
# read in data using readLines
myStuff <- readLines(textConnection(temp))
This will return a character vector with one element per line in the file. Note that textConnection is not necessary for reading in files. Just supply the file path. Now, split each vector element into a vectors using strsplit and return a list.
myList <- strsplit(myStuff, split=" ")
Now, separate the first element from the remaining element for each vector within the list.
result <- lapply(myList, function(x) list(word=x[1], synonyms=x[-1]))
This returns the desired result. We use lapply to move through the list items. For each list item, we return a named list where the first element, named word, corresponds to the first element of the vector that is the list item and the remaining elements of this vector are placed in a second list element called synonyms.
result
[[1]]
[[1]]$word
[1] "ss"
[[1]]$synonyms
[1] "yy" "yyss"
[[2]]
[[2]]$word
[1] "ser"
[[2]]$synonyms
[1] "sert" "sertyy" "serty"
[[3]]
[[3]]$word
[1] "at"
[[3]]$synonyms
[1] "ate" "ater" "ates"
[[4]]
[[4]]$word
[1] "late"
[[4]]$synonyms
[1] "lated" "lates" "latee"
data
temp <-
"ss yy yyss
ser sert sertyy serty
at ate ater ates
late lated lates latee"

How to access variable length lists inside a list in R

When I call strsplit() on a column of a data frame, depending on the results of the strsplit(), I sometimes get one or two "sublists" as a result of splitting. For example,
v <- c("50", "1 h 30 ", "1 h", NA)
split <- strsplit(v, "h")
[[1]]
[1] "50"
[[2]]
[1] "1" " 30"
[[3]]
[1] "1 "
[[4]]
[1] NA
I know I can access the individual lists of split using '[]' and '[[]]' tells me the contents of those sublists, so I think I understand that. And that I can access the " 30" in [[2]] by doing split[[2]][2].
Unfortunately, I don't know how to access this programmatically over the entire column that I have. I am trying to convert the column to numeric data. But that "1 h 30" case is giving me a lot of trouble.
func1 <- function(x){
split.l <- strsplit(x, "h")
len <- lapply(split.l, length)
total <- ifelse(len == 2, as.numeric(split.l[2]) + as.numeric(split.l[1]) * 60, as.numeric(split.l[2]))
return(total)
}
v <- ifelse(grepl("h", v), func1(v), as.numeric(v))
I know len returns the vector of the length of the splits. But when it comes to actually accessing that individual sublist's second element, I simply don't know how to do it properly. This will generate an error because split.l[1] and split.l[2] will only return the first two elements of the entire original dataframe column every time. [[1]] and [[2]] won't work either. I need something like [[i]][1] and [[i]][2]. But I'm trying not to use a for loop and iterate.
To make a long story short, how do I access the inner list element programmatically
For reference, I did look at this which helped. But I still haven't been able to solve it. apply strsplit to specific column in a data.frame
I'm really struggling with lists and list processing in R so any help is appreciated.
A common idiom is lapply(l,[, 2), which applied to your example gives:
> lapply(split, `[`, 2)
[[1]]
[1] NA
[[2]]
[1] " 30 "
[[3]]
[1] NA
[[4]]
[1] NA
sapply() will collapse this to a vector if it can.
What is being done is lapply() takes each component of split in turn ā€” this is the [[i]] bit of your pseudo code ā€” and to each of those we want to extract the nth element. We do by applying the [ function with argument nā€” in this case 2L.
If you want the first element unless there is a second element, in which case take the second, you could just write a wrapper instead of using [ directly:
wrapper <- function(x) {
if(length(x) > 1L) {
x[2L]
} else {
x[1L]
}
}
lapply(split, wrapper)
which gives
> lapply(split, wrapper)
[[1]]
[1] "50"
[[2]]
[1] " 30 "
[[3]]
[1] "1 "
[[4]]
[1] NA
or perhaps
lens <- lengths(split)
out <- lapply(split, `[`, 2L)
ind <- lens == 1L
out[ind] <- lapply(split[ind], `[`, 1L)
out
but that loops over the output from strsplit() twice.

R get objects' names from the list of objects

I try to get an object's name from the list containing this object. I searched through similar questions and find some suggestions about using the deparse(substitute(object)) formula:
> my.list <- list(model.product, model.i, model.add)
> lapply(my.list, function(model) deparse(substitute(model)))
and the result is:
[[1]]
[1] "X[[1L]]"
[[2]]
[1] "X[[2L]]"
[[3]]
[1] "X[[3L]]"
whereas I want to obtain:
[1] "model.product", "model.i", "model.add"
Thank you in advance for being of some help!
You can write your own list() function so it behaves like data.frame(), i.e., uses the un-evaluated arg names as entry names:
List <- function(...) {
names <- as.list(substitute(list(...)))[-1L]
setNames(list(...), names)
}
my.list <- List(model.product, model.i, model.add)
Then you can just access the names via:
names(my.list)
names(my.list) #..............
Oh wait, you didn't actually create names did you? There is actually no "memory" for the list function. It returns a list with the values of its arguments but not from whence they came, unless you add names to the pairlist given as the argument.
You won't be able to extract the information that way once you've created my.list.
The underlying way R works is that expressions are not evaluated until they're needed; using deparse(substitute()) will only work before the expression has been evaluated. So:
deparse(substitute(list(model.product, model.i, model.add)))
should work, while yours doesn't.
To save stuffing around, you could employ mget to collect your free-floating variables into a list with the names included:
one <- two <- three <- 1
result <- mget(c("one","two","three"))
result
#$one
#[1] 1
#
#$two
#[1] 1
#
#$three
#[1] 1
Then you can follow #DWin's suggestion:
names(result)
#[1] "one" "two" "three"

How to extract elements from a list with mixed elements

I have a list in R with the following elements:
[[812]]
[1] "" "668" "12345_s_at" "667" "4.899777748"
[6] "49.53333333" "10.10930207" "1.598228663" "5.087437057"
[[813]]
[1] "" "376" "6789_at" "375" "4.899655078"
[6] "136.3333333" "27.82508792" "2.20223398" "5.087437057"
[[814]]
[1] "" "19265" "12351_s_at" "19264" "4.897730912"
[6] "889.3666667" "181.5874908" "1.846451572" "5.087437057"
I know I can access them with something like list_elem[[814]][3] in case that I want to extract the third element of the position 814.
I need to extract the third element of all the list, for example 12345_s_at, and I want to put them in a vector or list so I can compare their elements to another list later on. Below is my code:
elem<-(c(listdata))
lp<-length(elem)
for (i in 1:lp)
{
newlist<-c(listdata[[i]][3]) ###maybe to put in a vector
print(newlist)
}
When I print the results I get the third element, but like this:
[1] "1417365_a_at"
[1] "1416336_s_at"
[1] "1416044_at"
[1] "1451201_s_at"
so I cannot traverse them with an index like newlist[3], because it returns NA. Where is my mistake?
If you want to extract the third element of each list element you can do:
List <- list(c(1:3), c(4:6), c(7:9))
lapply(List, '[[', 3) # This returns a list with only the third element
unlist(lapply(List, '[[', 3)) # This returns a vector with the third element
Using your example and taking into account #GSee comment you can do:
yourList <- list(c("","668","12345_s_at","667", "4.899777748","49.53333333",
"10.10930207", "1.598228663","5.087437057"),
c("","376", "6789_at", "375", "4.899655078","136.3333333",
"27.82508792", "2.20223398", "5.087437057"),
c("", "19265", "12351_s_at", "19264", "4.897730912",
"889.3666667", "181.5874908","1.846451572","5.087437057" ))
sapply(yourList, '[[', 3)
[1] "12345_s_at" "6789_at" "12351_s_at"
Next time you can provide some data using dput on a portion of your dataset so we can reproduce your problem easily.
With purrr you can extract elements and ensure data type consistency:
library(purrr)
listdata <- list(c("","668","12345_s_at","667", "4.899777748","49.53333333",
"10.10930207", "1.598228663","5.087437057"),
c("","376", "6789_at", "375", "4.899655078","136.3333333",
"27.82508792", "2.20223398", "5.087437057"),
c("", "19265", "12351_s_at", "19264", "4.897730912",
"889.3666667", "181.5874908","1.846451572","5.087437057" ))
map_chr(listdata, 3)
## [1] "12345_s_at" "6789_at" "12351_s_at"
There are other map_ functions that enforce the type consistency as well and a map_df() which can finally help end the do.call(rbind, ā€¦) madness.
In case you wanted to use the code you typed in your question, below is the fix:
listdata <- list(c("","668","12345_s_at","667", "4.899777748","49.53333333",
"10.10930207", "1.598228663","5.087437057"),
c("","376", "6789_at", "375", "4.899655078","136.3333333",
"27.82508792", "2.20223398", "5.087437057"),
c("", "19265", "12351_s_at", "19264", "4.897730912",
"889.3666667", "181.5874908","1.846451572","5.087437057" ))
v <- character() #creates empty character vector
list_len <- length(listdata)
for(i in 1:list_len)
v <- c(v, listdata[[i]][3]) #fills the vector with list elements (not efficient, but works fine)
print(v)
[1] "12345_s_at" "6789_at" "12351_s_at"

Resources