extract data from a list without using loop in R - r

I have a vector v with row positions:
v<-c(10,3,100,50,...)
with those positions I want to extract elements of a list, having a column fixed, for example lets suppose my column number is 2, so I am doing:
data<-c()
data<-c(list1[[v]][[2]])
list1 has the data in the following format:
[[34]]
[1] "200_s_at" "483" "1933" "3664"
So for example, I want to extract from the row 342 the value 1910 only, column 2, and do the same with the next rows
but I got an error when I want to do that, is it possible to do it directly? or should I have a loop that read one by one the positions in v and fill the data vector like:
#algorithm
for i<-1 to length(v)
pos<-v[i]
data[[i]]<-c(list1[[pos]][[2]])
next i
Thanks

You can use sapply as below:
sapply(list1[v], `[`, 2)
However, depending on your data, you might get an unexpected output, as explained in Why is `vapply` safer than `sapply`?. For example, what if some of your list items have length < 2? What if some of the list items are not vectors but data.frames? Also, the output class may differ based on the class of your list elements (logical, integer, numeric, character). If for example, you expect that all your list items are character vectors of length >= 2, then it is safer to do:
vapply(list1[v], `[`, character(1), 2)
where vapply will double check your assumptions for you, and error out if it finds a problem.

Related

R: Get column size using index

I'd like to get the size of a column using the index. I tried using the length() function with the column index inside, but it doesn't work:
length(bd[7])
I'm sorry if this is too basic, I'm new to R. Thank you!
The bd[7] is still a data.frame with single column and length for a data.frame is by default the number of columns. We need to extract the column as a vector and then use length. Extraction of column depends on the class i.e. if it is a data.frame/matrix, then bd[,7] would drop the dimensions and return a vector, but it is not the case with data.table/tibble. However, all of them works with either $ or [[
length(bd[[7]])
Or if it is a data.frame or vector, NROW would still work though
NROW(bd[7])
i.e.
> NROW(1:7)
[1] 7
> NROW(data.frame(col1 = 1:7))
[1] 7

Break apart nested list into two (or more) lists if the nested list contains a vector of all NAs

I have a nested list (or list of lists) containing vectors of integers. I run this list through a custom function that randomly replaces the integer with NA. I would like to "break apart" the internal list into two lists if a vector contains all NAs
Its likely better with just showing you what I have and what I want instead of text explanation:
#Example of full list data
a<-list(1,3,c(0,2,0),c(0,0))
b<-list(1,6,c(0,3,2,0,1,0),c(0,0,0,1,0,0),1,2,c(0,0),2,c(0,0))
c<-list(1,0)
d<-list(1,0)
e<-list(1,4,c(2,0,0,0),c(4,1),c(1,0,0,0,0),0)
L.full<-list(a,b,c,d,e,)
#Example of list with random positions replaced with NA
f<-list(1,3,c(0,NA,0),c(0,0))
g<-list(1,6,c(0,3,NA,0,NA,0),c(0,NA,0,1,0,0),1,NA,c(0,0),2,c(0,0))
h<-list(1,NA)
i<-list(NA,0)
j<-list(1,NA,c(NA,0,0,0),c(NA,NA),c(1,0,0,NA,0),0)
L.miss<-list(f,g,h,i,j)
#To get what I want, I need to evaluate each list in the list-of-lists for vectors containing all NAs,
#and "break" into two lists (or more, if mulitple vectors in the list contain all NAs)
#In this example:
#"f" should remain complete since no vector in the list contains all NAs
#"g" should be "broken" since the 6th position only has one position and is NA (i.e. all NAs) and has subsequent positions in the list
#"g" should be broken up such that:
g.1<-list(1,6,c(0,3,NA,0,NA,0),c(0,NA,0,1,0,0),1)
g.2<-list(2,c(0,0))
#"h" should remain complete since the NA is at the end and there are no subsequent positions in the list
#"i" should remain complete since the NA is at the beginning and there are no previous positions in the list
#"j" should be broken up since the 2nd and 4th positions contain all NA and have previous/subsequent positions in the list
#"h" should be broken up such that:
j.1<-1
j.2<-c(NA,0,0,0)
j.3<-list(c(1,0,0,NA,0),0)
#In this example, the original list of 5 lists would result in a list of 8 lists/individual vectors, such that:
L.want<-list(f,g.1,g.2,h,i,j.1,j.2,j.3)
I tried quite a few things but I am quite stuck. I thought I may be on to something when I realized a vector of all NAs is a logical, not numeric, so I started coding
#Checking each vector in the nested list for if it is logical
for(i in 1:length(L.miss)){
for (j in 1:length(L.miss[[i]])){
if(is.logical(L.miss[[i]][[j]])){
##I have no idea what to do here to break it apart##
}
}
}
I appreciate any advice or guidance!
One option is to do a nested loop with lapply, create a numeric index based on all NA elements, and split
L.split <- lapply(names(L.miss), function(nm) {
split(L.miss[[nm]], cumsum(sapply(L.miss[[nm]], function(x) all(is.na(x)))))
})
From this, if we need to remove the elements that have all NAs
L.split2 <- lapply(L.split, function(lstA) lapply(lstA,
function(x) Filter(function(y) !all(is.na(y)), x)))
names(L.split2) <- names(L.miss)
data
names(L.miss) <- c('f', 'g', 'h', 'i', 'j')

'dictionary' list to data.table columns

I am converting output from an API call to a bibliography database, that returns content in RIS form. I would then like to get a data.table object, with a row for each database item, and a column for each field of the RIS output.
I will explain more about RIS later, but I am stuck in the following:
I would like to get a data.table using something like:
PubDB <- as.data.table(list(TY = "txtTY",TI = "txtTI"))
which returns:
PubDB
TY TI
1: txtTY txtTI
However, what I have is a string (actually a vector of strings returned from API call: PubStr is one element)
PubStr
## [1] "TY = \"txtTY\",TI = \"txtTI\" "
How can I convert this string to the list needed inside the as.data.table command above?
More specifically, following the first steps of my code, after resp<-GET(url), rawToChar(resp$content) and as.data.table() after some string manipulation, I have a data table with rows for each publication, and one column called PubStr that has the string as above. How to convert this string to many columns, for each row of the data.table. Note: some rows have more or fewer fields.
I am unsure of RIS format but if each element of these strings are separated by commas and then within each comma the header column names are separated by the equal sign then here is a quick and dirty function that uses base R and data.table:
RIS_parser_fn<-function(x){
string_parse_list<-lapply(lapply(x,
function(i) tstrsplit(i,",")),
function(j) lapply(tstrsplit(j,"="),
function(k) t(gsub("\\W","",k))))
datatable_format<-rbindlist(lapply(lapply(string_parse_list,
function(i) data.table(Reduce("rbind",i))),
function(j) setnames(j,unlist(j[1,]))[-1]),fill = T)
return(datatable_format)
}
The first line of code simply creates a list of lists which contain 2 lists of matrices. The outer list has the number of elements equal to the size of the initial vector of strings. The inner list has exactly two matrix elements with the number of columns equal to the number of fields in each string element determined by the ',' sign. The first matrix in each list of lists consists of the columns headers (determined by the '=' sign) and the second matrix contains the values they are equal to. The last gsub simply removes any special characters remaining in the matrices. May need to modify this if you want nonalphanumeric characters to be present in the values. There were not any in your example.
The second line of code converts these lists into one data.table object. The Reduce function simply rbinds the 2 element lists and then converts them to data.tables. Hence there is now only one list consisting of data.tables for each initial string element. The "j" lapply function sets the column names to the first row of the matrix and then removes that row from the data.table. The final rbindlist call combines the list of the data.tables which have varying number of columns. Set the fill=T to allow them to be combined and NAs will be assigned to cells that do not have that particular field.
I added a second string element with one more field to test the code:
PubStr<-c("TY = \"txtTY1\",TI = \"txtTI1\"","TY = \"txtTY2\",TI = \"txtTI2\" ,TF = \"txtTF2\"")
RIS_parser_fn(PubStr)
Returns this:
TY TI TF
1: txtTY1 txtTI1 <NA>
2: txtTY2 txtTI2 txtTF2
Hopefully this will help you out and/or stimulate some ideas for more efficient code. Best of luck!

R: Removing vector entries from a list of vectors after comparison using operator

I'm trying to remove elements smaller than a given number from the vectors contained in a list. I manage to find exactly which elements in the vector meet my criteria, but somehow I'm failing to select them.
myList <- list(1:7,4:7,5:10)
lapply(myList, function(x)`>`(x ,5))
...
Rmagic
...
desiredoutput <- list(6:7,6:7,6:10)
I'm sure it's something to do with `[` but I can't figure it out and searching for this problem is a nightmare.
We need to extract the elements based on the logical index (x>=6)
lapply(myList, function(x) x[x>= 6])

I have a vector whose elements contain multiple numbers. How do I sum numbers inside each element and create a new vector?

I have a vector of class 'factor' like:
vec <-c("1,1,1,1,1,2","2,1,2","3,3,4")
And I want to get another vector like this:
sumvec <- c(7, 5, 10)
How do I do this? I am using R.
Try this:
> sapply(strsplit(as.character(vec), ","), function(x) sum(as.numeric(x)))
[1] 7 5 10
The basic idea is to split the character vector, extract the numeric values, and calculate the sum. strsplit doesn't work on factors, so if you actually have factors, you'll need to convert them to characters first. Similarly, sum won't work on the resulting characters, so you need to convert that to numeric first.

Resources