Mapping different args to a list of data frames - r

This might be a job for Purr or maybe I just need to change the layout of my function or my data here.
I've got a function that takes 2 arguments which I'm trying to apply across a list of data frames. One of the arguments should be the list element (the name) name whereas the other will be a list component (a value in the list).
Book List - Book title and Chapters
my_list <- list(Book1 = c("ABC", "DEF", "GHI"), Book2 = c("ABB", "BCC"), Book3 = c("AAA", "BBB", "CCC", "DDD"))
Function with arg for list component and value
my_function <- function(Book, Chapter) {
path <- paste("www.fake-website.com", "ctp:", Book, "/", Chapter, sep = "")
##Would have API call here, but let's just print path
path
}
I can easily call this on an individual item by specifying the arguments in map
map(my_list$Book1, function(Chapter) my_function(Chapter =
Chapter, Book = "Book1"))
Output:
[[1]]
[1] "www.fake-website.comctp:Book1/ABC"
[[2]]
[1] "www.fake-website.comctp:Book1/DEF"
[[3]]
[1] "www.fake-website.comctp:Book1/GHI"
But How do I apply the function to each list element, calling the function to each Book name and the chapter values?
I'm hoping for something like
[[1]]
[1] "www.fake-website.comctp:Book1/ABC"
[[2]]
[1] "www.fake-website.comctp:Book1/DEF"
[[3]]
[1] "www.fake-website.comctp:Book1/GHI"
[[4]]
[1] "www.fake-website.comctp:Book2/ABB"
[[5]]
[1] "www.fake-website.comctp:Book2/BCC"
[[6]]
[1] "www.fake-website.comctp:Book2/AAA"
[[7]]
[1] "www.fake-website.comctp:Book2/BBB"
[[8]]
[1] "www.fake-website.comctp:Book2/CCC"
[[9]]
[1] "www.fake-website.comctp:Book2/DDD"
My function actually isn't simply pasting Books and Chapters, but getting a bunch of info from the API and parsing it.
However, what I need help with is mapping across the list of data frames and pairing the book arg with the chapter arg.

You can use purrr::imap, which passes the names of the list as second argument to the function:
library(purrr)
imap(my_list, ~ my_function(..2, ..1))
# or imap(my_list, ~ my_function(.y, .x))
$Book1
[1] "www.fake-website.comctp:Book1/ABC" "www.fake-website.comctp:Book1/DEF"
[3] "www.fake-website.comctp:Book1/GHI"
$Book2
[1] "www.fake-website.comctp:Book2/ABB" "www.fake-website.comctp:Book2/BCC"
$Book3
[1] "www.fake-website.comctp:Book3/AAA" "www.fake-website.comctp:Book3/BBB"
[3] "www.fake-website.comctp:Book3/CCC" "www.fake-website.comctp:Book3/DDD"
And if you switch the arguments of your function, Book and Chapter, you can simply do:
my_function <- function(Chapter, Book) {
path <- paste("www.fake-website.com", "ctp:", Book, "/", Chapter, sep = "")
path
}
imap(my_list, my_function)
$Book1
[1] "www.fake-website.comctp:Book1/ABC" "www.fake-website.comctp:Book1/DEF"
[3] "www.fake-website.comctp:Book1/GHI"
$Book2
[1] "www.fake-website.comctp:Book2/ABB" "www.fake-website.comctp:Book2/BCC"
$Book3
[1] "www.fake-website.comctp:Book3/AAA" "www.fake-website.comctp:Book3/BBB"
[3] "www.fake-website.comctp:Book3/CCC" "www.fake-website.comctp:Book3/DDD"

Related

How to extract values from a list with multiple levels in r

I have a list looks like this
[[1]]
[[1]][[1]]
[[1]][[1]]$p1est.z
[1] 2.890829
[[1]][[1]]$p1se.z
[1] 0.1418367
[[1]][[2]]
[[1]][[2]]$p2est.w
[1] 4.947014
[[1]][[2]]$p2se.w
[1] 0.5986682
[[2]]
[[2]][[1]]
[[2]][[1]]$p1est.z
[1] 3.158164
[[2]][[1]]$p1se.z
[1] 0.138770
[[2]][[2]]
[[2]][[2]]$p2est.w
[1] 5.052874
[[2]][[2]]$p2se.w
[1] 0.585608
How can I extract values of "p1est.z" from both levels? since I need to compute the average of them.
Thanks!
Actually the unlist() function out of the box should probably work here:
output <- unlist(your_list)
output[names(output) == "p1est.z"]
p1est.z p1est.z
2.890829 3.158164
Data:
your_list <- list(
list(list(p1est.z=2.890829, p1se.z=0.1418367),
list(p1est.w=4.947014, p2se.w=0.5986682)),
list(list(p1est.z=3.158164, p1se.z=0.138770),
list(p1est.w=5.052874, p2se.w=0.585608)))
One way to do this, using Tim Biegeleisen's representation of your data is to make a function to extract p1est.z and apply that. Your top level list has two elements, in both, the first element has a p1est.z so you could do
fn <- function(x) { x[[1]]$p1est.z }
and then apply it
sapply(your_list, fn)
# [1] 2.890829 3.158164

How can I use two lists to create a Table (Columns and Rows)

I want to write a script in R that allows me to import MSG files and store the information in a table. The fields may vary by course, so the column names are defined based on the first MSG file being imported.
The import and extraction are already working (special thanks to the user "January")
What does not work is the filling in the table, which consists of two steps. Add column names and fill in rows.
I've tried using unlist to prepare the contents of the lists so that I can add them as colums and rows to a table.
Anmeldung <- gsub("^\\s+", "", Anmeldung) # remove spaces at the beginning and end
Anmeldung <- gsub("\\s+$", "", Anmeldung)
words <- strsplit(Anmeldung, " *[\n\r]+ *")[[1]]
fields <- as.list(words[seq(1, length(words), 2)])
information <- as.list(words[seq(2, length(words), 2)])
resTab1 = data.frame(t(unlist(fields)))
resTab2 = data.frame(t(unlist(information)))
colnames(resTab2) = c(resTab1)
variable.names(resTab2)
When I am trying to create the Table,this error appears:
colnames(resTab2) = c(resTab1)
Error in names(x) <- value :
'names' attribute [22] must be the same length as the vector [21]
This is what the Dataframes Fields and Information look like:
Fields
> fields
[[1]]
[1] "Anrede"
[[2]]
[1] "Vorname"
[[3]]
[1] "Name"
[[4]]
[1] "Email (für Kontaktaufnahme)"
[[5]]
[1] "Telefon/Mobile (geschäftlich)"
[[6]]
[1] "Telefon/Mobile (privat)"
[[7]]
[1] "Strasse/Nr."
Information:
> information
[[1]]
[1] "Herr"
[[2]]
[1] "James"
[[3]]
[1] "Bond"
[[4]]
[1] "james.bond#email.com"
[[5]]
[1] "007 000 77 07"
[[6]]
[1] "007 000 77 07"
[[7]]
[1] "Lampenstrasse 8"
I see you're trying to give names to resTab2 that is shorter than your resTab1
ex:
x <- c(1,2)
y <- c("a","b","c")
names(x) <- y
#Error in names(x) <- y :
#'names' attribute [3] must be the same length as the vector [2]
EDIT:
use unlist to flatten the list
information <- unlist(information)
fields <- unlist(fields)
names(information) <- fields
information
#OUTPUT
#Anrede 'Herr'
#Vorname 'James'
#Name 'Bond'
#Email (für Kontaktaufnahme) 'james.bond#email.com'
#Telefon/Mobile (geschäftlich) '007 000 77 07'
#Telefon/Mobile (privat) '007 000 77 07'
#Strasse/Nr. 'Lampenstrasse 8'

Cannot iterate through a list while I iterate through a list of lists

The input is a list of lists. Please see below. The file names is a list containing as many names as there are lists in the list (name1, name2, name3).
Each name is appended to the path: path/name1 - path/name2 - path/name3
The program iterates through the list containing the paths as it iterates through the list of lists and prints the paths with their file names. I would expect for the output to be path/name1 - path/name2 - path/name3. However I get the output below. Please see OUTPUT after INPUT
INPUT
[[1]]
[1] "150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt" "160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
[3] "JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt" "JF_160426_Dep2Plas_tryp_Gpep_SIDtarg-(06)_PSMs.txt"
[[2]]
[1] "150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt" "160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
[3] "JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt"
[[3]]
[1] "150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt"
"160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
OUTPUT
I would expect for the output to be path/nam1 - path/name2 - path/name3
[1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/name1.tsv",
[1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/name2.tsv",
[1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/name3.tsv".
However I get the output below:
[1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/name1.tsv"
I cannot understand why I cannot iterate through the list of paths with the file name while iterating through the list of lists. I hope this helps to clarify the problem. Could anyone help with this?
I have analyzed each statement using printing and every thing works fine except for the output of the code below
for (i in 1:length(lc)) {
for (j in 1:length(lc[[i]])) { # fetch and read files
if (j==1) {
newFile <- paste(dataFnsDir, lc[[i]][j], sep="/")
newFile <- tryCatch(read.delim(newFile, header = TRUE, sep = '/'), error=function(e) NULL)
newFile<- tryCatch(newFile, error=function(e) data.frame())
print(tmpFn[i])
} else {
newFile <- paste(dataFnsDir, lc[[i]][j], sep="/")
newFile <- tryCatch(read.delim(newFilei, header = TRUE, sep = '/'), error=function(e) NULL)
newFile <- tryCatch(newFile, error=function(e) data.frame())
newFile <- dplyr::bind_rows(newFile, newFile)
print(tmpFn[i])
}
}
}
There's no need to use nested loop. try this:
# sample data
dataFnsDir <- "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/"
lc <- list()
lc[[1]] <- c("150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt","160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
, "JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt" ,"JF_160426_Dep2Plas_tryp_Gpep_SIDtarg-(06)_PSMs.txt"
)
lc[[2]] <- c(
"150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt" , "160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt",
"JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt"
)
lc[[3]] <- c(
"150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt",
"160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
)
# actual code
lc.path.v <- paste0(dataFnsDir,unlist(lc))
# maybe this is what you want?
lc.path.v
#> [1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt"
#> [2] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
#> [3] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt"
#> [4] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/JF_160426_Dep2Plas_tryp_Gpep_SIDtarg-(06)_PSMs.txt"
#> [5] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt"
#> [6] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
#> [7] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt"
#> [8] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt"
#> [9] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
If you want to read all of them and combine them together, try this(it may not work because I don't know what the data looks like):
lc.alldf <- lapply(lc.path, read.delim, header = TRUE, sep = "/")
lc.onedf <- dplyr::bind_rows(lc.alldf)
Edit:
code improved, thanks! #Onyambu
If I understand correctly, the OP wants to create 3 new files each from the file names given as character vectors in each list element.
The main issue with OP's code is that newFile is overwritten in each iteration of the nested loops.
Here is what I would with my preferred tools (untested):
library(data.table) # for fread() and rbindlist()
library(magrittr) # use piping for clarity
lapply(
lc,
function(x) {
filenames <- file.path(dataFnsDir, x)
lapply(filenames, fread) %>%
rbindlist()
}
)
This will return a list of three dataframes (data.tables).
I do not have the OP's input files available but we can simulate the effect for demonstration. If we remove the second call to lapply() we will get a list of 3 elements each containing a character vector of file names with the path prepended.
lapply(
lc,
function(x) {
filenames <- file.path(dataFnsDir, x)
print(filenames)
}
)
[[1]]
[1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt"
[2] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
[3] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt"
[4] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/JF_160426_Dep2Plas_tryp_Gpep_SIDtarg-(06)_PSMs.txt"
[[2]]
[1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt"
[2] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
[3] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt"
[[3]]
[1] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt"
[2] "/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA/160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
Data
dataFnsDir <-"/home/giuseppa/Development/glycoPipeApp/OUT/openMS/INPUT_DATA"
lc <- list(
c("150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt",
"160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt",
"JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt",
"JF_160426_Dep2Plas_tryp_Gpep_SIDtarg-(06)_PSMs.txt"
),
c(
"150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt" ,
"160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt",
"JF_160426_Dep2Plas_ctryp_Gpep_SIDtargFULL__PSMs.txt"
),
c(
"150413_JF_GPeps_SIDtarg_GPstdMix_Tryp_2runs_v3_PSMs.txt",
"160824_JF_udep_tryp_Hi_SIDdda_FULL_NewParse-(05)_PSMs.txt"
)
)

How to use unlist with nested lapply in R [duplicate]

This question already has answers here:
How to remove a level of lists from a list of lists
(2 answers)
Closed 4 years ago.
I am working on a difficult function. Giving an example of my function is very hard, hence, I tried to give a very close example to my problem. I would like to get the output as a list instead of a list of list.
Input
x <- list(rnorm(10,2,3), rnorm(10,3,4))
y <- list(rnorm(10,4,5), rnorm(10,5,6))
z <- list(x, y)
xy <- lapply(seq_along(z), function(i) {
lapply(seq_along( z[[i]]), function(j) {
x[[i]][[j]]*z[[i]][[j]]
})
})
unlist(xy)
The Output
xy
[[1]]
[[1]][[1]]
[1] 2.2280230 -4.9779716 4.1359718 10.3939970 -5.2133243 -1.2696787 0.5000506 4.7157700 7.8720780 7.0678141
[[1]][[2]]
[1] -14.950644 -7.263222 -6.586231 9.762505 -4.686088 4.259647 -3.579593 -7.341470 -13.626069 4.979983
[[2]]
[[2]][[1]]
[1] 3.2567110 18.8390907 32.7599898 16.5438238 10.7631826 35.8007750 7.0666637 -9.0148408 -2.5030033 -0.6119803
[[2]][[2]]
[1] 26.766508 9.292216 8.767470 20.690148 20.456934 22.686122 1.981408 1.763479 9.060410 35.391961
expected Output
xy
[[1]]
[1] 2.2280230 -4.9779716 4.1359718 10.3939970 -5.2133243 -1.2696787 0.5000506 4.7157700 7.8720780 7.0678141
[[2]]
[1] -14.950644 -7.263222 -6.586231 9.762505 -4.686088 4.259647 -3.579593 -7.341470 -13.626069 4.979983
[[3]]
[1] 3.2567110 18.8390907 32.7599898 16.5438238 10.7631826 35.8007750 7.0666637 -9.0148408 -2.5030033 -0.6119803
[[4]]
[1] 26.766508 9.292216 8.767470 20.690148 20.456934 22.686122 1.981408 1.763479 9.060410 35.391961
I tried unlist but it gave me a vector.
Use unlist(xy, recursive = FALSE).
It will prevent unlisting to be applied to components of the list.
The output is:
[[1]]
[1] 0.27862974 1.47723685 -1.82963782 3.47664717 0.62645954 1.67429065 -0.06359767 -1.21542539 1.65609366 2.65336458
[[2]]
[1] 1.167232 3.318266 5.949589 -18.459982 -5.321955 7.810067 -12.792953 2.723463 9.934529 16.385867
[[3]]
[1] 5.4596367 1.3340797 4.8059125 -0.2578762 1.2808736 2.6462153 -3.6259595 1.4900160 -0.1496829 -0.8140339
[[4]]
[1] 13.130614 2.957532 2.270956 1.015446 -3.254110 -4.939529 1.465290 -3.141455 5.803487 15.114528
You can do the following:
library(purrr)
flatten(xy)
I think this is what you wanted, but let me know if otherwise.

Nested List Parsing with jsonlite

This is the second time that I have faced this recently, so I wanted to reach out to see if there is a better way to parse dataframes returned from jsonlite when one of elements is an array stored as a column in the dataframe as a list.
I know that this part of the power with jsonlite, but I am not sure how to work with this nested structure. In the end, I suppose that I can write my own custom parsing, but given that I am almost there, I wanted to see how to work with this data.
For example:
## options
options(stringsAsFactors=F)
## packages
library(httr)
library(jsonlite)
## setup
gameid="2015020759"
SEASON = '20152016'
BASE = "http://live.nhl.com/GameData/"
URL = paste0(BASE, SEASON, "/", gameid, "/PlayByPlay.json")
## get the data
x <- GET(URL)
## parse
api_response <- content(x, as="text")
api_response <- jsonlite::fromJSON(api_response, flatten=TRUE)
## get the data of interest
pbp <- api_response$data$game$plays$play
colnames(pbp)
And exploring what comes back:
> class(pbp$aoi)
[1] "list"
> class(pbp$desc)
[1] "character"
> class(pbp$xcoord)
[1] "integer"
From above, the column pbp$aoi is a list. Here are a few entries:
> head(pbp$aoi)
[[1]]
[1] 8465009 8470638 8471695 8473419 8475792 8475902
[[2]]
[1] 8470626 8471276 8471695 8476525 8476792 8477956
[[3]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[4]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[5]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[6]]
[1] 8469619 8471695 8473492 8474625 8475727 8475902
I don't really care if I parse these lists in the same dataframe, but what do I have for options to parse out the data?
I would prefer to take the data out of out lists and parse them into a dataframe that can be "related" to the original record it came from.
Thanks in advance for your help.
From #hrbmstr above, I was able to get what I wanted using unnest.
select(pbp, eventid, aoi) %>% unnest() %>% head

Resources