Subset trials from a list using a map function - r

I am trying to create another list to include all of the trials from 2 out of the 3 variables shown in the picture.
I am trying to learn how to map this.
So far I have:
d2 <- map(d1 ,`[` , c("time_100L_1", "vertical_100L_1"))
but this only brings in the first trial. I need all 14 for time and vertical and force is in the middle of the list.
any suggestions? See picture for list

map(d1, [, c(paste0("time_100L_", 1:14), paste0("vertical_100L_", 1:14)))

Related

R Refer to (part of) data frame using string in R

I have a large data set in which I have to search for specific codes depending on what i want. For example, chemotherapy is coded by ~40 codes, that can appear in any of 40 columns called (diag1, diag2, etc).
I am in the process of writing a function that produces plots depending on what I want to show. I thought it would be good to specify what I want to plot in a input data frame. Thus, for example, in case I only want to plot chemotherapy events for patients, I would have a data frame like this:
Dataframe name: Style
Name SearchIn codes PlotAs PlotColour
Chemo data[substr(names(data),1,4)=="diag"] 1,2,3,4,5,6 | red
I already have a function that searches for codes in specific parts of the data frame and flags the events of interest. What i cannot do, and need your help with, is referring to a data frame (Style$SearchIn[1]) using codes in a data frame as above.
> Style$SearchIn[1]
[1] data[substr(names(data),1,4)=="diag"]
Levels: data[substr(names(data),1,4)=="diag"]
I thought perhaps get() would work, but I cant get it to work:
> get(Style$SearchIn[1])
Error in get(vars$SearchIn[1]) : invalid first argument
enter code here
or
> get(as.character(Style$SearchIn[1]))
Error in get(as.character(Style$SearchIn[1])) :
object 'data[substr(names(data),1,5)=="TDIAG"]' not found
Obviously, running data[substr(names(data),1,5)=="TDIAG"] works.
Example:
library(survival)
ex <- data.frame(SearchIn="lung[substr(names(lung),1,2) == 'ph']")
lung[substr(names(lung),1,2) == 'ph'] #works
get(ex$SearchIn[1]) # does not work
It is not a good idea to store R code in strings and then try to eval them when needed; there are nearly always better solutions for dynamic logic, such as lambdas.
I would recommend using a list to store the plot specification, rather than a data.frame. This would allow you to include a function as one of the list's components which could take the input data and return a subset of it for plotting.
For example:
library(survival);
plotFromSpec <- function(data,spec) {
filteredData <- spec$filter(data);
## ... draw a plot from filteredData and other stuff in spec ...
};
spec <- list(
Name='Chemo',
filter=function(data) data[,substr(names(data),1,2)=='ph'],
Codes=c(1,2,3,4,5,6),
PlotAs='|',
PlotColour='red'
);
plotFromSpec(lung,spec);
If you want to store multiple specifications, you could create a list of lists.
Have you tried using quote()
I'm not entirely sure what you want but maybe you could store the things you're trying to get() like
quote(data[substr(names(data),1,4)=="diag"])
and then use eval()
eval(quote(data[substr(names(data),1,4)=="diag"]), list(data=data))
For example,
dat <- data.frame("diag1"=1:10, "diag2"=1:10, "other"=1:10)
Style <- list(SearchIn=c(quote(data[substr(names(data),1,4)=="diag"]), quote("Other stuff")))
> head(eval(Style$SearchIn[[1]], list(data=dat)))
diag1 diag2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6

R - split list by defined interval

I have a list of files whose length will always be a multiple of 12. This is a simplified sample:
files <- c("LC82210802013322LGN00_B1.TIF", "LC82210802013322LGN00_B10.TIF",
"LC82210802013322LGN00_B11.TIF", "LC82210802013322LGN00_B2.TIF",
"LC82210802013322LGN00_B3.TIF", "LC82210802013322LGN00_B4.TIF",
"LC82210802013322LGN00_B5.TIF", "LC82210802013322LGN00_B6.TIF",
"LC82210802013322LGN00_B7.TIF", "LC82210802013322LGN00_B8.TIF",
"LC82210802013322LGN00_B9.TIF", "LC82210802013322LGN00_BQA.TIF",
"LC82210802013354LGN00_B1.TIF", "LC82210802013354LGN00_B10.TIF",
"LC82210802013354LGN00_B11.TIF", "LC82210802013354LGN00_B2.TIF",
"LC82210802013354LGN00_B3.TIF", "LC82210802013354LGN00_B4.TIF",
"LC82210802013354LGN00_B5.TIF", "LC82210802013354LGN00_B6.TIF",
"LC82210802013354LGN00_B7.TIF", "LC82210802013354LGN00_B8.TIF",
"LC82210802013354LGN00_B9.TIF", "LC82210802013354LGN00_BQA.TIF",
"LC82210802014021LGN00_B1.TIF", "LC82210802014021LGN00_B10.TIF",
"LC82210802014021LGN00_B11.TIF", "LC82210802014021LGN00_B2.TIF",
"LC82210802014021LGN00_B3.TIF", "LC82210802014021LGN00_B4.TIF",
"LC82210802014021LGN00_B5.TIF", "LC82210802014021LGN00_B6.TIF",
"LC82210802014021LGN00_B7.TIF", "LC82210802014021LGN00_B8.TIF",
"LC82210802014021LGN00_B9.TIF", "LC82210802014021LGN00_BQA.TIF",
"LC82210802014037LGN00_B1.TIF", "LC82210802014037LGN00_B10.TIF",
"LC82210802014037LGN00_B11.TIF", "LC82210802014037LGN00_B2.TIF",
"LC82210802014037LGN00_B3.TIF", "LC82210802014037LGN00_B4.TIF",
"LC82210802014037LGN00_B5.TIF", "LC82210802014037LGN00_B6.TIF",
"LC82210802014037LGN00_B7.TIF", "LC82210802014037LGN00_B8.TIF",
"LC82210802014037LGN00_B9.TIF", "LC82210802014037LGN00_BQA.TIF",
"LC82210802014085LGN00_B1.TIF", "LC82210802014085LGN00_B10.TIF",
"LC82210802014085LGN00_B11.TIF", "LC82210802014085LGN00_B2.TIF",
"LC82210802014085LGN00_B3.TIF", "LC82210802014085LGN00_B4.TIF",
"LC82210802014085LGN00_B5.TIF", "LC82210802014085LGN00_B6.TIF",
"LC82210802014085LGN00_B7.TIF", "LC82210802014085LGN00_B8.TIF",
"LC82210802014085LGN00_B9.TIF", "LC82210802014085LGN00_BQA.TIF"
)
Those files are satellite images. There are always 12 files (or bands) for each single date. In this case, there are five groups (dates) with 12 files each, totalling 60 elements.
What I need to do is to split this list into groups of 12, ideally creating a new variable. Using the sample data provided above, the new variable would have five elements (corresponding to dates), each one containing 12 files:
new<-list()
length(new) <- length(files)/12
# CODE BELOW DOESN'T WORK. I JUST WANT TO SHOW WHAT I NEED TO DO
new[1] <- files[1:12]
new[2] <- files[13:24]
new[3] <- files[25:36]
new[4] <- files[37:48]
new[5] <- files[49:60]
How to find a generic solution for this problem? Generic in the sense that the original list of files will always be a multiple of 12, but not always have lenght of 60 elements - sometimes 72, and sometimes 120.
Thanks in advance for any help.
After some more in depth research I came out with the following solution:
new <- split(files, ceiling(seq_along(files)/12))
which works ok. Any better idea?
Thanks,
Thiago.

How to store trees/nested lists in R?

I have a list of boroughs and a list of localities (like this one). Each locality lies in exactly one borough. What's the best way to store this kind of hierarchical structure in R, considerung that I'd like to have a convenient and readable way of accessing these, and using this list to accumulate data on the locality-level to the borough level.
I've come up with the following:
localities <- list("Mitte" = c("Mitte", "Moabit", "Hansaviertel", "Tiergarten", "Wedding", "Gesundbrunnen",
"Friedrichshain-Kreuzberg" = c("Friedrichshain", "Kreuzberg")
)
But I am not sure if this is the most elegant and accessible way.
If I wanted to assign additional information on the localitiy-level, I could do that by replacing the c(...) by some other call, like rbind(c('0201', '0202'), c("Friedrichshain", "Kreuzberg")) if I wanted to add additional information to the borough-level (like an abbreviated name and a full name for each list), how would I do this?
Edit: For example, I'd like to condense a table like this into a borough-wise version.
Hard to know without having a better view on how you intend to use this, but I would strongly recommend moving away from a nested list structure to a data frame structure:
library(reshape2)
loc.df <- melt(localities)
This is what the molten data looks like:
value L1
1 Mitte Mitte
2 Moabit Mitte
3 Hansaviertel Mitte
4 Tiergarten Mitte
5 Wedding Mitte
6 Gesundbrunnen Mitte
7 Friedrichshain Friedrichshain-Kreuzberg
8 Kreuzberg Friedrichshain-Kreuzberg
You can then use all the standard data frame and other computations:
loc.df$population <- sample(100:500, nrow(loc.df)) # make up population
tapply(loc.df$population, loc.df$L1, mean) # population by borough
gives mean population by Borough:
Friedrichshain-Kreuzberg Mitte
278.5000 383.8333
For more complex calculations you can use data.table and dplyr
You can extract all of this data directly into a data.frame using the XML library.
library(XML)
theurl <- "http://en.wikipedia.org/wiki/Boroughs_and_localities_of_Berlin#List_of_localities"
tables<-readHTMLTable(theurl)
boroughs<-tables[[1]]$Borough
localities<-tables[c(3:14)]
names(localities) <- as.character(boroughs)
all<-do.call("rbind", localities)
#Roland, I think you will find data frames superior to lists for the reasons cited earlier, but also because there is other data on the web page you reference. Loading to a data frame will make it easy to go further if you wish. For example, making comparisons based on population density or other items provided "for free" on the page will be a snap from a data frame.

R storing different columns in different vectors to compute conditional probabilities

I am completely new to R. I tried reading the reference and a couple of good introductions, but I am still quite confused.
I am hoping to do the following:
I have produced a .txt file that looks like the following:
area,energy
1.41155882174e-05,1.0914586287e-11
1.46893363946e-05,5.25011714434e-11
1.39244046855e-05,1.57904991488e-10
1.64155121046e-05,9.0815757601e-12
1.85202830392e-05,8.3207522281e-11
1.5256036289e-05,4.24756620609e-10
1.82107587343e-05,0.0
I have the following command to read the file in R:
tbl <- read.csv("foo.txt",header=TRUE).
producing:
> tbl
area energy
1 1.411559e-05 1.091459e-11
2 1.468934e-05 5.250117e-11
3 1.392440e-05 1.579050e-10
4 1.641551e-05 9.081576e-12
5 1.852028e-05 8.320752e-11
6 1.525604e-05 4.247566e-10
7 1.821076e-05 0.000000e+00
Now I want to store each column in two different vectors, respectively area and energy.
I tried:
area <- c(tbl$first)
energy <- c(tbl$second)
but it does not seem to work.
I need to different vectors (which must include only the numerical data of each column) in order to do so:
> prob(energy, given = area), i.e. the conditional probability P(energy|area).
And then plot it. Can you help me please?
As #Ananda Mahto alluded to, the problem is in the way you are referring to columns.
To 'get' a column of a data frame in R, you have several options:
DataFrameName$ColumnName
DataFrameName[,ColumnNumber]
DataFrameName[["ColumnName"]]
So to get area, you would do:
tbl$area #or
tbl[,1] #or
tbl[["area"]]
With the first option generally being preferred (from what I've seen).
Incidentally, for your 'end goal', you don't need to do any of this:
with(tbl, prob(energy, given = area))
does the trick.

Pasting (or merging) two elements of a column together

I have two sources of clinical procedure billing information that I have added together (with rbind). In each row there is a CPT field and a CPT.description field that supplys a brief explanation. However, the descriptions are slightly different from the two sources. I want to be able to combine them. That way, if different words or abbreviations are used, then I can just do a string search to find what I am looking for.
So lets make up a simplified representation of a data table that I was able to generate.
cpt <- c(23456,23456,10000,44555,44555)
description <- c("tonsillectomy","tonsillectomy in >12 year old","brain transplant","castration","orchidectomy")
cpt.desc <- data.frame(cpt,description)
And here is what I want to get to.
cpt.wanted <- c(23456,10000,44555)
description.wanted <- c("tonsillectomy; tonsillectomy in >12 year old","brain transplant","castration; orchidectomy")
cpt.desc.wanted <- data.frame(cpt.wanted,description.wanted)
I have tried using functions such as unstack and then lapply(list,paste) but that is not pasting the elements of each list. I also tried reshape but there was no categorical variable to differentiate first or second version of description or even in some cases a third. The really annoying part is I had a similar problem a few months or years ago and someone helped me either on stackoverflow or on r-help and for the life of me I cannot find it.
So the underlying problem is, imagine that I have a spreadsheet in front of me. I need to do a vertical merge (paste) of two or maybe even three description cells who have the same CPT code in the adjacent column.
What buzzwords should I have been using to search for a solution to this problem.
Thank you so much for your help.
sapply( sapply(unique(cpt), function(x) grep(x, cpt) ),
# creates sets of index vectors as a list
function(x) paste(description[x], collapse=";") )
# ... and this pastes each set of selected items from "description" vector
[1] "tonsillectomy;tonsillectomy in >12 year old"
[2] "brain transplant"
[3] "castration;orchidectomy"
Here is an approach that uses plyr.
library("plyr")
cpt.desc.wanted <- ddply(cpt.desc, .(cpt), summarise,
description.wanted = paste(unique(description), collapse="; "))
which gives
> cpt.desc.wanted
cpt description.wanted
1 10000 brain transplant
2 23456 tonsillectomy; tonsillectomy in >12 year old
3 44555 castration; orchidectomy

Resources