Adding a suffix to names when storing results in a loop - r

I am making some plots in R in a for-loop and would like to store them using a name to describe the function being plotted, but also which data it came from.
So when I have a list of 2 data sets "x" and "y" and the loop has a structure like this:
x = matrix(
c(1,2,4,5,6,7,8,9),
nrow=3,
ncol=2)
y = matrix(
c(20,40,60,80,100,120,140,160,180),
nrow=3,
ncol=2)
data <- list(x,y)
for (i in data){
??? <- boxplot(i)
}
I would like the ??? to be "name" + (i) + "_" separator. In this case the 2 plots would be called "plot_x" and "plot_y".
I tried some stuff with paste("plot", names(i), sep = "_") but I'm not sure if this is what to use, and where and how to use it in this scenario.

We can create an empty list with the length same as that of the 'data' and then store the corresponding output from the for loop by looping over the sequence of 'data'
out <- vector('list', length(data))
for(i in seq_along(data)) {
out[[i]] <- boxplot(data[[i]])
}
str(out)
#List of 2
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 1 1.5 2 3 4 5 5.5 6 6.5 7
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 0.632 3.368 5.088 6.912
# ..$ out : num(0)
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 20 30 40 50 60 80 90 100 110 120
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 21.8 58.2 81.8 118.2
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
If required, set the names of the list elements with the object names
names(out) <- paste0("plot_", c("x", "y"))
It is better not to create multiple objects in the global environment. Instead as showed above, place the objects in a list

akrun is right, you should try to avoid setting names in the global environment. But if you really have to, you can try this,
> y = matrix(c(20,40,60,80,100,120,140,160,180),ncol=1)
> .GlobalEnv[[paste0("plot_","y")]] <- boxplot(y)
> str(plot_y)
List of 6
$ stats: num [1:5, 1] 20 60 100 140 180
$ n : num 9
$ conf : num [1:2, 1] 57.9 142.1
$ out : num(0)
$ group: num(0)
$ names: chr "1"
You can read up on .GlobalEnv by typing in ?.GlobalEnv, into the R command prompt.

Related

Iterate through a nested list

I am attempting to iterate through a nested list in R, and can't quite get the function/for loop correct.
Sample of my data:
> str(waveforms)
List of 3
$ Sta2_Ev20:List of 7
..$ 1: num [1:10000] 5.88e-05 -2.84e-05 -5.50e-05 7.02e-05 1.90e-06 ...
..$ 2: num [1:10000] 2.61e-05 -2.14e-05 -2.02e-05 2.97e-05 5.94e-06 ...
..$ 3: num [1:10000] 1.08e-05 -4.12e-05 1.95e-05 3.03e-05 -4.55e-05 ...
..$ 4: num [1:10000] 2.45e-05 -1.23e-05 -1.53e-05 2.76e-05 3.07e-06 ...
..$ 5: num [1:10000] 2.29e-05 0.00 5.71e-06 -2.86e-05 5.71e-06 ...
..$ 6: num [1:10000] -1.01e-04 2.37e-05 2.08e-05 -5.93e-06 2.08e-05 ...
..$ 7: num [1:10000] 3.47e-05 -2.75e-05 0.00 1.45e-05 -1.45e-06 ...
$ Sta2_Ev21:List of 34
..$ 1 : num [1:10000] 1.35e-05 -3.46e-05 -3.46e-05 8.65e-05 -2.11e-05 ...
..$ 2 : num [1:10000] 5.68e-05 1.14e-05 -7.38e-05 2.27e-05 4.73e-05 ...
..$ 3 : num [1:10000] 8.21e-06 3.69e-05 -2.46e-05 1.64e-05 -8.21e-06 ...
..$ 4 : num [1:10000] 3.26e-05 -1.34e-05 -1.19e-05 8.90e-06 1.78e-05 ...
..$ 5 : num [1:10000] 2.43e-05 -3.00e-05 1.29e-05 2.86e-06 -1.00e-05 ...
..$ 6 : num [1:10000] -6.87e-06 2.34e-05 -2.34e-05 3.44e-05 -2.20e-05 ...
..$ 7 : num [1:10000] 1.23e-05 -5.75e-05 2.46e-05 1.23e-05 -2.74e-06 ...
..$ 8 : num [1:10000] -2.34e-05 -2.17e-05 1.83e-05 4.17e-05 -4.50e-05 ...
..$ 9 : num [1:10000] 3.34e-05 7.42e-06 -2.04e-05 7.42e-06 0.00 ...
etc...
REPRODUCIBLE DATA
Sta2_Evt1=list(a=runif(10000, min=-12, max=12), b=runif(10000, min=-12, max=12),c=runif(10000, min=-12, max=12))
Sta2_Evt2=list(a=runif(10000, min=-2, max=2), b=runif(10000, min=-2, max=2),c=runif(10000, min=-2, max=2))
...
waveforms=list(Sta2_Evt1,Sta2_Evt2,...))
binsize=5000
And so on. What I need to do it iterate through each list within my list. I tested the data on one of the "Sta#_Evt#" lists. Previously, this code worked:
ch0=list()
for (i in seq_along(Sta2_Evt2)) {
tempobj=head(Sta2_Evt2[[i]],n=binsize)
name <- paste('click',names(Sta2_Evt2)[[i]],sep='')
ch0[[name]] <- tempobj
}
This is simple, just extracting the first 5000 data points from each element. From this new list of elements (ch0), I was able to run multiple scripts to process my data. However, now that I need to expand to include ALL my data, not just the test set I was originally working with, I can't figure out how to run iterations over nested lists (like waveform, above). When I run the code for 'ch0', for instance, over my nested 'waveform' list, it returns the same nested list.
I have tried a few methods: lapply, an additional for loop, llply. I think that maybe writing a function to complete my analysis, and then using llply. However, with this function:
mkChs=function(x,binsize) {for (i in 1:length(x)) {
head(x[[i]],n=binsize)
}}
test=llply(waveforms,mkChs, binsize=5000)
It still does not work. The new list 'test' comes back empty.
I've tried a nest for loop.
ch0=list()
for (i in seq_along(waveforms)) {
a=list(names(waveforms)[[i]])
b=for (j in seq_along(waveforms[i])) {
tempobj=head(waveforms[[i]][[j]],n=binsize)
name <- paste('click',seq_along(waveforms)[[i]][[j]]-1,sep='')
a[[name]] <- tempobj
}
name1 <- names(waveforms)[[i]]
ch0[[name1]] <- b
}
That returns the following:
str(ch0)
List of 3
$ Sta2_Ev20: num [1:5000] 5.88e-05 -2.84e-05 -5.50e-05 7.02e-05 1.90e-06 ...
$ Sta2_Ev21: num [1:5000] 1.35e-05 -3.46e-05 -3.46e-05 8.65e-05 -2.11e-05 ...
$ Sta2_Ev22: num [1:5000] 2.06e-05 3.44e-06 2.06e-05 -3.44e-05 0.00 ...
Not exactly what I am looking for. I'd rather not have a separate list per "Sta#_Evt#" to get this to run properly.
I tried to create a minimal reproducible example which may get close to what you want
waveform <- list("a" = list('1' = c(1,2,3), '2' = c(4,5,6)),
"b" = list('1' = c(7,8,9), '2' = c(10,11,12)))
# arbitrary function
my_fun <- function(vec) {
return(mean(vec))
}
# return list structure
r1 <- lapply(waveform, function (x) {
lapply(x, my_fun)})
# return a two dimensional array
r2 <- sapply(waveform, function (x) {
sapply(x, my_fun)})
str(r1)
# List of 2
# $ a:List of 2
# ..$ 1: num 2
# ..$ 2: num 5
# $ b:List of 2
# ..$ 1: num 8
# ..$ 2: num 11
r2
# a b
# 1 2 8
# 2 5 11
>
I used a nested loop. Turns out my previous loop was missing a pair of parentheses!
ch0=list()
for (i in seq_along(waveforms)) {
a=list()
b=for (j in seq_along(waveforms[[i]])) {
tempobj=head((waveforms[[i]])[[j]],n=binsize)
name <- paste('click',seq_along((waveforms)[[i]])[[j]]-1,sep='')
a[[name]] <- tempobj
}
name1 <- names(waveforms)[[i]]
ch0[[name1]] <- a
}
In the tempobj=head((waveforms[[i]])[[j]],n=binsize) line of the for loop, I had neglected to put parentheses around waveforms[[i]], and again when generating the names.

NA to replace NULL in list/for loop

I am trying to replace NULL values with NAs in a list pulled from an API, but the lengths are different and therefore can't be replaced.
I have tried using the nullToNA function in the toxboot package (found here), but it won't locate the function in R when I try to call it (I don't know if there have been changes to the package which I can't locate or whether it is because the list is not pulled from a MongoDB). I have also tried all the function call checks here . My code is below. Any help?
library(httr)
library(toxboot)
library(RJSONIO)
library(lubridate)
library(xlsx)
library(reshape2)
resUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3010CO3.M"
comUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3020CO3.M"
indUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3035CO3.M"
apiList <- list(resUrl, comUrl, indUrl)
results <- vector("list", length(apiList))
for(i in length(apiList)){
raw <- GET(url = as.character(apiList[i]))
char <- rawToChar(raw$content)
list <- fromJSON(char)
for (j in length(list$series[[1]]$data)){
if (is.null(list$series[[1]]$data[[j]][[2]])== TRUE)
##nullToNA(list$series[[1]]$data[[j]][[2]])
##list$series[1]$data[[j]][[2]] <- NA
else
next
}
##seriesData <- list$series[[1]]$data
unlistResult <- lapply(list, unlist)
##unlistResult <- lapply(seriesData, unlist)
##unlist2 <- lapply(unlistResult,unlist)
##results[[i]] <- unlistResult
results[[i]] <- unlistResult
}
My hashtags have some of the things that I have tried. But there are a few other methods I haven't tried.
I have seen lapply(list, function(x) ifelse (x == "NULL", NA, x)) but haven't had any luck with that eiter.
Try this:
library(httr)
resUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3010CO3.M"
x <- GET(resUrl)
y <- content(x)
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : NULL
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3
In this first URL, only the first within $series[[1]]$data contained a NULL. BTW: be clear to distinguish between NULL (the literal) and "NULL" (a character string with 4 letters).
Here are some ways (with various data types) to check for NULLs:
is.null(NULL)
# [1] TRUE
length(NULL)
# [1] 0
Simple enough so far, let's try to list with NULLs:
l <- list(NULL, 1)
is.null(l)
# [1] FALSE
sapply(l, is.null)
# [1] TRUE FALSE
length(l)
# [1] 2
lengths(l)
# [1] 0 1
sapply(l, length)
# [1] 0 1
(The "0" lengths indicate NULLs.) I'll use lengths here:
y$series[[1]]$data <- lapply(y$series[[1]]$data, function(z) { z[ lengths(z) == 0 ] <- NA; z; })
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : logi NA
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3

R - low-level file IO

I am trying to read in a file in "flexible data format" using R.
I got the number of bytes I should be reading in (counting from EOF, e.g., I should be reading EOF-32 to EOF bytes in as my data).
I am seeking the equivalences to the fseek and fread from MATLAB in R.
I think you would do better with a different approach (if I've got the right "flexible data format" file format here). You can deal with much of these (horrible) files with basic string functions in R:
library(stringr)
# read in fdf file
l <- readLines("http://rud.is/dl/Fe.fdf")
# some basic cleanup
l <- sub("#.*$", "", l) # remove comments
l <- sub("^=.*$", "", l) # remove comments
l <- gsub("\ +", " ", l) # compress spaces
l <- str_trim(l) # beg/end space trim
l <- grep("^$", l, value=TRUE, invert=TRUE) # ignore blank lines
# start of data blocks
blocks <- which(grepl("^%block", l))
# all "easy"/simple lines
simple <- str_split_fixed(grep("^[[:digit:]%]", l, value=TRUE, invert=TRUE),
"[[:space:]]+", 2)
# "simple" name/val [unit] conversions
convert_vals <- function(simple) {
vals <- simple[,2]
names(vals) <- simple[,1]
lapply(vals, function(v) {
# if logical
if (tolower(v) %in% c("t", "true", ".true.", "f", "false", ".false.")) {
return(as.logical(gsub("\\.", "", v)))
}
# if it's just a number
# i may be missing a numeric fmt char in this horrible format
if (grepl("^[[:digit:]\\.\\+\\-]+$", v)) {
return(as.numeric(v))
}
# if value and unit convert to an actual number with a unit attribute
# or convert it here from the table starting on line 927 of fdf.f
if (grepl("^[[:digit:]]", v) & (!any(is.na(str_locate(v, " "))))) {
vu <- str_split_fixed(v, " ", 2)
x <- as.numeric(vu[,1])
attr(x, "unit") <- vu[,2]
return(x)
}
# handle "1.d-3" and other vals with other if's
# anything not handled is returned
return(v)
})
}
# handle begin/end block "complex" data conversion
convert_blocks <- function(lines) {
block_names <- sub("^%block ", "", grep("^%block", lines, value=TRUE))
lapply(blocks, function(blk_start) {
blk <- lines[blk_start]
blk_info <- str_split_fixed(blk, " ", 2)
blk_end <- which(grepl(sprintf("^%%endblock %s", blk_info[,2]), lines))
# this is overly simplistic since you have to do some conversions, but you know the line
# range of the data values now so you can process them however you need to
read.table(text=lines[(blk_start+1):(blk_end-1)],
header=FALSE, stringsAsFactors=FALSE, fill=TRUE)
}) -> blks
names(blks) <- block_names
return(blks)
}
fdf <- c(convert_vals(simple),
convert_blocks(l))
str(fdf)
Output of the str:
List of 32
$ SystemName : chr "bcc Fe ferro GGA"
$ SystemLabel : chr "Fe"
$ WriteCoorStep : chr ""
$ WriteMullikenPop : num 1
$ NumberOfSpecies : num 1
$ NumberOfAtoms : num 1
$ PAO.EnergyShift : atomic [1:1] 50
..- attr(*, "unit")= chr "meV"
$ PAO.BasisSize : chr "DZP"
$ Fe : num 2
$ LatticeConstant : atomic [1:1] 2.87
..- attr(*, "unit")= chr "Ang"
$ KgridCutoff : atomic [1:1] 15
..- attr(*, "unit")= chr "Ang"
$ xc.functional : chr "GGA"
$ xc.authors : chr "PBE"
$ SpinPolarized : logi TRUE
$ MeshCutoff : atomic [1:1] 150
..- attr(*, "unit")= chr "Ry"
$ MaxSCFIterations : num 40
$ DM.MixingWeight : num 0.1
$ DM.Tolerance : chr "1.d-3"
$ DM.UseSaveDM : logi TRUE
$ DM.NumberPulay : num 3
$ SolutionMethod : chr "diagon"
$ ElectronicTemperature : atomic [1:1] 25
..- attr(*, "unit")= chr "meV"
$ MD.TypeOfRun : chr "cg"
$ MD.NumCGsteps : num 0
$ MD.MaxCGDispl : atomic [1:1] 0.1
..- attr(*, "unit")= chr "Ang"
$ MD.MaxForceTol : atomic [1:1] 0.04
..- attr(*, "unit")= chr "eV/Ang"
$ AtomicCoordinatesFormat : chr "Fractional"
$ ChemicalSpeciesLabel :'data.frame': 1 obs. of 3 variables:
..$ V1: int 1
..$ V2: int 26
..$ V3: chr "Fe"
$ PAO.Basis :'data.frame': 5 obs. of 3 variables:
..$ V1: chr [1:5] "Fe" "0" "6." "2" ...
..$ V2: num [1:5] 2 2 0 2 0
..$ V3: chr [1:5] "" "P" "" "" ...
$ LatticeVectors :'data.frame': 3 obs. of 3 variables:
..$ V1: num [1:3] 0.5 0.5 0.5
..$ V2: num [1:3] 0.5 -0.5 0.5
..$ V3: num [1:3] 0.5 0.5 -0.5
$ BandLines :'data.frame': 5 obs. of 5 variables:
..$ V1: int [1:5] 1 40 28 28 34
..$ V2: num [1:5] 0 2 1 0 1
..$ V3: num [1:5] 0 0 1 0 1
..$ V4: num [1:5] 0 0 0 0 1
..$ V5: chr [1:5] "\\Gamma" "H" "N" "\\Gamma" ...
$ AtomicCoordinatesAndAtomicSpecies:'data.frame': 1 obs. of 4 variables:
..$ V1: num 0
..$ V2: num 0
..$ V3: num 0
..$ V4: int 1
You can see the output (and the file and this code) in this gist since it's easier to copy/past/clone a gist.
You still need to:
deal with unit conversion (but with this grid::unit-like structure that shld be far more straightforward)
swap out the naive read.table with a better "block reader"
deal with file includes (pretty simple, tho, if you add a function or two)
With a bit of tweaking/polish this cld be a new R package, not that I'd ever want a data file in this format ever.

Scoping of variables in aes(...) inside a function in ggplot

Consider this use of ggplot(...) inside a function.
x <- seq(1,10,by=0.1)
df <- data.frame(x,y1=x, y2=cos(2*x)/(1+x))
library(ggplot2)
gg.fun <- function(){
i=2
plot(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
}
if(exists("i")) remove(i)
gg.fun()
# Error in `[.data.frame`(df, , i) : object 'i' not found
i=3
gg.fun() # plots df[,3] vs. x
It looks like ggplot does not recognize the variable i defined inside the function, but does recognize i if it is defined in the global environment. Why is that?
Note that this gives the expected result.
gg.new <- function(){
i=2
plot(ggplot(data.frame(x=df$x,y=df[,i]),aes(x,y)) + geom_line())
}
if(exists("i")) remove(i)
gg.new() # plots df[,2] vs. x
i=3
gg.new() # also plots df[,2] vs. x
Let's return a non-rendered ggplot object to see what's going on:
gg.str <- function() {
i=2
str(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
}
gg.str()
List of 9
$ data :'data.frame': 91 obs. of 3 variables:
..$ x : num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
..$ y1: num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
..$ y2: num [1:91] -0.208 -0.28 -0.335 -0.373 -0.393 ...
$ layers :List of 1
..$ :Classes 'proto', 'environment' <environment: 0x0000000009886ca0>
$ scales :Reference class 'Scales' [package "ggplot2"] with 1 fields
..$ scales: list()
..and 21 methods, of which 9 are possibly relevant:
.. add, clone, find, get_scales, has_scale, initialize, input, n, non_position_scales
$ mapping :List of 2
..$ x: symbol x
..$ y: language df[, i]
$ theme : list()
$ coordinates:List of 1
..$ limits:List of 2
.. ..$ x: NULL
.. ..$ y: NULL
..- attr(*, "class")= chr [1:2] "cartesian" "coord"
$ facet :List of 1
..$ shrink: logi TRUE
..- attr(*, "class")= chr [1:2] "null" "facet"
$ plot_env :<environment: R_GlobalEnv>
$ labels :List of 2
..$ x: chr "x"
..$ y: chr "df[, i]"
- attr(*, "class")= chr [1:2] "gg" "ggplot"
As we can see, mapping for y is simply an unevaluated expression. Now, when we ask to do the actual plotting, the expression is evaluated within plot_env, which is global. I do not know why it is done so; I believe there are reasons for that.
Here's a demo that can override this behaviour:
gg.envir <- function(envir=environment()) {
i=2
p <- ggplot(df,aes(x=x,y=df[,i]))+geom_line()
p$plot_env <- envir
plot(p)
}
# evaluation in local environment; ok
gg.envir()
# evaluation in global environment (same as default); fails if no i
gg.envir(environment())

R k-means clustering data

in R, I have computed a k-means clustering as follows:
km = (mat2, centers=3)
where mat2 is a matrix of column vectors obtained by combining elements of a set of time series. There are 31 rows
Now that I have my k-means object how can I look at the data associated with a particular point? For example, supposed I clicked on a dot in that belongs to one of the partitions. How can I view this data? Of course what I mean is how to programmatically obtain this data.
I expect that you call kmeans as this:
set.seed(42)
df <- data.frame( row.names = paste0( "obs", 1:100 ),
V1 = rnorm(100),
V2 = rnorm(100),
V3 = rnorm(100) )
km <- kmeans( df, centers = 3 )
If you are unfamiliar with a new function, it's always a good idea to inspect the resulting object using str():
> str(km)
List of 7
$ cluster : Named int [1:100] 1 2 3 3 1 1 1 1 1 1 ...
..- attr(*, "names")= chr [1:100] "obs1" "obs2" "obs3" "obs4" ...
$ centers : num [1:3, 1:3] 0.65604 -1.09689 0.56428 0.11162 0.00549 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "1" "2" "3"
.. ..$ : chr [1:3] "V1" "V2" "V3"
$ totss : num 291
$ withinss : num [1:3] 43.7 65.7 51.3
$ tot.withinss: num 161
$ betweenss : num 130
$ size : int [1:3] 36 34 30
- attr(*, "class")= chr "kmeans"
As I understood from your question, you are looking for km$cluster, which tells you which observation of your data has been assigned to which cluster. The cluster centers can accordingly be investigated by km$centers.
If you now want to know which observations has been clustered to the third cluster with the center km$centers[3,], you can subset your data.frame (or matrix) by
> rownames(df[ km$cluster == 3, ])
[1] "obs3" "obs4" "obs12" "obs15" "obs16" "obs21" "obs25" "obs27" "obs32" "obs42" "obs43" "obs46" "obs48" "obs54" "obs55" "obs58" "obs61" "obs62" "obs63" "obs66" "obs67" "obs73" "obs76"
[24] "obs77" "obs81" "obs84" "obs86" "obs87" "obs90" "obs94"

Resources