Impute missing values - r

I want to impute some data. I use the data moss from the package mvoutlier. The goal is to impute the values < 0.004 from the column Bi. Because the moss date are compositional data, I use methods from the package robCompositions. When I try to impute the values, I get an error.
Code:
library(mvoutlier)
library(robCompositions)
data(moss)
attach(moss)
x <- moss[-c(1,2,3)] # copying the data from moss, withoud the first 3 variables into x
x$Bi[Bi < 0.004] <- 0 # the values that are under 0.004 are replaced with 0
res <- impRZilr(x,dl=c(0,0,0,0,0,0.004,rep(0,25)))
|======= | 10%Error in !all.equal(x[!w], xOrig[!w]) : invalid argument type
Don't know how to handle this error

library(mvoutlier)
library(robCompositions)
data(moss)
x <- moss[-c(1,2,3)] #copying the data from moss, withoud the first 3 variables into x
### Before
head(x$Bi)
## [1] 0.002 0.039 0.012 0.033 0.002 0.052
# Impute below 0.004
x$Bi[x$Bi < 0.004] <- 0
## head(x$Bi)
## [1] 0.000 0.039 0.012 0.033 0.000 0.052
# Imputation
result <- impRZilr(x, dl = rep(0.004, nrow(x)))
res <- data.frame(result$x)
head(res$Bi)
## [1] 0.002515667 0.039000000 0.012000000 0.033000000 0.002836172 0.052000000
As you can see, the values that were 0 are replaced by the impRZilr function values.
EDIT
Here is a description of how to access the results as required in your comments.
# Imputation
# Use the verbose = TRUE option to see how the algorithm is iterating
result <- impRZilr(x, dl = rep(0.004, nrow(x)), verbose = TRUE)
### Results description
str(result)
# List of 7
# $ x : num [1:598, 1:31] 0.016 0.073 0.032 0.118 0.038 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:31] "Ag" "Al" "As" "B" ...
# $ criteria: num 0.0203
# $ iter : num 4
# $ maxit : num 10
# $ wind : logi [1:598, 1:31] FALSE FALSE FALSE FALSE FALSE FALSE ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:598] "1" "2" "3" "4" ...
# .. ..$ : chr [1:31] "U" "Bi" "Th" "Tl" ...
# $ nComp : int [1:4] 4 6 3 5
# $ method : chr "pls"
# - attr(*, "class")= chr "replaced"
# Results data.frame with imputed ceros
res <- data.frame(result$x)
# Index of missing values
index_missing_wind <- data.frame(result$wind)
# Number of iterations
result$iter
# [1] 4
# Method used (you can change this)
result$method

The OP wrote in an edit:
I managed to solve the problem, this is what I did:
x <-moss[-c(1,2,3)]
x$Bi[Bi <- 0.004] <- NA
res <- impAll(x)
and the object res contains the imputed matrix.

Related

How to flatten nested list keeping inner structure

Let's say I have the following list (result), which is a nested list that captures information from a model: parameters (betas) and their standard errors (sd), additionally with some information regarding the global model (method) and the number of observations (n).
I want to flatten the lists betas and sd while distinguishing where each value of x1 and x2 comes from (i.e. if they are from betas or sd).
Please gently consider the following example:
result<- list(n = 100,
method = "tree",
betas = list(x1 = 1.47,
x2 = -2.85),
sd = list(x1 = 0.55,
x2 = 0.25))
str(result)
# List of 4
# $ n : num 100
# $ iterations: num 50
# $ betas :List of 2
# ..$ x1: num 1.47
# ..$ x2: num -2.85
# $ sd :List of 2
# ..$ x1: num 0.55
# ..$ x2: num 0.25
First attempt: flatten(). [Spoiler(!): I lose the precedence of each value]
## I can't distinguish between betas and sd.
flatten(result)
# $n
# [1] 100
#
# $iterations
# [1] 50
#
# $x1
# [1] 1.47
#
# $x2
# [1] -2.85
#
# $x1
# [1] 0.55
#
# $x2
# [1] 0.25
Second attempt: unlist(). [Spoiler(!), I need a list, not an atomic vector]
#I need a list
unlist(result)
# n iterations betas.x1 betas.x2 sd.x1 sd.x2
# 100.00 50.00 1.47 -2.85 0.55 0.25
Desired Output.
list(n = 100,
method = "tree",
betas.x1 = 1.47,
betas.x2 = -2.85,
sd.x1 = 0.55,
sd.x2 = 0.25)
# List of 6
# $ n : num 100
# $ method : chr "tree"
# $ betas.x1: num 1.47
# $ betas.x2: num -2.85
# $ sd.x1 : num 0.55
# $ sd.x2 : num 0.25
as.data.frame will flatten for you. From ?as.data.frame:
Arrays with more than two dimensions are converted to matrices by
'flattening' all dimensions after the first and creating suitable
column labels.
Which does a poor job of explaining that it operates on nested lists as well, not just arrays. (In other words, I think the docs do not discuss this feature on non-arrays.)
str(as.data.frame(result))
# 'data.frame': 1 obs. of 6 variables:
# $ n : num 100
# $ method : chr "tree"
# $ betas.x1: num 1.47
# $ betas.x2: num -2.85
# $ sd.x1 : num 0.55
# $ sd.x2 : num 0.25
If you don't want/need a list, just as.list it next:
str(as.list(as.data.frame(result)))
# List of 6
# $ n : num 100
# $ method : chr "tree"
# $ betas.x1: num 1.47
# $ betas.x2: num -2.85
# $ sd.x1 : num 0.55
# $ sd.x2 : num 0.25

Adding a suffix to names when storing results in a loop

I am making some plots in R in a for-loop and would like to store them using a name to describe the function being plotted, but also which data it came from.
So when I have a list of 2 data sets "x" and "y" and the loop has a structure like this:
x = matrix(
c(1,2,4,5,6,7,8,9),
nrow=3,
ncol=2)
y = matrix(
c(20,40,60,80,100,120,140,160,180),
nrow=3,
ncol=2)
data <- list(x,y)
for (i in data){
??? <- boxplot(i)
}
I would like the ??? to be "name" + (i) + "_" separator. In this case the 2 plots would be called "plot_x" and "plot_y".
I tried some stuff with paste("plot", names(i), sep = "_") but I'm not sure if this is what to use, and where and how to use it in this scenario.
We can create an empty list with the length same as that of the 'data' and then store the corresponding output from the for loop by looping over the sequence of 'data'
out <- vector('list', length(data))
for(i in seq_along(data)) {
out[[i]] <- boxplot(data[[i]])
}
str(out)
#List of 2
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 1 1.5 2 3 4 5 5.5 6 6.5 7
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 0.632 3.368 5.088 6.912
# ..$ out : num(0)
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 20 30 40 50 60 80 90 100 110 120
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 21.8 58.2 81.8 118.2
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
If required, set the names of the list elements with the object names
names(out) <- paste0("plot_", c("x", "y"))
It is better not to create multiple objects in the global environment. Instead as showed above, place the objects in a list
akrun is right, you should try to avoid setting names in the global environment. But if you really have to, you can try this,
> y = matrix(c(20,40,60,80,100,120,140,160,180),ncol=1)
> .GlobalEnv[[paste0("plot_","y")]] <- boxplot(y)
> str(plot_y)
List of 6
$ stats: num [1:5, 1] 20 60 100 140 180
$ n : num 9
$ conf : num [1:2, 1] 57.9 142.1
$ out : num(0)
$ group: num(0)
$ names: chr "1"
You can read up on .GlobalEnv by typing in ?.GlobalEnv, into the R command prompt.

NA to replace NULL in list/for loop

I am trying to replace NULL values with NAs in a list pulled from an API, but the lengths are different and therefore can't be replaced.
I have tried using the nullToNA function in the toxboot package (found here), but it won't locate the function in R when I try to call it (I don't know if there have been changes to the package which I can't locate or whether it is because the list is not pulled from a MongoDB). I have also tried all the function call checks here . My code is below. Any help?
library(httr)
library(toxboot)
library(RJSONIO)
library(lubridate)
library(xlsx)
library(reshape2)
resUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3010CO3.M"
comUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3020CO3.M"
indUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3035CO3.M"
apiList <- list(resUrl, comUrl, indUrl)
results <- vector("list", length(apiList))
for(i in length(apiList)){
raw <- GET(url = as.character(apiList[i]))
char <- rawToChar(raw$content)
list <- fromJSON(char)
for (j in length(list$series[[1]]$data)){
if (is.null(list$series[[1]]$data[[j]][[2]])== TRUE)
##nullToNA(list$series[[1]]$data[[j]][[2]])
##list$series[1]$data[[j]][[2]] <- NA
else
next
}
##seriesData <- list$series[[1]]$data
unlistResult <- lapply(list, unlist)
##unlistResult <- lapply(seriesData, unlist)
##unlist2 <- lapply(unlistResult,unlist)
##results[[i]] <- unlistResult
results[[i]] <- unlistResult
}
My hashtags have some of the things that I have tried. But there are a few other methods I haven't tried.
I have seen lapply(list, function(x) ifelse (x == "NULL", NA, x)) but haven't had any luck with that eiter.
Try this:
library(httr)
resUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3010CO3.M"
x <- GET(resUrl)
y <- content(x)
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : NULL
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3
In this first URL, only the first within $series[[1]]$data contained a NULL. BTW: be clear to distinguish between NULL (the literal) and "NULL" (a character string with 4 letters).
Here are some ways (with various data types) to check for NULLs:
is.null(NULL)
# [1] TRUE
length(NULL)
# [1] 0
Simple enough so far, let's try to list with NULLs:
l <- list(NULL, 1)
is.null(l)
# [1] FALSE
sapply(l, is.null)
# [1] TRUE FALSE
length(l)
# [1] 2
lengths(l)
# [1] 0 1
sapply(l, length)
# [1] 0 1
(The "0" lengths indicate NULLs.) I'll use lengths here:
y$series[[1]]$data <- lapply(y$series[[1]]$data, function(z) { z[ lengths(z) == 0 ] <- NA; z; })
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : logi NA
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3

How to get result of package function into a dataframe in r

I am at the learning stage of r.
I am using library(usdm) in r where I am using vifcor(vardata,th=0.4,maxobservations =50000) to find the not multicollinear variables. I need to get the result of vifcor(vardata,th=0.4,maxobservations =50000) into a structured dataframe for further analysis.
Data reading process I am using:
performdata <- read.csv('F:/DGDNDRV_FINAL/OutputTextFiles/data_blk.csv')
vardata <-performdata[,c(names(performdata[5:length(names(performdata))-2])]
Content of the csv file:
pointid grid_code Blocks_line_dst_CHT GrowthCenter_dst_CHT Roads_nationa_dst_CHT Roads_regiona_dst_CHT Settlements_CHT_line_dst_CHT Small_Hat_Bazar_dst_CHT Upazilla_lin_dst_CHT resp
1 6 150 4549.428711 15361.31836 3521.391846 318.9043884 3927.594727 480 1
2 6 127.2792206 4519.557617 15388.68457 3500.24292 342.0526123 3902.883545 480 1
3 2 161.5549469 4484.473145 15391.6377 3436.539063 335.4101868 3844.216553 540 1
My tries:
r<-vifcor(vardata,th=0.2,maxobservations =50000) returns
2 variables from the 6 input variables have collinearity problem:
Roads_regiona_dst_CHT GrowthCenter_dst_CHT
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( Small_Hat_Bazar_dst_CHT ~ Roads_nationa_dst_CHT ): -0.04119076963
max correlation ( Small_Hat_Bazar_dst_CHT ~ Settlements_CHT_line_dst_CHT ): 0.1384278434
---------- VIFs of the remained variables --------
Variables VIF
1 Blocks_line_dst_CHT 1.026743892
2 Roads_nationa_dst_CHT 1.010556752
3 Settlements_CHT_line_dst_CHT 1.038307666
4 Small_Hat_Bazar_dst_CHT 1.026943711
class(r) returns
[1] "VIF"
attr(,"package")
[1] "usdm"
mode(r) returns "S4"
I need Roads_regiona_dst_CHT GrowthCenter_dst_CHT into a dataframe and VIFs of the remained variables into another dataframe!
But nothing worked!
Basically the resturned result is a S4 class and you can extract slots via the # operator:
library(usdm)
example(vifcor) # creates 'v2'
str(v2)
# Formal class 'VIF' [package "usdm"] with 4 slots
# ..# variables: chr [1:10] "Bio1" "Bio2" "Bio3" "Bio4" ...
# ..# excluded : chr [1:5] "Bio5" "Bio10" "Bio7" "Bio6" ...
# ..# corMatrix: num [1:5, 1:5] 1 0.0384 -0.3011 0.0746 0.7102 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:5] "Bio1" "Bio2" "Bio3" "Bio8" ...
# .. .. ..$ : chr [1:5] "Bio1" "Bio2" "Bio3" "Bio8" ...
# ..# results :'data.frame': 5 obs. of 2 variables:
# .. ..$ Variables: Factor w/ 5 levels "Bio1","Bio2",..: 1 2 3 4 5
# .. ..$ VIF : num [1:5] 2.09 1.37 1.25 1.27 2.31
So you can extract the results and the excluded slot now via:
v2#excluded
# [1] "Bio5" "Bio10" "Bio7" "Bio6" "Bio4"
v2#results
# variables VIF
# 1 Bio1 2.086186
# 2 Bio2 1.370264
# 3 Bio3 1.253408
# 4 Bio8 1.267217
# 5 Bio9 2.309479
You should be able to use the below command to get the information in the slot 'results' into a data frame. You can then split the information out into separate data frames using traditional methods
df <- r#results
Note that r#results[1:2,2] would give you the VIF for the first two rows.

Trying to get subset but showing error : (list) object cannot be coerced to type 'double'

I tried to find the subset but it's showing error as :
I am performing Data Envelopment Analysis using Benchmarking Package in R.
Although I saw similar Question were asked before but it didn't help me .
Update :Structure and Summary of Database
I am performing DEA for V6 and V7.
I guess you need
Large.Cap$V1[e_crs$eff > 0.85]
Using a reproducible example from ?dea
library(Benchmarking)
x <- matrix(c(100,200,300,500,100,200,600),ncol=1)
y <- matrix(c(75,100,300,400,25,50,400),ncol=1)
Large.Cap <- data.frame(v1= LETTERS[1:7], v2= 1:7)
e_crs <- dea(x, y, RTS='crs', ORIENTATION='in')
e_crs
#[1] 0.7500 0.5000 1.0000 0.8000 0.2500 0.2500 0.6667
The e_crs object is a list
str(e_crs)
#List of 12
# $ eff : num [1:7] 0.75 0.5 1 0.8 0.25 ...
# $ lambda : num [1:7, 1:7] 0 0 0 0 0 0 0 0 0 0 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:7] "L1" "L2" "L3" "L4" ...
# $ objval : num [1:7] 0.75 0.5 1 0.8 0.25 ...
# $ RTS : chr "crs"
# $ primal : NULL
# $ dual : NULL
# $ ux : NULL
# $ vy : NULL
# $ gamma :function (x)
# $ ORIENTATION: chr "in"
# $ TRANSPOSE : logi FALSE
# $ param : NULL
# - attr(*, "class")= chr "Farrell"
We extract the 'eff' list element from 'e_crs' to subset the 'v1' column in 'Large.Cap' dataset.
droplevels(Large.Cap$v1[e_crs$eff > 0.85])
#[1] C
#Levels: C

Resources