Using R Package NNMAPSlite to get City Environmental vs Mortality Dataset - r
I have several question for those who have worked with R studio. Currently I need to work with NMMAPSlite package. However, I found that there is an issue from the package itself when I wanted to initialise the database connection to remote DB that store the NMMAPS City dataset.
In short, I need help to either
resolve the problem with NMMAPSlite old R package or
where to find the NMMAPS dataset in csv format
BACKGROUND
As a background, I'm using NMMAPSLite packages with intend to reproduce paper of Antonio Gasparrini. Attached at the bottom is the code base I would like to run. It requires:
require(dlnm);
require(NMMAPSlite)
Now the package NMMAPSlite has been deprecated it seems, so I managed to install the dependencies and the package from archive. I will elaborate below on the links required to get the dependencies for NMMAPS and DLNM as well.
PROBLEM
The problems occur when calling initDB() where it says it failed to create remoteDB instance due to invalid object creation. But I suspect, rather, the error comes from the fact the url is not supported. Here is the NMMAPS docs that describes the initDB() function. The db initialisation is necessary to read the city dataset.
The following is the error from R Console when running initDB()
creating directory 'NMMAPS' for local storage
Error in validObject(.Object) :
invalid class “remoteDB” object: object needs a 'url' of type 'http://'
In addition: Warning message:
In grep("^http://", URL, fixed = TRUE, perl = TRUE) :
argument 'perl = TRUE' will be ignored
QUESTIONS
I know this packages NMMAPS are deprecated and too old perhaps, but I really want to reproduce/replicate Antonio Gasparrini's paper: Distributed lag non-linear models for the purpose of my undergraduate thesis project.
Hence,
I wonder if there is anyway to get NMMAPS Dataset for cities environment data vs mortality rate. I visited the official NMMAPS Database but the link for downloading the data is either broken or the server is already down
Or you can also help me to find out if there is equivalent to NMMAPSlite package in R. I just need to download the cities dataset that contains humidity trend, temperatures trend, dewpoint, CO2 trends, Ozone O3 trend, and deaths/mortality rate with respect to time at any particular city for over 2 years. The most important that I need is the mortality rate and Ozone O3 trend.
Or last effort, perhaps do you mind suggesting me similar dataset that is used by his paper? Something where I can derive/analyze time relationship to estimate mortality rate given environmental and air polution information?
APPENDIX
Definition of initDB
baseurl = "http://www.ihapss.jhsph.edu/NMMAPS/v0.1"
function (basedir = "NMMAPS")
{
if (!file.exists(basedir))
message(gettextf("creating directory '%s' for local storage",
basedir))
outcome <- new("remoteDB", url = paste(baseurl, "outcome",
sep = "/"), dir = file.path(basedir, "outcome"), name = "outcome")
exposure <- new("remoteDB", url = paste(baseurl, "exposure",
sep = "/"), dir = file.path(basedir, "exposure"), name = "exposure")
Meta <- new("remoteDB", url = paste(baseurl, "Meta", sep = "/"),
dir = file.path(basedir, "Meta"), name = "Meta")
assign("exposure", exposure, .dbEnv)
assign("outcome", outcome, .dbEnv)
assign("Meta", Meta, .dbEnv)
}
Code to run:
The error comes from line 3
require(dlnm);require(NMMAPSlite)
##############################
# LOAD AND PREPARE THE DATASET
##############################
initDB()
data <- readCity("ny", collapseAge = TRUE)
data <- data[,c("city", "date", "dow", "death", "tmpd", "dptp", "rhum", "o3tmean", "o3mtrend", "cotmean", "comtrend")]
# TEMPERATURE: CONVERSION TO CELSIUS
data$temp <- (data$tmpd-32)*5/9
# POLLUTION: O3 AND CO AT LAG-01
data$o3 <- data$o3tmean + data$o3mtrend
data$co <- data$cotmean + data$comtrend
data$o301 <- filter(data$o3,c(1,1)/2,side=1)
data$co01 <- filter(data$co,c(1,1)/2, side=1)
# DEW POINT TEMPERATURE AT LAG 0-1
data$dp01 <- filter(data$dptp,c(1,1)/2,side=1)
##############################
# CROSSBASIS SPECIFICATION
##############################
# FIXING THE KNOTS AT EQUALLY SPACED VALUES
range <- range(data$temp,na.rm=T)
ktemp <- range [1] + (range [2]-range [1])/5*1:4
# CROSSBASIS MATRIX
ns.basis <- crossbasis(data$temp,varknots=ktemp,cenvalue=21, lagdf=5,maxlag=30)
##############################
# MODEL FIT AND PREDICTION
##############################
ns <- glm(death ~ ns.basis + ns (dp01, df=3 ) + dow + o301 + co01 +
ns(date,df=14*7),family=quasipoisson(), data)
ns.pred <- crosspred(ns.basis,ns,at=-16:33)
##############################
# RESULTS AND PLOTS
##############################
# 3-D PLOT (FIGURE 1)
crossplot(ns.pred,label="Temperature")
# SLICES (FIGURE 2, TOP)
percentiles <- round(quantile(data$temp,c(0.001,0.05,0.95,0.999)), 1)
ns.pred <- crosspred(ns.basis,ns,at=c(percentiles,-16:33))
crossplot(ns.pred,"slices",var=percentiles,lag=c(0,5,15,28), label="Temperature")
# OVERALL EFFECT (FIGURE 2, BELOW)
crossplot(ns.pred,"overall",label="Temperature", title="Overall effect of temperature on mortality
New York 1987–2000" )
# RR AT CHOSEN PERCENTILES VERSUS 21C (AND 95%CI)
ns.pred$allRRfit[as.character(percentiles)]
cbind(ns.pred$allRRlow,ns.pred$allRRhigh)[as.character(percentiles),]
##############################
# THE MOVING AVERAGE MODELS UP TO LAG x (DESCRIBED IN SECTION 5.2)
# CAN BE CREATED BY THE CROSSBASIS FUNCTION INCLUDING THE
# ARGUMENTS lagtype="strata", lagdf=1, maxlag=x
Resources for your context
Distributed lag non-linear models link
Rstudio's NMMAPSlite Package docs pdf download
Rstudio's DNLM Package docs pdf
Duplicate questions from another forum: forum
How to install package from tar/archive: link
Meanwhile, I will contact the author of this package and see if I can get the dataset. Preferable in csv format.
It seems that your code is based on R ver. < 3.0.0. You might find it difficult to reproduce the paper as the current R is typical > 4.0.0. You could try to install the windows version of NMMAPS database from the link given by 'Lil'. But, you will need to install an older version of R (2.9.2).
Or, you could hang on with the latest version of R and make a simple search on GitHub. In case you haven't found the NMMAPS database, you will find how to deal with the database here.
you could try this link http://www.biostat.jhsph.edu/IHAPSS/data/NMMAPS/R/ to download the package. There you have the city-data compressed where you can choose New York manually if initDB does not work.
Related
Error: No tidy method for objects of class dgCMatrix
I'm trying out a package regarding double machine learning (https://rdrr.io/github/yixinsun1216/crossfit/) and in trying to run the main function dml(), I get the following "Error: No tidy method for objects of class dgCMatrix" using example dataframe data. When looking through the documentation (https://rdrr.io/github/yixinsun1216/crossfit/src/R/dml.R), I can't find anything wrong with how tidy() is used. Does anyone have any idea what could be going wrong here? R version 4.2.1 I have already tried installing broom.mixed, although broomextra doesn't seem to be available for my R version, and the same problem occurs. Code used below; install.packages("remotes") remotes::install_github("yixinsun1216/crossfit", force = TRUE) library("remotes") library("crossfit") library("broom.mixed") library("broom") # Effect of temperature and precipitation on corn yield in the presence of # time and locational effects data(corn_yield) library(magrittr) dml_yield <- "logcornyield ~ lower + higher + prec_lo + prec_hi | year + fips" %>% as.formula() %>% dml(corn_yield, "linear", n = 5, ml = "lasso", poly_degree = 3, score = "finite")
FW function in R is not working for my dataset
Hi I am trying to use the Finlay Wilkinson regression function FW in R for yield stability analysis. I tried running this, but am getting the below error Error in FW(y = yield, VAR = genotype, ENV = location, method = "OLS") : object 'genotype' not found My data file has the headers 'yield', 'genotype' and 'location' on it. Not sure what I am doing wrong here, since I just followed the syntax from the original publication. Please help! My code below: crop <- read.csv("barleygt6.csv", header = TRUE) summary(crop) install.packages("devtools") library(devtools) install.packages("remotes") remotes::install_github("lian0090/FW") library(FW) lm1=FW(y= yield, VAR= genotype, ENV= location, method="OLS")
Attempting to save intermediate states when running rmh yields error
I am trying to simulate a multitype point process, saving the intermediate states every 1000 steps in rmhcontrol. However, I can't simulate whenever I specify nsave. As an example, whenever I run the code block below, I get the error: Error in factor(Cmprop, levels = Ctypes) : object 'Cmprop' not found The code is: library(spatstat) library(optimbase) num_marks <- length(unique(marks(amacrine))) iradii <- .1*ones(nrow=num_marks,ncol=num_marks) MSH1 <- MultiStraussHard(iradii=iradii) x <- ppm(amacrine, trend =~polynom(x,y,3), interaction=MSH1) control <- rmhcontrol(nsave=1e3) rmh(x,control=control) Thanks for the help!
This is a bug in spatstat versions 1.62-1 and 1.62-2. It has already been fixed in the current development version 1.62-2.006 which you can download from the GitHub repository for spatstat. The next public release on CRAN will be at the end of January 2020. Please note: the code in the original question generates an error because ones has formal arguments nx, ny rather than nrow, ncol. The following code tests the bug: library(spatstat) nm <- length(levels(marks(amacrine))) ir <- matrix(0.1, nm, nm) MSH1 <- MultiStraussHard(iradii=ir) fit <- ppm(amacrine ~ polynom(x,y,3), MSH1) rmh(fit, nsave=1e3, verbose=FALSE)
Factor Analysis R - different results?
I've run a factor analysis on a dataset in R, using the psych package. Up until about 1 month ago, it has spit out the same output, but recently, it's different. When I try running it on older versions of the psych package, it also churns out a different output. I'm at a loss for diagnosing this issue and trying to get the original output. I don't see how it could be a coding issue since the results were generated in the past -- I'm just struggling getting the same output now.. Below is the condensed version of code. To download psych package: # check if package "psych" is installed. if not, remind the user to install if(("psych" %in% rownames(installed.packages())) == FALSE){ stop("Please install package 'psych' by running 'install.package('psych')'") } library(psych) # this package is needed for factor analysis To run the FA: n_factor=3 # the variables defined below are used to record the iterative process LIST_min_in_max_loading_vector <- NULL LIST_drop_variable <- NULL min_in_max_loading_vector=0 flag=1 while(TRUE){ cat("The ",flag," Step is done. \n") fa_result<- fa(dat,nfactors=n_factor,rotate = "varimax", cor='poly') max_loading_in_each_row <- sapply(1:dim(fa_result$loadings)[1],function(j) max(abs(fa_result$loadings[j,]))) variable_names=row.names(fa_result$loadings) min_in_max_loading_vector <- min(max_loading_in_each_row) # Please note that here we have a cut-off value 0.5. # This means that the minimum of the absolute values of all the loadings must be bigger than 0.5 # it's also the stop condition of our iterative algorithm if(min_in_max_loading_vector>0.5){ break } min_variable <- variable_names[which(max_loading_in_each_row==min_in_max_loading_vector)] cat("The minimum of the maximum absolute loadings is:",min_in_max_loading_vector,"\n") drop_index <- which(row.names(fa_result$loadings)==min_variable) cat(min_variable," is droped in this round.\n\n") dat <- dat[,-drop_index] #record the process of dropping LIST_min_in_max_loading_vector[flag]=min_in_max_loading_vector LIST_drop_variable[flag] <- min_variable print(fa_result$loadings) flag=flag+1 } Can anyone potentially troubleshoot this?
R: Autokrige.cv function in automap package generates NaNs
I’m fairly new to R and I am trying to make interpolations of temperature measurements that where gathered from different station across the Netherlands. I have data for about 35 stations that make measurements every 10 minutes covering a timespan of about two weeks. Accordingly, I figured it would be best to make a loop that takes care of this. To see how well the interpolation technique works I want to do a cross validation for every timestamp. In order to do this I used the Autokrige function from the automap package, and next I used the compare.cv function from the automap package in order to get an overview of the most important statistics for all time stamps. Besides that, I made sure the cross validation is only done if at least 25 stations registred meassurements. The problem however is, that my code as described below works most of the time but gives the following warnings in 4 cases: 1. In sqrt(ret[[var.name]]) : NaNs produced 2. In sqrt(ret[[var.name]]) : NaNs produced 3. In sqrt(ret[[var.name]]) : NaNs produced 4. In sqrt(ret[[var.name]]) : NaNs produced When I try to use the compare.cv command for the total list including all the cross validations it gives me the following error: "Error in quantile.default(as.numeric(x), c(0.25, 0.75), na.rm = na.rm, : missing values and NaN's not allowed if 'na.rm' is FALSE" Im wondering what causes the Autokrige function to generate NaNs in the cross validation, and more importantly how I can remove them from the results.cv so that I can use the compare.cv function? rm(list=ls()) # load packages require(sp) require(gstat) require(ggmap) require(automap) require(ggplot2) #load data (download link provided below) load("download path") https://www.dropbox.com/s/qmi3loub29e55io/meassurements_aug.RDS?dl=0 # make data spatial and assign spatial coordinate system coordinates(meassurements) = ~x+y proj4string(meassurements) <- CRS("+init=epsg:4326") meassurements_df <- as.data.frame(meassurements) # loop for cross validation timestamp <- meassurements$import_log_id results.cv=list() for (i in unique(timestamp)) { x = meassurements_df[which(meassurements$import_log_id == i), ] if(sum(!is.na(x$temperature)) > 25){ results.cv[[paste0(i)]] = autoKrige.cv (temperature ~ 1, meassurements[which(meassurements$import_log_id == i & !is.na(meassurements$temperature)), ]) } } # calculate key statistics (RMSE MAE etc) compare.cv(results.cv) Thanks!
I came across the same problem and solved it with the help of remove.duplicates() of package sp on the SpatialPointDataFrame used for kriging. Prior to that I calculated the mean of the relevant variables in the DataFrame. SPDF#data <- SPDF#data %>% group_by(varx,vary,varz) %>% mutate_at(vars(one_of(relevant_var)),mean,na.rm=TRUE) %>% ungroup() SPDF <- SPDF %>% remove.duplicates() At the time I was encountering the same problem the Dropbox link above was not working anymore, so I could not check this specific example.