Question about getting counts in the R survey package - r

I'm using the 2018 CBECS data set from the Energy Information Administration (available here: https://www.eia.gov/consumption/commercial/data/2018/xls/2018_public_use_data.csv) and I've set up the sample design according to their user guide. I'm noticing a discrepancy when I use the svyby function as opposed to just the svytotal function and I'm hoping somebody can explain what it is I'm seeing and/or what I'm doing wrong.
Here is the set up for the sample design:
library(survey)
library(spatstat)
library(tidyverse)
cbecs2018 <- read_csv(paste0(getwd(), "/2018_public_use_data.csv"))
samp_wts <- cbecs2018$FINALWT
rep_wts <- cbecs2018[, grepl("^FINALWT", names(cbecs2018))]
rep_wts$FINALWT <- NULL
samp_design <- svrepdesign(weights=samp_wts, repweights=rep_wts,
type="JK2", mse=TRUE, data=cbecs2018)
sqftc <- factor(cbecs2018$SQFTC) #this is categorical variable classifying buildings by size
When I run svytotal to get a count of buildings by each category in sqftc, I get the output below, which is consistent with what EIA has:
svytotal(~sqftc, samp_design)
total SE
sqftc2 2836939.2 138709.13
sqftc3 1358439.0 78632.96
sqftc4 966092.8 55503.86
sqftc5 396595.4 23727.58
sqftc6 218416.8 11718.72
sqftc7 93085.9 5179.07
sqftc8 39865.5 1993.62
sqftc9 6664.8 620.07
sqftc10 2111.8 255.25
However, when I try to break it out by census region, I get completely different counts by category. For example, instead of showing 2,836,939 buildings in the second sqftc group, the table below makes it look like there are 3,605,529 buildings in the group.
x <- svyby(~sqftc, ~region, samp_design, svytotal)
> sum(x$sqftc2)
[1] 3605529
print(x)
region sqftc2 sqftc3 sqftc4 sqftc5 sqftc6 sqftc7 sqftc8 sqftc9 sqftc10 se1 se2 se3 se4 se5 se6 se7 se8
1 1 679858.4 382470.2 466330.8 383649.9 638936.3 777312.6 918361.9 220786.7 97105.4 70972.33 58987.22 57377.8 41027.49 79224.73 100678.28 104811.7 26387.60
2 2 1142179.1 634697.1 752421.8 762969.8 929830.8 1107860.2 1382698.4 369059.3 149810.3 131036.12 88954.07 102800.3 120901.81 88769.62 118328.83 146119.8 56056.48
3 3 859228.7 456788.7 521518.6 540952.1 779310.4 912930.2 1062321.1 285638.1 100881.7 86845.98 50065.79 56198.4 53630.90 66850.76 68490.26 87545.5 34443.43
4 4 924262.5 499895.4 541658.9 555604.6 820252.5 927657.6 1205995.5 298595.7 96787.1 96106.38 51019.41 58771.1 58782.50 60113.72 85934.54 134417.5 41790.27
se9
1 14502.07
2 39303.04
3 21410.55
4 13725.39
I feel like whatever I'm doing wrong is probably pretty straightforward, but any pointers would be greatly appreciated.

maybe review your minimal reproducible example? :-) when i run this, the numbers match
library(survey)
cbecs2018 <- read.csv("https://www.eia.gov/consumption/commercial/data/2018/xls/2018_public_use_data.csv")
samp_design <-
svrepdesign(
weights = ~ FINALWT ,
repweights = "^FINALWT[0-9]" ,
type = 'JK2' ,
mse = TRUE ,
data = cbecs2018
)
samp_design <- update( samp_design , SQFTC = factor( SQFTC ) )
svytotal(~SQFTC, samp_design)
svyby(~SQFTC,~REGION,samp_design,svytotal)

Related

Recursive / Expanding Window forecasts

I am having a small issue with my Rstudio code. I will try to replicate my code but unfortunately there is no easy data for me to show. This is about the package forecast. What I am looking for is somehwat simpler for what is in the manual. But unfortunately, I am not able to work round it.
so the issue is with an expanding window forecast. So I have a dependent variable Y and 3 regressors (X). I am trying to build a recursive one steap ahead forecast for each X.
Here is my code.
library(forecast)
library(zoo)
library(timeDate)
library(xts)
## Load data
data = Dataset[,2:ncol(Dataset)]
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
data = xts(data, order.by=tt)
##########################################################################
RECFORECAST=function (Y,X,h,window){
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
datas= cbind(Y,X)
newfcast= matrix(0,nrow(datas),h)
for (k in 1:nrow(datas)){
sample =datas[1:(window+k-1),]
# print(sample)
v= window+k
# print(v)
# fit = Arima(sample[,1], order=c(0,0,0),xreg=sample[,2])
fit = lm(sample[,1]~sample[,2], data = sample)
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
print(fcast)
# print(fcast)
# newfcast[k+window+1,]=fcast
}
print(newfcast)
return(newfcast)
}
## Code to send the loop into forecasts
StoreMatrix = data$growth ## This is the first column data[,1]
for (i in 2:4)
{
try({
X=data[,i]
Y=data[,1]
RecModel=RECFORECAST(Y,X,h=1,window=60) ##Here the initial window is 60 obs
StoreMatrix=cbind(StoreMatrix,RecModel)
print(StoreMatrix)
}, silent=T)
}
The bits # were different ways I tried to crosscheck my data and they may not be useful. I have tried so many things but I don't seem to be able to get my head through it. At the end I want to have a matrix (StoreMatrix) with the first variable being the realization, and each of the columns with the corresponding 1 step ahead forecast.
The main lines where there seems to be an issue are these ones:
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
Note sure how to solve this. Thank you very much.

How to visulize the convolution layer and feature layer in mxnet after cnn was finished trained?

I want to plot or visualize the result of each layers out from a trained CNN with mxnet in R. Like w´those abstract art from what a nn's each layer can see.
But I don't know how. Please somebody help me. One way I can think out is to put the weights and bias back to every step and plot the step out. But when I try to put model$arg.params$convolution0_weight back to mx.symbol.Convolution(), I get
Error in mx.varg.symbol.Convolution(list(...)) :
./base.h:291: Unsupported parameter type object type for argument weight, expect integer, logical, or string.
Can anyone help me?
I thought out one way, but encounter a difficulty at one step. Here is what I did.
I found all the trained cnn's parameters inmodel$arg.params , and to compute with parameters we can use mx.nd... founctions as bellow:
`#convolution 1_result
conv1_result<- mxnet::mx.nd.Convolution(data=mx.nd.array(train_array),weight=model$arg.params$convolution0_weight,bias=model$arg.params$convolution0_bias,kernel=c(8,8),num_filter = 50)
str(conv1_result)
tanh1_result<-mx.nd.Activation(data= conv1_result, act_type = "sigmoid")
pool1_result <- mx.nd.Pooling(data = tanh1_result, pool_type = "avg", kernel = c(4,4), stride = c(4,4))
conv2 result
conv2_result<- mxnet::mx.nd.Convolution(data=pool1_result,weight=model$arg.params$convolution1_weight,bias=model$arg.params$convolution1_bias,kernel=c(5,5),num_filter = 50)
tanh2_result<-mx.nd.Activation(data= conv1_result, act_type = "sigmoid")
pool2_result <- mx.nd.Pooling(data = tanh1_result, pool_type = "avg", kernel = c(4,4), stride = c(4,4))
1st fully connected layer result
flat_result <- mx.nd.flatten(data = pool2_result)
fcl_1_result <- mx.nd.FullyConnected(data = flat_result,weight = model$arg.params$fullyconnected0_weight,bias = model$arg.params$fullyconnected0_bias, num_hidden = 500)
tanh_3_result <- mx.nd.Activation(data = fcl_1_result, act_type = "tanh")
2nd fully connected layer result
fcl_2_result <- mx.nd.FullyConnected(data = tanh_3,weight = model$arg.params$fullyconnected1_weight,bias = model$arg.params$fullyconnected1_bias, num_hidden =100)`
but when I came to mx.nd.FullyConnected() step , I encountered not sufficient memory(i have 16 GB RAM) and R crashed.
So, does anyone know how to batch_size the input data in
mx.nd.FullyConnected(), or any method to make mx.nd.FullyConnected() run successfully as mx.model.FeedForward.create()
did?
Here is the code that can help you to achieve what you want. The code below displays activations of 2 convolution layers of LeNet. The code gets as an input MNIST dataset, which is 28x28 grayscale images (downloaded automatically), and produces images as activations.
You can grab outputs from executor. To see the list of available outputs use names(executor$ref.outputs)
The result of each output is available as a matrix with values in [-1; 1] range. The dimensions of the matrix depends on parameters of the layer. The code use these matrices to display as greyscaled images where -1 is white pixel, 1 - black pixel. (most of the code is taken from https://github.com/apache/incubator-mxnet/issues/1152 and massaged a little bit)
The code is a self sufficient to run, but I have noticed that if I build the model second time in the same R session, the names of ouputs get different indices, and later the code fails because the expected names of outputs are hard coded. So if you decide to create a model more than once, you will need to restart R session.
Hope it helps and you can adjust this example to your case.
library(mxnet)
download.file('https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/data/mnist_csv.zip', destfile = 'mnist_csv.zip')
unzip('mnist_csv.zip', exdir = '.')
train <- read.csv('train.csv', header=TRUE)
data.x <- train[,-1]
data.x <- data.x/255
data.y <- train[,1]
val_ind = 1:100
train.x <- data.x[-val_ind,]
train.x <- t(data.matrix(train.x))
train.y <- data.y[-val_ind]
val.x <- data.x[val_ind,]
val.x <- t(data.matrix(val.x))
val.y <- data.y[val_ind]
train.array <- train.x
dim(train.array) <- c(28, 28, 1, ncol(train.x))
val.array <- val.x
dim(val.array) <- c(28, 28, 1, ncol(val.x))
# input layer
data <- mx.symbol.Variable('data')
# first convolutional layer
convLayer1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=30)
convAct1 <- mx.symbol.Activation(data=convLayer1, act_type="tanh")
poolLayer1 <- mx.symbol.Pooling(data=convAct1, pool_type="max", kernel=c(2,2), stride=c(2,2))
# second convolutional layer
convLayer2 <- mx.symbol.Convolution(data=poolLayer1, kernel=c(5,5), num_filter=60)
convAct2 <- mx.symbol.Activation(data=convLayer2, act_type="tanh")
poolLayer2 <- mx.symbol.Pooling(data=convAct2, pool_type="max",
kernel=c(2,2), stride=c(2,2))
# big hidden layer
flattenData <- mx.symbol.Flatten(data=poolLayer2)
hiddenLayer <- mx.symbol.FullyConnected(flattenData, num_hidden=500)
hiddenAct <- mx.symbol.Activation(hiddenLayer, act_type="tanh")
# softmax output layer
outLayer <- mx.symbol.FullyConnected(hiddenAct, num_hidden=10)
LeNet1 <- mx.symbol.SoftmaxOutput(outLayer)
# Group some output layers for visual analysis
out <- mx.symbol.Group(c(convAct1, poolLayer1, convAct2, poolLayer2, LeNet1))
# Create an executor
executor <- mx.simple.bind(symbol=out, data=dim(val.array), ctx=mx.cpu())
# Prepare for training the model
mx.set.seed(0)
# Set a logger to keep track of callback data
logger <- mx.metric.logger$new()
# Using cpu by default, but set gpu if your machine has a supported one
devices=mx.cpu(0)
# Train model
model <- mx.model.FeedForward.create(LeNet1, X=train.array, y=train.y,
eval.data=list(data=val.array, label=val.y),
ctx=devices,
num.round=1,
array.batch.size=100,
learning.rate=0.05,
momentum=0.9,
wd=0.00001,
eval.metric=mx.metric.accuracy,
epoch.end.callback=mx.callback.log.train.metric(100, logger))
# Update parameters
mx.exec.update.arg.arrays(executor, model$arg.params, match.name=TRUE)
mx.exec.update.aux.arrays(executor, model$aux.params, match.name=TRUE)
# Select data to use
mx.exec.update.arg.arrays(executor, list(data=mx.nd.array(val.array)), match.name=TRUE)
# Do a forward pass with the current parameters and data
mx.exec.forward(executor, is.train=FALSE)
# List of outputs available.
names(executor$ref.outputs)
# Plot the filters of a sample from validation set
sample_index <- 99 # sample number in validation set. Change it to if you want to see other samples
activation0_filter_count <- 30 # number of filters of the "convLayer1" layer
par(mfrow=c(6,5), mar=c(0.1,0.1,0.1,0.1)) # number of rows x columns in output
dim(executor$ref.outputs$activation0_output)
for (i in 1:activation0_filter_count) {
outputData <- as.array(executor$ref.outputs$activation0_output)[,,i,sample_index]
image(outputData,
xaxt='n', yaxt='n',
col=gray(seq(1,0,-0.1)))
}
activation1_filter_count <- 60 # number of filters of the "convLayer2" layer
dim(executor$ref.outputs$activation1_output)
par(mfrow=c(6,10), mar=c(0.1,0.1,0.1,0.1)) # number of rows x columns in output
for (i in 1:activation1_filter_count) {
outputData <- as.array(executor$ref.outputs$activation1_output)[,,i,sample_index]
image(outputData,
xaxt='n', yaxt='n',
col=gray(seq(1,0,-0.1)))
}
As a result you should see the following images for a validation sample #2 (use RStudio left and right arrows to navigate between them).

InR , multiple age pyramids using pyramidlattice {Giza}

I am trying to compare more that one age pyramids in only one frame. I was looking for some example using traditional packages like ggplot but, I did not finded it.
Do you have some option?
The unique able to compare is Package ‘Giza’. This package has some difficulty for find others examples... the only I can find is the following
data(EduDat)
data(dictionary)
# select the desired year, country, and education-scenario from EduDat
Years <- c(2010,2030,2050)
Countries <- c("Pakistan","Bangladesh","Indonesia")
Scenarios <- c("GET")
# the male-column needs to be flipped
iEduDat <- subset(EduDat,match(cc,getcode(Countries,dictionary)) & match(yr,Years) & match(scen2,Scenarios))
iEduDat$value[iEduDat$sex == "Male"] <- (-1) * iEduDat$value[iEduDat$sex == "Male"]
agegrs <- paste(seq(15,100,5),seq(19,104,5),sep="-")
agegrs[length(agegrs)] <- "100+"
lattice.options(axis.padding = list(numeric=0))
x <- pyramidlattice(agegr ~ value| factor(sex,levels=c("Male","Female")) *
factor(cc,levels=getcode(Countries,dictionary),labels=Countries) *
factor(yr,levels=Years,labels=Years),
groups=variable,data=iEduDat,layout=c(length(Countries)*2,length(Years)),
type="l",lwd=1,xlab="Population",ylab="Age",main="Population by Highest Level of Education",
strip=TRUE,par.settings = simpleTheme(lwd=3,col=colors()[c(35,76,613,28)]),box.width=1,
scales=list(alternating=3,tick.number=5,relation="same",y=list(at=1:length(4:21),labels=agegrs)),
auto.key=list(text=c("No-edu","Primary","Secondary","Tertiary"),reverse.row=TRUE,
points=FALSE,rectangles=TRUE,space="right",columns=1,border=FALSE,
title="ED-Level",cex.title=1.1,lines.title=2.5,padding.text=1,background="white"),
prepanel=prepanel.default.bwplot2,panel=function(...){
panel.grid(h=length(agegrs),v=5,col="lightgrey",lty=3)
panel.pyramid(...)
})
x # with strips for every factor over each panel
# useOuterStrips(x) # with outer strips, but only in case of two factors
useOuterStrips2(x) # with outer strips in case of three factors
Now, I would like to do some modications... for example, I would like to change the colors between the years panels. The most important modification that I want is axis x limits. I am trying to do something like this(scale parameter)
scales= list(x = list( relation = "free" , limits = list( c(-85000,85000) , c(-260000,260000) , c(-260000,260000)) ) , y = list(relation="same", at=1:length(agegrs)) ).
But this results in a error:
Error in abs(x$x.limits) : non-numeric argument to mathematical function
In addition: Warning message:
In valid.charjust(just) : reached elapsed time limit

SMA using R & TTR Package

Afternoon! I'm just starting out with R and learning about data frames, packages, etc... read a lot of the messages here but couldn't find an answer.
I have a table I'm accessing with R that has the following fields:
[Symbol],[Date],[Open],[High],[Low],[Close],[Volume]
And, I'm calculating SMAs on the close prices:
sqlQuery <- "Select * from [dbo].[Stock_Data]"
conn <- odbcDriverConnect(connectionString)
dfSMA <- sqlQuery(conn, sqlQuery)
sma20 <- SMA(dfSMA$Close, n = 20)
dfSMA["SMA20"] <- sma20
When I look at the output, it appears to be calculating the SMA without any regard for what the symbol is. I haven't tried to replicate the calculation, but I would suspect it's just doing it by 20 moving rows, regardless of date/symbol.
How do I restrict the calculation to a given symbol?
Any help is appreciated - just need to be pointed in the right direction.
Thanks
You're far more likely to get answers if you provide reproducible examples. First, let's replicate your data:
library(quantmod)
symbols <- c("GS", "MS")
getSymbols(symbols)
# Create example data:
dGS <- data.frame("Symbol" = "GS", "Date" = index(GS), coredata(OHLCV(GS)))
names(dGS) <- str_replace(names(dGS), "GS\\.", "")
dMS <- data.frame("Symbol" = "MS", "Date" = index(MS), coredata(OHLCV(MS)))
names(dMS) <- str_replace(names(dMS), "MS\\.", "")
dfSMA <- rbind(dGS, dMS)
> head(dfSMA)
Symbol Date Open High Low Close Volume Adjusted
1 GS 2007-01-03 200.60 203.32 197.82 200.72 6494900 178.6391
2 GS 2007-01-04 200.22 200.67 198.07 198.85 6460200 176.9748
3 GS 2007-01-05 198.43 200.00 197.90 199.05 5892900 177.1528
4 GS 2007-01-08 199.05 203.95 198.10 203.73 7851000 181.3180
5 GS 2007-01-09 203.54 204.90 202.00 204.08 7147100 181.6295
6 GS 2007-01-10 203.40 208.44 201.50 208.11 8025700 185.2161
What you want to do is subset your long data object, and then apply technical indicators on each symbol in isolation. Here is one approach to guide you toward acheiving your desired result.
You could do this using a list, and build the indicators on xts data objects for each symbol, not on a data.frame like you do in your example (You can apply the TTR functions to columns in a data.frame but it is ugly -- work with xts objects is much more ideal). This is template for how you could do it. The final output l.data should be intuitive to work with. Keep each symbol in a separate "Container" (element of the list) rather than combining all the symbols in one data.frame which isn't easy to work with.
make_xts_from_long_df <- function(x) {
# Subset the symbol you desire
res <- dfSMA[dfSMA$Symbol == x, ]
#Create xts, then allow easy merge of technical indicators
x_res <- xts(OHLCV(res), order.by = res$Date)
merge(x_res, SMA(Cl(x_res), n = 20))
}
l.data <- setNames(lapply(symbols, make_xts_from_long_df), symbols)

R Programming Random Stock Pick

I stuck in a problem with R Programming.
My aim is to randomly select 2 stocks out of the Swiss Market Index, which contains of 30 stocks.
Until now I solved the random pick of the 2 stocks with the following code:
SMI_components <- cbind("ABB (ABBN.VX)", "ADECCO (ADEN.VX)", "ACTELION (ATLN.VX)", "JULIUS BAER GRP (BAER.VX)", "RICHEMONT (CFR.VX)", "CREDIT SUISSE (CSGN.VX)", "GEBERIT (GEBN.VX)", "GIVAUDAN (GIVN.VX)", "HOLCIM (HOLN.VX)", "NESTLE (NESN.VX)", "NOVARTIS (NOVN.VX)", "TRANSOCEAN (RIGN.VX)", "ROCHE HOLDING (ROG.VX)", "SWISSCOM (SCMN.VX)", "SGS (SGSN.VX)", "SWISS RE (SREN.VX)", "SYNGENTA (SYNN.VX)", "UBS (UBSG.VX)", "SWATCH GROUP (UHR.VX)", "ZURICH INSURANCE GROUP (ZURN.VX)")
for(i in 1:1){
print(sample(SMI_components, 2))
}
How do I continue my code, if I want to download the historical data from these two random picked stocks?
For example, the random selection is:
"NOVARTIS (NOVN.VX)" and "ZURICH INSURANCE GROUP (ZURN.VX)"
how to continue that ...
SMI_NOVARTIS <- yahooSeries ("NOVN.VX", from = "2005-01-01", to = "2015-07-30", frequency = "daily")
SMI_ZURICH <- yahooSeries ("ZURN.VX", from = "2005-01-01", to = "2015-07-30", frequency = "daily")
I would really appreciate your help
Regards
print outputs to the console but doesn't store anything. So the first thing to do is assign the output of sample into a variable.
my_picks <- sample(SMI_components, 2)
Extract ticker symbol between parens (courtesy the comment below):
my_picks <- sub(".*\\((.*)\\).*", "\\1", my_picks)
Then you can use lapply, to call a function (yahooSeries) for each value in my_picks.
series_list <- lapply(my_picks, yahooSeries, from = "2005-01-01", to = "2015-07-30", frequency = "daily")
Then you'll get the output in a list. series_list[[1]] will have the output of yahooSeries for the first value of my_picks, and series_list[[2]] for the second
Lastly, not sure why you bothered with the single-iteration for loop, but you don't need that

Resources