R noob: running a simple 7-variable CSV through a bvarsv model - r

My simple code is yielding: Error in dimnames<-.data.frame(*tmp*, value = list(n)) :
invalid 'dimnames' given for data frame
Any help appreciated
library(bvarsv)
library(tidyverse)
library(janitor)
library(readxl)
set.seed(1)
test = read_excel("Desktop/test.csv")
bvar.sv.tvp(test, p = 2, tau = 40, nf = 10, pdrift = TRUE, nrep = 50000,
nburn = 5000, thinfac = 10, itprint = 10000, save.parameters = TRUE,
k_B = 4, k_A = 4, k_sig = 1, k_Q = 0.01, k_S = 0.1, k_W = 0.01,
pQ = NULL, pW = NULL, pS = NULL)

Edit:
The documentation specifies:
Y - Matrix of data, where rows represent time and columns are
different variables.Y must have at least two columns.
So when you read in your dataset, time will be a column at first, meaning you have to transform the dataframe that the time column will be your rownames. (Maybe you also want to use the lubridate package to parse your time column first).
tst <- read.csv("Desktop/tst.csv", TRUE, ",")
# tst_df <- data.frame(tst) # Should not be necassary
rownames(tst_df) <- tst_df[,1]
tst_df[,1] <- NULL
bvar.sv.tvp(tst_df, ...)
You can also the usmacro dataset as an example to see how the input data of bvar.sv.tvp() should look like.
data(usmacro)
print(usmacro)
Original Post:
I don't know how your csv looks like. So it is hard to tell what the actual issue is.
But you can try wrapping your data in "as.data.frame(test)" like this:
bvar.sv.tvp(as.data.frame(test), p = 2, tau = 40, nf = 10, pdrift = TRUE, nrep = 50000,
nburn = 5000, thinfac = 10, itprint = 10000, save.parameters = TRUE,
k_B = 4, k_A = 4, k_sig = 1, k_Q = 0.01, k_S = 0.1, k_W = 0.01,
pQ = NULL, pW = NULL, pS = NULL)

Related

Prp plot - Coloring positive and negative values differently

I am fitting regression trees via the function rpart(). Given my data, I am going to have both positive and negative estimates in nodes. Is there a way to color them differently?
In particular, what I would like to have is a tree whose nodes are shaded in blue for negative values and in red for positive values, where darker colors signal stronger absolute values.
I attach a minimal reproducible example.
library(rpart)
library(rpart.plot)
# Simulating data.
set.seed(1986)
X = matrix(rnorm(2000, 0, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
# Fitting regression tree.
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# Plotting.
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
# Repeating, with median(y) != 0.
X = matrix(rnorm(2000, 5, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# HERE I NEED HELP!
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
As far as I understood, thanks to the box.palette option, I obtained the result I need in the first setting because median(y) is close to zero.
Indeed, in the second setting I am unhappy: I get blue shades for values less than median(y), and red shades for those above such value. How can I impose zero as the threshold for the two colors?
To be more specific, I would like a command that automatically ensures the two-colors system in any tree.
Ook, I answered my own question. The solution is actually quite simple: if the box.palette option is a two-color diverging palette (as in my example), we can use pal.thresh to set the threshold we want. In my case:
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3,
pal.thresh = 0) # HERE THE SOLUTION!
Even if this is probably bad for me, I will leave here the answer for future users and close the question, rather than deleting it.

NbClust with one cluster

I have a dataset that I split by specific parameters and run NbClust function to calculate optimal number of clusters. Every once in a while there is only one cluster and NbClust breaks with
Error in sample.int(m, k) : cannot take a sample larger than the population when 'replace = FALSE'
Is there a generic workaround? Thanks!!
Data are attached:
df1 = structure(c(-0.01400863, -0.01400863, 0.00712136, 0.01377456,
0.00712136, 0, 0, -0.00636396, 0, 0.00636396), .Dim = c(5L, 2L
))
nb = NbClust(data = df1, diss = NULL, distance = "euclidean",
min.nc = 2, max.nc = 10, method = "kmeans",alphaBeale = 0)
EDIT (8/15/2018). I have found a solution to the problem. It simply bypasses the NbClust checks if nrow(unique(df1)) < 5 since max.nc must be at least min.nc + 2.
n_clusters = nrow(unique(df1))
if (nrow(unique(df1)) > 4) {
nb = NbClust(data = df1, diss = NULL, distance = "euclidean",
min.nc = 2, max.nc = min(nrow(unique(df1)),10), method = "kmeans",alphaBeale = 0)
n_clusters = max(unlist(nb[4]))
print(n_clusters)
}
clusters = kmeans(df1,n_clusters)

MXNet: sequence length in LSTM in a non-sequence data (R)

My data are not timeseries, but it has sequential properties.
Consider one sample:
data1 = matrix(rnorm(10, 0, 1), nrow = 1)
label1 = rnorm(1, 0, 1)
label1 is a function of the data1, but the data matrix is not a timeseries. I suppose that label is a function of not just one data sample, but more older samples, which are naturally ordered in time (not sampled randomly), in other words, data samples are dependent with one another.
I have a batch of examples, say, 16.
With that I want to understand how I can design an RNN/LSTM model which will memorize all 16 examples from the batch to construct the internal state. I am especially confused with the seq_len parameter, which as I understand is specifically about the length of the timeseries used as an input to a network, which is not case.
Now this piece of code (taken from a timeseries example) only confuses me because I don't see how my task fits in.
rm(symbol)
symbol <- rnn.graph.unroll(seq_len = 5,
num_rnn_layer = 1,
num_hidden = 50,
input_size = NULL,
num_embed = NULL,
num_decode = 1,
masking = F,
loss_output = "linear",
dropout = 0.2,
ignore_label = -1,
cell_type = "lstm",
output_last_state = F,
config = "seq-to-one")
graph.viz(symbol, type = "graph", direction = "LR",
graph.height.px = 600, graph.width.px = 800)
train.data <- mx.io.arrayiter(
data = matrix(rnorm(100, 0, 1), ncol = 20)
, label = rnorm(20, 0, 1)
, batch.size = 20
, shuffle = F
)
Sure, you can treat them as time steps, and apply LSTM. Also check out this example: https://github.com/apache/incubator-mxnet/tree/master/example/multivariate_time_series as it might be relevant for your case.

Interpreting Independent Components using FastICA in R

I've recently conducted an Independent Component Analysis using fastICA in R. I have obtained a matrix of independent components. How do I figure out what variables these components are made of? For example, component #4 (v4) has a value for each of the n = 400 observations. What is this? Which variables were used to create this v4?
This is the code that was used: ica_new<-fastICA (final,n.comp = 40, alg.typ = "parallel", fun = "logcosh", alpha = 1,
method = "C", row.norm = FALSE, maxit = 200, tol = 0.0001, verbose = TRUE)
I found this code for PCA: ## varimax with normalize = TRUE is the default
fa <- factanal( ~., 2, data = swiss)
varimax(loadings(fa), normalize = FALSE)
promax(loadings(fa))
EDIT: So thanks to #Hack-R I think the code I will need to use would look something like this ica_new<-fastICA (final,n.comp = 40, alg.typ = "parallel", fun = "logcosh", alpha = 1,
method = "C", row.norm = FALSE, maxit = 200, tol = 0.0001, verbose = TRUE, firstEig = 1, lastEig = nrow(final))
`
Is this accurate? EDIT: Doesn't Run

Object 'w' not found error in factor analysis with package 'psych'

A lot of questions about factor analysis on these pages. I have browsed through them but nothing seems similar, so hopefully someone can help.
I am running a factor analysis on some survey questions where I expect some latent constructs to emerge. I am running either principal axes or minres and get the same problem, as detailed below.
My dataset contains many discrete variables and a reasonable amount of missing variables coded as NA, but even after removing all NA the problem persists:
minres.out <- factor.minres(r = res, nfactors = 5, residuals=F, rotate = "varimax", n.obs=NA, scores=F, SMC=T, missing=F, min.err=0.001, ,max.iter=50, symmetric=T,warnings=T,fm="minres")
minres.out
minres.out2 <- fa(r = res, nfactors = 5, residuals=F, rotate = "oblimin", n.obs=NA, scores=F, SMC=T, missing=F, impute="median",min.err=0.001, ,max.iter=50, symmetric=T,warnings=T,fm="minres", alpha=0.1, p=0.05,oblique.scores=F, use="pairwise")
minres.out2
The first one uses the deprecated version and gives me a warning, but it works. The second one gives me the following error:
Error in factor.scores(x.matrix, f = Structure, method = scores) :
object 'w' not found
I have no object w in my data, but I do not really understand what this object is meant to be in the first place.
Running traceback() gives me:
3: factor.scores(x.matrix, f = Structure, method = scores)
2: fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate,
scores = scores, residuals = residuals, SMC = SMC, covar = covar,
missing = FALSE, impute = impute, min.err = min.err, max.iter = max.iter,
symmetric = symmetric, warnings = warnings, fm = fm, alpha = alpha,
oblique.scores = oblique.scores, np.obs = np.obs, use = use,
...)
1: fa(r = res, nfactors = 5, residuals = F, rotate = "oblimin",
n.obs = NA, scores = F, SMC = T, missing = F, impute = "median",
min.err = 0.001, , max.iter = 50, symmetric = T, warnings = T,
fm = "minres", alpha = 0.1, p = 0.05, oblique.scores = F,
use = "pairwise")
Not very enlightening to me.
Any suggestions regarding this w?
I went through the code line-by-line. It seems that scores cannot be passed as an argument to the factor.scores function. It goes through a switch statement and none of the branches activates, so you end up with no value for w which causes it to fail. You could try copying and pasting the following silly fix into your R session and then running your code again:
fa <- function(r, nfactors = 1, n.obs = NA, n.iter = 1, rotate = "oblimin",
scores = "regression", residuals = FALSE, SMC = TRUE, covar = FALSE,
missing = FALSE, impute = "median", min.err = 0.001, max.iter = 50,
symmetric = TRUE, warnings = TRUE, fm = "minres", alpha = 0.1,
p = 0.05, oblique.scores = FALSE, np.obs = NULL, use = "pairwise",
...){
scores <- c("a","b")
psych::fa(r, nfactors = 1, n.obs = NA, n.iter = 1, rotate = "oblimin",
scores = "regression", residuals = FALSE, SMC = TRUE, covar = FALSE,
missing = FALSE, impute = "median", min.err = 0.001, max.iter = 50,
symmetric = TRUE, warnings = TRUE, fm = "minres", alpha = 0.1,
p = 0.05, oblique.scores = FALSE, np.obs = NULL, use = "pairwise",
...)
}
I had this same error. Mine was caused because I tried to pass "Regression" to scores instead of "regression". So make sure that what you're passing to scores is an acceptable parameter option.

Resources