Variable selection and adding noise data - r

Its my first post and english is not my first language so please bear with me.
I have searched the forum about my problem but im still looking forward to the suitable answer.
So here is my problem im trying to use spike and slab package as a variable selection tool for the first time and i have a data set of 1000 examples and 8 variables, but i think i need more variables to evaluate the effectiveness of the package and i dont know how i can add more random variables to my data set.
Is there any command in R that do this ? Can you please help me friends?
I appreciate your inputs
Thanks.
the code i've used :
diabet=read.csv(data,header=T,sep=",")
diabet
library(spikeslab)
obj <- spikeslab(BS~ . , diabet)
print(obj)
plot(obj)
https://imgur.com/a/NerKn
as you can see all of my variables are included as top vars

Related

There are something that i don't know with this plot?

I am looking at this code, previously v-transformations were done and fitting VT-ARMA copula models, now here it is applying shapiro test to residuals and want to plot 4 graphs: 
https://i.stack.imgur.com/gTtBU.png
These 4 plots should come out of plot(vtcop, plotoption=3) etc... I have never used this argument plotoption, i think this argument is contained in the tscopula package ,but I have already done the necessary research on the help and read the pdf file that explains the tscopula package but there is no such "plotoption".
Can anyone tell me why it tells me unused argument at this point?
This code from by paper of AlexanderMcNeil: "Modelling Volatile Time Series with V-Transforms and Copulas".
Thank you very much. Good day.

R: [Indicspecies package] multipatt function: extract values from summary.multipatt

I am working with the 'indicspecies' package - multipatt function and am unable to extract summary values of the package. Unfortunately I can't print all the summary and am left with impartial information for my model. The reason is the huge amount of data that needs to be printed from the summary (300.000 different species, 3 groups, 6 comparable combinations).
This is what happens with summary being saved (pre-code incl.):
x <- multipatt(data, ...)
sumx <-summary(x)
sumx
NULL
str(sumx)
NULL
So, the summary does not work exactly like a generic summary. It seems that the function is based around the older indval function from the 'labdsv' package (which is mentioned in the documentation). I found an archived thread where a similar problem is discussed: http://r.789695.n4.nabble.com/extract-values-from-summary-of-function-indval-of-the-package-labdsv-td4637466.html
but it seems not resolved (and is not exactly about the same function, rather the base function indval).
I was wondering if anyone has experience with the indicspecies package and knows a way to either extract the info from the summary.
It is possible to extract significance and other information from the other saved data from the model, but it might be nice to just get a quick complete overview from the data.
ps. I tried
options(max.print=1000000)
but this didn't solve it for me.
I use to capture the summary output for a multipatt object, but don't any more because the p-values reported are not corrected for multiple testing. To answer the OP's question you can capture the summary output using capture.output
ex.
dat.multipatt.summary<-capture.output(summary(dat.multipatt, indvalcomp=TRUE))
Again, I do not recommend this. It is very important to correct the p-values for multiple testing, so the summary output actually isn't helpful. To be clear ?multipatt states:
"sign Data table with results of the best matching pattern, the association value and the degree of statistical significance of the association (i.e. p-values from permutation test). Note that p-values are not corrected for multiple testing."
I just posted an answer for how to correct the p-values here https://stats.stackexchange.com/questions/370724/indiscpecies-multipatt-and-overcoming-multi-comparrisons/401277#401277
I don't have any experience with this package and since you haven't provided the data, it's difficult to reproduce. But since summary is returning NULL, are you sure your x is computed properly? Check the object.size or class or something else of x to see if it indeed has any content.
Also instead of accessing all the contents of summary(x) together, you can use # to access slots of it (similar to $ in dataframe).
If you need further assistance, it'd be better t provide atleast a small subset or some other sample data so that the community can work with it.

VAR model with variable combination and variation

I tried searching for an answer for this question of mine, however I could not find anything.
I want to build a model that predicts barley prices for that i came up with 11 variables that may have an impact on the prices. What I tried doing was building a loop that chooses every time one extra variable from my pool of variables and tries different combinations of them and the output would be for every (extra/combination) variable a new VAR-model, so in a sense, it is a combinatorics exercise. After that, i want to implement an in/out of sample testing for each of the models that I came up with to decide which one is the most appropriate. Unfortunately, i am not very familiar with loops and i have been told not to use them on R... As I am a beginner on R, my tryouts won't help you out at all, but if you really require them I am happy to provide them to you.
Many thanks in advance!

Need help, I stuck when install (VGAM) and how to use mlogit. I don't understand from that pdf example

I need help for my problems. I have 2 problems.
I can't install library (VGAM) on my RStudio. Have any idea for another regression logistic ordinal package or have solution for my problem?
I Stuck for the first step when used mlogit. I have dependent variable = Kategori.Kredit with 3 option. And I have independent variable = FD,FC,ND,NC,CASA. Please help me to solve this problem, I try read example on pdf but still didn't understand.
Please, don't post two completely different questions at the same time.
Don't post code as image. Post as text
About number 2, it seems that your data is in the wide format, while mlogit need it in the long one. Use the function mlogit.data to get your data ready. The manual has some good examples about mlogit.data

matching among multiple variables in R

I am beginner in R. So, I am confused about the title of my question. sorry for that. I am trying to explain..
Professor gave me a NetCDF atmospheric data file(18.3MB).this file has 8 dimension and 8 variable. i have to work with 4 variable. every variable(time,site number,urban site,pm10) has 683016 data. suppose,
Urban site number:[2,5],
site number:[1,2,3,4,5,6],
time:[1-3-2012,2-3-2012....](hourly data(24) has taken in each day ),
pm10:[1,2,3,4,5,6.......](different for every hourly data with some missing value)
I have to manage this data set only for urban site and 1-3-2012(actually I have to make this spatio-temporal data to spatial data).I want my final data set like this:
Colum 1(time): 1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012
colum 2(Urban site number): 2,2,2,5,5,5
colum 3(pm10 value):1,2,3,NA,4,5,
As I only know very basic commands of R so I cant understand how can I solve this problem. Even I don't under stand How can I find any example of this type of problem in internet.
so, please give me some suggestion or link about what I have to learn to solve this problem in R. Please, help me out?
I think you're trying to reshape the dataset but i'm afraid i do not see how your current dataset looks like.
Could you elaborate more on what your dataset looks like right now?
There are packages that help reshaping such as {reshape} or {plyr}. But i need more detail to suggest which one you should use.

Resources