(R) Getting "variable length differ" for two different variable

(R) Getting "variable length differ" for two different variable - r

I created two variable called "low.income" and "mid.income" from survey, they are variables which obtained based on participants income. here you can see the variables how looks like:
low.income = 75 95 85 100 85 100 85 90 75 90 65 80 85 90 85 70 95 85 100 95 85 95 90 95 95
mid.income = 95 100 90 90 85 95 100 95 80
But when try to call aov(low.income~mid.income) it gives me Error in model.frame.default(formula = low.income ~ mid.income, drop.unused.levels = TRUE) :
variable lengths differ (found for 'mid.income')
So, what should i do ?

That is not correct, I think you are looking for t.test ie
t.test(low.income, mid.income, var.equal = TRUE)
To use the formula method, you have to create a dataframe with the level and the income. It should look like below:
data <- data.frame(level = rep(paste0(c("low","mid"),".income"),c(25,9)), income = c(low.income,mid.income))
level income
1 low.income 75
2 low.income 95
3 low.income 85
4 low.income 100
5 low.income 85
6 low.income 100
: : :
29 mid.income 90
30 mid.income 85
31 mid.income 95
32 mid.income 100
33 mid.income 95
34 mid.income 80
Now you could do:
t.test(income~level,data,var.equal = TRUE)
Well Since you are using aov, I will give you an example of how to do that:
aov(income~level,data)
These two will lead to the exact same result. You can run TukeyHSD to see that the results are the same.
NOTE: You only run ANOVA when you have more than 2 groups. If you only have 2 groups, run a t.test. Recall that ANOVA is a generalization of the t.test

Related

Create an optimization model such that selecting start date and end date produces some outcome

21-Oct 28-Oct 4-Nov 11-Nov
22-Apr 90 95 95 95
29-Apr 95 100 100 100
6-May 95 100 100 100
13-May 90 100 100 95
20-May 90 95 95 90
27-May 80 85 85 90
3-Jun 75 80 80 85
`
The data above shows the start dates (rows) and end dates (columns) and values represent outcomes in percentage terms given start and end dates. I want to create an optimization model such that selecting start date and end date produces an outcome using R.

The key to this question is how to subset the data frame based on row and column names. Here I designed a function to achieve this. To use this function, you can set start to the start date, end to the end date, and dt to your data frame.
outcome <- function(start, end, dt){
out <- dt[rownames(dt) %in% start, colnames(dt) %in% end]
return(out)
}
# Example:
outcome(start = "29-Apr", end = "28-Oct", dt = dat)
# [1] 100
Or as mentioned in the comment, you can do the following directly.
outcome <- function(start, end, dt){
out <- dt[start, end]
return(out)
}
DATA
dat <- read.table(text = "'22-Apr' 90 95 95 95
'29-Apr' 95 100 100 100
'6-May' 95 100 100 100
'13-May' 90 100 100 95
'20-May' 90 95 95 90
'27-May' 80 85 85 90
'3-Jun' 75 80 80 85", header = FALSE, row.names = 1)
names(dat) <- c("21-Oct", "28-Oct", "4-Nov", "11-Nov")

Variable lengths differ - fligner.test

I have the folloing a dataset like follows:
attack defense sp_attack sp_defense speed is_legendary
60 62 63 80 60 0
80 100 123 122 120 0
39 52 43 60 65 0
58 64 58 80 80 0
90 90 85 125 90 1
100 90 125 85 90 1
106 150 70 194 120 1
100 100 100 100 100 1
90 85 75 115 100 1
From this dataset, I want to check if there is heteroscedasticity between two groups: Legendary vs. Non legendary pokemons. To do that, first I checked the normality of the data for the legendary and non legendary pokémon as follows:
# Shapiro-test for legendary and non legendari pokemon, hp comparison.
shapiro.test(df_net$hp[df_net$is_legendary==0])
shapiro.test(df_net$hp[df_net$is_legendary==1])
I´ve seen that in both cases the result is not distributed normally. Now, I´ve decided to carry out a Fligner test as follows:
fligner.test(hp[df_net$is_legendary==0] ~ hp[df_net$is_legendary==1], data = df_net)
However, I obtain the following error:
Error in model.frame.default(formula = hp[df_net$is_legendary == 0] ~ : variable lengths differ (found for 'hp[df_net$is_legendary == 1]')
I guess that this is due to the number of observations of pokemon legendary different from non legendary but then how can I check the heteroscedasticity between this two groups?

The correct syntax for fligner.test is
fligner.test(x ~ group, data)
In your case the correct syntax would be (e.g for variable sp_defense)
fligner.test(sp_defense ~ is_legendary, data=df_net)

Converting spectral density values produced by spectrum() in R to values produced by SAS PROC SPECTRA

I am converting SAS programs that demonstrate temporal data analysis into R. I would like to reproduce SAS PROC SPECTRA output using R.
So, my question is, can the spectral density values produced by the spectrum() function in R be converted into the values for spectral density produced by SAS PROC SPECTRA?
DATA AR1_09;
INPUT t U;
OUTPUT;
CARDS;
1 -5.19859
2 4.91364
3 -3.86515
4 4.02932
5 -4.12263
6 3.46548
7 -3.01139
8 3.13753
9 -2.34875
10 2.1531
11 -2.01086
12 1.88911
13 -2.22766
14 1.94077
15 0.1786
16 0.84228
17 -1.51301
18 2.62644
19 -3.44148
20 3.13813
21 -2.34959
22 2.70754
23 -2.54789
24 2.04427
25 -2.34041
26 1.13443
27 -0.11853
28 0.74645
29 0.02448
30 0.57811
31 -1.54715
32 1.05646
33 -0.56458
34 0.6863
35 -0.53347
36 0.60813
37 -1.22044
38 0.13136
39 -0.45568
40 0.13459
41 -0.10892
42 0.46324
43 1.01367
44 -2.44015
45 1.62849
46 1.54928
47 -2.7146
48 2.20448
49 -1.58668
50 1.06419
51 -1.41402
52 1.30755
53 -1.55331
54 1.58191
55 -2.38216
56 1.45702
57 0.79562
58 -0.91078
59 -0.59827
60 1.44958
61 -1.81996
62 -0.05101
63 -0.13188
64 1.34861
65 -1.81912
66 0.73641
67 -0.32049
68 -0.37179
69 2.26288
70 -2.2773
71 0.95193
72 -1.24679
73 0.67123
74 -0.40868
75 1.46308
76 -0.71945
77 1.07481
78 -2.25127
79 1.87573
80 -1.52811
81 1.27772
82 -2.96657
83 3.58684
84 -1.7656
85 2.92004
86 -2.36525
87 2.17087
88 -1.65458
89 0.86588
90 0.19505
91 -2.34264
92 3.51124
93 -3.33501
94 3.13522
95 -1.8957
96 0.93527
97 -0.96551
98 0.08307
99 -0.14018
100 0.48641
;
PROC SPECTRA DATA=AR1_09 OUT=AR1_09PSPEC1 P S WHITETEST;
VAR U;
WEIGHTS 1 2 1;
RUN;
PROC SPECTRA DATA=AR1_09 OUT=AR1_09PSPEC2 S WHITETEST;
VAR U;
WEIGHTS 1 2 3 4 5 4 3 2 1;
RUN;
DATA AR1_09PSPEC12;
SET AR1_09PSPEC1;
n=100;
fre=0.5*n*FREQ/(4*ATAN(1));
P_01=P_01/(16*ATAN(1));
KEEP fre P_01 S_01;
RUN;
DATA AR1_09PSPEC22;
SET AR1_09PSPEC2;
n=100;
fre=0.5*n*FREQ/(4*ATAN(1));
S_02=S_01;
KEEP fre S_02;
DATA AR1_09TRUESPEC;
SET AR1_09PSPEC1;
n=100;
rho=-0.9;
theoreticalS=1.0/(8*ATAN(1)*(1-2*rho*cos(FREQ)+rho*rho));
fre=0.5*n*FREQ/(4*ATAN(1));
KEEP fre theoreticalS;
DATA AR1_09PSPEC;
MERGE AR1_09PSPEC12 AR1_09PSPEC22 AR1_09TRUESPEC;
PROC PRINT DATA=AR1_09PSPEC;
VAR fre P_01 S_01 S_02 theoreticalS;
RUN;
and so far, after entering the data using read.xlsx() in R, this is what I've got:
AR1.09 <- as.ts(AR1.09[, 2])
install.packages("forecast")
library(forecast)
Using ma for the moving average smoother of order 2.
MA2AR1.09 <- ma(AR1.09, order = 2)
AR1.09PSPEC1 <- spectrum(na.omit(MA2AR1.09))
Tests for White Noise for Variable U
Box.test (MA2AR1.09)
Box.test (MA2AR1.09, type = "Ljung")
Periodogram of Fourier analysis. The periodogram values produced by SAS look to be approximately P (below) times 4. This isn't surprising, I have been told that some software produces periodogram values divided by 4pi.
n <- length(AR1.09)
FF <- abs(fft(AR1.09) / sqrt(n))^2
P <- (4 / n) * FF[1:((n / 2) + 1)]
f <- (0:(n/2)) / n
plot(f, P, type = "h")
So, the $spec values that are the spectral density values produced by R's spectrum() function are not the same as those produced by SAS PROC SPECTRA. Can I transform my R values into the SAS values?
That's all. Thank you for your time.

How to use for loop in R

I have a CSV dataset (call it data) as follow:
CLASS CoverageT1 CoverageT2 CoverageT3
Gamma 90 80 75
Gamma 89 72 79
Gamma 92 86 75
Alpha 50 80 67
Alpha 53 78 60
Alpha 58 81 75
I would like to retrieve the unique classes and calculate the average for each coverage column.
What I've done so far is the following:
classes <- subset(data, select = c(CLASS))
unique_classes <- unique(classes)
for(x in unique_classes){
cove <- subset(data, CLASS == x , select=c(CoverageT1:CoverageT3))
average <- colMeans(cove)
print(cove)
}
As a result, I got the following results:
CoverageT1 CoverageT2 CoverageT3
1 90 80 75
3 92 86 75
4 50 80 67
6 58 81 75
I want to retrieve the coverage values based on each class and then calculate the average. When I print the retrieved coverage values, I get some rows and the other are missing!
Can someone help me solving this issue
Thanks

Your code isn't working because, amongst other things, you are assigning to average on each iteration and the previous is lost
There are several ways to do what you are trying to do. This would be my approach:
library(dplyr)
data %>% group_by(CLASS) %>% summarise_all(mean)

Another option using aggregate
aggregate(data, . ~ CLASS , mean)

Taking your idea and wrapping it in by.
xy <- read.table(text = "CLASS CoverageT1 CoverageT2 CoverageT3
Gamma 90 80 75
Gamma 89 72 79
Gamma 92 86 75
Alpha 50 80 67
Alpha 53 78 60
Alpha 58 81 75", header = TRUE)
out <- by(data = xy[, -1], INDICES = list(xy$CLASS), FUN = colMeans)
out <- do.call(rbind, out)
out
CoverageT1 CoverageT2 CoverageT3
Alpha 53.66667 79.66667 67.33333
Gamma 90.33333 79.33333 76.33333

This is how I solved it:
coverage_all <- aggregate(coverage , list(class=data$CLASS), mean)

Loop Linear Regression

As a begginer in R i have a, probably, simple question.
I have a linear regression with this specification:
X1 = X1_t-h + X2_t-h
h for is equal to 1,2,3,4,5:
For example, when h=1 i run this code:
Modelo11 <- dynlm(X1 ~ L(X1,1) + L(X2, 1)-1, data = GDP)
Its a simple regression.
I want to implement a function that gives me the five linear regressions (h=1,2,3,4 and 5) with and without HAC heteroscedasticity estimation:
I did this, and didnt work:
for(h in 1:5){
Modelo1[h] <- dynlm(GDPTrimestralemT ~ L(SpreademT,h) + L(GDPTrimestralemT, h)-1, data = MatrizDadosUS)
coeftest(Modelo1[h], df = Inf, vcov = parzenHAC)
return(list(summary(Modelo1[h])))
}
One of the error message is:
number of items to replace is not a multiple of replacement length
This is my data.frame:
GDP <- data.frame(data )
GDP
X1 X2
1 0.542952690 0.226341364
2 0.102328393 0.743360185
3 0.166345969 0.186533485
4 1.406733422 1.392420181
5 -0.469811005 -0.114609464
6 -0.509268267 0.687555461
7 1.470439930 0.298655018
8 1.046456428 -1.056387597
9 -0.492462197 -0.530284962
10 -0.516065519 0.645957530
11 0.624638996 1.044731264
12 0.213616470 -1.652979785
13 0.669747432 1.398602289
14 0.552089131 -0.821013792
15 0.452715216 1.420094663
16 -0.892063248 -1.436600779
17 1.429284965 0.559738610
18 0.853740565 -0.898976767
19 0.741864168 1.352012831
20 0.171494650 1.704764705
21 0.422326351 -0.267064235
22 -1.261643503 -2.090694608
23 -1.321086283 -0.273954212
24 0.365226000 1.965167113
25 -0.080888690 -0.594498893
26 -0.183293801 -0.483053404
27 -1.033792032 0.586491772
28 0.718322432 1.776210145
29 -2.822693790 -0.731509917
30 -1.251740437 -1.918124078
31 1.184256949 -0.016548037
32 2.255202675 0.303438286
33 -0.930446147 0.803126180
34 -1.691383225 -0.157839283
35 -1.081643279 -0.006652717
36 1.034162006 -1.970063305
37 -0.716827488 0.306792930
38 0.098471514 0.338333164
39 0.343536547 0.389775011
40 1.442117465 -0.668885360
41 0.095131066 -0.298356861
42 0.222524607 0.291485267
43 -0.499969717 1.308312472
44 0.588162304 0.026539575
45 0.581215173 0.167710855
46 0.629343124 -0.052835206
47 0.811618963 0.716913172
48 1.463610069 -0.356369304
49 -2.000576321 1.226446201
50 1.278233553 0.313606888
51 -0.700373666 0.770273988
52 -1.206455648 0.344628878
53 0.024602262 1.001621886
54 0.858933385 -0.865771777
55 -1.592291995 -0.384908852
56 -0.833758365 -1.184682199
57 -0.281305858 2.070391729
58 -0.122848757 -0.308397782
59 -0.661013984 1.590741535
60 1.887869805 -1.240283364
61 -0.313677463 -1.393252994
62 1.142864110 -1.150916732
63 -0.633380499 -0.223923970
64 -0.158729527 -1.245647224
65 0.928619010 -1.050636078
66 0.424317087 0.593892028
67 1.108704956 -1.792833100
68 -1.338231248 1.138684394
69 -0.647492569 0.181495183
70 0.295906675 -0.101823172
71 -0.079827607 0.825158278
72 0.050353111 -0.448453121
73 0.129068772 0.205619797
74 -0.221450137 0.051349511
75 -1.300967949 1.639063824
76 -0.861963677 1.273104220
77 -1.691001610 0.746514122
78 0.365888734 -0.055308006
79 1.297349754 1.146102001
80 -0.652382297 -1.095031447
81 0.165682952 -0.012926971
82 0.127996446 0.510673745
83 0.338743162 -3.141650682
84 -0.266916587 -2.483389321
85 0.148135154 -1.239997153
86 1.256591385 0.051984536
87 -0.646281986 0.468210275
88 0.180472423 0.393014848
89 0.231892902 -0.545305005
90 -0.709986273 0.104969765
91 1.231712844 -1.703489840
92 0.435378714 0.876505107
93 -1.880394798 -0.885893722
94 1.083580732 0.117560662
95 -0.499072654 -1.039222894
96 1.850756855 -1.308752222
97 1.653952857 0.440405804
98 -1.057618294 -1.611779530
99 -0.021821282 -0.807071503
100 0.682923562 -2.358596342
101 -1.132293845 -1.488806929
102 0.319237353 0.706203968
103 -2.393105781 -1.562111727
104 0.188653972 -0.637073832
105 0.667003685 0.047694037
106 -0.534018861 1.366826933
107 -2.240330371 -0.071797320
108 -0.220633546 1.612879694
109 -0.022442941 1.172582601
110 -1.542418139 0.635161458
111 -0.684128812 -0.334973482
112 0.688849615 0.056557966
113 0.848602803 0.785297518
114 -0.874157558 -0.434518305
115 -0.404999060 -0.078893114
116 0.735896917 1.637873669
117 -0.174398836 0.542952690
118 0.222418628 0.102328393
119 0.419461884 0.166345969
120 -0.042602368 1.406733422
121 2.135670836 -0.469811005
122 1.197644287 -0.509268267
123 0.395951293 1.470439930
124 0.141327444 1.046456428
125 0.691575897 -0.492462197
126 -0.490708151 -0.516065519
127 -0.358903359 0.624638996
128 -0.227550909 0.213616470
129 -0.766692832 0.669747432
130 -0.001690915 0.552089131
131 -1.786701123 0.452715216
132 -1.251495762 -0.892063248
133 1.123462446 1.429284965
134 0.237862653 0.853740565
Thanks.

Your variable Modelo1 is a vector which cannot store lm objects. When Modelo1 is a list it should work.
library(dynlm)
df<-data.frame(rnorm(50),rnorm(50))
names(df)<-c("a","b")
c<-list()
for(h in 1:5){
c[[h]] <- dynlm(a ~ L(a,h) + L(b, h)-1, data = df)
}
To get the summary you have to access the single list elements. For example:
summary(c[[1]])
*edit in response to Richard Scriven comment
The most efficent way to to get all summaries would be:
lapply(c, summary)
This applies the summary function to each element of the list and returns a list with the results.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

(R) Getting "variable length differ" for two different variable - r

Related

Create an optimization model such that selecting start date and end date produces some outcome

Variable lengths differ - fligner.test

Converting spectral density values produced by spectrum() in R to values produced by SAS PROC SPECTRA

How to use for loop in R

Loop Linear Regression

Categories

Resources