using chisq.test in R (chi-squared tests)

using chisq.test in R (chi-squared tests) - r

I am trying to read a csv file and then creating 3 matrices out of each row from the csv file and then apply chi-squared test using the method chisq.test(matrix), but somehow this methods seems to fail.
It gives me the following error:
Error in sum(x) : invalid 'type' (list) of argument
On the other hand, if I simply create a matrix passing some numbers then it works fine.
I also tried running str on two types of matrices.
That I create using the row, from the csv file. str on that gives:
List of 12
$ : int 3
$ : int 7
$ : int 3
$ : int 1
$ : int 7
$ : int 3
$ : int 1
$ : int 1
$ : int 1
$ : int 0
$ : int 2
$ : int 0
- attr(*, "dim")= int [1:2] 4 3
Matrix created using some numbers. str on that gives:
num [1:2, 1:3] 1 2 3 4 5 6
Can someone please tell me what is going on here?

The problems is that your data structure is an array of lists, and for chisq.test() you need an array of numeric values.
One solution is to coerce your data into numeric, using as.numeric(). I demonstrate this below. Another solution would be to convert the results of your read.csv() into numeric first before you create the array.
# Recreate data
x <- structure(array(list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)), dim=c(3,4))
str(x)
List of 12
$ : num 1
$ : num 2
$ : num 3
$ : num 4
$ : num 5
$ : num 6
$ : num 7
$ : num 8
$ : num 9
$ : num 10
$ : num 11
$ : num 12
- attr(*, "dim")= int [1:2] 3 4
# Convert to numeric array
x <- array(as.numeric(x), dim=dim(x))
str(x)
num [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
chisq.test(x)
Pearson's Chi-squared test
data: x
X-squared = 0.6156, df = 6, p-value = 0.9961
Warning message:
In chisq.test(x) : Chi-squared approximation may be incorrect

Related

How to code Simple returns for multiple columns?

How do I code this formula:
Simple returns = [(Pt / Pt-1) - 1]
I have tried the below, but keep getting the wrong numbers.
stockindices = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/stockInd.csv')
library(tidyverse)
simple_returns <- stockindices %>%
mutate(across(3:ncol(.), ~ ((.x / lag(.x-1))-1)))

You had too many -1's in your expression:
simple_returns <- stockindices %>%
mutate(across( 3:ncol(.), ~ .x / lag(.x)-1))
str(simple_returns)
'data.frame': 3978 obs. of 8 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ Date: chr "1999-04-01" "1999-05-01" "1999-06-01" "1999-07-01" ...
$ DJX : num NA 0.01382 0.025107 -0.000755 0.011068 ...
$ SPX : num NA 0.01358 0.02214 -0.00205 0.00422 ...
$ HKX : num NA 0.00835 0.03465 0.04493 0.00272 ...
$ NKX : num NA -0.01365 0.01781 0.00506 -0.01069 ...
$ DAX : num NA 0.000295 0.036108 -0.022119 0.01308 ...
$ UKX : num NA 0.0134 0.03199 -0.00774 0.00754 ...
You could have bracketed the .x/lag(.x) but it's not necessary here because of operator precedence and R's order of operations rules. The default lag-interval is 1 so it doesn't need to be inside the argument to lag. If you had wanted the semi-monthly returns it would have been
~ .x/lag(.x, 2) - 1
And as always it will pay to make sure that you have masked the stats::lag function, which is quite different and doesn't play nicely with the tidyverse.

Building a table/dataframe/something exportable from Desc function output in R

I'm definitely a noob, though I have used R for various small tasks for several years.
For the life of me, I cannot figure out how to get the results from the "Desc" function into something I can work with. When I save the x<-Desc(mydata) the class(x) shows up as "Desc." In R studio it is under Values and says "List of 1." Then when I click on x it says ":List of 25" in the first line. There is a list of data in this object, but I cannot for the life of me figure out how to grab any of it.
Clearly I have a severe misunderstanding of the R data structures, but I have been searching for the past 90 minutes to no avail so figured I would reach out.
In short, I just want to pull certain aspects (N, mean, UB, LB, median) of the descriptive statistics provided from the Desc results for multiple datasets and build a little table that I can then work with.
Thanks for the help.

Say you have a dataframe, x, where:
x <- data.frame(i=c(1,2,3),j=c(4,5,6))
You could set:
desc.x <- Desc(x)
And access the info on any given column like:
desc.x$i
desc.x$i$mead
desc.x$j$sd
And any other stats Desc comes up with. The $ is the key here, it's how you access the named fields of the list that Desc returns.
Edit: In case you pass a single column (as the asker does), or simply a vector to Desc, you are then returned a 1 item list. The same principle applies but the usual syntax is different. Now you would use:
desc.x <- Desc(df$my.col)
desc.x[[1]]$mean
In the future, the way to attack this is to either look in the environment window in RStudio and play around trying to figure out how to access the fields, check the source code on github or elsewhere, or (best first choice) use str(desc.x), which gives us:
> str(desc.x)
List of 1
$ :List of 25
..$ xname : chr "data.frame(i = c(1, 2, 3), j = c(4, 5, 6))$i"
..$ label : NULL
..$ class : chr "numeric"
..$ classlabel: chr "numeric"
..$ length : int 3
..$ n : int 3
..$ NAs : int 0
..$ main : chr "data.frame(i = c(1, 2, 3), j = c(4, 5, 6))$i (numeric)"
..$ unique : int 3
..$ 0s : int 0
..$ mean : num 2
..$ meanSE : num 0.577
..$ quant : Named num [1:9] 1 1.1 1.2 1.5 2 2.5 2.8 2.9 3
.. ..- attr(*, "names")= chr [1:9] "min" ".05" ".10" ".25" ...
..$ range : num 2
..$ sd : num 1
..$ vcoef : num 0.5
..$ mad : num 1.48
..$ IQR : num 1
..$ skew : num 0
..$ kurt : num -2.33
..$ small :'data.frame': 3 obs. of 2 variables:
.. ..$ val : num [1:3] 1 2 3
.. ..$ freq: num [1:3] 1 1 1
..$ large :'data.frame': 3 obs. of 2 variables:
.. ..$ val : num [1:3] 3 2 1
.. ..$ freq: num [1:3] 1 1 1
..$ freq :Classes ‘Freq’ and 'data.frame': 3 obs. of 5 variables:
.. ..$ level : Factor w/ 3 levels "1","2","3": 1 2 3
.. ..$ freq : int [1:3] 1 1 1
.. ..$ perc : num [1:3] 0.333 0.333 0.333
.. ..$ cumfreq: int [1:3] 1 2 3
.. ..$ cumperc: num [1:3] 0.333 0.667 1
..$ maxrows : num 12
..$ x : num [1:3] 1 2 3
- attr(*, "class")= chr "Desc"
"List of 1" means you access it by desc.x[[1]], and below that follow the $s. When you see something like num[1:3] that means it's an atomic vector so you access the first member like var$field$numbers[1]

I would like to merge one data frame with each vector in a list of vectors.. Output should be a list of data frames

str(list) # the list
List of 11
$ : int [1:62850] 1013128473 1010310348 1048245573 1034384956 1041152164 1044038741 1018034270 1028472668 1028965885 1009487677 ...
$ : int [1:76934] 1013175201 1008463364 1016595579 1015077603 1036297925 1033985605 1004670509 1002708962 1035740487 1033948421 ...
$ : int [1:63141] 1023522277 1028419750 1035072196 1015895913 1044665345 1045384789 1003817549 1007103029 1034294940 1048731747 ...
$ : int [1:66286] 1004375117 1015143512 1013554405 1029388459 1042758662 1002010773 1014659880 1010136990 1042787992 1034111995 ...
$ : int [1:59295] 1026598712 1046781801 1047773468 1029647490 1000445831 1004654396 1026574333 1028210894 1031396631 1017077460 ...
$ : int [1:39513] 1008628321 1031342452 1036618138 1025299916 1059540334 1044636981 1025831775 1020671796 1016064196 1000573822 ...
$ : int [1:52616] 1007104357 1035072196 1045300736 1013342439 1021471188 1014648594 1047521123 1006283327 1018237501 1052887674 ...
$ : int [1:53865] 1043482304 1006375883 1065831792 1025658285 1025898360 1042188555 1010986410 1036297925 1016468595 1042017564 ...
$ : int [1:74030] 1049026709 1076616323 1013343981 1009441716 1004974596 1032515221 1059905172 1011514112 1005423064 1006931636 ...
$ : int [1:62171] 1024128835 1006168791 1003374715 1042188555 1016219766 1002708962 1035781234 1039706286 1011430434 1055809196 ...
$ : int [1:66560] 1020967137 1029327077 1026256246 1046334023 1035156221 1017504075 1035065786 1043426434 1034294940 1019105475 ...
str(df) # the data frame
'data.frame': 3727518 obs. of 5 variables:
$ A: int 10001676 10001676 10002575 10002990 10003466 10005485 10005736 10005949 10006562 10007119 ...
$ 1: int 1020565642 1020565642 1008628321 1038358741 1045031612 1025102185 1011873328 1002079752 1028579827 1026598712 ...
$ 2: Factor w/ 2 levels "ÇäËì","ÐßÑ": 2 2 2 2 2 2 2 2 2 2 ...
$ 3: int 1 4 1 1 1 1 20 1 1 1 ...
$ 4: int 64 64 66 63 69 59 84 83 65 64 ...
I want to merge each vector in the list with the data frame by "A".
What I tried was:
for(n in 1:length(list))
{
newlist[[n]] <- merge(df, list[[n]], by.x = "A")
}
Error in merge.data.frame(rd_info, newengagementspermonth[[n]], by.x = "NEWNINUMBER") :
'by.x' and 'by.y' specify different numbers of columns
The input is a list of 11 vectors and a dataframe. the output should be a list of 11 dataframes with the each dataframe having number of rows equal to the length of the corresponding vector.

You could do something like this. First, explicitly transform each object in the list into a data.frame. Then, merge it with df. You need to specify by.x and by.y since the data.frames do not have the same names.
new list <- lapply(lapply(list,as.data.frame),function(x) merge(x,df,by.x="X[[i]]",by.y="A",all.x=TRUE))
With sample data:
list <- list(1:8,1:10,2:15)
df <- data.frame(A=1:15,
b=rnorm(15))
output
str(newlist)
List of 3
$ :'data.frame': 8 obs. of 2 variables:
..$ X[[i]]: int [1:8] 1 2 3 4 5 6 7 8
..$ b : num [1:8] 0.0127 0.2082 -0.271 0.421 -0.538 ...
$ :'data.frame': 10 obs. of 2 variables:
..$ X[[i]]: int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ b : num [1:10] 0.0127 0.2082 -0.271 0.421 -0.538 ...
$ :'data.frame': 14 obs. of 2 variables:
..$ X[[i]]: int [1:14] 2 3 4 5 6 7 8 9 10 11 ...
..$ b : num [1:14] 0.208 -0.271 0.421 -0.538 0.506 ...

All N Combinations of All Subsets

Given a vector of elements, I would like to obtain a list of all possible n-length combinations of subsets of elements. For example, given the (simplest) sequence 1:2, I would like to obtain a list object of the form
{ {{1},{1}}, {{1},{2}}, {{2},{2}}, {{1},{1,2}}, {{2},{1,2}}, {{1,2},{1,2}} }
when n=2.
I was able to generate a list of all non-empty subsets using the following:
listOfAllSubsets <- function (s) {
n <- length(s)
unlist(lapply(1:n, function (n) {
combn(s, n, simplify=FALSE)
}), recursive=FALSE)
}
However, I'm not sure the best way to proceed from here. Essentially, I want a Cartesian product of this list with itself (for n=2).
Any suggestions? A non-iterative solution would be preferable (i.e., no for loops).

It is easier to start with a Cartesian product of the indices. Then duplication can be avoided by making sure the tuple of indices is sorted.
combosn <- function(items,n) {
i <- seq_along(items)
idx <-do.call(expand.grid,rep(list(i),n))
idx <- idx[!apply(idx,1,is.unsorted),]
apply(idx,1,function(x) items[x])
}
ss<-listOfAllSubsets(1:2)
str(combosn(ss,2))
List of 6
$ :List of 2
..$ : int 1
..$ : int 1
$ :List of 2
..$ : int 1
..$ : int 2
$ :List of 2
..$ : int 2
..$ : int 2
$ :List of 2
..$ : int 1
..$ : int [1:2] 1 2
$ :List of 2
..$ : int 2
..$ : int [1:2] 1 2
$ :List of 2
..$ : int [1:2] 1 2
..$ : int [1:2] 1 2
Or, for n=3,
str(combosn(ss,3))
List of 10
$ :List of 3
..$ : int 1
..$ : int 1
..$ : int 1
$ :List of 3
..$ : int 1
..$ : int 1
..$ : int 2
$ :List of 3
..$ : int 1
..$ : int 2
..$ : int 2
$ :List of 3
..$ : int 2
..$ : int 2
..$ : int 2
$ :List of 3
..$ : int 1
..$ : int 1
..$ : int [1:2] 1 2
$ :List of 3
..$ : int 1
..$ : int 2
..$ : int [1:2] 1 2
$ :List of 3
..$ : int 2
..$ : int 2
..$ : int [1:2] 1 2
$ :List of 3
..$ : int 1
..$ : int [1:2] 1 2
..$ : int [1:2] 1 2
$ :List of 3
..$ : int 2
..$ : int [1:2] 1 2
..$ : int [1:2] 1 2
$ :List of 3
..$ : int [1:2] 1 2
..$ : int [1:2] 1 2
..$ : int [1:2] 1 2

This is what I would do, with, e.g., s=1:2:
1) Represent subsets with a 0/1 matrix for each element's membership.
subsets = as.matrix(do.call(expand.grid,replicate(length(s),0:1,simplify=FALSE)))
which gives
Var1 Var2
[1,] 0 0
[2,] 1 0
[3,] 0 1
[4,] 1 1
Here, the first row is the empty subset; the second, {1}; the third, {2}; and the fourth, {1,2}. To get the subset itself, use mysubset = s[subsets[row,]], where row is the row of the subset you want.
2) Represent pairs of subsets as pairs of rows of the matrix:
pairs <- expand.grid(Row1=1:nrow(subsets),Row2=1:nrow(subsets))
which gives
Row1 Row2
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 4 2
9 1 3
10 2 3
11 3 3
12 4 3
13 1 4
14 2 4
15 3 4
16 4 4
Here, the fourteenth row corresponds to the second and fourth rows of subsets, so {1} & {1,2}. This assumes the order of the pair matters (which is implicit in taking the Cartesian product). To recover the subsets, use mypairosubsets=lapply(pairs[p,],function(r) s[subsets[r,]]) where p is the row of the pair you want.
Expanding beyond pairs to the P(s)^n case (where P(s) is the power set of s) would look like
setsosets = as.matrix(do.call(expand.grid,replicate(n,1:nrow(subsets),simplify=FALSE)))
Here, each row will have a vector of numbers. Each number corresponds to a row in the subsets matrix.
Making copies of the elements of s is probably not necessary for whatever you are doing after this. However, you could do it from here by using lapply(1:nrow(pairs),function(p)lapply(pairs[p,],function(r) s[subsets[r,]])), which starts like...
[[1]]
[[1]]$Row1
integer(0)
[[1]]$Row2
integer(0)
[[2]]
[[2]]$Row1
[1] 1
[[2]]$Row2
integer(0)

allSubsets<-function(n,# size of initial set
m,# number of subsets
includeEmpty=FALSE)# should the empty set be consiered a subset?
{
# m can't exceed the number of possible subsets
if(includeEmpty)
stopifnot(m <= 2^n)
else
stopifnot(m <= 2^n-1)
# get the subsets of the initial set (of size n)
if(includeEmpty){
ll <- split(t(combn(2^n,m)),seq(choose(2^n,m)))
}else
ll <- split(t(combn(2^n-1,m)),seq(choose(2^n-1,m)))
# get the subets
subsets <- apply(do.call(expand.grid,rep(list(c(F,T)),n)),
1,which)
# remove the empty subset if desired
if(!includeEmpty)
subsets <- subsets[-1]
# covert the subsets to vector
subsets <- lapply(subsets,as.vector)
# return the list of subsets
apply(t(mapply('[',list(subsets),ll)),1,function(x)x)
}
# returns a list where each element is a list of length 2 with
# subsets of the initial set of length 4
x = allSubsets(4,2,F)

Trouble converting list into factor in R

I am having problems creating a boxplot of my data, because one of my variables is in the form of a list.
I am trying to create a boxplot:
boxplot(dist~species, data=out)
and received the following error:
Error in model.frame.default(formula = dist ~ species, data = out) :
invalid type (list) for variable 'species'
I have been unsuccessful in forcing 'species' into the form of a factor:
out[species]<- as.factor(out[[out$species]])
and receive the following error:
Error in .subset2(x, i, exact = exact) : invalid subscript type 'list'
How can I convert my 'species' column into a factor which I can then use to create a boxplot? Thanks.
EDIT:
str(out)
'data.frame': 4570 obs. of 6 variables:
$ GridRef : chr "NT73" "NT80" "NT85" "NT86" ...
$ pred : num 154 71 81 85 73 99 113 157 92 85 ...
$ pred_bin : int 0 0 0 0 0 0 0 0 0 0 ...
$ dist : num 20000 10000 9842 14144 22361 ...
$ years_since_1990: chr "21" "16" "21" "20" ...
$ species :List of 4570
..$ : chr "C.splendens"
..$ : chr "C.splendens"
..$ : chr "C.splendens"
.. [list output truncated]

It's hard to imagine how you got the data into this form in the first place, but it looks like
out <- transform(out,species=unlist(species))
should solve your problem.
set.seed(101)
f <- as.list(sample(letters[1:5],replace=TRUE,size=100))
## need I() to make a wonky data frame ...
d <- data.frame(y=runif(100),f=I(f))
## 'data.frame': 100 obs. of 2 variables:
## $ y: num 0.125 0.0233 0.3919 0.8596 0.7183 ...
## $ f:List of 100
## ..$ : chr "b"
## ..$ : chr "a"
boxplot(y~f,data=d) ## invalid type (list) ...
d2 <- transform(d,f=unlist(f))
boxplot(y~f,data=d2)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

using chisq.test in R (chi-squared tests) - r

Related

How to code Simple returns for multiple columns?

Building a table/dataframe/something exportable from Desc function output in R

I would like to merge one data frame with each vector in a list of vectors.. Output should be a list of data frames

All N Combinations of All Subsets

Trouble converting list into factor in R

Categories

Resources