I am using EcoTest.sample to compare rarefaction curves for 19 vegetation plots on two soil types (alluvial and canyon). The code below produces the following
warning (more than 50 times): "In cor(x > 0) : the standard deviation is zero".
The test still produces all the expected output. Should I be concerned about the warnings? Is it a result of my relatively small sample size?
rawdata<-read.table(text="Plot SiteType sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11 sp12 sp13 sp14 sp15 sp16 sp17 sp18 sp19 sp20 sp21 sp22 sp23 sp24 sp25 sp26 sp27 sp28 sp29 sp30 sp31 sp32 sp33 sp34 sp35
2 canyon 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0
3 alluvial 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
5 alluvial 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
6 alluvial 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0
7 alluvial 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
8 alluvial 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0
10 alluvial 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 0
11 canyon 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0
12 canyon 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
13 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0
14 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
15 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0
16 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
17 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
18 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
19 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0
20 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1
22 alluvial 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 0
23 alluvial 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0
", header=T)
data<-rawdata[,-1]
rownames(data)<-rawdata[,1]
test.data<-EcoTest.sample(data[,-1], by=data$SiteType, MARGIN=1, trace=F)
EDIT: Perhaps you need to set the nature of the index using q. For instance if I use q=2 the inverse Simpson index, I cannot reproduce your error. As it stands you're using q=0, the species richness. Perhaps there's nothing to do rather than using a different index. I'm not aware of the factors affecting index choice. I've read a thing or two here: http://www.tiem.utk.edu/~gross/bioed/bealsmodules/shannonDI.html and found this paper that I didn't go into much detail: https://dx.doi.org/10.1002%2Fece3.1155
Using Simpson's index: No warnings.
test.data<-EcoTest.sample(data[,-1], by=data$SiteType, MARGIN=1, trace=F,q=2)
Sample-based method
P(Obs <= null) = 0.205
As stated in this answer on SE, a standard deviation of zero will have an impact on the nature of the distribution. Therefore, any tests you perform that may have depended on a normal distribution will likely be erroneous. The p-values obtained say by a t-test may therefore be "insignificant."
When standard deviation is zero, your Gaussian (normal) PDF turns into Dirac delta function. You can't simply plug zero standard deviation into the conventional expression. For instance, if the PDF is plugged into some kind of numerical integration, this won't work. (Aksakal on SE)
https://stats.stackexchange.com/questions/233834/what-is-the-normal-distribution-when-standard-deviation-is-zero
Related
I use library(ergm) and library(igraph) and generate a ERGM network. But I want the adjacency matrix of that network. I am unable to find any function which can produce that.
library(ergm)
library(igraph)
g.use <- network(16,density=0.1,directed=FALSE)
#
# Starting from this network let's draw 3 realizations
# of a edges and 2-star network
#
g.sim <- simulate(~edges+kstar(2), nsim=3, coef=c(-1.8,0.03),
basis=g.use, control=control.simulate(
MCMC.burnin=1000,
MCMC.interval=100))
#g.sim[[3]]
summary(g.sim)
Is it possible to find the adjacency matrix from g.sim? and how?
EGRM package uses the network package and not the igraph package. You should maintain everythig in network and not load igraph as the two have some conflicting functions with same names.
In your case, you simulate 3 graphs thus you should have 3 adjacency matrices. The code is as below:
library(ergm)
g.use <- network(16,density=0.1,directed=FALSE)
g.sim <- simulate(~edges+kstar(2), nsim=3, coef=c(-1.8,0.03),
basis=g.use, control=control.simulate(
MCMC.burnin=1000,
MCMC.interval=100))
The code you want:
lapply(g.sim, as.matrix)
[[1]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0
3 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1
4 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
5 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
6 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
7 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1
8 0 1 0 0 0 0 0 0 0 1 1 1 1 0 1 0
9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
10 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0
11 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0
12 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
13 0 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1
14 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
16 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0
[[2]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0
4 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
6 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1
7 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
8 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0
9 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
11 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0
12 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1
13 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
15 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
16 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0
[[3]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
2 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0
3 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0
4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
5 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 1 0 0 0 0 1 0 1 0 0 0 1 0 1 0
7 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
10 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1
11 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
12 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0
13 1 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0
14 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0
15 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1
16 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0
I have a series of text files in a folder called "Disintegration T1" which look like this:
> 1.txt
0 0 0 0 1
1 0 0 0 1
0 1 0 0 1
0 0 0 0 0
1 1 1 1 0
> 2.txt
0 1 1 0 1
0 0 1 1 1
1 1 0 1 1
1 1 1 0 1
0 0 0 0 1
> 3.txt
0 1 1 1
1 0 0 0
0 0 0 0
1 0 0 0
The files are all either 4X4 or 5X5. They must be read in as matrices, as the data is for social network analyses. My goal is to automate the process of putting these matrices into a larger matrix, so that these matrices are directly diagonal to each other, and 0s inputted in the blank spaces within the larger matrix. In this case the final result would look like:
> mega_matrix
0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 1 0 0 0 0
0 0 0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 1 0 1 1 0 0 0 0
0 0 0 0 0 1 1 1 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0
Thank you!
You want bdiag from the Matrix package:
library(Matrix)
bdiag(matrix1, matrix2, matrix3)
And to do the whole directory (thanks to #user20650 in the comments) :
bdiag(lapply(dir(), function(x){as.matrix(read.table(x))}))
Here is how to reproduce my problem. I want to create a 3D array
> g=array(0,dim=c(3,31,31))
> dim(g)
[1] 3 31 31
> dim(g[1,,])
[1] 31 31
This is x with dimension 31 by 31
> dim(x)
[1] 31 31
> x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
2 0 NA 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
3 2 1 NA 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
4 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
7 0 0 0 0 0 1 NA 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 NA 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 1 1 0 0 0 1 0 0 NA 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0
11 0 0 0 0 0 0 0 0 2 0 NA 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 1 1 0 0 0 NA 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 1 1 1 0 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
16 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 NA 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
17 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 NA 0 0 0 0 0 0 1 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 NA 0 1 1 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 NA 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 NA 0 0 0 0 0 0 0 0 0 0
22 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 NA 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 NA 0 0 1 0 0 0 0
25 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0
26 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 NA 0 1 0 0
28 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0
29 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 NA 0 0
30 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0
31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA
when I try to assign x to the first 'section' of g using
> g[1,,] = x
The array structure of g is totally changed, as now it becomes:
> dim(g)
NULL
> head(g)
[[1]]
[1] NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
[[2]]
[1] 0
[[3]]
[1] 0
[[4]]
[1] 0 NA 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[[5]]
[1] 0
[[6]]
[1] 0
This is totally different from what I expected, I am just trying to put a matrix to g[1,,] and dim(g) should still be 3 by 31 by 31, am I wrong? where did I do wrong?
Thanks to Pascal's comments below I've amended my answer, though I've left the dimensionality changed to 31x31x3 for perhaps easier understanding. The problem is the way that the data are converted from your data.frame object before storing in your array. I think by converting first to a matrix you should get what you're looking for:
g <- array(0,dim=c(31,31,3))
m <- matrix(1, 31, 31)
x <- as.data.frame(m)
## Storing x as it is will result in g becoming a list...
#g[,,1] <- x
## Converting the data.frame to a matrix will result in
## g remaining an array:
g[,,1] <- as.matrix(x)
Using R I have a table, lets say 'locations'
head(locations, n=10)
apillar fender fwheel fdoor compart rdoor rwheel boot
1 0 0 0 0 0 0 0 1
2 0 0 0 1 0 0 0 0
3 0 0 0 0 1 0 0 0
4 0 1 0 0 0 0 0 0
5 1 0 1 0 0 0 0 0
6 1 0 0 1 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 1 0 0 0
9 0 0 0 1 0 0 0 0
10 0 0 0 0 0 1 0 0
now i want to create a new variable "cat" which groups the impacts into category locations.
I have been using if, elseif and else command, but I cannot get it to work.
The command is:
cat <- lapply(locations, function(x) if (apillar|fender|fwheel == 1)print("front") else if (fdoor|compart|rdoor == 1)print("middle") else if(rwheel|boot ==1)print("rear") else print("NA")
such that cat should read rear, middle, middle, middle, front etc
When vectors of TRUE or FALSE statements are involved, I usually prefer not to work with if to avoid loops. I find conditional referencing to be more elegant in this case. See below.
locations <- read.table(header=TRUE, text=
"apillar fender fwheel fdoor compart rdoor rwheel boot
1 0 0 0 0 0 0 0 1
2 0 0 0 1 0 0 0 0
3 0 0 0 0 1 0 0 0
4 0 1 0 0 0 0 0 0
5 1 0 1 0 0 0 0 0
6 1 0 0 1 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 1 0 0 0
9 0 0 0 1 0 0 0 0
10 0 0 0 0 0 1 0 0")
locations$cat <- NA
within(locations,{
cat[apillar|fender|fwheel] <- "front"
cat[fdoor|compart|rdoor] <- "middle"
cat[rwheel|boot] <- "rear"
})
Result:
apillar fender fwheel fdoor compart rdoor rwheel boot cat
1 0 0 0 0 0 0 0 1 rear
2 0 0 0 1 0 0 0 0 middle
3 0 0 0 0 1 0 0 0 middle
4 0 1 0 0 0 0 0 0 front
5 1 0 1 0 0 0 0 0 front
6 1 0 0 1 0 0 0 0 middle
7 0 0 0 0 0 0 0 0 <NA>
8 0 0 0 0 1 0 0 0 middle
9 0 0 0 1 0 0 0 0 middle
10 0 0 0 0 0 1 0 0 middle
Cheers!
Corrected your own code:
locations$cat= with(locations, ifelse(apillar|fender|fwheel, "front", ifelse(fdoor|compart|rdoor,"middle",ifelse(rwheel|boot, "rear", "NA"))) )
> locations
apillar fender fwheel fdoor compart rdoor rwheel boot cat
1 0 0 0 0 0 0 0 1 rear
2 0 0 0 1 0 0 0 0 middle
3 0 0 0 0 1 0 0 0 middle
4 0 1 0 0 0 0 0 0 front
5 1 0 1 0 0 0 0 0 front
6 1 0 0 1 0 0 0 0 front
7 0 0 0 0 0 0 0 0 NA
8 0 0 0 0 1 0 0 0 middle
9 0 0 0 1 0 0 0 0 middle
10 0 0 0 0 0 1 0 0 middle
>
I want to create a histogram from my data set of the frequency of students who have had broken bones. The values are either 0 or 1.
I.E:
[1] 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
[38] 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
[75] 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1
[112] 1 1 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0
[149] 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0
[186] 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0
[223] 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1
[260] 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0
[297] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1
[334] 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0
[371] 0 0 0 0 0 0 0 0 0 0 0 0
However the scale on the axis axis of the graph has increments of 0.2. I just want either 0 or 1 as the data is categorical. Would anyone please kindly tell me how to rectify this?
What you need is a combination of assigning the appropriate values to the breaks argument and the xaxp argument in ?hist. Consider:
# this just gives me your data:
my.data <- "
0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1
1 1 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0
0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0
0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1
0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0"
my.data <- unlist(strsplit(my.data, " "))
my.data <- gsub("\\n", "", my.data)
my.data <- as.numeric(my.data)
hist(my.data, breaks=c(-.5, .5, 1.5), xaxp=c(0,1,1))
breaks is used to define exactly 2 bins, and xaxp is used to change the number and placement of the tick marks on the x axis (for more on how xaxp works, see this excellent answer: R, change the spacing of tick marks on the axis of a plot?) Here is the resulting figure:
On a different note, it is not clear how informative a histogram is for data like this (or perhaps even ever, see: assessing-approximate-distribution-of-data-based-on-a-histogram on stats.SE). You might just was well try:
> table(my.data)
my.data
0 1
296 86