R hist right/left clump binning - r

I have a data set of length 15,000 with real values from 0 to 100. My data set is HEAVILY skewed to the left. I'm trying to accomplish the following bins: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, >10. What i have done so far is created the following:
breakvector = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100)
and have run:
hist(datavector, breaks=breakvector, xlim=(0, 13))
However, it seems like this results in a histogram where data greater than 13 aren't included. Does anyone have any idea on how to get R to bin all the rest of the data in the last bin. Thanks in advance.

How about this
datavector<-c(sample(1:9, 40, replace=T), sample(10:100, 20, replace=T))
breakvector <- c(0:11)
hist(ifelse(datavector>10,11,datavector), breaks=breakvector, xlim=c(0, 13), xaxt="n")
axis(1, at=1:11-.5, labels=c(1:10, ">10"))
Rather than adjusting the breaks, i just throw all the values >10 into a bin for 11. Then i update the axis accordingly.

Related

How might you determine how well distributed a set of data is?

I have two datasets which contains a distrbution of 90 data points into 2 and 4 groups/rows and I would like to determine which one out of the two has better distributed the data and plot the result to visually see which one has done this. Better distribution means which one has made it so each group has a similar/same number of data. For example, we can see that the result of Grouped 2 the second group contains larger values for each column compared to the first column so 1 of the 2 groups contains larger values which means its not well distributed among the 2 groups.
I quite new to R so I am unsure how I could go about doing this. Would appreciate any insight into what approach could be used.
R
Grouped into 4
Values <- matrix(c(1, 6, 3, 6, 6, 8,
3, 3, 5, 3, 3, 3,
6, 7, 6, 7, 5, 4,
9, 4, 4, 5, 5, 3), nrow = 4, ncol = 6, byrow = TRUE)
Grouped into 2
Values <- matrix(c(3, 6, 4, 3, 4, 6,
12, 9, 12, 12, 11, 9), nrow = 2, ncol = 6, byrow = TRUE)
You can do this with some basic statistics, using hypothesis testing i.e. testing whether the two groups are statistically different or not. The stats package in R has a lot of tests that you can try and use, each with its own assumptions. Here is one:
Making the matrix
values <- matrix(c(3, 6, 4, 3, 4, 6,
12, 9, 12, 12, 11, 9), nrow = 2, ncol = 6, byrow = TRUE)
Conducting t-test
t.test(values[1, ], values[2, ], paired = FALSE)
Will give you this:
Welch Two Sample t-test
data: values[1, ] and values[2, ]
t = -7.9279, df = 9.945, p-value = 1.318e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.328203 -4.671797
sample estimates:
mean of x mean of y
4.333333 10.833333
The means of values[1, ] is smaller than values[2, ], with a p-value of 1.3e-05.

Histogram with R

I have a vector of calls made on each days of a certain month.
callsperDayforMonth <- c(3, 1, 2, 1, 1, 3, 9, 1, 4, 2, 6, 4, 9, 13, 15, 2, 5, 5, 2, 7, 3, 0, 1, 2, 7, 1, 8, 6, 9, 4)
I also have a vector of factors which spans the range of the "callsperDayforMonth" vector.
"0-2" "3-5" "6-8" "9-11" "12-14" "16+"
I need to create a histogram, with the factors on the horizontal axis.
How can this be done.
The hist command has an argument breaks that can be a vector of the breakpoints to be used. That should do what you want.
Or you could use table and cut to do the counts yourself and create a barplot from the result.
For example:
library(ggplot2)
cuts <- cut(callsperDayforMonth,
breaks = c(-Inf,2, 5, 8, 11, 14, 16, Inf),
labels = c("0-2", "3-5", "6-8", "9-11", "12-14", "15-16", "16+"))
df <- data.frame(cuts, callsperDayforMonth)
ggplot(df, aes(x=cuts)) + geom_bar(stat = "count")

Subsample a matrix by selection locations with specific values within a matrix in R

I'm have to use R instead of Matlab and I'm new to it.
I have a large array of data repeating like 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
I need to find the locations where values equal to 1, 4, 7, 10 are found to create a sample using those locations.
In this case it will be position(=corresponding value) 1(=1) 4(=4) 7(=7) 10(=10) 11(=1) 14(=4) 17(=7) 20(=10) and so on.
in MatLab it would be y=find(ismember(x,[1, 4, 7, 10 ])),
Please, help! Thanks, Pavel
something like this?
foo <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
bar <- c(1, 4, 7, 10)
which(foo %in% bar)
#> [1] 1 4 7 10 11 14 17 20
#nicola, feel free to copy my answer and get the recognition for your answer, simply trying to close answered questions.
The %in% operator is what you want. For example,
# data in x
targets <- c(1, 4, 7, 10)
locations <- x %in% targets
# locations is a logical vector you can then use:
y <- x[locations]
There'll be an extra step or two if you wanted the row and column indices of the locations, but it's not clear if you do. (Note, the logicals will be in column order).

highcharts display all dates for all points on x-axis

I have a spline graph with date time on x-axis. i want the date time to be automatic but also display it for all points on the x-axis. Take the demo for example - i want all the points to be display with their dates on the x-axis. how do i do this? the demo is found here . I have not tried anything because i do not know how to, that is why I am asking on here. why it is complaining to not let me post still i do not know why.
You can use tickPositions like in the example: http://jsfiddle.net/z5P8d/
tickPositions: [Date.UTC(1970,9, 27),Date.UTC(1970, 9, 26),Date.UTC(1970, 11, 1),Date.UTC(1970, 11, 11),Date.UTC(1970, 11, 25), Date.UTC(1971, 0, 8),Date.UTC(1971, 0, 15), Date.UTC(1971, 1, 1),Date.UTC(1971, 1, 8), Date.UTC(1971, 1, 21),Date.UTC(1971, 2, 12), Date.UTC(1971, 2, 25),Date.UTC(1971, 3, 4), Date.UTC(1971, 3, 9),Date.UTC(1971, 3, 13), Date.UTC(1971, 3, 19), Date.UTC(1971, 4, 25),Date.UTC(1971, 4, 31), Date.UTC(1971, 5, 7) ],

How to create a cuvilinear line segment with loess and lines using R

I am trying to fit a curved line segment to a dataset. While I can create the line it is always connected back to the starting point. I can't figure out how to get rid of this. I would really appreciate any help. Here is the code
mscF25=c(-12.94382785, -11.0281518, -9.186403952, -7.691576905, -6.470229134, -5.43000796, -4.559074508, -12.87271022, -10.0646268, -6.796208225, -4.433351598, -2.928135666, -1.979265556, -1.38936463, -11.05819006, -7.785838826, -5.297330858, -3.674159165, -2.64702678, -1.980973252, -1.533714976, -11.83971039, -9.168353808, -6.89192172, -5.23424594, -4.033326594, -3.148798626, -2.480469911)
bscF25=c(4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10)
df25 <- data.frame(bscF25,mscF25)
plot(mscF25 ~ bscF25, data = df25)
ls25 <- loess(mscF25 ~ bscF25, data = df25, span = 3)
lines(df25$bscF25, ls25$fitted)
You might try the scatter.smooth function: "Plot and add a smooth curve computed by loess to a scatter plot"
scatter.smooth(x = df25$bscF25, y = df25$mscF25, span = 3)

Resources