I'm new to R and have this question. As mentioned in the title, I have a distribution of reported dice number from students. In this task, they are given a dice with 6 faces (from 1-6) and are asked to throw it in private. The data are plotted as in the picture.
However, I wonder if it's possible that I can use this data to simulate the situation where they are given a dice with 10 faces instead (from 1-10)? How can I achieve this in R?
Ok second attempt if you want to use your existing six-sided die data. I use the snpackage to fit a skewed normal distribution to your existing data and then scale it to represent a ten-sided die and make it discrete using round.
First I will simulate your data
set.seed(9999)
n=112
a = rnorm( 42, 3, 1 )
b = rnorm( 70, 5, 0.5 )
dat = round(c( a, b))
dat[!(dat %in% 1:6)] = NA
dat=dat[complete.cases(dat)]
hist(dat,breaks = seq(0.5, 6.5,1), col = rgb(0,0,1,0.25))
Just set dat as your existing data if you want.
Now to parametise the distribution using the sn package. (You can try to fit other distributions if you prefer)
require(sn)
cp.est = sn.mple(y=dat,opt.method = "nlminb")$cp
dp.est = cp2dp(cp.est,family="SN")
##example to sample from the distribution and compare to existing
sim = rsn(n, xi=dp.est[1], omega=dp.est[2], alpha=dp.est[3])
sim = round(sim)
sim[!(sim %in% 1:6)] = NA
hist(sim,breaks = seq(0.5, 6.5,1), col = rgb(1,0,0,0.25), add=T)
Now scale the distribution to represent a ten-sided die.
sim = rsn(n, xi=dp.est[1], omega=dp.est[2], alpha=dp.est[3])/6*10
sim <- round(sim)
sim[!(sim %in% 1:10)] = NA
hist(sim,breaks = seq(0.5, 10.5,1), col = rgb(0,1,0,0.25))
To simulate 112 students rolling a ten-sided die and plotting the results in histogram:
n=112
res = sample(1:10, size = n, replace = T)
hist(res)
Related
I want to draw a graph of the mean by a value.
It's a statistic graph, and for this I made a list of each grade.
It used the "list" function to bring in each score and the average of each value using the "lapply" function.
But I tried to use it as a 'barlpot','beside =T' but the graph was not made properly.
I don't know if the graph format is wrong or what kind of mistake I made.
Code
list01 <-list(Sci=df_01$Sci,Eng = df_01$Eng,Math = df_01$Math)
C01_gd = lapply(list01,mean)
as.matrix(C01_gd)
barplot(as.matrix(C01_gd) ,border="white",beside = T)
Here's a tentative solution (tentative, as you have not provided a sample of your data).
Some reproducible data for illustration:
set.seed(12)
df <- data.frame(
Sci = sample(1:6, 100, replace = T),
Eng = sample(1:6, 100, replace = T),
Math = sample(1:6, 100, replace = T)
)
The calculation of the means could not be simpler using apply:
means <- apply(df, 2, mean)
And drawing the barplot is not rocket science either:
barplot(means)
I am having some trouble with a homework I have at Statistics.
I am required to graphical represent the density and the distribution function in two inline plots for a set of parameters at my choice ( there must be minimum 4 ) for Student, Fisher and ChiS repartitions.
Let's take only the example of Student Repartition.
From what I have searched on the internet, I have come with this:
First, I need to generate some random values.
x <- rnorm( 20, 0, 1 )
Question 1: I need to generate 4 of this?
Then I have to plot these values with:
plot(dt( x, df = 1))
plot(pt( x, df = 1))
But, how to do this for four set of parameters? They should be represented in the same plot.
Is this the good approach to what I came so far?
Please, tell me if I'm wrong.
To plot several densities of a certain distribution, you have to first have a support vector, in this case x below.
Then compute the values of the densities with the parameters of your choice.
Then plot them.
In the code that follows, I will plot 4 Sudent-t pdf's, with degrees of freedom 1 to 4.
x <- seq(-5, 5, by = 0.01) # The support vector
y <- sapply(1:4, function(d) dt(x, df = d))
# Open an empty plot first
plot(1, type = "n", xlim = c(-5, 5), ylim = c(0, 0.5))
for(i in 1:4){
lines(x, y[, i], col = i)
}
Then you can make the graph prettier, by adding a main title, changing the axis titles, etc.
If you want other distributions, such as the F or Chi-squared, you will use x strictly positive, for instance x <- seq(0.0001, 10, by = 0.01).
I'm a beginner in R and I followed this tutorial on K-means clustering. However, I'm trying to run this algorithm on real data. I chose : http://exoplanet.eu/catalog/
I have loaded data :
d <- read.csv2(
"exoplanet.eu_catalog.csv",
header = TRUE,
sep = ","
)
With this code :
plot(
x = log(as.numeric(as.character(d$semi_major_axis))),
y = log(as.numeric(as.character(d$mass))),
xlab = "Star-exoplanet distance (log(UA))",
ylab = "Mass of exoplanets (log(M[Jupiter]))"
)
I have the following graphic :
I'd like to run the K-means clustering algorithm on this graphic to show three clusters with colors but I don't know how to proceed in R. I suppose I have to begin with :
y = log(as.numeric(as.character(d$mass)))
y <- y[!is.na(y)]
x = log(as.numeric(as.character(d$semi_major_axis)))
x <- x[!is.na(x)]
But I don't know how to format data into a matrix in order to run kmeans(matrix, 3, nstart = 20). Any clue please ?
Since you read your file using
d <- read.csv2("exoplanet.eu_catalog.csv",
header = TRUE,
sep = ",")
Your data is in the form of data frame and you need to convert as a matrix
Use this code to convert a data frame into matrix
inMatrixForm <- data.matrix(d)
Let's say I generate 9 groups of data in a list data and plot them each with a for loop. I could use *apply here too, whichever you prefer.
data = list()
layout(mat = matrix(1:9, nrow = 3))
for(i in 1:9){
data[[i]] = rnorm(n = 100, mean = i, sd = 1)
plot(data[[i]])
}
After creating all the data, I want to decide which one is best:
best_data = which.min(sapply(data, sd))
Now I want to highlight that best data on the plot to distinguish it. Is there a plotting function that lets me go back to a specified sub-plot in the active device and add an element (maybe a title)?
I know I could make a second for loop: for loop 1 generates the data, then I assess which is best, then for loop 2 creates the plots, but this seems less efficient and more verbose.
Does such a plotting function exist for base R graphics?
#rawr's answer is simple and easy. But I thought I'd point out another option that allows you to select the "best" data set before you plot, in case you want more flexibility to plot the "best" data set differently from the rest.
For example:
# Create the data
data = lapply(1:9, function(i) rnorm(n = 100, mean = i, sd = 1))
par(mar=c(4,4,1,1))
layout(mat = matrix(1:9, nrow = 3))
rng = range(data)
# Plot each data frame
lapply(1:9, function(i) {
# Select data frame with lowest SD
best = which.min(sapply(data, sd))
# Highlight data frame with lowest SD by coloring points red
plot(data[[i]], col=ifelse(best==i,"red","black"), pch=ifelse(best==i, 3, 1), ylim=rng)
})
I have a large data set which I would like to make a 3D surface from. I would like the x-axis to be the date, the y-axis to be the time (24h) and the z-axis (height) to be a value I have ($). I am a beginner with R, so the simpler the better!
http://www.quantmod.com/examples/chartSeries3d/ has a nice example, but the code is way to complicated for my skill level!
Any help would be much appreciated - anything I have researched so far needs to have the data sorted, which is not suitable I think.
Several options present themselves, persp() and wireframe(), the latter in package lattice.
First some dummy data:
set.seed(3)
dat <- data.frame(Dates = rep(seq(Sys.Date(), Sys.Date() + 9, by = 1),
each = 24),
Times = rep(0:23, times = 10),
Value = rep(c(0:12,11:1), times = 10) + rnorm(240))
persp() needs the data as the x and y grid locations and a matrix z of observations.
new.dates <- with(dat, sort(unique(Dates)))
new.times <- with(dat, sort(unique(Times)))
new.values <- with(dat, matrix(Value, nrow = 10, ncol = 24, byrow = TRUE))
and can be plotted using:
persp(new.dates, new.times, new.values, ticktype = "detailed", r = 10,
theta = 35, scale = FALSE)
The facets can be coloured using the col argument. You could do a lot worse than study the code for chartSeries3d0() at the page you linked to. Most of the code is just drawing proper axes as neither persp() nor wireframe() handle Date objects easily.
As for wireframe(), we
require(lattice)
wireframe(Value ~ as.numeric(Dates) + Times, data = dat, drape = TRUE)
You'll need to do a bit or work to sort out the axis labelling as wireframe() doesn't work with objects of class "Date" at the moment (hence the cast as numeric).