Plotting CDF of a dataset in R? - r

I am not really sure about the difference between CDF (Cumulative Distribution Function) and ECDF (Empirical Cumulative Distribution Function) but I usually utilize a CDF plot to make observations about my data.
I have been using R recently and am desperately trying to find out how to plot a CDF and CCDF (Complementary CDF) of my data. All I could find was that R has ecdf but am not really sure if this is what I am looking for. Plotting an ECDF is as simple as:
plot.ecdf(data)
Does anyone know how to plot a CDF and CCDF of a dataset using R?

A CDF commonly requires closed form when you know or assume a distribution. An ECDF, on the other hand, is 'empiricial' as it comes from your data. I just answered a question about using ecdf() and Hmisc's Ecdf() here the other day.
More generally, you can search here using terms such as
[r] ecdf
in the search box to look for 'ecdf' within the R tag. At rseek.org, little comes up for 'ccdf'. Is that maybe just the same as one minus the ECDF? If so, Ecdf() in Hmisc can do it.
I hope this helps, if not please re-phrase your question as it is not quite clear exactly what you are looking for. Both ecdf() and Ecdf() are pretty featureful so make sure to read their help pages.

Related

Generate data from an arbitrary multivariate continuous density function

I am trying to sample from a multivariate distribution given by a (quite complex, but continuous) density function in R. For the univariate case I used AbscontDistribution from the distr package, but I cannot make it work for the multivariate case.
I tried finding an appropriate package for this problem online, but cannot find one.
Any ideas?
Thanks! :)

ggadjustedcurves, fun="cumhaz" does not work

In the 'survminer' package I have been able to construct adjusted curves using cox model but this only shows me the surival function. When I try to input "events" or "cumhaz" into fun= option this only gives me the same survival function. I found this link
https://github.com/kassambara/survminer/issues/287
Wondering if anyone have any suggestions?
I took the advice of Chung30916 in the comment chain and used the following code
plotdata2<-plotdata%>%
mutate(cumhaz=1-surv)
to make a cumulative incidence curve but, forgive me for my inexperience, how do I proceed? Just plot the graph in ggplot2 using the strata (2 groups in my case) and the x will be the time whereas y will be the cumhaz?
Thanks

CCF - general problems

I am working on my bachelor thesis, where I want to look into the lagged cross-correlation of a timeseries of search query volumes (=x) to the price of bitcoin (=y).
I have already created several ccf-plots using the "ccf"-function in R .
See picture:
I saw in the description of R's acf-function that ccf only works with one y and one x series. I was wondering if someone knows a way to put several of those plots into one, especially since I can categorize positively correlated and negatively correlated ones.
Further I was wondering, the dashed-blue line representing the confidence value, but at what level? 0.05? 0.01?
These are two questions in one.
1. question: combine plots
This question has been asked before. Please look it up:
Combining plots created by R base, lattice, and ggplot2
Combine plots in R
2. question: confidence intervals in ccf-plot:
The plot gives you the confidence intervals. The manual advises caution with these even though it uses ci.type = "white" is default setting. This default bluntly adds some confidence based on the quantiles of a standard normal distribution. It does not take the statistical properties of your data into account. In my opinion it is altogether useless. The manual recommends ci.type = "ma". But that will only work for autocorrelations. If you try using it with cross-correlations, you will get a warning saying "can use ci.type=‘ma’ only if first lag is 0". When doing autocorrelations the function shifts the sequences from -k to +k and will allow the first lag to be zero. ccf does not.
Further support
I hope it is not against the code of conduct to offer further support.
The ccf function has some pecularities that aren't well explained in the manual. Since I had trouble with ccf myself I wrote it all down here for everybody.
Because I wanted meaningful confidence intervals I developed an improved version of 'ccf' (link to repository in case anyone is interested) myself. It offers confidence intervals. The ccf-object by the new function is compatible with the output by stats::ccf() but contains more information. Additional functions make it more useful.

R identifying type of frequency distribution

I am interested in frequency distributions that are not normally distributed.
If I have a frequency distributions table which is not normally distributed.
Is there a function or package that will identify the type of distribution for me?
You can use the fitdistr function (library MASS i think) and check for yourself if you find a 'fitting' distribution. However i suggest that you plot the function first and see how it looks like. This approach is generally not recommended as you always can use different parameters to fit a distribution and thus confuse one distribution with another. If you have found a suited distribution you should test it against data.
Edit: For instance a normal distribution may look like a poisson distribution. Fitting is in my oppinion only useful if you have enough random variables. Otherwise just draw variables from your data if you need to
You can always try to test whether a distribution is adequate for your data with QQ plot. If you have data that is dynamic, I would suggest that you use ECDF (Empirical Cumulative Distribution Function) which will give you more precise distributions as your data grows. You can use ECDF in R with the ecdf() function.

Integrate nonparametric curve in R

Just a warning, I started using R a day ago...my apologies if anything seems idiotically simple.
Right now im trying to have R take in a .txt file with acelerometer data of an impact and calculate a Head injury criterion test for it. The HIC test requires that curve from the data be integrated on a certain interval.
The equation is at the link below...i tried to insert it here as an image but it would not let me. Apparently i need some reputation points before it'll let me do that.
a(t) is the aceleration curve.
So far i have not had an issue generating a suitable curve in R to match the data. The loess function worked quite well, and is exactly the kind of thing i was looking for...i just have no idea how to integrate it. As far as i can tell, loess is a non-parametric regression so there is no way to determine the equation of the curve iteslf. Is there a way to integrate it though?
If not, is there another way to acomplish this task using a different function?
Any help or insighful comments would be very much appreciated.
Thanks in advance,
Wes
One more question though James, how can i just get the number without the text and error when using the integrate() function?
You can use the predict function on your loess model to create a function to use with integrate.
# using the inbuilt dataset "pressure"
plot(pressure,type="l")
# create loess object and prediction function
l <- loess(pressure~temperature,pressure)
f <- function(x) predict(l,newdata=x)
# perform integration
integrate(f,0,360)
40176.5 with absolute error < 4.6
And to extract the value alone:
integrate(f,0,360)$value
[1] 40176.5

Resources