R: Grouping data in a plot - r

I have a time series and I have reduced this serie with a transformation. For example
The original time serie:
T=(12,13,14,20,65,78,85,35)
and transformed one is:
T'=(17.22009 27.96722 111.16376 71.33732)
and now I want to have such a diagram, in its x-axis I have 8 values but for each 2 values one value from T' . I can do sme thing like this in R:
but in the second Plot I want to extend the the diagram on 8 values too

Assuming T' is called Tc in R you fix the lower one by
plot(0:length(Tc)*2, c(Tc, tail(Tc,1)), type="s")
The additional element added by tail is needed for drawing the last segment, from 6 to 8.
Update
If you just want to stretch the second plot to go between 1 and 8, you can do
plot(seq(1, 2*length(Tc), length.out=length(Tc)+1), c(Tc, tail(Tc,1)), type="s")
However, I take it that each value of the second plot corresponds to two values of the upper plot, so perhaps the best way to visualize it then would be
barplot(Tc, width=2, space=0)
lines(seq(Tb)-.5, Tb, type="b", lwd=2)

Related

Reducing number of datapoints when plotting in loglog scale in Gnuplot

I have a large dataset which I need to plot in loglog scale in Gnuplot, like this:
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512)
LogLogPlot of my datapoints
Text file with the datapoints
Datapoints on the x axis are equally spaced, but because of the logscale they get very dense on the right part of the graph, and as a result the output file (I finally export it in .tex) gets very large.
In linear scale, I would simply use the option every to reduce the number of points which get plotted. Is there a similar option for loglogscale, such that the plotted points appear equally spaced?
I am aware of a similar question which was raised a few years ago, but in my opinion the solution is unsatisfactory: plotted points are not equally spaced along the x-axis. I think this is a really unsophisticated problem which deserves a clearer solution.
As I understand it, you don't want to plot the actual data points; you just want to plot a line through them. But you want to keep the appearance of points rather than a line. Is that right?
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512) with lines dashtype '.' lw 2
Amended answer
If it is important to present outliers/errors in the data set then you must not use every or any other technique that simply discards or skips most of the data points. In that case I would prefer the plot with points that you show in the original question, perhaps modified to represent each point as a dot rather than a cross. I will simulate this by modifying a single point in your 500000 point data set (first figure below). But I would also suggest that the presence of outliers is even more apparent if you plot with lines (second figure below).
Showing error bounds is another alternative for noisy data, but the options depend on what you have to work with in your data set. If you want to pursue that, please ask a separate question.
If you really want to reduce the number of data to be plotted, you might consider the following script.
s = 0.1 ### sampling interval in log scale
### (try 0.05 for more detail)
c = log10(0.01) ### a parameter used in sampler(x)
### which should be initialized by
### smaller value than any x in log scale
sampler(x) = (x>0 && log10(x)>=c) ? (c=ceil(log10(x)/s+0.5)*s, x) : NaN
set log xy
set grid xtics
plot 'A_1D_l0.25_L1024_r0.dat' using (sampler($1)):($2-512) with points pt 7 lt 1 notitle , \
'A_1D_l0.25_L1024_r0.dat' using 1:($2-512) with lines lt 1 notitle
This script samples the data in increments of roughly 0.1 on x-axis in log scale. It makes use of the property that points whose x value is evaluated as NaN in using are not drawn.

I wonder is histogram is appropriate for this kind of data and how to make graph in R Studio that i want

Hi ~ I'm try to make graph which has sample mean on x-axis and
relative frequency(?) on y-axis
to make sure i will give example!
for example when i pick 1sample from c(1,2,3,4,5)
the possible result will be 1 and 2 and 3 and 4 and5
in that case the relative frequency is 1/5 each !!
so in this case my graph will show 1,2,3,4,5 on x-axis
0.2 for y -axis (because they are same in 1/5)
and if i pick 2sample from c(1,2,3,4,5) case would be
(1,2) and (1,3), (1,4), (1,5) (2,3)..... and so on (total 10cases)
so sample mean would be (1+2)/2=1.5 .. (1+3)/2=2 .... etc
so in this case x value will be 1.5, 2 ... etc and y value will
1/10 1/10 ...
so, My question is, is histogram is appropriate for this graph??
i want to plot which have sample mean on x -axis, relative frequency on y-axis
and make a line that connect a dot
sorry for too long question
thanks for reading!!
Yes, it's entirely appropriate to plot a histogram of sample means. This is an example of a sampling distribution.
To do this, you would create an object that contains the sample means, and then just plot a histogram of that object as you would with any other histogram. The value of the sample means would be on the x axis, and frequency or relative frequency on the y axis. You would have to choose an appropriate bin number and breaks vector for your purpose, but it's the same as any other histogram.

R - locate intersection of two curves

There are a number of questions in this forum on locating intersections between a fitted model and some raw data. However, in my case, I am in an early stage project where I am still evaluating data.
To begin with, I have created a data frame that contains a ratio value whose ideal value should be 1.0. I have plotted the data frame and also used abline() function to plot a horizontal line at y=1.0. This horizontal line and the plot of ratios intersect at some point.
plot(a$TIME.STAMP, a$PROCESS.RATIO,
xlab='Time (5s)',
ylab='Process ratio',
col='darkolivegreen',
type='l')
abline(h=1.0,col='red')
My aim is to locate the intersection point, say x and draw two vertical lines at x±k, as abline(v=x-k) and abline(v=x+k) where, k is certain band of tolerance.
Applying a grid on the plot is not really an option because this plot will be a part of a multi-panel plot. And, because ratio data is very tightly laid out, the plot will not be too readable. Finally, the x±k will be quite valuable in my discussions with the domain experts.
Can you please guide me how to achieve this?
Here are two solutions. The first one uses locator() and will be useful if you do not have too many charts to produce:
x <- 1:5
y <- log(1:5)
df1 <-data.frame(x= 1:5,y=log(1:5))
k <-0.5
plot(df1,type="o",lwd=2)
abline(h=1, col="red")
locator()
By clicking on the intersection (and stopping the locator top left of the chart), you will get the intersection:
> locator()
$x
[1] 2.765327
$y
[1] 1.002495
You would then add abline(v=2.765327).
If you need a more programmable way of finding the intersection, we will have to estimate the function of your data. Unfortunately, you haven’t provided us with PROCESS.RATIO, so we can only guess what your data looks like. Hopefully, the data is smooth. Here’s a solution that should work with nonlinear data. As you can see in the previous chart, all R does is draw a line between the dots. So, we have to fit a curve in there. Here I’m fitting the data with a polynomial of order 2. If your data is less linear, you can try increasing the order (2 here). If your data is linear, use a simple lm.
fit <-lm(y~poly(x,2))
newx <-data.frame(x=seq(0,5,0.01))
fitline = predict(fit, newdata=newx)
est <-data.frame(newx,fitline)
plot(df1,type="o",lwd=2)
abline(h=1, col="red")
lines(est, col="blue",lwd=2)
Using this fitted curve, we can then find the closest point to y=1. Once we have that point, we can draw vertical lines at the intersection and at +/-k.
cross <-est[which.min(abs(1-est$fitline)),] #find closest to 1
plot(df1,type="o",lwd=2)
abline(h=1)
abline(v=cross[1], col="green")
abline(v=cross[1]-k, col="purple")
abline(v=cross[1]+k, col="purple")

R - draw line between two specific values in plot in R

I need to draw a line between two specific values from a plot in R. That's what I want. If it is possible to draw a line between those two consecutive values which the difference between values is higher than 3. Else, draw it knowing the values from the dataset. Also, I would like to add a number under or above the line. Thanks.
Here the link where you can find the image "ImageR.png"
https://www.dropbox.com/sh/blnr3jvius8f3eh/AACOhqyzZGiDHAOPmyE__873a?dl=0
Something like this should do it. You may have to play with pos and offset in text to get it to look good on your data.
x <- rnorm(20, sd=3)
d <- diff(x)
plot(x)
for (i in which(d>3)) {
lines(c(i,i+1), x[i:(i+1)])
text(i+.5, mean(x[i:(i+1)]), round(d[i],1), pos=2)
}

R question about plotting probability/density histogram the right way

I have a following matrix [500,2], so we have 500 rows and 2 columns, the left one gives us the index of X observations, and the right one gives the probability with which this X comes true, so - a typical probability density relationship.
So, my question is, how to plot the histogram the right way, so that the x-axis is the x-index, and the y-axis is the density(0.01-1.00). The bandwidth of the estimator is 0.33.
Thanks in advance!
the end of the whole data looks like this: just for a little orientation
[490,] 2.338260830 0.04858685
[491,] 2.347839477 0.04797310
[492,] 2.357418125 0.04736149
[493,] 2.366996772 0.04675206
[494,] 2.376575419 0.04614482
[495,] 2.386154067 0.04553980
[496,] 2.395732714 0.04493702
[497,] 2.405311361 0.04433653
[498,] 2.414890008 0.04373835
[499,] 2.424468656 0.04314252
[500,] 2.434047303 0.04254907
#everyone,
yes, I have made the estimation before, so.. the bandwith is what I mentioned, the data is ordered from low to high values, so respecively the probability at the beginning is 0,22, at the peak about 0,48, at the end 0,15.
The line with the density is plotted like a charm but I have to do in addition is to plot a histogram! So, how I can do this, ordering the blocks properly(ho the data to be splitted in boxes etc..)
Any suggestions?
Here is a part of the data AFTER the estimation, all values are discrete, so I assume histogram can be created.., hopefully.
[491,] 4.956164 0.2618131
[492,] 4.963014 0.2608723
[493,] 4.969863 0.2599309
[494,] 4.976712 0.2589889
[495,] 4.983562 0.2580464
[496,] 4.990411 0.2571034
[497,] 4.997260 0.2561599
[498,] 5.004110 0.2552159
[499,] 5.010959 0.2542716
[500,] 5.017808 0.2533268
[501,] 5.024658 0.2523817
Best regards,
appreciate the fast responses!(bow)
What will do the job is to create a histogram just for the indexes, grouping them in a way x25/x50 each, for instance...and compute the average probability for each 25 or 50/100/150/200/250 etc as boxes..?
Assuming the rows are in order from lowest to highest value of x, as they appear to be, you can use the default plot command, the only change you need is the type:
plot(your.data, type = 'l')
EDIT:
Ok, I'm not sure this is better than the density plot, but it can be done:
x = dnorm(seq(-1, 1, length = 500))
x.bins = rep(1:50, each = 10)
bars = aggregate(x, by = list(x.bins), FUN = sum)[,2]
barplot(bars)
In your case, replace x with the probabilities from the second column of your matrix.
EDIT2:
On second thought, this only makes sense if your 500 rows represent discrete events. If they are instead points along a continuous distribution function adding them together as I have done is incorrect. Mathematically I don't think you can produce the binned probability for a range using only a few points from within that range.
Assuming M is the matrix. wouldn't this just be :
plot(x=M[ , 1], y = M[ , 2] )
You have already done the density estimation since this is not the original data.

Resources