Plot points for every 15 minutes - r

I have a text file having the numbers(of float type) which represents time in seconds. I wish to represent the number of occurances every 15 minutes. The sample of my file is:
0.128766
2.888977
25.087900
102.787657
400.654768
879.090874
903.786754
1367.098789
1456.678567
1786.564569
1909.567567
for first 900 seconds(15 minutes), there are 6 occurances. I want to plot that point on y axis first. Then from 900-1800(next 15 minutes), there are 4 occurances. So, i want to plot 4 on my y-axis next. This should go on...
I know the basic plot() function, but i don't know how to plot every 15 minutes. If there is a link present, please guide me to that link.

Use findInterval():
counts <- table(findInterval(x, seq(0, max(x), 900)))
counts
1 2 3
6 4 1
It's easy to plot:
plot(counts)

To build on Andrie's answer. You can add plot(counts, type = 'p') to plot points or plot(counts, type = 'l') to plot a connected line. If you want to plot a curve for the counts you would need to model it using ?lm or ?nls.

Related

Is there a way to get around the inability to add measures as constant lines for Power BI scatter plots?

So I'd like to use a calculated or referenced value from another table as a y constant line in Power BI. I know there's no default way of doing it but I was wondering if there was a workaround. I have this:
And I want this:
The key is how to I get it to reference a value in another table or calculated column as that constant line since adding a measure isn't a feature right now. Thank you
You can add another chart (Line and Stacked Column Chart), include your metric in line values. Remove the background, changed to off the X and Y axis.
I suggest to use R or Python visual.
In that case, make sure to define the measure for horizontal line to ignore the filter context of X-axis dimension (use ALLSELECTED).
With this table (Tabell):
X Y Z
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
And this measure: Measure = CALCULATE(AVERAGE(Tabell[X]);ALLSELECTED(Tabell))+1
A scatter plot with a constant line that always is on average of X+1 can be created with this Python-script:
# dataset = pandas.DataFrame(X, Y, Z)
# dataset = dataset.drop_duplicates()
# Klistra in eller skriv in din skriptkod här:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# zorder is to make the dots above the grid
ax.scatter(dataset.Y, dataset.X, s=dataset.Z*100, c='grey',zorder=10)
# this shows a constant line between (0,Measure) and (10,Measure)
ax.plot([0,10],[max(dataset.Measure), max(dataset.Measure)], c='r', linestyle='--')
# this is to set the limit of the axes so the horisontal line doesnt decide the chart size
ax.set_xlim([0,6])
# to set the ticks yourself
ax.set_yticks(range(0,6,1))
# display a thin grid
plt.grid(True, lw=.3)
plt.show()
Result:

Identifying data points amongst background noise for binned data R

Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")
Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.

plot more than one confidence interval in one plot at a particular order[SAS]

I need to plot more than one confidence interval in one plot at a particular order.
For example, my data is:
N Est. Lower Upper
1 5 3 6
2 1 0 4
3 3 0 7
I use the following command to plot:
proc sgplot data=confidence;
scatter y=N x=est. / xerrorlower=lower xerrorupper=upper
markerattrs=(symbol=circlefilled size=9);
run;
SAS will always plot the confidence interval at the order of N from 1 to 3. However, I need to show a trend of est. change. i.e the order I need is N=2 at first followed by N=3 and N=1 corresponding to est. = 1 3 5. Even after sorted by est., SAS still do the same things. I know I can sort and add an new order to my data to make the result I want, but I still want to show the correct N in my final plot to tell me the number of my confidence interval. Thanks.
You can request a discrete vertical axis, and specify the ordering method using the yaxis statement:
yaxis discreteorder = data type = discrete;
This will tell SAS to ignore the values in N and display them based on the order in which they are read from the dataset. You will have to sort your data in advance.

Connecting grouped dots/points on a scatter plot based on distance

I have 2 sets of depth point measurements, for example:
> a
depth value
1 2 2
2 4 3
3 6 4
4 8 5
5 16 40
6 18 45
7 20 58
> b
depth value
1 10 10
2 12 20
3 14 35
I want to show both groups in one figure plotted with depth and with different symbols as you can see here
plot(a$value, a$depth, type='b', col='green', pch=15)
points(b$value, b$depth, type='b', col='red', pch=14)
The plot seems okay, but the annoying part is that the green symbols are all connected (though I want connected lines also). I want connection only when one group has a continued data points at 2 m interval i.e. the symbols should be connected with a line from 2 to 8 m (green) and then group B symbols should be connected from 10-14 m (red) and again group A symbols should be connected (green), which means I do NOT want to see the connection between 8 m sample with the 16 m for group A.
An easy solution may be dividing the group A into two parts (say, A-shallow and A-deep) and then plotting A-shallow, B, and A-deep separately. But this is completely impractical because I have thousands of data points with hundreds of groups i.e. I have to produce many depth profiles. Therefore, there has to be a way to program so that dots are NOT connected beyond a prescribed frequency/depth interval (e.g. 2 m in this case) for a particular group of samples. Any idea?
If plot or lines encounters and NA value, it will automatically break the line. Using that, we can insert NA values for missing measurements in your data and that would fix the problem. One way is this
rng<-range(range(a$depth), range(b$depth))
rng<-seq(rng[1], rng[2], by=2)
aa<-rep(NA, length(rng))
aa[match(a$depth, rng)]<-a$value
bb<-rep(NA, length(rng))
bb[match(b$depth, rng)]<-b$value
plot(aa, rng, type='b', col='green', pch=15)
points(bb, rng, type='b', col='red', pch=14)
Which produces
Note that this code assumes that all depth measurements are evenly divisible by 2.
I'm not sure if you really have separate data.frames for all of your groups, but there may be better ways to fill in missing values depending on your real data structure.
We can use the fact that lines will but breaks in when there is a NA, like MrFlick suggests. There might be a simpler way, though:
#Merge the two sets together
all = merge(a,b,by='depth', all=T)
#Plot the lines
plot(all$value.x, all$depth, type='b', col='green', pch=15)
points(all$value.y, all$depth, type='b', col='red', pch=14)

Box plot in r: 3 time points, 2 treatments with 2 factors on the same graph

I would like to show a simple box plot of data from 3 time points (0, 7 and 28) against abundance. I want to split the plots into treatment (i.e. CO2 level/Temperature) which will be nested within. Essentially I have 2 box plots per time point indicating the 2 different treatments. I Was going to use an overlay but because I have 2 box plots for each time point I am finding it tricky to work out the correct code.
Thanks

Resources