Instantaneous velocity on R studio - r

(Rstudio) suppose I have a data set of:
# Circle X Y
1 A 21 8
2 A 32 17
3 A 23 32
4 B 22 4
5 B 43 12
6 C 12 4
.....
I need to find the instantaneous velocity of each circle at each time frame.
For line 1 is the starting point so the velocity is 0, and the formula I want to achieve for each circle's (X, Y) coordinates is sqrt(((x2-x1)^2 + (y2-y1)^2)/2)) where the x2 and x1 is from the previous line (e.g. line 1 & line 2, Line 2 & line 3). the final result I want to have is as below:
# Circle X Y Instant velocity
1 A 21 8 0
2 A 32 17 sqrt(((32-21)^2 + (17-8)^2)/2))
3 A 23 32 sqrt(((23-32)^2 + (32-17)^2)/2))
4 B 22 4 0
5 B 43 12 sqrt(((43-22)^2 + (12-4)^2)/2))
6 C 12 4 0
.....
Could anyone help me in achieving this on Rstudio???

You have one more ) than ( in your code example, which makes me a bit confused about where the /2 goes, but if you verify my syntax something like this should work:
library(dplyr)
your_data %>%
group_by(Circle) %>%
mutate(
instant_velocity = coalesce(sqrt(((x - lag(x))^2 + (y - lag(y))^2)/2), 0)
)

Related

R: Find out which observations are located in each "bar" of the histogram

I am working with the R programming language. Suppose I have the following data:
a = rnorm(1000,10,1)
b = rnorm(200,3,1)
c = rnorm(200,13,1)
d = c(a,b,c)
index <- 1:1400
my_data = data.frame(index,d)
I can make the following histograms of the same data by adjusting the "bin" length (via the "breaks" option):
hist(my_data, breaks = 10, main = "Histogram #1, Breaks = 10")
hist(my_data, breaks = 100, main = "Histogram #2, Breaks = 100")
hist(my_data, breaks = 5, main = "Histogram #3, Breaks = 5")
My Question: In each one of these histograms there are a different number of "bars" (i.e. bins). For example, in the first histogram there are 8 bars and in the third histogram there are 4 bars. For each one of these histograms, is there a way to find out which observations (from the original file "d") are located in each bar?
Right now, I am trying to manually do this, e.g. (for histogram #3)
histogram3_bar1 <- my_data[which(my_data$d < 5 & my_data$d > 0), ]
histogram3_bar2 <- my_data[which(my_data$d < 10 & my_data$d > 5), ]
histogram3_bar3 <- my_data[which(my_data$d < 15 & my_data$d > 10), ]
histogram3_bar4 <- my_data[which(my_data$d < 15 & my_data$d > 20), ]
head(histogram3_bar1)
index d
1001 1001 4.156393
1002 1002 3.358958
1003 1003 1.605904
1004 1004 3.603535
1006 1006 2.943456
1007 1007 1.586542
But is there a more "efficient" way to do this?
Thanks!
hist itself can provide for the solution to the question's problem, to find out which data points are in which intervals. hist returns a list with first member breaks
First, make the problem reproducible by setting the RNG seed.
set.seed(2021)
a = rnorm(1000,10,1)
b = rnorm(200,3,1)
c = rnorm(200,13,1)
d = c(a,b,c)
Now, save the return value of hist and have findInterval tell the bins where each data points are in.
h1 <- hist(d, breaks = 10)
f1 <- findInterval(d, h1$breaks)
h1$breaks
# [1] -2 0 2 4 6 8 10 12 14 16
head(f1)
#[1] 6 7 7 7 7 6
The first six observations are intervals 6 and 7 with end points 8, 10 and 12, as can be seen indexing d by f1:
head(d[f1])
#[1] 8.07743 10.26174 10.26174 10.26174 10.26174 8.07743
As for whether the intervals given by end points 8, 10 and 12 are left- or right-closed, see help("findInterval").
As a final check, table the values returned by findInterval and see if they match the histogram's counts.
table(f1)
#f1
# 1 2 3 4 5 6 7 8 9
# 2 34 130 34 17 478 512 169 24
h1$counts
#[1] 2 34 130 34 17 478 512 169 24
To have the intervals for each data point, the following
bins <- data.frame(bin = f1, min = h1$breaks[f1], max = h1$breaks[f1 + 1L])
head(bins)
# bin min max
#1 6 8 10
#2 7 10 12
#3 7 10 12
#4 7 10 12
#5 7 10 12
#6 6 8 10

How to measure distances between certain pairs of (pixel) coordinates in R?

I have a dataset of 22 point coordinates (points represent landmarks on photo of fish-lateral view).
I would like to measure 24 distances between these points (24 different measurements). For example distance between point 1 and 5 and so on.
And I would like to make a loop from it (always will measure the same set of 24 distances - I have 2000 of such lists of coordinates where I have to measure these 24 distances).
I tried "dist" function (see below) and it gave me all possible measurements between all points.
getwd()
setwd("C:/Users/jakub/merania")
LCmeasure <- read.csv("LC_meranie2.csv", sep = ";", dec = ",", header = T)
LCmeasure
head(LCmeasure)
names(LCmeasure)
> LCmeasure
point x y
1 1 1724.00000 1747.00000
2 2 1864.00000 1637.00000
3 3 1862.00000 1760.00000
4 4 2004.00000 1757.00000
5 5 2077.00000 1533.00000
6 6 2134.00000 1933.00000
7 7 2293.00000 1699.00000
8 8 2282.00000 1588.00000
9 9 2728.00000 1576.00000
10 10 2922.00000 1440.00000
11 11 3018.00000 1990.00000
12 12 3282.00000 1927.00000
13 13 3435.00000 1462.00000
14 14 3629.00000 1548.00000
15 15 3948.00000 1826.00000
16 16 3935.00000 1571.00000
17 17 4463.00000 1700.00000
18 18 4661.00000 1978.00000
19 19 4671.00000 1445.00000
20 20 4101.00000 1699.00000
21 21 2203.00000 2806.00000
22 22 4772.00000 2788.00000
df= data.frame(LCmeasure)
df
dflibrary(tidyverse)
dist(df[,-1])
Points <- data.frame(p1=c(1,1,1,3,4,5,1,1,1,7,10,10,11,12,12,14,15,11,13,7,20,20,20,1),p2=c(8,2,3,4,8,6,11,10,13,10,13,11,13,13,20,20,16,12,14,9,18,17,19,20))
Points
Dists <- Points %>% rowwise() %>% mutate(dist=dist(filter(LCmeasure, Point %in% c(p1,p2))))
Dists
Now I need to specify in R to measure for me only those specific 24 distances. For example between point 1 and 5, then between point 2 and 10, and so on.
And to make a loop from it (always will be the same set of 24 distances measured).
Here is my solution to your problem:
Generate a new dataframe with your desired pairs of points and then use dplyr to generate distances based on those points:
library(tidyverse)
Points <- data.frame(p1=c(1,2,4,5,6),p2=c(5,10,14,15,17))
Dists <- Points %>% rowwise() %>% mutate(dist=dist(filter(LCMeasure, point %in% c(p1,p2))))
> Dists
> p1 p2 dist
> <dbl> <dbl> <dbl>
> 1 1 5 413.
> 2 2 10 1076.
> 3 4 14 1638.
> 4 5 15 1894.
> 5 6 17 2341.

Tidying Time Intervals for Plotting Histogram in R

I'm doing some cluster analysis on the MLTobs from the LifeTables package and have come across a tricky problem with the Year variable in the mlt.mx.info dataframe. Year contains the period that the life table was taken, in intervals. Here's a table of the data:
1751-1754 1755-1759 1760-1764 1765-1769 1770-1774 1775-1779 1780-1784 1785-1789 1790-1794
1 1 1 1 1 1 1 1 1
1795-1799 1800-1804 1805-1809 1810-1814 1815-1819 1816-1819 1820-1824 1825-1829 1830-1834
1 1 1 1 1 2 3 3 3
1835-1839 1838-1839 1840-1844 1841-1844 1845-1849 1846-1849 1850-1854 1855-1859 1860-1864
4 1 5 3 8 1 10 11 11
1865-1869 1870-1874 1872-1874 1875-1879 1876-1879 1878-1879 1880-1884 1885-1889 1890-1894
11 11 1 12 2 1 15 15 15
1895-1899 1900-1904 1905-1909 1908-1909 1910-1914 1915-1919 1920-1924 1921-1924 1922-1924
15 15 15 1 16 16 16 2 1
1925-1929 1930-1934 1933-1934 1935-1939 1937-1939 1940-1944 1945-1949 1947-1949 1948-1949
19 19 1 20 1 22 22 3 1
1950-1954 1955-1959 1956-1959 1958-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984
30 30 2 1 40 40 41 41 41
1983-1984 1985-1989 1990-1994 1991-1994 1992-1994 1995-1999 2000-2003 2000-2004 2005-2006
1 42 42 1 1 44 3 41 22
2005-2007
14
As you can see, some of the intervals sit within other intervals. Thankfully none of them overlap. I want to simplify the intervals so intervals such as 1992-1994 and 1991-1994 all go into 1990-1994.
An idea might be to get the modulo of each interval and sort them into their new intervals that way but I'm unsure how to do this with the interval data type. If anyone has any ideas I'd really appreciate the help. Ultimately I want to create a histogram or barplot to illustrate the nicely.
If I understand your problem, you'll want something like this:
bottom <- seq(1750, 2010, 5)
library(dplyr)
new_df <- mlt.mx.info %>%
arrange(Year) %>%
mutate(year2 = as.numeric(substr(Year, 6, 9))) %>%
mutate(new_year = paste0(bottom[findInterval(year2, bottom)], "-",(bottom[findInterval(year2, bottom) + 1] - 1)))
View(new_df)
So what this does, it creates bins, and outputs a new column (new_year) that is the bottom of the bin. So everything from 1750-1754 will correspond to a new value of 1750-1754 (in string form; the original is an integer type, not sure how to fix that). Does this do what you want? Double check the results, but it looks right to me.

approx() without duplicates?

I am using approx() to interpolate values.
x <- 1:20
y <- c(3,8,2,6,8,2,4,7,9,9,1,3,1,9,6,2,8,7,6,2)
df <- cbind.data.frame(x,y)
> df
x y
1 1 3
2 2 8
3 3 2
4 4 6
5 5 8
6 6 2
7 7 4
8 8 7
9 9 9
10 10 9
11 11 1
12 12 3
13 13 1
14 14 9
15 15 6
16 16 2
17 17 8
18 18 7
19 19 6
20 20 2
interpolated <- approx(x=df$x, y=df$y, method="linear", n=5)
gets me this:
interpolated
$x
[1] 1.00 5.75 10.50 15.25 20.00
$y
[1] 3.0 3.5 5.0 5.0 2.0
Now, the first and last value are duplicates of my real data, is there any way to prevent this or is it something I don't understand properly about approx()?
You may want to specify xout to avoid this. For instance, if you want to always exclude the first and the last points, here's how you can do that:
specify_xout <- function(x, n) {
seq(from=min(x), to=max(x), length.out=n+2)[-c(1, n+2)]
}
plot(df$x, df$y)
points(approx(df$x, df$y, xout=specify_xout(df$x, 5)), pch = "*", col = "red")
It does not prevent from interpolating the existing point somewhere in the middle (exactly what happens on the picture below).
approx will fit through all your original datapoints if you give it a chance (change n=5 to xout=df$x to see this). Interpolation is the process of generating values for y given unobserved values of x, but should agree if the values of x have been previously observed.
The method="linear" setup is going to 'draw' linear segments joining up your original coordinates exactly (and so will give the y values you input to it for integer x). You only observe 'new' y values because your n=5 means that for points other than the beginning and end the x is not an integer (and therefore not one of your input values), and so gets interpolated.
If you want observed values not to be exactly reproduced, then maybe add some noise via rnorm ?

ggplot2 is plotting a line strangely

i am trying to plot the time series x_t = A + (-1)^t B
To do this i am using the following code. The problem is, that the ggplot is wrong.
require (ggplot2)
set.seed(42)
N<-2
A<-sample(1:20,N)
B<-rnorm(N)
X<-c(A+B,A-B)
dat<-sapply(1:N,function(n) X[rep(c(n,N+n),20)],simplify=FALSE)
dat<-data.frame(t=rep(1:20,N),w=rep(A,each=20),val=do.call(c,dat))
ggplot(data=dat,aes(x=t, y=val, color=factor(w)))+
geom_line()+facet_grid(w~.,scale = "free")
looking at the head of dat everything looks right:
> head(dat)
t w val
1 1 12 10.5533
2 2 12 13.4467
3 3 12 10.5533
4 4 12 13.4467
5 5 12 10.5533
6 6 12 13.4467
So the lower (blue) line should only have values 10.5533 and 13.4467. But it also takes different values. What is wrong in my code?
Thanks in advance for any help
You really should be more careful before asserting that something is "wrong". The way you are creating dat the rows are not ordered by dat$t, so head(...) is not displaying the extra values:
head(dat[order(dat$w,dat$t),],10)
# t w val
# 21 1 18 18.43530
# 61 1 18 18.36313
# 22 2 18 19.56470
# 62 2 18 17.63687
# 23 3 18 18.43530
# 63 3 18 18.36313
# 24 4 18 19.56470
# 64 4 18 17.63687
# 25 5 18 18.43530
# 65 5 18 18.36313
Note the row numbers.

Resources