Find zero crossing in R - r

If I have the following data:
df <- structure(list(x = c(1.63145539094563, 1.67548187017034, 1.71950834939504,
1.76353482861975, 1.80756130784445, 1.85158778706915, 1.89561426629386,
1.93964074551856, 1.98366722474327, 2.02769370396797, 2.07172018319267,
2.11574666241738, 2.15977314164208, 2.20379962086679, 2.24782610009149,
2.2918525793162, 2.3358790585409, 2.3799055377656, 2.42393201699031,
2.46795849621501, 2.51198497543972, 2.55601145466442, 2.60003793388912,
2.64406441311383, 2.68809089233853, 2.73211737156324, 2.77614385078794,
2.82017033001265, 2.86419680923735, 2.90822328846205, 2.95224976768676,
2.99627624691146, 3.04030272613617, 3.08432920536087, 3.12835568458557,
3.17238216381028, 3.21640864303498, 3.26043512225969, 3.30446160148439,
3.3484880807091, 3.3925145599338, 3.4365410391585, 3.48056751838321,
3.52459399760791, 3.56862047683262, 3.61264695605732, 3.65667343528202,
3.70069991450673, 3.74472639373143, 3.78875287295614), y = c(24.144973858154,
18.6408277478876, 21.9174270206615, 22.8017876727379, 20.9766270378248,
18.604384256745, 18.4805250429826, 15.8436744335752, 13.6357170277296,
11.6228806771368, 9.4065868126964, 6.81644596802601, 4.41187500831424,
4.31911614349431, 0.678259284890563, -1.18632719250877, -2.32986407762089,
-3.84480566043122, -5.24738510499144, -5.20160089844013, -5.42094587600499,
-5.39886757202858, -5.26753920575326, -4.68727963638973, -2.73267203102102,
0.296905237887623, 2.45725152489283, 5.12102449689086, 7.13986218237411,
10.2044876281093, 14.4358946463429, 19.0643081865458, 22.8920445618834,
26.7229418763085, 31.3776791707576, 36.19058349817, 41.2843224331918,
46.3396522631345, 51.4321502764393, 56.4080998038294, 61.5215778808583,
66.6845421308734, 71.3912749310486, 76.0856977880158, 80.7039319129457,
84.4095953723555, 88.0163019647757, 89.918078622734, 91.6341473685881,
94.0404562451352)), class = c("tbl_df", "tbl", "data.frame"), .Names = c("x",
"y"), row.names = c(NA, -50L))
Plot:
How do I find the exact x value when y == 0? I tried doing interpolation, but it does not necessarily give me a y value equals to zero. Does anyone know of a function to find zero crossings?

Firstly, one can define a corresponding (linearly) interpolated function with
approxfun(df$x, df$y)
where the result looks like
curve(approxfun(df$x, df$y)(x), min(df$x), max(df$x))
Those zero crossing then can be seen as the roots of this function. In base R there is a function uniroot, but it looks for a single root, while in your case we have two. Hence, one option would be the rootSolve package as in
library(rootSolve)
uniroot.all(approxfun(df$x, df$y), interval = range(df$x))
# [1] 2.263841 2.727803

Related

Use lag(x,1) or lag(x,-1) for dynamic regression?

I have a simple yet somehow confusing question about dynamic regressions and lagged independent variables. I have 3 time series and I want to study the effect of 3 indedendent variables (namely PSVI, NSVI, and BTC_Ret) from the previous week on the current weeks bitcoin log returns. I want to analyse for example if a negative change in PSVI (Positve Sentiment Index) from the previous week can tell us something about the direction of the BTC returns in the following week.
I came across the lag function which can do exactly do that.
If I understand the function correctly, I would use the the lag function in combination with the dyn$lm function from the package dyn to get the results I want.
My code would then look as follows:
test1 <- dyn$lm(BTC_Ret~lag(PSVI,1)+lag(NSVI,1)+lag(BTC_Ret,1))
summary(test1)
Am I right to assume that I need to use lag(x,1) and not lag(x,-1)?
And should I use dyn$lm to study the effect or is there a better way to do all of this?
My data looks as follows:
structure(c(0.151825062532955, -0.179352391776254, -0.171610266403897,
0.0159227765884022, -0.353420091085592, -0.0179223189753976,
0.260710954985742, -0.0878045204765083, 0.17494222283881, -0.183889954532262,
-0.15249960475038, 0.0325479482522972, -0.216135243885031, 0.0258548317723122,
0.170469815313808, 0.0552681180119521, 0.0676987678252168, 0.0247151614282206,
-0.101373110320685, -0.0244444101458825, -0.363995910827583,
-0.819549195465083, -0.311532754839479, -0.661660753934884, -0.036159476713393,
-0.0116417252109642, -0.219357256430676, -0.386169350367107,
-0.468384245564164, 0.226420789220966, -0.2366560332375, 0.2425676656972,
-0.351430535471613, -0.287492079068963, 0.548071569094531, -0.228973857164721,
-0.139490538928287, 0.247548840497568, -0.361502742177194, 0.0604938285432965,
0.619445016304069, 0.0947076213861557, -0.887137767470338, 0.0485516007581502,
0.0429273907756451, -0.701341407090506, 0.34191134646093, -0.428167056300805,
-0.298917079322128, 0.517537828051947, 0.0474069010338689, -0.118044838446349,
-0.414289228784203, 0.143198527419672, 0.0733053148180489, 0.0131259707878403,
-0.106103445964187, 0.107827719520595, -0.604074345624302, 0.444400965939648
), .Dim = c(20L, 3L), .Dimnames = list(NULL, c("BTC_Ret", "PSVI",
"NSVI")), .Tsp = c(2018, 2018.36538461538, 52), class = c("mts",
"ts", "matrix"))
Many thanks!
Assuming tt defined in the Note at the end (copied from the question) we use the following.
ts class is normally used with R's lag. The -1 in that means move the series 1 forward so that the previous value lines up with the current row. There is more information in ?lag.
Do not use dplyr's lag which does not work with ts class and furthermore is different and uses the opposite convention or if you want to load dplyr use library(dplyr, exclude = c("filter", "lag")) to ensure that you are using R's lag.
library(dyn)
test1 <- dyn$lm(BTC_Ret ~ lag(PSVI,-1) + lag(NSVI,-1) + lag(BTC_Ret,-1), tt)
These alternatives also work:
Lag <- function(x, k = 1) lag(x, -k)
test2 <- dyn$lm(BTC_Ret ~ Lag(PSVI) + Lag(NSVI) + Lag(BTC_Ret), tt)
test3 <- dyn$lm(BTC_Ret ~ lag(tt, -1), tt)
Note
tt <- structure(c(0.151825062532955, -0.179352391776254, -0.171610266403897, 0.0159227765884022, -0.353420091085592, -0.0179223189753976, 0.260710954985742, -0.0878045204765083, 0.17494222283881, -0.183889954532262, -0.15249960475038, 0.0325479482522972, -0.216135243885031, 0.0258548317723122, 0.170469815313808, 0.0552681180119521, 0.0676987678252168, 0.0247151614282206, -0.101373110320685, -0.0244444101458825, -0.363995910827583, -0.819549195465083, -0.311532754839479, -0.661660753934884, -0.036159476713393, -0.0116417252109642, -0.219357256430676, -0.386169350367107, -0.468384245564164, 0.226420789220966, -0.2366560332375, 0.2425676656972, -0.351430535471613, -0.287492079068963, 0.548071569094531, -0.228973857164721, -0.139490538928287, 0.247548840497568, -0.361502742177194, 0.0604938285432965, 0.619445016304069, 0.0947076213861557, -0.887137767470338, 0.0485516007581502, 0.0429273907756451, -0.701341407090506, 0.34191134646093, -0.428167056300805, -0.298917079322128, 0.517537828051947, 0.0474069010338689, -0.118044838446349, -0.414289228784203, 0.143198527419672, 0.0733053148180489, 0.0131259707878403, -0.106103445964187, 0.107827719520595, -0.604074345624302, 0.444400965939648 ), .Dim = c(20L, 3L), .Dimnames = list(NULL, c("BTC_Ret", "PSVI", "NSVI")), .Tsp = c(2018, 2018.36538461538, 52), class = c("mts", "ts", "matrix"))

Rolling Sample standard deviation in R

I wanted to get the standard deviation of the 3 previous row of the data, the present row and the 3 rows after.
This is my attempt:
mutate(ming_STDDEV_SAMP = zoo::rollapply(ming_f, list(c(-3:3)), sd, fill = 0)) %>%
Result
ming_f
ming_STDDEV_SAMP
4.235279667
0.222740262
4.265353
0.463348209
4.350810667
0.442607461
3.864739333
0.375839159
3.935632333
0.213821765
3.802632333
0.243294783
3.718387667
0.051625808
4.288542333
0.242010836
4.134689
0.198929941
3.799883667
0.112733475
This is what I expected:
ming_f
ming_STDDEV_SAMP
4.235279667
0.225532646
4.265353
0.212776157
4.350810667
0.23658801
3.864739333
0.253399417
3.935632333
0.26144862
3.802632333
0.246259684
3.718387667
0.20514358
4.288542333
0.208578409
4.134689
0.208615874
3.799883667
0.233948429
It doesn't match your output exactly, but perhaps this is what you need:
zoo::rollapply(quux$ming_f, 7, FUN=sd, partial=TRUE)
(It also works replacing 7 with list(-3:3).)
This expression isn't really different from your sample code, but the output is correct. Perhaps your original frame has a group_by still applied?
Data
quux <- structure(list(ming_f = c(4.235279667, 4.265353, 4.350810667, 3.864739333, 3.935632333, 3.802632333, 3.718387667, 4.288542333, 4.134689, 3.799883667), ming_STDDEV_SAMP = c(0.225532646, 0.212776157, 0.23658801, 0.253399417, 0.26144862, 0.246259684, 0.20514358, 0.208578409, 0.208615874, 0.233948429)), class = "data.frame", row.names = c(NA, -10L))

get rows from dataframe matching element of a list

Here are one dataframe/tibble and one character element(this element is one column of a tibble)
df1 <- structure(list(Twitter_name = c("CHESHIREKlD", "JellyComons",
"kirmiziburunlu", "erkekdeyimleri", "herosFrance", "IkishanShah"
), Declared_followers = c(60500L, 43100L, 31617L, 27852L, 26312L,
16021L), Real_followers = c(60241, 43054, 31073, 27853, 25736,
15856), Twitter_Id = c("783866366", "1424086592", "2367932244",
"3352977681", "2580703352", "521094407")), .Names = c("Twitter_name",
"Declared_followers", "Real_followers", "Twitter_Id"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
myId <- c("867211097882804224", "868806957133688832", "549124465","822580282452754432",
"109344546", "482666188", "61716107", "3642392237", "595318933",
"833365943044628480", "1045015087", "859830740669800448", "860562940059045888",
"2854457294", "871784135983067136", "866922354554814464", "4839343547",
"849451474572759040", "872084673526214656", "794841530053853184")
N:B: df1 has been shortened and has indeed 128 observations.
I am looking to test all row elements of df1$Twitter_Id and see if they are in myId. I can run this:
> match(myId[1], df1$Twitter_Id)
but:
it stops at the first occurrence
I need to apply the match() function to all elements of myId.
I can't find a clean and simple way to do this, using lapply() or other functions from dplyr, tydiverse packages.
Thank you for help.
EDIT I need to be more explicit with the whole real case.
myTw <- structure(list(id_str = c("893445199661330433", "893116842558050304",
"892739336466305024", "892401780105019393", "892401594272296963",
"892365572486430720", "891964139756818432")), .Names = "id_str", row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
these are tweets ID.What I am looking for is to obtain which twitter users have retweeted these ones. To do this, I use the retweeters() function from package twitteR.
library(twitteR)
MyRtw <- retweeters(myTw[1])
MyRtw <- c("889135428028084224", "867211097882804224", "868806957133688832",
"549124465", "822580282452754432", "109344546", "482666188",
"61716107", "3642392237", "595318933", "833365943044628480",
"1045015087", "859830740669800448", "860562940059045888", "2854457294",
"871784135983067136", "866922354554814464", "4839343547", "849451474572759040",
"872084673526214656")
This is a list of Twitter user Id.
Now finally I want to see which users from df1$Twitte_Id have retweeted MyTw[1].
You can use the '%in%' operator.
Edit: Probably this is what you want. Here I used the data posted in your original post (before editing).
matchVector = NULL
for (id in df1$Twitter_Id) {
matchCounter <- sum(myId %in% id)
matchVector <- c(matchVector, matchCounter)
}
df1$numberOfMatches <- matchVector

How to determine if a point is above or below a line connecting points in R?

Objective:
Given a set of data, how can I determine if a new data point is above or below the line connecting the points.
For example, how can I determine if the red point shown is above or below the line (without visual inspection)?
I'd like to fit an exact line to the points. Essentially joining the points, and need a fit to be able to use any point on the line for a comparator.
Current attempts:
So far I've tried fitting various splines to the data, but it is still a bit too smooth. I'm really looking for an exact fit to the data, sharp corners and all.
I've tried a natural spline (as well as smooth.splines), but can't quite get the fit exact/sharp enough:
plot(df$lowerx, df$lowery, type='b', xlab='x', ylab='y', pch=21, bg='steel blue')
myspline <- splinefun(df$lowerx, df$lowery, method='natural')
curve(myspline, add=T, from = 0, to=140, n = 100, col='green')
I think once I get the fit right it will be straightforward use it to figure out if points are above or below the line (e.g. using predict or the function), but I need help with the fit.
Also would welcome another approach entirely.
Data:
df <- structure(list(lowerx = c(11.791, 18.073, 23.833, 35.875, 39.638, 44.153, 59.206, 71.498, 83.289, 95.091, 119.676, 131.467, 143.76), lowery = c(5.205, 5.89, 6.233, 9.041, 10, 10.342, 12.603, 13.493, 14.658, 15.274, 15.89, 15.616, 15.342)), .Names = c("lowerx", "lowery"), class = "data.frame", row.names = c(NA, -13L))
The R function approxfun will create a function that does a linear interpolation.
> F <- approxfun(x=df$lowerx, y=df$lowery)
> F(80) > 13
[1] TRUE
I used the data you offered and tested my best guess at the coordinates of the "red point" as (80, 13), so it says that 13 is less than the interpolation and (80,15) is above:
> F(80) > 15
[1] FALSE
this post shows how to do the trick: How to tell whether a point is to the right or left side of a line
if position is +1 the point is above, if it's -1 the point is below, if 0 it is directly on the line, no fitting required, you just need to know which two points to refer to span the line...
applied to your example:
df <- structure(list(lowerx = c(11.791, 18.073, 23.833, 35.875, 39.638, 44.153, 59.206, 71.498, 83.289, 95.091, 119.676, 131.467, 143.76), lowery = c(5.205, 5.89, 6.233, 9.041, 10, 10.342, 12.603, 13.493, 14.658, 15.274, 15.89, 15.616, 15.342)), .Names = c("lowerx", "lowery"), class = "data.frame", row.names = c(NA, -15L))
X <- 79
Y <- 13
xIndex2 <- which(df$lowerx > X)[1]
xIndex1 <- xIndex2 - 1
Ax <- df$lowerx[xIndex1]
Ay <- df$lowery[xIndex1]
Bx <- df$lowerx[xIndex2]
By <- df$lowery[xIndex2]
position = sign((Bx - Ax) * (Y - Ay) - (By - Ay) * (X - Ax))

what is the "class" parameter in structure()?

I am trying to use the structure() function to create a data frame in R.
I saw something like this
structure(mydataframe, class="data.frame")
Where did class come from? I saw someone using it, but it is not listed in the R document.
Is this something programmers learned in another language and carries it over? And it works. I am very confused.
Edit: I realized dput(), is what actually created a data frame looking like this. I got it figured out, cheers!
You probably saw someone using dput. dput is used to post (usually short) data. But normally you would not create a data frame like that. You would normally create it with the data.frame function. See below
> example_df <- data.frame(x=rnorm(3),y=rnorm(3))
> example_df
x y
1 0.2411880 0.6660809
2 -0.5222567 -0.2512656
3 0.3824853 -1.8420050
> dput(example_df)
structure(list(x = c(0.241188014013708, -0.522256746461544, 0.382485333260912
), y = c(0.666080872170054, -0.251265630627216, -1.84200501106852
)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
Then, if someone wants to "copy" your data.frame, he just has to run the following:
> copied_df <- structure(list(x = c(0.241188014013708, -0.522256746461544, 0.382485333260912
+ ), y = c(0.666080872170054, -0.251265630627216, -1.84200501106852
+ )), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
I put "copy" in quotes because note the following:
> identical(example_df,copied_df)
[1] FALSE
> all.equal(example_df,copied_df)
[1] TRUE
identical yields false because when you post your dput output, often the numbers get rounded to a certain decimal point.
'class' is not a specific argument to the structure function - that's why you didn't find it in the help file.
structure takes an object and then any number of name/value pairs and sets them as attributes on the object. In this case, class was such an attribute. You can try this to add fictional 'foo' and 'bar' attributes to a vector:
x <- structure(1:3, foo=42, bar='hello')
attributes(x)
#$foo
#[1] 42
#
#$bar
#[1] "hello"
And as Joshua Ulrich and Xu Wang mentioned, you should not create a data.frame like that.
I'm scratching my head, wondering what "R Document" would not have said something about "class". It's a very basic component of the the language and how functions get applied. You should type this and read:
?class
?methods

Resources