Indexing Issues - r

I've been trying to plot the difference between two sets of information (the residuals). Both sets of data have similar (yet different) characteristics, and both data sets go from 0 to the same X value. The only inconsistency is that they are indexed differently, so while the first graph reaches X in A steps, the second reaches X in B steps. Thus, I cannot simply subtract the dependent variable values of one data frame from the other. I am speaking in very general terms, so I've provided a simple example. I want to plot the residuals between two data sets that look like this:
data1 <- data.frame(x1=c(1,2,3,4,5,6), y1=c(10,5,7,3,2,4))
data2 <- data.frame(x2=c(1,3,6), y2=c(1,3,2))
plot(data1, y1 ~ x1, type = 'l', lty = 1, col = 'blue', xlim = c(1,6), ylim = c(0,10))
points(data2$y2 ~ data2$x2, type = 'l', lty = 1, col = 'red')
So I guess my question is:
How can I plot the residuals of two functions (like the above) that are indexed differently. Is there a function that will solve for the residuals between the two data sets?
EDIT1: The example was faulty, Spacedman helped me to rectify this.

If a linear interpolation is good enough, you can use approx to interpolate at a bunch of X coordinates. EG:
> xout = sort(unique(c(seq(1,6,len=100),data1$x1,data2$x2))) # include data coords (untested)
> d1 = approx(data1$x1,data1$y1,xout)
> d2 = approx(data2$x2,data2$y2,xout)
> plot(xout,d1$y-d2$y,type="l")

Related

non-linear 2d object transformation by horizontal axis

How can such a non-linear transformation be done?
here is the code to draw it
my.sin <- function(ve,a,f,p) a*sin(f*ve+p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1+s2+10+1:100
par(mfrow=c(1,2),mar=rep(2,4))
plot(s,t="l",main = "input") ; abline(h=seq(10,120,by = 5),col=8)
plot(s*7,t="l",main = "output")
abline(h=cumsum(s)/10*2,col=8)
don't look at the vector, don't look at the values, only look at the horizontal grid, only the grid matters
####UPDATE####
I see that my question is not clear to many people, I apologize for that...
Here are examples of transformations only along the vertical axis, maybe now it will be more clear to you what I want
link Source
#### UPDATE 2 ####
Thanks for your answer, this looks like what I need, but I have a few more questions if I may.
To clarify, I want to explain why I need this, I want to compare vectors with each other that are non-linearly distorted along the horizontal axis .. Maybe there are already ready-made tools for this?
You mentioned that there are many ways to do such non-linear transformations, can you name a few of the best ones in my case?
how to make the function f() more non-linear, so that it consists, for example, not of one sinusoid, but of 10 or more. Тhe figure shows that the distortion is quite simple, it corresponds to one sinusoid
and how to make the function f can be changed with different combinations of sinusoids.
set.seed(126)
par(mar = rep(2, 4),mfrow=c(1,3))
s <- cumsum(rnorm(100))
r <- range(s)
gridlines <- seq(r[1]*2, r[2]*2, by = 0.2)
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
f <- function(x) 2 * sin(x)/2 + x
plot(s, t = "l", main = "input+new greed")
abline(h = f(gridlines), col = 8)
plot(f(s), t = "l", main = "output")
abline(h = f(gridlines), col = 8)
If I understand you correctly, you wish to map the vector s from the regular spacing defined in the first image to the irregular spacing implied by the second plot.
Unfortunately, your mapping is not well-defined, since there is no clear correspondence between the horizontal lines in the first image and the second image. There are in fact an infinite number of ways to map the first space to the second.
We can alter your example a bit to make it a bit more rigorous.
If we start with your function and your data:
my.sin <- function(ve, a, f, p) a * sin(f * ve + p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1 + s2 + 10 + 1:100
Let us also create a vector of gridlines that we will draw on the first plot:
gridlines <- seq(10, 120, by = 2.5)
Now we can recreate your first plot:
par(mar = rep(2, 4))
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
Now, suppose we have a function that maps our y axis values to a different value:
f <- function(x) 2 * sin(x/5) + x
If we apply this to our gridlines, we have something similar to your second image:
plot(s, t = "l", main = "input")
abline(h = f(gridlines), col = 8)
Now, what we want to do here is effectively transform our curve so that it is stretched or compressed in such a way that it crosses the gridlines at the same points as the gridlines in the original image. To do this, we simply apply our mapping function to s. We can check the correspondence to the original gridlines by plotting our new curves with a transformed axis :
plot(f(s), t = "l", main = "output", yaxt = "n")
axis(2, at = f(20 * 1:6), labels = 20 * 1:6)
abline(h = f(gridlines), col = 8)
It may be possible to create a mapping function using the cumsum(s)/10 * 2 that you have in your original example, but it is not clear how you want this to correspond to the original y axis values.
Response to edits
It's not clear what you mean by comparing two vectors. If one is a non-linear deformation of the other, then presumably you want to find the underlying function that produces the deformation. It is possible to create a function that applies the deformation empirically simply by doing f <- approxfun(untransformed_vector, transformed_vector).
I didn't say there were many ways of doing non-linear transformations. What I meant is that in your original example, there is no correspondence between the grid lines in the original picture and the second picture, so there is an infinite choice for which gridines in the first picture correspond to which gridlines in the second picture. There is therefore an infinite choice of mapping functions that could be specified.
The function f can be as complicated as you like, but in this scenario it should at least be everywhere non-decreasing, such that any value of the function's output can be mapped back to a single value of its input. For example, function(x) x + sin(x)/4 + cos(3*(x + 2))/5 would be a complex but ever-increasing sinusoidal function.

How to smooth a curve in R?

location diffrence<-c(0,0.5,1,1.5,2)
Power<-c(0,0.2,0.4,0.6,0.8,1)
plot(location diffrence,Power)
The guy which has written the paper said he has smoothed the curve using a weighted moving average with weights vector w = (0.25,0.5,0.25) but he did not explained how he did this and with which function he achieved that.i am really confused
Up front, as #MartinWettstein cautions, be careful in when you smooth data and what you do with it (infer from it). Having said that, a simple exponential moving average might look like this.
# replacement data
x <- seq(0, 2, len=5)
y <- c(0, 0.02, 0.65, 1, 1)
# smoothed
ysm <-
zoo::rollapply(c(NA, y, NA), 3,
function(a) Hmisc::wtd.mean(a, c(0.25, 0.5, 0.25), na.rm = TRUE),
partial = FALSE)
# plot
plot(x, y, type = "b", pch = 16)
lines(x, ysm, col = "red")
Notes:
the zoo:: package provides a rolling window (3-wide here), calling the function once for indices 1-3, then again for indices 2-4, then 3-5, 4-6, etc.
with rolling-window operations, realize that they can be center-aligned (default of zoo::rollapply) or left/right aligned. There are some good explanations here: How to calculate 7-day moving average in R?)
I surround the y data with NAs so that I can mimic a partial window. Normally with rolling-window ops, if k=3, then the resulting vector is length(y) - (k-1) long. I'm inferring that you want to include data on the ends, so the first smoothed data point would be effectively (0.5*0 + 0.25*0.02)/0.75, the second smoothed data point (0.25*0 + 0.5*0.02 + 0.25*0.65)/1, and the last smoothed data point (0.25*1 + 0.5*1)/0.75. That is, omitting the 0.25 times a missing data point. That's a guess and can easily be adjusted based on your real needs.
I'm using Hmisc::wtd.mean, though it is trivial to write this weighted-mean function yourself.
This is suggestive only, and not meant to be authoritative. Just to help you begin exploring your smoothing processes.

Assigning colours to data sets

I've got the following code that generates a random data set with a graph of the following,
x1=abs(rnorm(200))
x2=abs(rnorm(200))-7*x1^2
plot(x1,x2)
My goal is to separate the data so that the first 100 points are blue and the remaining 100 points are red in a data.frame. So I have two quick questions,
1) How do I separate the data so as I move along x1 the first 100 points are blue and the other are red? I've added an image below for clarification, mind my artistic talent with the snipping tool.
2) If after the colours are assigned, is a simple z=data.frame(x1,x2, colours) enough to get the data into a dataset so that I may run the data using some basic machine learning tools, such as SVM, Bagging and Boosting?
Cheers for the help.
set.seed(42)
dat <- data.frame(x1 = abs(rnorm(200)))
dat$x2 <- abs(rnorm(200)) - 7*dat$x1^2
dat$col <- ifelse(rank(dat$x1) <= 100, "blue", "red")
plot(x2 ~ x1, data = dat, col = col)
# also: plot(dat$x1, dat$x2, col = dat$col)
The "first 100" is subjective depending on your needs and the context of the data. One might also want the euclidean distance from origin (pythagorean), manhattan distance, or some other valuation. Or x1 <= mean(x1) or x1 <= median(x1). Lots of ways, this is just one way, where we use ifelse to differentiate/assign.

R Statistics Distributions Plotting

I am having some trouble with a homework I have at Statistics.
I am required to graphical represent the density and the distribution function in two inline plots for a set of parameters at my choice ( there must be minimum 4 ) for Student, Fisher and ChiS repartitions.
Let's take only the example of Student Repartition.
From what I have searched on the internet, I have come with this:
First, I need to generate some random values.
x <- rnorm( 20, 0, 1 )
Question 1: I need to generate 4 of this?
Then I have to plot these values with:
plot(dt( x, df = 1))
plot(pt( x, df = 1))
But, how to do this for four set of parameters? They should be represented in the same plot.
Is this the good approach to what I came so far?
Please, tell me if I'm wrong.
To plot several densities of a certain distribution, you have to first have a support vector, in this case x below.
Then compute the values of the densities with the parameters of your choice.
Then plot them.
In the code that follows, I will plot 4 Sudent-t pdf's, with degrees of freedom 1 to 4.
x <- seq(-5, 5, by = 0.01) # The support vector
y <- sapply(1:4, function(d) dt(x, df = d))
# Open an empty plot first
plot(1, type = "n", xlim = c(-5, 5), ylim = c(0, 0.5))
for(i in 1:4){
lines(x, y[, i], col = i)
}
Then you can make the graph prettier, by adding a main title, changing the axis titles, etc.
If you want other distributions, such as the F or Chi-squared, you will use x strictly positive, for instance x <- seq(0.0001, 10, by = 0.01).

Visualize data using histogram in R

I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)

Resources