R data.table causes lines() to draw weird stuff - r

I want to draw a linear regression line y = m*x+b with x coming from a column in a data.table and m and b fixed. When I execute this program:
library(data.table)
dt = data.table(KEY_COLUMN = c("a","c","d","e","b"),
x = c(29.34224, 26.77573, 25.45568, 26.27839, 28.22389)
)
x = dt$x
m = -0.1211562
b = 63.09729
plot(c(25,30), c(58,61))
lines(x, m*x + b, col="red")
setkeyv(dt, "KEY_COLUMN")
then I get this weird picture:
which cannot be the true picture of the data because hey, I'm drawing a line y = mx+b!
Even more awkwardly, when removing the command setkeyv(dt, "KEY_COLUMN") which takes place BEHIND the line drawing, then everything wirks and I get a line. And if that is not enought: when leaving the 'bad' command setkeyv(dt, "KEY_COLUMN") there but inserting a browser() right after the lines command then everything works as expected and I get a line...
This is a 'quantum' error: whenever you want to see the error, it goes away... only in the situation where you cannot really observe the error, it is there. Am I stupid/overlooking something really simple here or what is going on?
Cheers,
FW

As you know, data.table modifies by reference. I can reproduce this when I source the code all at once. If I source it line by line I get the expected result. Thus, I assume that this happens:
The first parameter to lines is a reference (pointer) to x which is a reference to the x column of the data.table. Since it is never modified, it is never actually copied and stays a reference. The second parameter to lines is not a reference, since the expression gets evaluated and results in a new (independent) variable.
Now, plotting is slow and the last line of code is evaluated (and the key is set) before the C code for plotting has actually produced the plot. This orders the data.table in memory and x is still only a reference to it. The lines are produced based on the reordered x data.
I can get the expected result if I force a copy:
library(data.table)
dt = data.table(KEY_COLUMN = c("a","c","d","e","b"),
x = c(29.34224, 26.77573, 25.45568, 26.27839, 28.22389)
)
x = copy(dt$x)
#alternatively modifying x works: x[1] <- x[1]
m = -0.1211562
b = 63.09729
plot(c(25,30), c(58,61))
lines(x, m*x + b, col="red")
setkeyv(dt, "KEY_COLUMN")

Related

How to plot the function 4(x)^2 = ((y)^2/(1-y))?

I want to plot the function
4(x)^2 = ((y)^2/(1-y));
how can I plot this?
--> 4*(x) = ((y^2)*(1-y)^-1)^0.5;
4*(x) = ((y^2)*(1-y)^-1)^0.5;
^^
Error: syntax error, unexpected =, expecting end of file
Since Scilab 6.1.0, plotimplicit() does it:
plotimplicit "4*x^2 = y^2/(1-y)"
xgrid()
Can't do more simple. Result:
Well, you have to first create a function and for that you have to express one variable in terms of the other.
function x = f(y)
x = (((y^2)*(1-y)^-1)^0.5)/4;
endfunciton
Then you need to generate the input data (i.e, the points at which you want to evaluate the function)
ydata = linspace(1, 10)
Now you push your input point through the function to get your output points
xdata = f(ydata)
Then, you can plot the pairs of x and y using:
plot(xdata, ydata)
Or even easier, without the intermediate step of generating the output data, you can simply do:
plot(f(ydata), ydata)
BTW. I find it strange that the function you are trying to plot is x in terms of y, usually, x is the input variable, but I hope you know what you are trying to accomplish.
Reference: https://www.scilab.org/tutorials/getting-started/plotting
Take care that y must be in [-inf 1[
y=linspace(-10 ,1.00001,1000);
x = sqrt(y^2./(1-y))/4;
clf; plot(y,x),plot(y,-x)
If x is a solution -x is also solution

Julia Plotting: delete and modify existing lines

Two questions in one: Given a line plotted in Julia, how can I
delete it from the plot and legend (without clearing the whole plot)
change its properties (such as color, thickness, opacity)
As a concrete example in the code below, how can I 1. delete previous regression lines OR 2. change their opacity to 0.1?
using Plots; gr()
f = x->.3x+.2
g = x->f(x)+.2*randn()
x = rand(2)
y = g.(x)
plt = scatter(x,y,c=:orange)
plot!(0:.1:1, f, ylim=(0,1), c=:green, alpha=.3, linewidth=10)
anim = Animation()
for i=1:200
r = rand()
x_new, y_new = r, g(r)
push!(plt, x_new, y_new)
push!(x, x_new)
push!(y, y_new)
A = hcat(fill(1., size(x)), x)
coefs = A\y
plot!(0:.1:1, x->coefs[2]*x+coefs[1], c=:blue) # plot new regression line
# 1. delete previous line
# 2. set alpha of previous line to .1
frame(anim)
end
gif(anim, "regression.gif", fps=5)
I tried combinations of delete, pop! and remove but without success.
A related question in Python can be found here: How to remove lines in a Matplotlib plot
Here is a fun and illustrative example of how you can use pop!() to undo plotting in Julia using Makie. Note that you will see this goes back in the reverse order that everything was plotted (think, like adding and removing from a stack), so deleteat!(scene.plots, ind) will still be necessary to remove a plot at a specific index.
using Makie
x = range(0, stop = 2pi, length = 80)
f1(x) = sin.(x)
f2(x) = exp.(-x) .* cos.(2pi*x)
y1 = f1(x)
y2 = f2(x)
scene = lines(x, y1, color = :blue)
scatter!(scene, x, y1, color = :red, markersize = 0.1)
lines!(scene, x, y2, color = :black)
scatter!(scene, x, y2, color = :green, marker = :utriangle, markersize = 0.1)
display(scene)
sleep(10)
pop!(scene.plots)
display(scene)
sleep(10)
pop!(scene.plots)
display(scene)
You can see the images above that show how the plot progressively gets undone using pop(). The key idea with respect to sleep() is that if we were not using it (and you can test this on your own by running the code with it removed), the fist and only image shown on the screen will be the final image above because of the render time.
You can see if you run this code that the window renders and then sleeps for 10 seconds (in order to give it time to render) and then uses pop!() to step back through the plot.
Docs for sleep()
I have to say that I don't know what the formal way is to accomplish them.
There is a cheating method.
plt.series_list stores all the plots (line, scatter...).
If you have 200 lines in the plot, then length(plt.series_list) will be 200.
plt.series_list[1].plotattributes returns a dictionary containing attributes for the first line(or scatter plot, depends on the order).
One of the attributes is :linealpha, and we can use it to modify the transparency of a line or let it disappear.
# your code ...
plot!(0:.1:1, x->coefs[2]*x+coefs[1], c=:blue) # plot new regression line
# modify the alpha value of the previous line
if i > 1
plt.series_list[end-1][:linealpha] = 0.1
end
# make the previous line invisible
if i > 2
plt.series_list[end-2][:linealpha] = 0.0
end
frame(anim)
# your code ...
You cannot do that with the Plots package. Even the "cheating" method in the answer by Pei Huang will end up with the whole frame getting redrawn.
You can do this with Makie, though - in fact the ability to interactively change plots was one of the reasons for creating that package (point 1 here http://makie.juliaplots.org/dev/why-makie.html)
Not sure about the other popular plotting packages for Julia.

How does the curve function in R work? - Example of curve function

How does the following code work? I got the example when I was reading the help line of R ?curve. But i have not understood this.
for(ll in c("", "x", "y", "xy"))
curve(log(1+x), 1, 100, log = ll,
sub = paste("log= '", ll, "'", sep = ""))
Particularly , I am accustomed to numeric values as arguments inside the for-loop as,
for(ll in 1:10)
But what is the following command saying:
for(ll in c("","x","y","xy"))
c("","x","y","xy") looks like a string vector? How does c("","x","y","xy") work inside curve
function as log(1+x)[what is x here? the string "x"? in c("","x","y","xy")] and log=ll ?
Apparently, there are no answers on stack overflow about how the curve function in R works and especially about the log argument so this might be a good chance to delve into it a bit more (I liked the question btw):
First of all the easy part:
c("","x","y","xy") is a string vector or more formally a character vector.
for(ll in c("","x","y","xy")) will start a loop of 4 iterations and each time ll will be '','x','y','xy' respectively. Unfortunately, the way this example is built you will only see the last one plotted which is for ll = 'xy'.
Let's dive into the source code of the curve function to answer the rest:
First of all the what does the x represent in log(1+x)?
log(1+x) is a function. x represents a vector of numbers that gets created inside the curve function in the following part (from source code):
x <- exp(seq.int(log(from), log(to), length.out = n)) #if the log argument is 'x' or
x <- seq.int(from, to, length.out = n) #if the log argument is not 'x'
#in our case from and to are 1 and 100 respectively
As long as the n argument is the default the x vector will contain 101 elements. Obviously the x in log(1+x) is totally different to the 'x' in the log argument.
as for y it is always created as (from source code):
y <- eval(expr, envir = ll, enclos = parent.frame()) #where expr is in this case log(1+x), the others are not important to analyse now.
#i.e. you get a y value for each x value on the x vector which was calculated just previously
Second, what is the purpose of the log argument?
The log argument decides which of the x or y axis will be logged. The x-axis if 'x' is the log argument, y-axis if 'y' is the log argument, both axis if 'xy' is the log argument and no log-scale if the log argument is ''.
It needs to be mentioned here that the log of either x or y axis is being calculated in the plot function in the curve function, that is the curve function is only a wrapper for the plot function.
Having said the above this is why if the log argument is 'x' (see above) the exponential of the log values of the vector x are calculated so that they will return to the logged ones inside the plot function.
P.S. the source code for the curve function can be seen with typing graphics::curve on the console.
I hope this makes a bit of sense now!

Reinitializing variables in R and having them update globally

I'm not sure how to pose this question with the right lingo and the related questions weren't about the same thing. I wanted to plot a function and noticed that R wasn't udpating the plot with my change in a coefficient.
a <- 2
x <- seq(-1, 1, by=0.1)
y <- 1/(1+exp(-a*x))
plot(x,y)
a <- 4
plot(x,y) # no change
y <- 1/(1+exp(-a*x)) # redefine function
plot(x,y) # now it updates
Just in case I didn't know what I was doing, I followed the syntax on this R basic plotting tutorial. The only difference was the use of = instead of <- for assignment of y = 1/(1+exp(-a*x)). The result was the same.
I've actually never just plotted a function with R, so this was the first time I experienced this. It makes me wonder if I've seen bad results in other areas if re-defined variables aren't propagated to functions or objects initialized with the initial value.
1) Am I doing something wrong and there is a way to have variables sort of dynamically assigned so that functions take into account the current value vs. the value it had when they were created?
2) If not, is there a common way R programmers work around this when tweaking variable assignments and making sure everything else is properly updated?
You are not, in fact, plotting a function. Instead, you are plotting two vectors. Since you haven't updated the values of the vector before calling the next plot, you get two identical plots.
To plot a function directly, you need to use the curve() function:
f <- function(x, a)1/(1+exp(-a*x))
Plot:
curve(f(x, 1), -1, 1, 100)
curve(f(x, 4), -1, 1, 100)
R is not Excel, or MathCAD, or any other application that might lead you to believe that changing an object's value might update other vectors that might have have used that value at some time in the past. When you did this
a <- 4
plot(x,y) # no change
There was no change in 'x' or 'y'.
Try this:
curve( 1/(1+exp(-a*x)) )
a <- 10
curve( 1/(1+exp(-a*x)) )

How to draw lines on a plot in R?

I need to draw lines from the data stored in a text file.
So far I am able only to draw points on a graph and i would like to have them as lines (line graph).
Here's the code:
pupil_data <- read.table("C:/a1t_left_test.dat", header=T, sep="\t")
max_y <- max(pupil_data$PupilLeft)
plot(NA,NA,xlim=c(0,length(pupil_data$PupilLeft)), ylim=c(2,max_y));
for (i in 1:(length(pupil_data$PupilLeft) - 1))
{
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red", cex = 0.5, lwd = 2.0)
}
Please help me change this line of code:
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red")
to draw lines from the data.
Here is the data in the file:
PupilLeft
3.553479
3.539469
3.527239
3.613131
3.649437
3.632779
3.614373
3.605981
3.595985
3.630766
3.590724
3.626535
3.62386
3.619688
3.595711
3.627841
3.623596
3.650569
3.64876
By default, R will plot a single vector as the y coordinates, and use a sequence for the x coordinates. So to make the plot you are after, all you need is:
plot(pupil_data$PupilLeft, type = "o")
You haven't provided any example data, but you can see this with the built-in iris data set:
plot(iris[,1], type = "o")
This does in fact plot the points as lines. If you are actually getting points without lines, you'll need to provide a working example with your data to figure out why.
EDIT:
Your original code doesn't work because of the loop. You are in effect asking R to plot a line connecting a single point to itself each time through the loop. The next time through the loop R doesn't know that there are other points that you want connected; if it did, this would break the intended use of points, which is to add points/lines to an existing plot.
Of course, the line connecting a point to itself doesn't really make sense, and so it isn't plotted (or is plotted too small to see, same result).
Your example is most easily done without a loop:
PupilLeft <- c(3.553479 ,3.539469 ,3.527239 ,3.613131 ,3.649437 ,3.632779 ,3.614373
,3.605981 ,3.595985 ,3.630766 ,3.590724 ,3.626535 ,3.62386 ,3.619688
,3.595711 ,3.627841 ,3.623596 ,3.650569 ,3.64876)
plot(PupilLeft, type = 'o')
If you really do need to use a loop, then the coding becomes more involved. One approach would be to use a closure:
makeaddpoint <- function(firstpoint){
## firstpoint is the y value of the first point in the series
lastpt <- firstpoint
lastptind <- 1
addpoint <- function(nextpt, ...){
pts <- rbind(c(lastptind, lastpt), c(lastptind + 1, nextpt))
points(pts, ... )
lastpt <<- nextpt
lastptind <<- lastptind + 1
}
return(addpoint)
}
myaddpoint <- makeaddpoint(PupilLeft[1])
plot(NA,NA,xlim=c(0,length(PupilLeft)), ylim=c(2,max(PupilLeft)))
for (i in 2:(length(PupilLeft)))
{
myaddpoint(PupilLeft[i], type = "o")
}
You can then wrap the myaddpoint call in the for loop with whatever testing you need to decide whether or not you will actually plot that point. The function returned by makeaddpoint will keep track of the plot indexing for you.
This is normal programming for Lisp-like languages. If you find it confusing you can do this without a closure, but you'll need to handle incrementing the index and storing the previous point value 'manually' in your loop.
There is a strong aversion among experienced R coders to using for-loops when not really needed. This is an example of a loop-less use of a vectorized function named segments that takes 4 vectors as arguments: x0,y0, x1,y1
npups <-length(pupil_data$PupilLeft)
segments(1:(npups-1), pupil_data$PupilLeft[-npups], # the starting points
2:npups, pupil_data$PupilLeft[-1] ) # the ending points

Resources