Problems with interpolating using impute.interp in Julia - julia

I have a problem with interpolating. In the data is everything okay using this code, exept this interval when interpolating from datapoint 40903 and 40997 with the data Lat[40903] = 12.4461 and Lat[40997] = 12.4460. The datapoints in between is missing and have to be interpolated to be useful. Is it possible somehow? The results of interpolating is giving me -0.0 for most of the needed values.
using CSV
using DataFrames
using Impute
data = CSV.read("data.csv", delim=",", header=5, datarow=6, silencewarnings=true)
Lat = Impute.interp(data[:,8]) |> Impute.locf() |> Impute.nocb()

If you know the exact bounds of a section with missing values, and can use linear interpolation to fill, you should be able to generate a Range object for each column with missing values from the top and bottom of that section's values and the number of points in that section. For example, if the bottom X in the column is 1.5 and the top X is 1.3 and the number of missing values between is 5,
LinRange(1.5, 1.3, 5 + 2)[2:6]
will give the values you need. Impute.jl should do this as well.

Related

Determining the mean center and standard distance of a dataset in R

I have a dataset called mypoints and I have created a polygon and plotted the points as below:
mypoints=read.csv("d:\\data\\venus.csv",header = T)
mypoints
minx=min(mypoints[,1])
maxx=max(mypoints[,1])
miny=min(mypoints[,2])
maxy=max(mypoints[,2])
mypolygon=cbind(c(minx,maxx,maxx,minx),c(miny,miny,maxy,maxy))
plot(mypoints)
polygon(mypolygon)
I now want to write a function that calculates both the mean center and standard distance for mypoints. I then need to plot the standard distance as a circle centered on the mean center of all points with the radius equal to the standard distance. Note that the last expression evaluated in a function becomes the return value, the result of invoking the function.
So far:
#I think this how I calculate the mean center for x and y:
x1=sum(mypoints[,1])/length(mypoints[,1])
y1=sum(mypoints[,2])/length(mypoints[,2])
#This is the formula I was shown for standard distance:
sd.mypoints=sqrt(sum(x1+y1)/n)
#This is the formula I was shown for creating the circle:
symbols(sd.mypoints[1],sd.mypoints[2],sd.mypoints$sd,add=T,inches=F)
#This is the error that I get when I run the circle formula:
Error in sd.mypoints$sd : $ operator is invalid for atomic vectors
I have found it easier to find the Nearest Neighbor, do KDE, Ghat, and Fhat for this dataset than trying to figure this out. I am sure there is a easy solution for this but I just can't seem to get it. Third class in R and it has been a lot of fun up to this point.
You have the line
symbols(sd.mypoints[1],sd.mypoints[2],sd.mypoints$sd,add=T,inches=F)
in your code. As said in the comments, sd.mypoint is not a data.frame, so subsetting it with sd.mypoint$sd` causes the error you see.
From the documentation of symbols, which you can access with ?symbols you'll see that for circles the circles argument is mandatory, so the function can differentiate what sort of figure is drawing.
EDIT:
Also, please notice that you are using x and y points to symbols different to the ones you already calculated. So you need to replace that line with:
symbols(x1, y1,circles = sd.mypoints,add=T,inches=F)
Notice the use of x1 and y1. I can see the plot now.

How to plot minimum, maximum, and mean in r

I've been reading how to plot points in r, but can't find anything that matches my problem. My data is a matrix; the rows start with a column called 'site' and it is followed by three columns containing the parameters: minimum, mean, and maximum. There are four rows in the matrix, corresponding to 4 sites.
What I want is a graph that has the 4 sites on the x-axis and the three data points (min, mean max) above each site, connected by a line. The mean would be represented by a circle, while the min and max by a cross bar. Each of the means would be connected by a line. My output would look like a boxplot without the boxes and with a line connecting the means.
Can anyone help me? It seems like a simple problem but I'm stumped.
Define a random matrix:
set.seed(1)
n_sites <- 4
myMatrix <- cbind(t(replicate(n_sites,sort(rnorm(3)))),1:n_sites)
dimnames(myMatrix) <- list(paste("Site",1:n_sites),c("Min","Mean","Max","n"))
Plot:
plot(c(1,n_sites),range(myMatrix),type="n",xlab="",ylab="",xaxt="n",las=1)
axis(1,1:n_sites,rownames(myMatrix))
arrows(x0=1:n_sites,y0=myMatrix[,"Min"],x1=1:n_sites,y1=myMatrix[,"Max"],angle=90,code=3,length=0.1)
points(1:n_sites,myMatrix[,"Mean"],bg="white",pch=21,type="o")
text(1:n_sites,myMatrix[,"Max"],myMatrix[,"n"],pos=3)
I like using arrows() in cases like this.

How to cleanly use interpolation between points to generate a mean in R

I am having issues trying to generate a code that will cleanly produce a mean (specifically a weighted average) based on a simple plot of points using interpolation.
For Example;
ex=c(1,2,3,4,5)
why=c(2,5,9,15,24)
This shows the kind of information I am working with.
plot(ex, why, type="o")
At this point, I want to actually have each point "binned" so the lines between them are straight. To do this, I have been adding points to the x values manually in excel as (x+0.01).
This is the new output:
why=c(2,2,5,5,9,9,15,15,24,24)
ex=c(1,2,2.01,3,3.01,4,4.01,5,5.01,6)
plot(ex, why, type="o")
So this is where my question comes in to play. I have to do this many times and do not want to generate a ton of new vectors and objects. To get a weighted average, I have been interpolating y values for increments of x at 0.01 using interpolation into a new object. I am then able to go into this new object and get a mean when a point falls between the actual ex values, i.e.
mean(newy[1:245])
Because I made new y values for 100 increments of x that (basically) follow a straight line, I am getting a weighted average here for x= 1 to 2.45.
Is there an easier and more elegant way to embed the interpolate code into the mean code so I could just say "average of interpolated y for nonreal x to nonreal x?"
It doesn't do exactly what you want, but you should consider the stepfun function -- this creates a step function out of two series.
plot(stepfun(ex[-1], why))
stepfun is handy because it gives you a function defined over that interval, so you can easily interpolate just by evaluating anywhere. The downside to it is that it is not strictly defined on the range given (which is why we have to cut off the first value in ex).
Based on your second plotting example, I think you are probably looking for this:
library(ggplot2)
qplot(ex, why, geom="step")
this gives:
Or if you want the line to go vertical first, you can use:
qplot(ex, why, geom="step", direction = "vh")
which gives:

How to count line segment occurrences by pixel in R?

I am trying to convey the concentration of lines in 2D space by showing the number of crossings through each pixel in a grid. I am picturing something similar to a density plot, but with more intuitive units. I was drawn to the spatstat package and its line segment class (psp) as it allows you to define line segments by their end points and incorporate the entire line in calculations. However, I'm struggling to find the right combination of functions to tally these counts and would appreciate any suggestions.
As shown in the example below with 50 lines, the density function produces values in (0,140), the pixellate function tallies the total length through each pixel and takes values in (0, 0.04), and as.mask produces a binary indictor of whether a line went through each pixel. I'm hoping to see something where the scale takes integer values, say 0..10.
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L = psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# image with 2-dimensional kernel density estimate
D = density.psp(L, sigma=0.03)
# image with total length of lines through each pixel
P = pixellate.psp(L)
# binary mask giving whether a line went through a pixel
B = as.mask.psp(L)
par(mfrow=c(2,2), mar=c(2,2,2,2))
plot(L, main="L")
plot(D, main="density.psp(L)")
plot(P, main="pixellate.psp(L)")
plot(B, main="as.mask.psp(L)")
The pixellate.psp function allows you to optionally specify weights to use in the calculation. I considered trying to manipulate this to normalize the pixels to take a count of one for each crossing, but the weight is applied uniquely to each line (and not specific to the line/pixel pair). I also considered calculating a binary mask for each line and adding the results, but it seems like there should be an easier way. I know that you can sample points along a line, and then do a count of the points by pixel. However, I am concerned about getting the sampling right so that there is one and only one point per line crossing of a pixel.
Is there is a straight-forward way to do this in R? Otherwise would this be an appropriate suggestion for a future package enhancement? Is this more easily accomplished in another language such as python or matlab?
The example above and my testing has been with spatstat 1.40-0, R 3.1.2, on x86_64-w64-mingw32.
You are absolutely right that this is something to put in as a future enhancement. It will be done in one of the next versions of spatstat. It will probably be an option in pixellate.psp to count the number of crossing lines rather than measure the total length.
For now you have to do something a bit convoluted as e.g:
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L <- psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# split into individual lines and use as.mask.psp on each
masklist <- lapply(1:nsegments(L), function(i) as.mask.psp(L[i]))
# convert to 0-1 image for easy addition
imlist <- lapply(masklist, as.im.owin, na.replace = 0)
rslt <- Reduce("+", imlist)
# plot
plot(rslt, main = "")

R question about plotting probability/density histogram the right way

I have a following matrix [500,2], so we have 500 rows and 2 columns, the left one gives us the index of X observations, and the right one gives the probability with which this X comes true, so - a typical probability density relationship.
So, my question is, how to plot the histogram the right way, so that the x-axis is the x-index, and the y-axis is the density(0.01-1.00). The bandwidth of the estimator is 0.33.
Thanks in advance!
the end of the whole data looks like this: just for a little orientation
[490,] 2.338260830 0.04858685
[491,] 2.347839477 0.04797310
[492,] 2.357418125 0.04736149
[493,] 2.366996772 0.04675206
[494,] 2.376575419 0.04614482
[495,] 2.386154067 0.04553980
[496,] 2.395732714 0.04493702
[497,] 2.405311361 0.04433653
[498,] 2.414890008 0.04373835
[499,] 2.424468656 0.04314252
[500,] 2.434047303 0.04254907
#everyone,
yes, I have made the estimation before, so.. the bandwith is what I mentioned, the data is ordered from low to high values, so respecively the probability at the beginning is 0,22, at the peak about 0,48, at the end 0,15.
The line with the density is plotted like a charm but I have to do in addition is to plot a histogram! So, how I can do this, ordering the blocks properly(ho the data to be splitted in boxes etc..)
Any suggestions?
Here is a part of the data AFTER the estimation, all values are discrete, so I assume histogram can be created.., hopefully.
[491,] 4.956164 0.2618131
[492,] 4.963014 0.2608723
[493,] 4.969863 0.2599309
[494,] 4.976712 0.2589889
[495,] 4.983562 0.2580464
[496,] 4.990411 0.2571034
[497,] 4.997260 0.2561599
[498,] 5.004110 0.2552159
[499,] 5.010959 0.2542716
[500,] 5.017808 0.2533268
[501,] 5.024658 0.2523817
Best regards,
appreciate the fast responses!(bow)
What will do the job is to create a histogram just for the indexes, grouping them in a way x25/x50 each, for instance...and compute the average probability for each 25 or 50/100/150/200/250 etc as boxes..?
Assuming the rows are in order from lowest to highest value of x, as they appear to be, you can use the default plot command, the only change you need is the type:
plot(your.data, type = 'l')
EDIT:
Ok, I'm not sure this is better than the density plot, but it can be done:
x = dnorm(seq(-1, 1, length = 500))
x.bins = rep(1:50, each = 10)
bars = aggregate(x, by = list(x.bins), FUN = sum)[,2]
barplot(bars)
In your case, replace x with the probabilities from the second column of your matrix.
EDIT2:
On second thought, this only makes sense if your 500 rows represent discrete events. If they are instead points along a continuous distribution function adding them together as I have done is incorrect. Mathematically I don't think you can produce the binned probability for a range using only a few points from within that range.
Assuming M is the matrix. wouldn't this just be :
plot(x=M[ , 1], y = M[ , 2] )
You have already done the density estimation since this is not the original data.

Resources