Adding two kernel density objects in R? - r

Suppose we have two objects created using the density() function. Is there a way to add these two objects to get another density (or similar) object?
For example:
A = rnorm(100)
B = rnorm(1000)
dA = density(A)
dB = density(B)
dC = density(c(A, B))
Is there a way to get the dC object from the dA and dB objects? Some king of sum operation?

A return from density is a list with these parts:
> str(dA)
List of 7
$ x : num [1:512] -3.67 -3.66 -3.65 -3.64 -3.63 ...
$ y : num [1:512] 0.00209 0.00222 0.00237 0.00252 0.00268 ...
$ bw : num 0.536
$ n : int 4
$ call : language density.default(x = A)
$ data.name: chr "A"
$ has.na : logi FALSE
- attr(*, "class")= chr "density"
note the original data isn't in there, so we can't get that and simply do something like dAB = density(c(dA$data, dB$data)).
The x and y components form the curve of the density, which you can plot with plot(dA$x, dA$y). You might think all you need to do is add the y values from two density objects but there's no guarantee they'll be at the same x points.
So maybe you think you can interpolate one to the same x points and then add the y values. But that won't integrate to 1 like a proper density ought to, so what you should do is scale dA$y and dB$y according to the fraction of points in each component density - which you can get from the dA$n component.
If you don't understand that last point, consider the following two densities, one from 1000 points and one from 10:
dA = density(runif(1000))
dB = density(runif(500)+10)
the first is a uniform between 0 and 1, the second a uniform between 10 and 11. The height of both uniforms is 1, and their ranges don't overlap, so if you added them you'd get two steps of equal height. But the density of their unions:
dAB = density(c(runif(1000), runif(500)+10))
is a density with twice as much mass between 0 and 1 than between 10 and 11. When adding densities taken from samples you need to weight by the sample size.
So if you can interpolate them to the same x values, and then sum the y values scaled according to the n values as weights, you can get something that would approximate density(c(A,B)).

Related

gnuplot 2D contour with multicolumn tabular dataset

How to turn the following tabular dataset into a simple 2D density plot to show a loc-number distribution?
I am new to gnuplot. Attempted a tutorial. A simple x,y plot with multiple columns of data, the plot is fine of course. Then tried this answer.. However I encountered the following issue, though x values are defined. I am guessing fundamentally my data set is lacking?(!).. what am I not doing right here? How to achieve a simple 2D contour from below data?
Updating based on recommended suggestions while OP aim remains intact.
Following is the input sample data used. File is single-space delimited. x = x, y=y, z1 = locid (1 to n) or z2=loctype (scuba, shower, swimming, restrooms, sushi, cafe, restaurant, etc)
input data :
ametype amename X(1000) Y1000) km-to-carpark
Scuba SCUB1 10.72 49.01
Scuba SCUB2 13.88 47.32
Scuba SCUB3 14.58 46.46
Scuba SCUB4 14.52 48.23
Scuba SCUB5 13.05 47.23
Scuba SCUB6 12.21 47.95
Scuba SCUB7 12.66 46.19
Cafe CAFE1 13.97 47.45
Cafe CAFE4 31.63 30.3
Playground PARK2 31.57 30.2
Playground PARK1 27.51 31.87
Cafe CAFE5 67.71 109.09
Scuba SCUB8 68.58 109.54
Scuba SCUB9 67.14 109.99
Cafe CAFE2 13.83 46.24
SUSHI SUSH1 79.59 41.22
SUSHI SUSHI2 73.81 54.14
SUSHI SUSHI3 72.87 55.47
SUSHI SUSHI4 75.05 56.51
RESTROOM RESTR1 74.1 56.05
RESTROOM RESTR2 74.96 57.9
RESTROOM RESTR3 75.06 55.59
RESTAURANT RESTAU1 76.57 56.33
RESTAURANT RESTAU1 76.95 55.1
RESTAURANT RESTAU2 77.75 54.69
RESTAURANT RESTAU2 76.15 54.34
code tried for a different dataset where x,y weren't coordinates;
set view map
set contour
set isosample 250, 250
set cntrparam level incremental 1, 0.1
set palette rgbformulae 33,13,10
splot 'data.dat' with lines nosurface
#splot for [col=1:10] ‘data.dat’ u ($1):(column(col) > 2 ? 1/0 : column(col)):3
errors:
1) All points x value undefined
2) Tabular output of this 3D plot style not implemented
updated:
a) increased data points
c) a possible chicken scratch to give simple impression.
Expecting a distribution density map like this.
This is an interesting plotting challenge.
The input data format is also straightforward, but needs some processing until the desired contour lines can be plotted with gnuplot.
Comments:
The data is all in one file. Data entries for the types can be random, no order necessary.
the example below will create some random test data with "Cafe, Scuba, Sushi" and 50 entries of each. Skip this part if you want to use your own file.
the further lines of the script, have no idea about the content of the test data file (i.e. how many types, type names, coordinates, etc.), all will be determined automatically.
create a unique list of types. The list will be in the order of first occurrence.
define a grid (here dx=0.2, dy=0.2, i.e. reasonable values within the data range) and count for each grid point the occurrences for each type within a certain radius (here: 0.5). Calculate the density by dividing the count by the unit area (area of the circle).
for each type create the contour lines via plotting to a file indexed by a two digit number. So far, I don't know how one would easily write this into indexed datablocks to avoid files on disk.
finally, plot the contour line files and the original data points by using a filter to get the right color.
One thing which I haven't figured out yet is set cntrparam level 2: I would like to have exactly 2 contour lines per type, but it seems gnuplot still uses the option set cntrparam level auto 2 and adjusts the number of levels itself.
As you can imagine this graph will probably look pretty confusing with 10 or more types.
For sure, there is room for improvement and no guarantee that there are no bugs in this script. Look at it as a starting point for further optimization. Suggestions for improvements are welcome!
Script:
### plot density contours from simple x,y location file
reset session
FILE = "SO73244095.dat"
# create some random test data
myTypes = "Cafe Scuba Sushi"
set print FILE
do for [p=1:words(myTypes)] {
a = word(myTypes,p)
x0 = rand(0)*5
y0 = rand(0)*5
do for [i=1:20] {
print sprintf("%s %s%d %.3g %.3g",a,a,i,invnorm(rand(0))+x0,invnorm(rand(0))+y0)
}
}
set print
# create a unique list of types
# and extract min, max data
addToList(list,col) = list.(_s='"'.strcol(col).'"', strstrt(list,_s)>0 ? '' : _s)
myTypes = ''
myType(i) = word(myTypes,i)
stats FILE u (myTypes=addToList(myTypes,1),$3):4 name "DATA" nooutput
Nt = words(myTypes)
print sprintf("%d types found: %s",Nt,myTypes)
# get densities for each type
dx = 0.2 # adjust the grid as you like...
dy = 0.2 # ... time for graph creation will increase with finer grid
Radius = 0.5 # adjust radius to a reasonable value
Nx = ceil((DATA_max_x-DATA_min_x)/dx)
Ny = ceil((DATA_max_y-DATA_min_y)/dy)
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
print "Please wait..."
set print $Densities
do for [nt=1:Nt] {
do for [ny=0:Ny] {
do for [nx=0:Nx] {
c = 0
x = DATA_min_x+nx*dx
y = DATA_min_y+ny*dy
stats FILE u (Dist(x,y,$3,$4)<=Radius && (strcol(1) eq word(myTypes,nt)) ? c=c+1 : 0) nooutput
d = c / (pi * Radius**2) # density per unit area
print sprintf("%g %g %g",x,y,d)
}
print "" # empty line
}
print ""; print "" # two empty lines
}
set print
# get contour lines via splot into files
myContFile(n) = sprintf("%s.cont%02d",FILE,n)
unset surface
set contour
set cntrparam cubicspline levels 2 # cubicspline for "nice" round curves
do for [nt=1:Nt] {
set table myContFile(nt)
splot $Densities u 1:2:3 index nt-1
unset table
}
# set size ratio -1 # uncomment if equal x,y scale is important
set grid x,y
set key out noautotitle
set xrange[:] noextend
set yrange[:] noextend
set colorsequence classic
myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN
plot for [i=1:Nt] myContFile(i) u 1:2 w l lc i, \
for [i=1:Nt] FILE u 3:(myFilter(4,1,myType(i))) w p pt 7 lc i ti myType(i)
### end of script
Result: (a few random examples)

When plotting phylogeny using plotBranchbyTrait, the colour intensity which indicates a trait value does not correspond with values in my data

I am trying to plot a phylogeny tree and have the colours of the branches correspond to a continuous trait of the species. I am using the function plotBranchbyTrait from the phytools package in R to do it but the colour intensity which indicates the degree of the trait (amount of sediment in diet in my case) for each species do not all correspond to my data. Is plotBranchbyTrait actually using the values of the trait for the species to indicate colour or is it colouring by the inverse of that trait value because the plotted colour intensities seem to show that. Any other suggestions to do this would be appreciated.
Here is an example of my code:
birdtree <- read.nexus("output.nex") # my pruned phylogeny tree downloaded from BirdTree.org
rootedtree <- maxCladeCred(birdtree,tree = TRUE, rooted = TRUE) # here you root the tree
diet_averages <- read.csv("species diet averages.csv",header=TRUE, na.strings=c("", "n/a")) # my data showing percent of seed in diet
str(diet_averages)
data.frame': 123 obs. of 2 variables:
$ binomial: chr "Accipiter_cooperii" "Accipiter_gentilis" "Accipiter_melanoleucus" "Acridotheres_tristis" ...
$ seeds : num 0 0 0 17.6 0 0 0 38.4 0 0 ...
plotBranchbyTrait(rootedtree, mydata$seeds, palette = "rainbow")
I have the same issue. 'plotBranchbyTrait' has a problematic mapping routine that mixes up nodes and edges when mapping all edges AND tips.
This is a workaround, which ignores the edge data (creating new edge values with 'plotBranchbyTrait'):
MapData<-mydata$seeds
MapData<-MapData[1:length(rootedtree$tip.label)] # assuming that the rows 1:x correspond to the taxa in the tree
names(MapData)<-rootedtree$tip.label
plotBranchbyTrait(rootedtree, MapData, palette = "rainbow", mode="tips")
After considerable searching I also found the root of the problem. When mapping a complete data set, including edge data, 'plotBranchbyTrait' internally reorders the input vector according to the "rootedtree$edge".
You can search the tree to identify the corresponding node/tip for every species list [i] that you derived from your nexus tree:
PhyloNode[i]<-findMRCA(rootedtree, tips=Specieslist[i], type="node")
and derive which row in the matrix rootedtree$edge has this node/tip as offspring:
EdgeRow[i]<-which(rootedtree$edge[,2]==PhyloNode[i])
Please mind that i=1 needs to be excluded from this reference table, because the Nexus root does not exist in a phylo tree.
Once you identified the edge of every node, you can reorder your data
MapData<-mydata$seeds
names(MapData)<-EdgeRow
MapData<-MapData[order(ParentRow)]
plotBranchbyTrait(rootedtree, MapData, palette = "rainbow", mode="edges")

Area estimation using image in R

So i'm looking for a way to estimate the area of a region, using only the image of the map. The reason i'm doing this is I want to calculate the area that would be lost upon a certain increase in sea level and I can't find any kind of meta data for that only the maps (in image formats). Here is the link to such a map:
(source: cresis.ku.edu)
So what i have in mind is to convert this image to a gray scale image using EBimage package and then using the pixel intensity as a criteria to count the number of pixels that represent potentially threaten area.
My question is it possible? How can we you the pixel intensity as a criteria? And if there are any other approach to solve this issue?
Also if there are way to gain access to the meta data used to plot such map that I'm not aware of please tell me.
Thank you everyone.
Edit:
Thank to hrbrmstr I was able to read the grid data int R using rgdal packages. So in order to calculate the area I tried to used the rgeos package, but the dataset from CRESIS doesn't have the shape file. So how can we define the polygon and calculate the area?
Sorry if this question seems silly. This is the first time I've ever dealt with spatial data and analysis
The data is in the ESRI files:
library(rgdal)
grid_file_1m <- new("GDALReadOnlyDataset", "/full/path/to/inund1/w001001.adf")
grid_1m <- asSGDF_GROD(grid_file_1m, output.dim=c(1000, 1000))
plot(grid_1m, bg="black")
grid_1m_df <- as.data.frame(grid_1m)
str(grid_1m_df)
## 'data.frame': 2081 obs. of 3 variables:
## $ band1: int 1 1 1 1 1 1 1 1 1 1 ...
## $ x : num -77.2 -76.9 -76.5 -76.1 -75.8 ...
## $ y : num 83.1 83.1 83.1 83.1 83.1 ...

Plot a thin plate spline using scatterplot3d

Splines are still fairly new to me.
I am trying to figure out how to create a three dimensional plot of a thin plate spline, similar to the visualizations which appear on pages 24-25 of Introduction to Statistical Learning (http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf). I'm working in scatterplot3d, and for the sake of easily reproducible data, lets use the 'trees' dataset in lieu of my actual data.
Setting the initial plot is trivial:
data(trees)
attach(trees)
s3d <- scatterplot3d(Girth, Height, Volume,
type = "n", grid = FALSE, angle = 70,
zlab = 'volume',
xlab = 'girth',
ylab = 'height',
main = "TREES") # blank 3d plot
I use the Tps function from the fields library to create the spline:
my.spline <- Tps(cbind(Girth, Height), Volume)
And I can begin to represent the spline visually:
for(i in nrow(my.spline$x):1) # for every girth . . .
s3d$points3d(my.spline$x[,1], rep(my.spline$x[i,2], times=nrow(my.spline$x)), # repeat every height . . .
my.spline$y, type='l') # and match these values to a predicted volume
But when I try to complete the spline by cross hatching lines along the height access, the results become problematic:
for(i in nrow(my.spline$x):1) # for every height . . .
s3d$points3d(rep(my.spline$x[i,1], times=nrow(my.spline$x)), my.spline$x[,2], # repeat every girth . . .
my.spline$y, type='l') # and match these values to a predicted volume
And the more that I look at the resulting plot, the less certain I am that I'm even using the right data from my.spline.
Please note that this project uses scatterplot3d for other visualizations, so I am wedded to this package as the result of preexisting team choices. Any help will be greatly appreciated.
I don't think you are getting the predicted Tps. That requires using predict.Tps
require(fields)
require(scatterplot3d)
data(trees)
attach(trees) # this worries me. I generally use data in dataframe form.
s3d <- scatterplot3d(Girth, Height, Volume,
type = "n", grid = FALSE, angle = 70,
zlab = 'volume',
xlab = 'girth',
ylab = 'height',
main = "TREES") # blank 3d plot
grid<- make.surface.grid( list( girth=seq( 8,22), height= seq( 60,90) ))
surf <- predict(my.spline, grid)
str(surf)
# num [1:465, 1] 5.07 8.67 12.16 15.6 19.1 ...
str(grid)
#------------
int [1:465, 1:2] 8 9 10 11 12 13 14 15 16 17 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "girth" "height"
- attr(*, "grid.list")=List of 2
..$ girth : int [1:15] 8 9 10 11 12 13 14 15 16 17 ...
..$ height: int [1:31] 60 61 62 63 64 65 66 67 68 69 ...
#-------------
s3d$points3d(grid[,1],grid[,2],surf, cex=.2, col="blue")
You can add back the predicted points. This gives a better idea of x-y regions where there is "support" for the estimated surface:
s3d$points3d(my.spline$x[,1], my.spline$x[,2],
predict(my.spline) ,col="red")
There is no surface3d function in scatterplot3d package. (And I just searched the Rhelp archives to see if I were missing something but the graphics experts have always said that you would need to use lattice::wireframe, the graphics::persp or the 'rgl'-package functions. Since you have made a commitment to scatterplot3d, I think the easiest transtion would not be to those but to the much more capable base-graphics package named plot3d. It is capable of many variations and makes quite beautiful surfaces with its surf3D function:

Octave 3D mesh, data from file

I have a big file with 3 columns: density, dimension, value.
example:
10 0.3 200
10 0.4 300
20 0.3 250
20 0.4 320
I am trying to draw a 3d plot - mesh with mesh() function in octave, like this:
data = load ("file.txt");
mesh(data(:,1), data (:,2), data (:,3));
Problem I have is , I always get error:
rows (z) must be the same as length (y), columns (z) must be the same as length (x).
It worked with function plot3(), but I would like a mesh kind of plot.
The problem is that mesh(X,Y,Z) is expecting your X and Y matrices to be generated using X = meshgrid(x) and Y = meshgrid(y) where x and y only contain unique points. Your data basically already defines the meshgrid, but it is difficult to get it out.
I suggest using reshape as:
X = reshape(data(:,1),m,n);
Y = reshape(data(:,2),m,n); % might be reshape(data(:,2),n,m)
Z = reshape(data(:,3),m,n);
mesh(X,Y,Z);
In this case the assumption is that you have m unique values in Y, and n unique values in X. You may have to transpose these in your call to mesh as mesh(X',Y',Z) or something like that.

Resources