I have taken photos of a bird nesting area and have marked positions of earch bird on the photo. Resulting data is a list of X and Y positions. I transformed pixel data to meters data.
I want to calculate how many of counts are there in squares of 1m2. I was able to get what I looked for graphically with geom_bin2d but I would like to extract the value of each of the squares.
Any functions that would do this? or methods to extract data from geom_bin2d?
Thank you very much!
I have found few functions (density, bkde2D) but they are related to Kernel density estimate, which doesn't seem to fit the same values with geom_bin2d.
Related
I want to find the expected value of y given x, based on some data. I want really good estimates of the mean y-value at a few particular x-values, but I don't want/need to fit something parametric or do a regression.
Instead, I want to take my observations, bin a bunch of them where I have a lot of x-values in a small range of X, and compute the mean of y.
Is there a clever way to select, say, 6 non-overlapping regions of high density from my vector of x observations?
If so, I'll take the center of each region, grab a bunch of the closest x's (maybe 100 in my real data), and compute the associated mean(y).
Here's some example data:
# pick points for high density regions
#xobs<-runif(900)
clustery_obs<-function(x){rnorm(40,x,0.2)}
under_x<-runif(11)
xobs<-sapply(under_x, clustery_obs)
xobs<-xobs[0<xobs&xobs<1]
yfun<-function(x){rnorm(1, mean=(10*x)^2-(30*x)+3, sd=6)}
yobs<-sapply(xobs, yfun)
plot(xobs, yobs)
I've just started learning R, and was wondering, say I have the dataset quake, and I want to generate the probability histogram of quakes near Fiji, would the code simply be hist(quakes$lat,freq=F)?
A histogram shows the frequency or proportion of a given value out of all the values in a data set. You need a numeric vector as the x argument for hist(). There is no flat variable in quakes, but there is a lat variable. hist(quakes$lat, freq = F) would show the following:
This shows the north/south geographical distribution of earthquakes, centering around -20, and, since it is approximately normal (with a left skew) suggests that there is a mechanism for earthquake generation that centers around a specific latitude.
The best way to learn is to try. If you wonder if that would be the way to do it, try it.
You might also want to look at this tutorial on creating kernel density plots with ggplot.
I have a time series dataset with spatial data (x,y coordinates). Each point is static in location, but its value varies over time, ie. each point has its own unique function. I want to assign these functions as a mark, so I can plot the point pattern with each individual time series as a plotting symbol.
This is an exploratory step to eventually perform some spatial functional data analysis.
As an example, I want something like Figure 2 published in this article:
*Delicado,P., R. Giraldo, C. Comas, and J. Mateu. 2010. Spatial Functional Data: Some Recent Contibutions. Environmetrics 21:224-239
I'm having trouble posting an image of the figure
1) Working in R with ggplot2, I can plot a line of change in quant of each id over time:
(Fake example dataset, where x and y are Carteian coordinates, id is an individual observation, and quant are values of id at each year):
x<-c(1,1,1,2,2,2,3,3,3)
y<-c(1,1,1,2,2,2,3,3,3)
year<-c(1,2,3,1,2,3,1,2,3)
id<-c("a","a","a","b","b","b","c","c","c")
quant<-c(5,2,4,2,4,2,4,4,6)
allData<-data.frame(x,y,year,id,quant)
ggplot(allData,aes(x=year,y=quant, group=id))+geom_line()
2) Or I can plot the geographic point pattern of id:
ggplot(allData,aes(x=x,y=y,color=id))+geom_point()
I want to plot the graph from (2), but use the line plots from (1) as the point symbols (marks). Any suggestions?
I want to generate a plot in which I'm capable of visualising the distribution of a given variable in space.
Let's say I want to know how the values of frequency bandwidth (Hz) of a bird's song are spread in space.
I tried with scatterplot3d. But I'm think this is not right.
x<-a vector with easting coordinates
y<-a vector with northing coordinates
z<-a vector with the bandwidth values (in Hz)
Then I do:
scatterplot3d(x,y,z)
Should "z" be a coordinate or can I use it as a vector of values of a given variable?
Thanks in advance!
I did it with scatterplot3d.
scatterplot3d(Hypo$Easting,Hypo$Northing,Hypo$Song_Dur,angle=118,color="red",pch=16,highlight.3d=TRUE,xlab="Easting",ylab="Northing",type="h",lwd=1)
Where Hypo$Easting==x, Hypo$Northing==y and Hypo$Song_Dur==z(variable).
I need to get a plot of a Lorentz curve of a cumulative variable as a function of the number of observations. I want both axes to be displayed on a percentage basis (e.g. say observations are the number of buyers and the y variable is the amount they bought, buyers are already ranked in descending order, I want to get the plot that says "The top 10% buyers purchased 90% of the total bought"). My dataset is a couple million observations.
What is the best way to do this? Sub-questions:
If I need to add two variables for the quantiles of total observations and total $ bought (so as to use them to plot), what is the object that returns the row number? I tried:
user_quantile <- row(df)/nrow(df)
but I get a matrix of identical columns (user_quantile.1, user_quantile.2) of which I only need one column.
Is there instead any way to skip adding percentages as variables and only have them for axes values?
The plot has way to many points than I need to get the line. What is the best approach to minimize the computational effort and get a nice graph?
Thanks.
You may want to acquaint yourself with the excellent RSeek search engine for R content. One quick query for Lorentz curve (and Lorenz curve) lead to these packages:
ineq: Measuring inequality, concentration, and poverty
reldist: Relative Distribution Methods
GeoXp: Interactive exploratory spatial data analysis
lawstat: An R package for biostatistics, public policy and law
all of which seem to supply a Lorenz curve function.
In order to get the plot done you need first to arrange the raw data.
1) You can use the cut2() function from the Hmisc package to cut the data in quantiles. Check the documentation, it's not hard. It's similar to the cut() from the base package.
2) After using the cut2() function with the income data, you need to compute the frequency of each decile. Use table() for that. Then calculate percentages of income for each decile.
3) Now you should have a very small table with the following columns:
Decile, cumulative % of total income.
Add another column with the 45 degree line. Just add a constant cumulative % of income.
finaltable$cumulative_equality_line = seq(0.1, 1, by = 0.1)
4) You can use base graphics or ggplot2 for plotting. I guess you can do it with the info of step 3 or perhaps check out specific plotting questions.
I'll have to do it soon, but i already have the final table. I'll post the code for plotting once i do it.
Good luck!