I'm afraid this is a slightly abstract question, but is there a way to use a metric other than 'count' in hexbin. My code to produce a hexbin chart would be something like:
library(hexbin)
hexbinplot(latitude~longitude,data=X)
Given that my output is essentially a map, I was wanting to replace 'count'and make the colours of the hexbins relate to a 3rd variable at the aggregate level, eg say I were looking at demographic data, then the 3rd variable might be aggregate or average consumer spending power of all those people belonging to a geographical region defined by the hexbin.
I understand that this would be possible by conditioning on a third variable in 2 windows in the normal lattice/trellis way, but I'm not sure it is possible to have a representation of something other than 'count' in the chart. Thanks for any help.
Related
I am attempting to create a heatmap using a data set that has only one value per coordinate, with that value being a continuous variable. All of the examples I have found using leaflet.extras::addHeatmap() use data that can have multiple values per coordinate, and create the heatmap based on the density of counts in an area. There doesn't seem to be a way to pass a weight instead.
My ultimate goal is to have something that looks like a raster based on these values:
However I don't want to use a raster due to the pixelation along the coasts.
When I pass the data to addHeatmap() and include the argument intensity = ~my_weighted_value, I get something like this:
And at increased zoom levels, it just ends up being a bunch of circles:
What is the proper way to take weighted spatial data and add a heatmap that looks like the raster?
Try to scale my_weighted_value back by *.00001 or something. Your weighted value appears to be exceeding the max.
I have different time-series corresponding to different individuals and their location within a building (a categorical variable -- more like a room name).
I would like to study the similarity in movement of different individuals by something like cross-recurrence plots, where the two time-series correspond to the two axes and the actual points correspond to the presence/absence of individuals in the same room.
Has anyone tried doing such plots in R or while using ggplot? Any help would be great!
I haven't used this routine. I used only d2 dimension and Lyapunov exponent for EEG but this package Tisean (RTisean for your case) has a routine ['recurr'] that returns the specific plot.
This link has a nice wrap up of tutorials and links
Edited:
In this link you can find a nice example of application of recurrence plot.
The return variables of function recur(and similar functions of other packages) you can access after putting $ after the dataset (like database)
and you can access them inside in ggplot function and applying the appropriate aes.
I am a novice at R and experimenting with as an alternative for data visualisation.
I am having trouble creating a stacked bar chart.
I have tried the reshape2 package with the melt function and have successfully produced one, but I had to explicitly create a dataset containing JUST the x-axis and variables that I want stacked.
It seems extremely counter-intuitive to me that we can't visualise data from a left to right sense (x-axis constant, y variables summed and overlapping).
Is there an alternate method, where I could simply perform a ggplot with the logic of:
ggplot(data=dataset, aes(x=Time, y1=var1, y2=var2, y3=var3.....)) +
geom_bar(stat="identity",position="stack")
where y1, y2, y3 are the variables I want stacked, but do not have corresponding flags for me to use a "fill=flag" type?
I basically want to work off one large master dataset and export multiple analysis without having to excessively isolate each dataset and melt it
In general a stacked bar chart is used to distinguish between variations within a single category of data. For example if you had a bar chart showing the population of three species of migratory fowl that inhabit one specific marsh.
The bars might be mallard ducks, muted swans & Canada geese. Each would have a single whole bar.
The stacking would come in when you looked at these with a trait or quality they might share which you were comparing, such as the number who migrate and those who overwinter locally. The population of each type of fowl would be split into two stacks in the bar, those migrating who are Canada geese, those not...and so on.
It is not really meant to bring together disparate traits into a stack.
So, if you have data that separates out categories of the same population, reshaping the data to create a set of individual types within your data in columns, then differentiating by factors in another (also all in the same column) that is the right move.
If you need to keep it extracted for some reason, you can probably use y = (x$1 +x$2 x$b) to create your stacks, but depending on the data that might fail miserably. The best thing to do is reshape so that the quality you are counting is in a column and you compare those members across some other column with stacks.
If you need to use the data in another format later, create a temporary table, plot and then remove() it and gc() after graphing to get your memory back
I have two values I wish to plot against each other in tableau. They are two totals aggregated around the same date. I can get them to the point where they are plotted on a dual access against the date like so:
but any attempt to plot them against each other for correlation has come to nothing. I've tried simple conversion to scatterplot, using calculated fields, using a cross tab with subtitles and attempting to only plot the subtotals against each other all of which have failed. I could do it in Excel but have to do it in tableau.
I have consulted the official Tableau 9.0 guide, google and existing questions on Stack Overflow all to no avail. If I was doing this in BOXI, I could just select the columns and chart them. How do I do the equivalent visualisation in Tableau?
You aren't clear about what type of chart you want to make.
Do you want a scatter plot? If so, put one measure on the row shelf, the other measure on the column shelf, and one or more dimensions (such as your date) on the detail shelf to define how finely to aggregate the data. Check the aggregation functions you use (SUM, AVG) and the aggregation level for your date fields (YEAR, MONTH ...) as desired. You probably want to use the second block of date aggregations on the menu unless you want to group all January data together regardless of year.
If you want a connected scatter plot, set the mark type from automatic to line and move the date field from the detail to the path shelf. You might also then want to put the date on size, color or legend to visually show the direction of time on the line. You might need to change that field to attribute in some cases to avoid creating multiple lines.
Tableau is fantastic once you learn how it works, and get a strong understanding of how choices about treating fields as dimensions or measures, or discrete or continuous impacts the behavior. If you skim over those details, you can still make beautiful charts by following recipes, mimicking examples (and asking StackOverflow), but Tableau's behavior will seem mysterious and arbitrary.
If you take some time to learn the fundamentals about how Tableau works, it will repay your time investment. I recommend Joshua Milligan's book Learning Tableau for a good way to start, along with the training videos on the Tableau website.
Using the basic plot function (plot.intervals.lmList) from an lme model (called meef1), I produced a massive graph of boxplots. My vector v2andv3commoditycombined has 98 levels.
plot(meef1, v2andv3commoditycombined~resid(.))
I would like to separate by the grouping values of my variable v2andv3commoditycombined to either graph them separately, order them, or exclude some. I'm not sure if there is code to do this or if I have to extract information from the lme output. If that is the case, I'm not sure what to extract to create the boxplots as extracting the residuals returns only one value for each level. If this is impossible, any advice on how to space out the commodity names would be equally helpful.
Thank you.
For each level of v2andv3commoditycombined, what exactly would you like your Y axis and your X axis to be? Since you're splitting the plots by v2andv3commoditycombined, you obviously can't also use that as one of your axes.
Let's pretend you just want do the traditional residuals on the Y axis and fitted values on the X axis, in a separate plot for each of the 98 levels. You can change the code to do plot whatever it is you actually want to plot.
As per ?plot.lme, you would do something like this:
plot(meef1,resid(.,type='pearson',level=1)~fitted(.,level=1)|v2andv3commoditycombined);
Make sure you stretch out your plot window beforehand so that it's nice and big, otherwise you might get an error saying something about margins. The following might produce a better-looking plot:
plot(meef1,resid(.,type='pearson',level=1)~fitted(.,level=1)|v2andv3commoditycombined,pch='.',cex=1.5,abline=0);
Since it wasn't clear from your question I went ahead and assumed you're interested in the individual level residuals (i.e. how much each datapoint differs from the predicted value given its random variables), and that you have one level of nesting in your random formula. If you want population residuals (i.e. how much each datapoint differs from the average predicted value), change both instances of level to say level=0. If you have K levels of nesting, change them to level=K and good luck.
I also assumed you wanted standardized residuals (because you can use the convenient rule of thumb that absolute values greater than 3 are possible outliers, regardless of what scale the original data are on). If not, see ?residuals.lme for other valid options for the type argument.
Oh, and the name of your variables suggests that you're looking at some sort of financial time series. If so, have a look at ACF(meef1) to see if there is a lot of autocorrelation. If there is, you could remedy it by instead fitting a model where the response (Y) variable is diff(...) the original variable. If you're seeing really skewed residuals, you might consider log-transforming your response variable before taking the diff.