Community,
I have a question regarding the STM package for R and hope that you can help me find an answer.
In figure 7 of the vignette the authors present a graph, where the topic prevalence (for topic 7) over time can be seen. Is it possible to plot the same graph by adding two further lines: one for liberal and one for conservative?
Liberal and conservative are attributes of the variable "rating"
The plot in figure 7 shows you the topic proportion of one topic (in this case topic 7) over a time span of January 2008 to December 2008. Basically, this graph shows you how your topic is distributed over a specific time frame. You can plot other topics in this graph, so yes, you can add more lines, but you cannot add a variable, like liberal/conservative, to plot in this graph.
Also, you might want to have a look at this -- it would be useful to add a reproducible example for clarity
How to make a great R reproducible example
Related
At work when I want to understand a dataset (I work with portfolio data in life insurance), I would normally use pivot tables in Excel to look at e.g. the development of variables over time or dependencies between variables.
I remembered from university the nice R-function where you can plot every column of a dataframe against every other column like in:
For the dependency between issue.age and duration this plot is actually interesting because you can clearly see that high issue ages come with shorter policy durations (because there is a maximum age for each policy). However the plots involving the issue year iss.year are much less "visual". In fact you cant see anything from them. I would like to see with once glance if the distribution of issue ages has changed over the different issue.years, something like
where you could see immediately that the average age of newly issue policies has been increasing from 2014 to 2016.
I don't want to write code that needs to be customized for every dataset that I put in because then I can also do it faster manually in Excel.
So my question is, is there an easy way to plot each column of a matrix against every other column with more flexible chart types than with the standard plot(data.frame)?
The ggpairs() function from the GGally library. It has a lot of capability for visualizing columns of all different types, and provides a lot of control over what to visualize.
For example, here is a snippet from the vignette linked to above:
data(tips, package = "reshape")
ggpairs(tips)
I recently started to work with a huge dataset, provided by medical emergency
service. I have cca 25.000 spatial points of incidents.
I am searching books and internet for quite some time and am getting more and more confused about what to do and how to do it.
The points are, of course, very clustered. I calculated K, L and G function
for it and they confirm serious clustering.
I also have population point dataset - one point for every citizen, that is similarly clustered as incidents dataset (incidents happen to people, so there is a strong link between these two datasets).
I want to compare these two datasets to figure out, if they are similarly
distributed. I want to know, if there are places, where there are more
incidents, compared to population. In other words, I want to use population dataset to explain intensity and then figure out if the incident dataset corresponds to that intensity. The assumption is, that incidents should appear randomly regarding to population.
I want to get a plot of the region with information where there are more or less incidents than expected if the incidents were randomly happening to people.
How would you do it with R?
Should I use Kest or Kinhom to calculate K function?
I read the description, but still don't understand what is a basic difference
between them.
I tried using Kcross, but as I figured out, one of two datasets used
should be CSR - completely spatial random.
I also found Kcross.inhom, should I use that one for my data?
How can I get a plot (image) of incident deviations regarding population?
I hope I asked clearly.
Thank you for your time to read my question and
even more thanks if you can answer any of my questions.
Best regards!
Jernej
I do not have time to answer all your questions in full, but here are some pointers.
DISCLAIMER: I am a coauthor of the spatstat package and the book Spatial Point Patterns: Methodology and Applications with R so I have a preference for using these (and I genuinely believe these are the best tools for your problem).
Conceptual issue: How big is your study region and does it make sense to treat the points as distributed everywhere in the region or are they confined to be on the road network?
For now I will assume we can assume they are distributed anywhere.
A simple approach would be to estimate the population density using density.ppp and then fit a Poisson model to the incidents with the population density as the intensity using ppm. This would probably be a reasonable null model and if that fits the data well you can basically say that incidents happen "completely at random in space when controlling for the uneven population density". More info density.ppp and ppm are in chapters 6 and 9 of 1, respectively, and of course in the spatstat help files.
If you use summary statistics like the K/L/G/F/J-functions you should always use the inhom versions to take the population density into account. This is covered in chapter 7 of 1.
Also it could probably be interesting to see the relative risk (relrisk) if you combine all your points in to a marked point pattern with two types (background and incidents). See chapter 14 of 1.
Unfortunately, only chapters 3, 7 and 9 of 1 are availble as free to download sample chapters, but I hope you have access to it at your library or have the option of buying it.
I'm still a novice in R and I read quite a couple of posts and discussions on how to filter out frequency domains in a time series, but none of those quite matched my problem.
I would like to ask for your suggestions about the following:
I calculated wavelet coherence for two annually measured time series and taking a look at the wavelet coherence PSD graph:
The purple line (i.e. 8 year period) represents the border under which I would like to filter out the frequency domain, but not in the PSD, but in the original input data.
I though about using the butter function from the signal package, but it was overcomplicated for my purposes.
Thus I approached the problem with the bwfilter function of the mFilter package fo pass through the data over the 8 year period which corresponds to 2.37E-7 Hz.
name="dta OAK.resid Tair "
adat=read.table(file=paste(name,".csv", sep=""), sep=";", header=T)
dta=adat$ya
highpass <- bwfilter(dta, freq=8,drift=FALSE)
plot(highpass)
However, the results do not seem to be correct, because it seems to filter out too much from the data, the trend is too much aligned to the original time series.
Do you have any idea what may have gone wrong? The measurement unit maybe?
Any help is appreciated and if any additional details are needed I am happy to provide them!
Thank you!
The data can be found here
i have a problem with clustering time series in R.
I googled a lot and found nothing that fits my problem.
I have made a STL-Decomposition of Timeseries.
The trend component is in a matrix with 64 columns, one for every series.
Now i want to cluster these series in simular groups, involve the curve shapes and the timely shift. I found some functions that imply one of these aspects but not both.
First i tried to calculte a distance matrix with the dtw-distance so i
found clusters based on the values and inply the time shift but not on the shape of the timeseries. After this i tried some correlation based clustering, but then the timely shift
we're not recognized and the result dont satisfy my claims.
Is there a function that could cover my problem or have i to build up something
on my own. Im thankful for every kind of help, after two days of tutorials and examples i totaly uninspired. I hope i could explain the problem well enough to you.
I attached a picture. Here you can see some example time series.
There you could see the problem. The two series in the middle are set to one cluster,
although the upper and the one on the bottom have the same shape as one of the middle.
Have you tried the R package dtwclust
https://cran.r-project.org/web/packages/dtwclust/index.html
(I'm just starting to explore this package, but it seems like it covers a lot of aspects of time series clustering and it has lots of good references.)
you can use the kml package. It is used specifically to longitudinal data. You can consult its help. It has the next example:
### Generation of some data
cld1 <- generateArtificialLongData(25)
### We suspect 3, 4 or 6 clusters, we want 3 redrawing.
### We want to "see" what happen (so printCal and printTraj are TRUE)
kml(cld1,c(3,4,6),3,toPlot='both')
### 4 seems to be the best. We want 7 more redrawing.
### We don't want to see again, we want to get the result as fast as possible.
kml(cld1,4,10)
Example cluster
This question already has an answer here:
How can I produce plots like this?
(1 answer)
Closed 9 years ago.
Just read the "Mining time series data" pdf by Ratanamahatana, Lin, Gunopulos and Keogh. Did someone know how to visualize time series clusters in R like in the Figure 1.7?
You can visualize 100s of Time Series sequences with Sparklines. If you also want to the Hierarchical ordering, the you could attain that in 2 steps.
Sort your data.frame of Times Series sequences by their multi-level clusters. (This assumes that you have computed the cluster hierarchy for each series.)
Download and install the SparkTable in your R setup. Now plot the Sparklines for your TS sequences. Take a look at this Inside-R page for SparkEPS.
This answer on statExchange is exactly what you need for the plotting part, so I am not reproducing the same example here.
Hope that helps.
This figure most likely is made with a drawing program, not with a data mining software.
Nobody would run cluster analysis on 6 observations like this. It's easier to look at them visually and do it manually than figuring out how to have a program visualize it this way.