How to plot seperate lines taken from a loop in R - r

So I have created a dataframe with data coming from a loop that runs several migration rates over a period of time and outputs number of people in several categories.
However when I try plotting this the output is read as a single line instead of several seperate ones for different migration rates.
What would be the easiest way to seperate these out?
Thanks!

Related

Is there a way to plot variable that can take multiple levels at once with ggplot

I have a data set of events I went to where one variable/column is the person that went with me. Sometimes it is one person, sometimes a group of multiple people.
I'm having trouble plotting this with ggplot. I've tried formatting the data but either, the list of people is just displayed as one cohesive list without the individual people in it or just the first person in the group is shown.
Is it possible to somehow format data in a way that one variable can take multiple levels at once? Or some package in ggplot that allows that?
Thanks a lot :)

R package for visualizing qualitative data -- coding events that occur over time

I'm fairly new to R, so I was hoping I could be pointed in the right direction.
I am looking for a package that allows me to construct graphs of some qualitative data.
The qualitative data consists of certain coded events that happen during during 30 minute chunks of time. The original data are from recorded videos, and the different events that occur have different codes. Ideally, the graphs would be a horizontal band divided into appropriately sized smaller chunks based on how long/when the different coded events occur during the 30 minute period. The idea is to provide a visual of what happens during various 30-minute chunks of time for different individuals. I am open to other types of visuals if they also seem appropriate! Thanks!

Is there a function to detect individual outliers in longitudinal data in R?

I have a dataset of 5,000 records and each of those records consists of a series of continuous measurements collected over a decade at various times. Each of the measurements was originally entered by manually and, as might be expected, there are a number of errors that need to be corrected.
Typically the incorrect data change by >50% from point to point, while data that is correct changes at most by 10% at any one time. If I visualize the data individually, these are very obvious in an X/Y plot with time on the X-axis.
It's not feasible to graph each of these individually, and I'm trying to figure out if there's a faster way to automate and flag the data that are obviously in error and need to be corrected/removed.
Does anyone have any experience with a problem like this?

R plot data.frame to get more effective overview of data

At work when I want to understand a dataset (I work with portfolio data in life insurance), I would normally use pivot tables in Excel to look at e.g. the development of variables over time or dependencies between variables.
I remembered from university the nice R-function where you can plot every column of a dataframe against every other column like in:
For the dependency between issue.age and duration this plot is actually interesting because you can clearly see that high issue ages come with shorter policy durations (because there is a maximum age for each policy). However the plots involving the issue year iss.year are much less "visual". In fact you cant see anything from them. I would like to see with once glance if the distribution of issue ages has changed over the different issue.years, something like
where you could see immediately that the average age of newly issue policies has been increasing from 2014 to 2016.
I don't want to write code that needs to be customized for every dataset that I put in because then I can also do it faster manually in Excel.
So my question is, is there an easy way to plot each column of a matrix against every other column with more flexible chart types than with the standard plot(data.frame)?
The ggpairs() function from the GGally library. It has a lot of capability for visualizing columns of all different types, and provides a lot of control over what to visualize.
For example, here is a snippet from the vignette linked to above:
data(tips, package = "reshape")
ggpairs(tips)

Shiny - Efficient way to use ggplot2(boxplot) & a 'reactive' subset function

I have a dataset with > 1000K rows and 5 columns. (material & prices been the relevant columns)
I have written a 'reactive' Shiny app which uses ggplot2 to create a boxplot of the price of the various materials.
e.g the user selects 4-5 materials from a list and then Shiny creates a boxplot of the price of each material :
Price spread of: Made of Cotton, Made of Paper, Made of Wood
It also creates a material combination data plot of the pricing spread of the combination of all the materials
e.g Boxplot of
Price spread of: Made of Cotton & Paper & Wood
It is working relatively quickly for the sample dataset (~5000 rows) but I am worried about scaling it effectively.
The dataset is static so I look at the following solutions:
Calculate the quartile ranges of the various materials (data <-
summary(data)) and then use googleViz to create a candle stick,
however I run into problems when trying to calculate the material combination plot as there are over 100 materials, so calculating
all the possible combinations offline is not feasible.
Calculate the quartile ranges of the various materials (data <- summary(data)) and then create a matrix which stores the row numberof the summary data (min,median,max,1st&3rd quartile) for each material. I can then use some rough calculations to establish the summary() data for the material combination plot,
and then plot using GoogleVIZ however I have little experience with this type of calculation using Shiny.
Can anyone suggest the most robust and scalable way to calculate & boxplot reactive subsets using Shiny?
I understand this a question related to method, rather than code, but I am new to the capabilities of R and am still digesting the different class capabilities, and don't want to 'miss a trick' so to speak.
As always thanks!
Please see below for methods reviewed.
Quartile Clustering: A quartile based technique for Generating Meaningful Clusters
http://arxiv.org/ftp/arxiv/papers/1203/1203.4157.pdf
Conditionally subsetting and calculating a new variable in dataframe in shiny
If you really have a dataset that has more than 1000K, which is 1M. It is probably in a flat file or in a database. You can always do some precalculations and store the result in a database table and use shiny app to call that table instead of loading everything into R every time people open up your shiny app.
I have built several shiny apps for internal use and the lesson I have learned is that: before you build your app, you need to carefully think about, how can I minimize the calculations for R and at the same time deliver the info to app user. Some of our data is 10billion+ and use Hive query will take more than 1 hour. Then I ended up precalculate result and put it on the crontab to update the result table every midnight.
I prefer, maybe your method2? or store the precalculation in a mysql database. (Maybe a Python script update the table once a day if you need some real-time feature later).

Resources