Visualizing Data time series using the zoo package - r

I am loading time series data using the read.zoo function. I noticed that when loading time series using zoo package it doesn't display as a data frame and when clicked on in displayed as shown in the picture.
One cannot discern what the data looks like from this. While data pulled using the read.csv/read.table are labeled as a data.frame and displayed in neat manner when clicked on. I know I can simply use the View(data) command but this is very cumbersome, I am sorry to be picky but it would be nice to simply click on the data and have it displayed with the appropriate columns and rows.
I also noticed that when I generate variables using the data-set that the new variables are never attached to the data-set in which they were created and therefore must use the data=merge(data,newvariable) command to combine it to the initial data.
Are there any techniques that can be employed to fix these two issues?

Related

Using `data()` for time series objects in R

I apologise if this question has been asked already (I haven't been able to find it). I was under the impression that I could access datasets in R using data(), for example, from the datasets package. However, this doesn't work for time series objects. Are there other examples where this is not the case? (And why?)
data("ldeaths") # no dice
ts("ldeaths") # works
(However, this works for data("austres"), which is also a time-series object).
The data function is designed to load package data sets and all their attributes, time series or otherwise.
I think the issue your having is that there is no stand-alone data set called ldeaths in the datasets package. ldeaths does exist as 1 of 3 data sets in the UKLungDeaths data set. The other two are fdeaths and mdeaths.
The following should lazily load all data sets.
data(UKLungDeaths)
Then, typing ldeaths in the console or using it as an argument in some function will load it.
str(ldeaths)
While it is uncommon for package authors to include multiple objects in 1 data set, it does happen. This line from the data function documentation gives on a 'heads up' about this:
"For each given data set, the first two types (‘.R’ or ‘.r’, and ‘.RData’ or ‘.rda’ files) can create several variables in the load environment, which might all be named differently from the data set"
That is the case here, as while there are three time series objects contained in the data set, not one of them is named UKLungDeaths.
This choice occurs when the package author uses the save function to write multiple R objects to an external file. In the wild, I've seen folks use the save function to bundle a description file with the data set, although this would not be the proper way to document something in a full on package. If your really curious, go read the documentation on the save function.
Justin
r

Is shiny a good solution to display a computationally intensive fixed big dataset?

Here is my problem:
I have a big dataset that in R that represent an object of ~500MB that I plot with ggplot2.
There is 20 millions num values to plot along an int axis that are associated with a 5 level factor for color aesthetics.
I would like to set up a webapps where users could visualize this dataset, using different filter that rely on the factor to display all the data are once or for example a subset corresponding to 1 level of the factor.
The problem is that when I write the plot it takes a couple of minute (~10 minutes)
Solution 1 : The best one for the user would be to use Shiny UI. But is there a way to have the plot already somehow prewritten thanks to ggplot2 or shiny tricks so it can be quickly displayed?
Solution 2 : Without shiny, I would have done different plots of the dataset already and I will have to rebuild a UI to let user visualizes the different pictures. If I do that I will have to restrict the possible use cases of displaying the data.
Looking forward for advices and discussions
Ideally, you shouldn't need to plot anything this big really. If you're getting the data from a database then just write a sequence of queries that will aggregate the data on the DB side and drag very little data to output in shiny. Seems to be a bad design on your part.
That being said, the author of highcharter package did work on implementing boost.js module to help with plotting millions of points. https://rpubs.com/jbkunst/highcharter-boost.
Also have a look at the bigvis package, which allows 'Exploratory data analysis for large datasets (10-100 million observations)' and has been built by #Hadley Wickham https://github.com/hadley/bigvis. There is a nice presentation about the package at this meetup
Think about following procedure:
With ggplot2 you can produce an R object.
plot_2_save <- ggplot()
an object can be saved by
saveRDS(object, "file.rds")
and in the shiny server.R you can load this data
plot_from_data <- readRDS("path/.../file.rds")
I used this setup for some kind of text classification with a really (really) huge svm model implemented as an application on shiny-server.

R plot data.frame to get more effective overview of data

At work when I want to understand a dataset (I work with portfolio data in life insurance), I would normally use pivot tables in Excel to look at e.g. the development of variables over time or dependencies between variables.
I remembered from university the nice R-function where you can plot every column of a dataframe against every other column like in:
For the dependency between issue.age and duration this plot is actually interesting because you can clearly see that high issue ages come with shorter policy durations (because there is a maximum age for each policy). However the plots involving the issue year iss.year are much less "visual". In fact you cant see anything from them. I would like to see with once glance if the distribution of issue ages has changed over the different issue.years, something like
where you could see immediately that the average age of newly issue policies has been increasing from 2014 to 2016.
I don't want to write code that needs to be customized for every dataset that I put in because then I can also do it faster manually in Excel.
So my question is, is there an easy way to plot each column of a matrix against every other column with more flexible chart types than with the standard plot(data.frame)?
The ggpairs() function from the GGally library. It has a lot of capability for visualizing columns of all different types, and provides a lot of control over what to visualize.
For example, here is a snippet from the vignette linked to above:
data(tips, package = "reshape")
ggpairs(tips)

reaching max.print on R

I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!
Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.

Tableau to R connection - script_real returning rounded fraction numbers

I'm pretty new to Tableau but have a lot of experience with R. Everytime I use SCRIPT_REAL to call an R function based on Tableau aggregates, I get back a number that seems to be like the closest fraction approximation. For example if raw R gives me .741312, Tableau will spit out .777778, and so on. Does anything have any experience with this issue?
I'm pretty sure this is an aggregation issue.
From the Tableau and R Integration post by Jonathan Drummey on their community site:
Using Every Row of Data - Disaggregated Data For accurate results
for the R functions, sometimes those R functions need to be called
with every row in the underlying data. There are two solutions to
this:
Disaggregate the measures using Analysis->Aggregate Measures->Off. This doesn’t actually cause the measures to stop their
aggregations, instead it tells Tableau to return every row in the data
without aggregating by the dimensions on the view (which gives the
wanted effect). Using this with R scripts can get the desired results,
but can cause problems for views that we want to have R work on the
non-aggregated data and then display the data with some level of
aggregation.
The second solution deals with this situation: Add a
dimension such as a unique Row ID to the view, and set the Compute
Using (addressing) of the R script to be along that dimension. If
we’re doing some sort of aggregation with R, then we might need to
reduce the number of values returned by filtering them out with
something like:
IF FIRST()==0 THEN SCRIPT_REAL('insert R script here') END
If we need to then perform additional aggregations on that
data, we can do so with table calculations with the appropriate
Compute Usings that take into account the increased level of detail in
the view.

Resources