How to import data and create a scatter plot in R? - r

I have 334 records, with two columns:
Column (1): Resolution
Column (2): Number of images with a specific resolution
How can I make a scatter plot in R with this data? Is there a way to import the records, since it will be time consuming to enter 334 records?

If you do not know how to get data into R nor create a scatterplot, it sounds like you are very new to R.
You might want to use a program that lends a hand.
rstudio has a Workspace - import dataset menu; I'd recommend Rstudio, particularly if you are very new to R.
Rcmdr also has GUI options for getting data into R
As always Quick-R provides a helpful starting point:
importing data
scatterplots
More generally, it sounds like you need to spend some time with some introductory instructional material on R. Here are my suggested startings points.

Related

R shiny data limitation

I have a question about the limitation of data in R shiny. I am now working on updating a previous project. The original data is around 5MB and the program would resample the data to obtain an estimate for future values. I am not updating the program to make it more general, where I try to import 300MB data. However, the R shiny would crack. I have used R to handle larger data before. But I am not sure if R shiny has any limitation of data size. Does anyone have any idea about it. Thanks.

Is shiny a good solution to display a computationally intensive fixed big dataset?

Here is my problem:
I have a big dataset that in R that represent an object of ~500MB that I plot with ggplot2.
There is 20 millions num values to plot along an int axis that are associated with a 5 level factor for color aesthetics.
I would like to set up a webapps where users could visualize this dataset, using different filter that rely on the factor to display all the data are once or for example a subset corresponding to 1 level of the factor.
The problem is that when I write the plot it takes a couple of minute (~10 minutes)
Solution 1 : The best one for the user would be to use Shiny UI. But is there a way to have the plot already somehow prewritten thanks to ggplot2 or shiny tricks so it can be quickly displayed?
Solution 2 : Without shiny, I would have done different plots of the dataset already and I will have to rebuild a UI to let user visualizes the different pictures. If I do that I will have to restrict the possible use cases of displaying the data.
Looking forward for advices and discussions
Ideally, you shouldn't need to plot anything this big really. If you're getting the data from a database then just write a sequence of queries that will aggregate the data on the DB side and drag very little data to output in shiny. Seems to be a bad design on your part.
That being said, the author of highcharter package did work on implementing boost.js module to help with plotting millions of points. https://rpubs.com/jbkunst/highcharter-boost.
Also have a look at the bigvis package, which allows 'Exploratory data analysis for large datasets (10-100 million observations)' and has been built by #Hadley Wickham https://github.com/hadley/bigvis. There is a nice presentation about the package at this meetup
Think about following procedure:
With ggplot2 you can produce an R object.
plot_2_save <- ggplot()
an object can be saved by
saveRDS(object, "file.rds")
and in the shiny server.R you can load this data
plot_from_data <- readRDS("path/.../file.rds")
I used this setup for some kind of text classification with a really (really) huge svm model implemented as an application on shiny-server.

R monthly plot with GGPlot2 in RMarkdown

I have a CSV file with the following data:
RegistrationDate;User_Id;Items
RegistrationDate has format like '22.05.2014 14:25'
Is there any easy way to connect CSV data to R markdown script? All of examples I've seen use random generated data, that looks too bad for reproducible research.
I need to create 2 plots with ggplot2:
a plot of users count per month.
a plot of items collected per month
I've checked a lot of graphs looks close to this one, but didn't find any right version. Looks like I don't understand something about R plotting :(.
What do you mean when you say "connect" CSV data to R markdown sript?
You mean reading them?
data <- read.csv("directory/name.csv", sep=";")
Or you mean adding somewhere a specific relation, telling that this data is related to this analysis? If the second, you can check the archivist package, stored on GitHub, that possess a set of tools for datasets and figures archivisation. There is a information how to install that package.
In question 2, you will need an extra column that specifies the month.

Shiny - Efficient way to use ggplot2(boxplot) & a 'reactive' subset function

I have a dataset with > 1000K rows and 5 columns. (material & prices been the relevant columns)
I have written a 'reactive' Shiny app which uses ggplot2 to create a boxplot of the price of the various materials.
e.g the user selects 4-5 materials from a list and then Shiny creates a boxplot of the price of each material :
Price spread of: Made of Cotton, Made of Paper, Made of Wood
It also creates a material combination data plot of the pricing spread of the combination of all the materials
e.g Boxplot of
Price spread of: Made of Cotton & Paper & Wood
It is working relatively quickly for the sample dataset (~5000 rows) but I am worried about scaling it effectively.
The dataset is static so I look at the following solutions:
Calculate the quartile ranges of the various materials (data <-
summary(data)) and then use googleViz to create a candle stick,
however I run into problems when trying to calculate the material combination plot as there are over 100 materials, so calculating
all the possible combinations offline is not feasible.
Calculate the quartile ranges of the various materials (data <- summary(data)) and then create a matrix which stores the row numberof the summary data (min,median,max,1st&3rd quartile) for each material. I can then use some rough calculations to establish the summary() data for the material combination plot,
and then plot using GoogleVIZ however I have little experience with this type of calculation using Shiny.
Can anyone suggest the most robust and scalable way to calculate & boxplot reactive subsets using Shiny?
I understand this a question related to method, rather than code, but I am new to the capabilities of R and am still digesting the different class capabilities, and don't want to 'miss a trick' so to speak.
As always thanks!
Please see below for methods reviewed.
Quartile Clustering: A quartile based technique for Generating Meaningful Clusters
http://arxiv.org/ftp/arxiv/papers/1203/1203.4157.pdf
Conditionally subsetting and calculating a new variable in dataframe in shiny
If you really have a dataset that has more than 1000K, which is 1M. It is probably in a flat file or in a database. You can always do some precalculations and store the result in a database table and use shiny app to call that table instead of loading everything into R every time people open up your shiny app.
I have built several shiny apps for internal use and the lesson I have learned is that: before you build your app, you need to carefully think about, how can I minimize the calculations for R and at the same time deliver the info to app user. Some of our data is 10billion+ and use Hive query will take more than 1 hour. Then I ended up precalculate result and put it on the crontab to update the result table every midnight.
I prefer, maybe your method2? or store the precalculation in a mysql database. (Maybe a Python script update the table once a day if you need some real-time feature later).

Binning NMR data in R

I've imported NMR spectra on R as .csv file ( first column represent the ppm values the others, signal intensity for various spectra) and I would like to bin the data, let's say, make every 5 points one. Any suggestions?
Cheers,
Marcelo
Marcelo, you can look at ChemoSpec on GitHub here: https://github.com/bryanhanson/ChemoSpec
The function binBuck will do what you ask. There is a fairly complete vignette available once you have the package installed.
To use ChemoSpec, you may have to import your data set differently than you apparently currently have it, or if you have the skills you can modify what you have now. Again, the vignette explains how ChemoSpec stores the data.
Let me know if you need further assistance. Bryan
I know it's an old question, but it can be useful for other users.
You can use "prospectr" package in R through function "binning". You can set "bin" as your final spectral size or "bin.size" for as ratio.

Resources