How can I create a histogram with unequal intervals in R - r

I'm having issues with creating a histogram with unequal breaks on R.
I tried to code as what has been answered on this site, however, I'm a very beginner at coding and I'm currently using the options which R offers. I need to create this histogram on this specific program because it's my college assingment.
What I've done so far: I import the data in a .xlsx file, click on "Graphs" and "Histogram"... and then I select "Density" as my y-axis. However I can't select my intervals.
I really don't know what to do. Here's the matrix :
Here are the translated column titles :
Intervals of BEFi Values
Number of patients at Session 1
Number of patients at Session 5
Number of patients at Session 8

Related

Plot all pairs of variables in R data frame based on column type [duplicate]

This question already has answers here:
Scatterplot matrixes with boxplots for categorical data
(1 answer)
Create a matrix of scatterplots (pairs() equivalent) in ggplot2
(4 answers)
Closed 29 days ago.
This post was edited and submitted for review 28 days ago and failed to reopen the post:
Original close reason(s) were not resolved
I’m fairly sure I saw a package that did this, but I cannot find its name in my notes.
This package produces a plot for each pair of variables in a data frame, but chooses the plot based on the columns’ types. So, two numeric variables would produce a scatterplot. A numeric y and categorical x would produce side-by-side box plots. Like that. It’s this multiple column type ability that distinguishes it from the packages I can find by Googling.
Perhaps I should say that I’m certain I saw it, and didn’t see a bunch of surrounding code with loops or purr calls looping over the data, so I’m guessing there was a package that did it.
You're probably thinking of GGally::ggpairs:
library(GGally)
ggpairs(iris)

R newbie- is there a way to separate or filter out items listed in a single cell for plotting purposes?

Problem
R and stack overflow newbie here so try and be patient with me. I am currently working on a data.frame that will act as a summary of various modeling approaches used to predict either fall events or fall rates within an in-patient setting based on a range of hospital, environmental and individual-level variables.
My data is in long format and some studies have several rows (I have created a row for each model type, with some studies having built multiple). For some columns (i.e., Model performance) I have multiple entries separated by a comma (e.g., C-statistic, Hosmer-Lemeshow test, likelihood ratio, and so forth). My question is, is there a way to separate these so I can create a barplot in ggplot2 that shows the prevalence of different methods and there is one bar per statistic/test type, with the height of the bar being a count of the number of instances in the data frame it occurs? At the moment this obviously does not work as some bars have a label that contains all of the values (i.e, C-statistic, Hosmer-Lemeshow test, likelihood ratio), which means there can be multiple bars that contain "c-statistic" for example, because the list is slightly different.
Screenshots and code
I have attached a screenshot of my data.frame below. The column I refer to is "Statistic.reported"
Screenshot of datadrame:
I have also attached an image of what happens when I create a basic barplot with the following code:
Bar <- ggplot(Modelling.Data, aes(x=Statistic.reported)) +geom_bar()+ theme_classic()
Image of plot using current basic code:~
Things I have tried
I have tried using the tidyr package function seperate_rows my code for this was as follows
separate_rows(Modelling.Data,Modelling.Data$Statistic.reported, sep = ",")
From this I got an error that said "Can't subset columns that don't exist".
Hopefully, this makes sense, but I'm really new to all of this so if you need anything else please tell me. Any tips or advice would be hugely appreciated! Apologies in advance for my complete lack of knowledge.

Set up and running function for multiple observations and variables

I have a question about the setup and execution of a function to some multivariate data.
My data file is set up in excel with each variable as individual sheets, and each trajectory as a row of data (100 trajectories in total). The values within each row across 365 columns show the measurements associated with the respective variable across time (daily measurements over 1 year).
I’ve done some analysis of 1 trajectory by setting up my data manually in a separate excel file, where I’ve got 16 columns containing separate variables, and 365 rows containing the associated data from each daily measurement. I’ve imported this into R as ‘Traj1’ and set up the function as follows;
> T1 <- Traj1[,1:16]
> multi.fun <- function(T1) {c(summary(T1),sd(T1), skewness(T1), kurtosis(T1), shapiro.test(T1))}
However, I need to do this with 100 trajectories, and this is extremely inefficient (both in R and Excel time).
I’m not sure how best to set this up in R with my initial excel file set up, and how this function should be set up so that I can batch execute and export the output into a new excel file.
Sorry I am new to programming in general and haven’t had much experience in dealing with large data sets. Any help is really appreciated.

Shiny - Efficient way to use ggplot2(boxplot) & a 'reactive' subset function

I have a dataset with > 1000K rows and 5 columns. (material & prices been the relevant columns)
I have written a 'reactive' Shiny app which uses ggplot2 to create a boxplot of the price of the various materials.
e.g the user selects 4-5 materials from a list and then Shiny creates a boxplot of the price of each material :
Price spread of: Made of Cotton, Made of Paper, Made of Wood
It also creates a material combination data plot of the pricing spread of the combination of all the materials
e.g Boxplot of
Price spread of: Made of Cotton & Paper & Wood
It is working relatively quickly for the sample dataset (~5000 rows) but I am worried about scaling it effectively.
The dataset is static so I look at the following solutions:
Calculate the quartile ranges of the various materials (data <-
summary(data)) and then use googleViz to create a candle stick,
however I run into problems when trying to calculate the material combination plot as there are over 100 materials, so calculating
all the possible combinations offline is not feasible.
Calculate the quartile ranges of the various materials (data <- summary(data)) and then create a matrix which stores the row numberof the summary data (min,median,max,1st&3rd quartile) for each material. I can then use some rough calculations to establish the summary() data for the material combination plot,
and then plot using GoogleVIZ however I have little experience with this type of calculation using Shiny.
Can anyone suggest the most robust and scalable way to calculate & boxplot reactive subsets using Shiny?
I understand this a question related to method, rather than code, but I am new to the capabilities of R and am still digesting the different class capabilities, and don't want to 'miss a trick' so to speak.
As always thanks!
Please see below for methods reviewed.
Quartile Clustering: A quartile based technique for Generating Meaningful Clusters
http://arxiv.org/ftp/arxiv/papers/1203/1203.4157.pdf
Conditionally subsetting and calculating a new variable in dataframe in shiny
If you really have a dataset that has more than 1000K, which is 1M. It is probably in a flat file or in a database. You can always do some precalculations and store the result in a database table and use shiny app to call that table instead of loading everything into R every time people open up your shiny app.
I have built several shiny apps for internal use and the lesson I have learned is that: before you build your app, you need to carefully think about, how can I minimize the calculations for R and at the same time deliver the info to app user. Some of our data is 10billion+ and use Hive query will take more than 1 hour. Then I ended up precalculate result and put it on the crontab to update the result table every midnight.
I prefer, maybe your method2? or store the precalculation in a mysql database. (Maybe a Python script update the table once a day if you need some real-time feature later).

Multiple Bar plot in one graphs in R [duplicate]

This question already has an answer here:
Closed 12 years ago.
Possible Duplicate:
Multiple Bar plot in one graphs in R
Hi,
I'm a beginner to R.
I need to create a graph like
http://i.stack.imgur.com/az56z.jpg
I dont know how to produce my entire dataset. The basic idea is some exon id would have more than one subgroups. I need to plot all the values in bar plots within that exon id
How can I do that in R?
I had to do R in my stats class last semester. For the future if you google r-code it yields better results. I know that just searching for r always makes annoying results.
If you set up your dataset as a value say
library(gdata)
dataset = read.csv('blahh.csv')
barplot(dataset, main="blahh",
xlab="blahh")

Resources