How to graph multiple lines in a single plot ggplot2? - r

I'm working on a R Shiny program that can take any csv file and output graphs of it. The user who uploads the csv has some guidelines on how the data should look, but I don't want it to be too strict.
I'm currently trying to use ggplot2 to graph multiple lines of the same dataset on one plot for comparison.
The data I am currently uploading looks like this (simplified, as the data has over 1000 rows):
Date Hamburgers Salads Sodas Fries
12-01 4 4 3 2
12-02 1 7 3 9
12-03 22 24 45 34
12-04 23 44 46 22
I'm trying to output a graph that has the time on the X-axis (the user chooses this via a sidebar, as he can choose any axis, but time makes the most sense here). For the Y axis, I want 4 lines, colored differently, plotting each variable over time.
I have all of the 'user taking in input and choosing which columns to graph' implemented, but for simplicity's sake, we can assume that for the most part, this has been hard coded (so Y variable will actually be input$y, etc in my implementation)
The portion of my code where I try to graph the data is:
output$plotLine <- renderPlot({
p <- ggplot(data, aes_string(x=X, y=Y), environment = environment())
p <- p + geom_point(size = 3)
p <- p + geom_line(aes(group=1))
print(p)
})
This plots one of the lines, but I have no idea how to plot the others on the same plot. I've read about using 'group' in the aes function, but this depends on having a classifier in the dataset, which this one currently does not have.
I have also looked into the melt() function from the reshape2 package but am not sure how it would help me (both for the multiple line problem and the greater sense of this project, so that the user doesn't have to abide by strict rules for upload format of the csv).
Any help would be much appreciated!

Assuming you put the xaxis variable (Date) in selectedxaxis, the selected products in selectedproducts and with data holding the loaded data:
selectedxaxis = "Date"
selectedproducts = c("Sodas", "Salads")
widedata = subset(data, select = c(selectedxaxis, selectedcolumns))
longdata = melt(widedata, id.vars=selectedxaxis, variable.name='Product', value.name='Count')
ggplot(longdata) + geom_line(aes(Date, Count, color=Product))

Related

Making a line graph with certain X + Y values expressed differently with lines of 33 user IDs in R

I'm trying to put ActivityDate on the X Axis, and Calories on the Y Axis, relating to how 33 different users ranged in their calorie burnings daily. I'm new to ggplot and visualizations as you can tell, so I'd appreciate the most basic solution that I can understand. Thank you so much.
I really tried several iterations of this code, and each one of them weren't quite right in how the visualization turned out. Here are a couple of my thoughts:
##first and foremost:
install.packages("tidyverse") install.packages("here") library(tidyverse) library(here)
Attempt 1 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=Id, color=ActivityDate))
Attempt 1 Bar Graph
##Not probably the best for stakeholders, but if I could maybe have the bars a little closer together that might help, so I tried to identify the unique IDs. Perhaps the reason why they are so small is that they appear in long number format, and are not sequential, so it could be adding the extra space and making the bars so small because of the spaces of empty sequential numbers.
Attempt 2 Bar Graph
UId <- unique("Id") ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=UId, color=ActivityDate))
Attempt 2 Bar Graph
##Facepalm, definitely not what I was looking for at all, but that was my effort to solve the above problem.
Attempt 3 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=ActivityDate, fill=Id)) + theme(axis.text.x = element_text(angle=45))
Attempt 3 Bar Graph
##The fill function does not work, and on the y-axis if you will, I don't know what "count" is referring to in this case, so could be useful except for those two issues.
##Finally, I switch to a line graph
Attempt 4 Line Graph
ggplot(data=trimmed_dactivity) + geom_line(mapping=aes(x=ActivityDate, y=Calories)) + theme(axis.text.x = element_text(angle=45))
Attempt 4 Line Graph
##Now what I get is separate lines going up and down, and what I want is 33 separate lines representing unique Id numbers to travel along the x axis for time, and rise in the y axis for calories. Of course I'm not sure how to do that...
Any help with what I'm missing on this journey here?
what I want is 33 separate lines representing unique Id numbers…
It sounds like you want a spaghetti plot. To make one, map Id to color (or to group if you don’t want each id to be colored differently).
library(ggplot2)
ggplot(fakedata, aes(ActivityDate, Calories)) +
geom_line(aes(color = factor(Id)), show.legend = FALSE)
Example data:
set.seed(13)
fakedata <- expand.grid(
Id = 1:33,
ActivityDate = seq(as.Date("2016-04-13"), length.out = 10, by = "day")
)
fakedata$Calories <- round(rnorm(330, 2500, 500))

R: creating a likert scale barplot

I'm new to R and feeling a bit lost ... I'm working on a dataset which contains 7 point-likert-scale answers.
My data looks like this for example:
My goal is to create a barplot which displays the likert scale on the x-lab and frequency on y-lab.
What I understood so far is that I first have to transform my data into a frequency table. For this I used a code that I found in another post on this site:
data <- factor(data, levels = c(1:7))
table(data)
However I always get this output:
data
1 2 3 4 5 6 7
0 0 0 0 0 0 0
Any ideas what went wrong or other ideas how I could realize my plan?
Thanks a lot!
Lorena
This is a very simple way of handling your question, only using base-R
## your data
my_obs <- c(4,5,3,4,5,5,3,3,3,6)
## use a factor for class data
## you could consider making it ordered (ordinal data)
## which makes sense for Likert data
## type "?factor" in the console to see the documentation
my_factor <- factor(my_obs, levels = 1:7)
## calculate the frequencies
my_table <- table(my_factor)
## print my_table
my_table
# my_factor
# 1 2 3 4 5 6 7
# 0 0 4 2 3 1 0
## plot
barplot(my_table)
yielding the following simple barplot:
Please, let me know whether this is what you want
Lorena!
First, there's no need to apply factor() neither table() in the dataset you showed. From what I gather, it looks fine.
R comes with some interesting plotting options, hist() is one of them.
Histogram with hist()
In the following example, I'll use the "Valenz" variable, as named in your dataset.
To get the frequency without needing to beautify it, you can simply ask:
hist(dataset, Valenz)
The first argument (dataset) informs where these values are; the second argument (Valenz) informs which values from dataset you want to use.
If you only want to know the frequency, without having to inform it in some elegant way, that oughta do it (:
Histogram with ggplot()
If you want to make it prettier, you can style your plot with the ggplot2 package, one of the most used packages in R.
First, install and then load the package.
install.packages("ggplot2")
library(ggplot2)
Then, create a histogram with x as the number of times some score occurred.
ggplot(dataset, aes(x = Valenz)) +
geom_histogram(bins = 7, color = "Black", fill = "White") +
labs(title = NULL, x = "Name of my variable", y = "Count of 'Variable'") +
theme_minimal()
ggplot() takes the value of your dataframe, then aes() specifies you want Valenz to be in the x-axis.
geom_histogram() gives you a histogram with "bins = 7" (7 options, since it's a likert scale), and the bars with "color = 'Black'" and "fill = 'White'".
labs() specifies the labels that appear beneath x ("x = "Name of my variable") and then by y (y = "Count of 'Variable'").
theme_minimal() makes the plot look cooler.
I hope I helped you in some way, Lorena. (:

Easy way to view multiple Y variables against same X

I want to visualize many time series at once. I am new at R, and have spent about 6 hours searching the web and reading about how to tackle this relatively simple problem. My dataset has five time points arranged as rows, and 100 columns. I can easily plot any column against the time points with qplot(time, var2, geom="line"). But I want to learn how to do this for a flexible number of columns, and how to print 6 to 12 of the individual graphs on one page.
Here I learned about the multiplot function, got that to work in terms of layout.
What I am stuck on is how for get the list of variables into a FOR statement so I can have one statement to plot all the variables against the same five time points.
this is what I am playing with. It makes 9 plots, 3 columns wide, but I do not know how to get all my variables into the array for yvars?
for (i in 1:9) {
p1 = qplot(symbol,yvar, geom ="smooth", main = i))
plots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = plots, cols = 3)
Stupidly on my part right now it makes 9 identical plots. So how do I create the list so the above will cycle through all my columns and make those plots?
first melt all your data using the reshape2 package
datm <- melt(your.original.data.frame, id = "time")
Now plot it using facets:
qplot(time, value, data = datm, facets= variable ~ ., geom="point")
Let me know if this works. If you could, please upload your data, it would help tremendously.

Print five point summary values(Min,Q1,Median,Q3,Max) on the boxplot

I am trying to create a simple boxplot with all the labels. I have a dataset that says about the Number of customer Visits .It has two columns; Customer ID and AvgVists
custID AvgVisits
1 10
2 4
3 12
I want a simple boxplot that is horizontally oriented and displays the five summary points on the graph, with nice color and axes. I am able to find the heading, make it horizontally oriented, unable to report the summary numbers on the graph itself.
#Henriks link seems to answer your question. This answer may also be helpful in terms of applying annotation to multiple boxplots on the same graph.
For completeness:
boxplot() will calculate the no.s (same as fivenum() ) to plot, which you can verify by storing the result:
AvgVisits <- c(10,4,12)
b1 <- boxplot(AvgVisits)
b1$stats == fivenum(AvgVisits)
Here's a solution with ggplot2 which you may find appealing. Change the values of aes(x=) to move the position up/down (as co-ordinates already flipped).
require(ggplot2)
q1 <- qplot(x=1, b1$stats, geom = "boxplot")
q1 +coord_flip() +
geom_text(aes(x=1.1,y=b1$stats,label=b1$stats)) +
opts(
axis.text.x=theme_blank(),
axis.text.y=theme_blank(),
axis.title.x=theme_blank(),
axis.title.y=theme_blank()
)
Giving:
Use the text() command, with the format text(location, "print this text", pos). pos should be one of the following: 1=below, 2=left, 3=above, 4=right. If you need further assistance please include the code you have so far. More here: http://www.statmethods.net/advgraphs/axes.html

Why do geom_line() and geom_freqpoly() give back different graphs?

I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)
I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )
I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.
using an intermediate table doing the counts and geom_line() to plot the results
using geom_freqpoly() on my raw dataframe
the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).
I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).
So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?
# line chart example
# setup the libraries
library(RCurl) # so we can download a dataset
library(ggplot2) # so we can make nice plots
library(gridExtra) # so we can put plots on a grid
# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)
data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))
# create a data frame that counts the number of trx per city/type combination
df_city_type<-data.frame(table(data$city,data$type))
# correct the column names in the dataframe
names(df_city_type)<-c('city','type','qty')
# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))
# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))
cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))
# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)
myplot<-grid.arrange(cline1,cline2)
As #joran pointed out, this gives a "similar" plot, when using "continuous" values:
ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) +
geom_freqpoly(binwidth=1)
However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.

Resources