How do I do a Barplot of already tabled data? - r
I have input data with very many observations per day.
I can make the barplot I want with 1 day by using table().
If I try to load more than 1 day and call table() I run out of memory.
I'm trying to table() each day and concatenate the tables into totals I can then barplot later. But I just cannot work out how to take the already tabled data and barplot each day as a stacked column.
After looping and consolidating I end up with something like this: 2 days of observations. (the Freq column is the default from the previous table() calls)
What is the best way to do a stacked barplot when my data ends up like this?
> data.frame(CLIENT=c("Mr Fluffy","Peppa Pig","Mr Fluffy","Dr Who"), Freq=c(18414000,9000000,7000000,15000000), DAY=c("2011-11-03","2011-11-03","2011-11-04","2011-11-04"))
CLIENT Freq DAY
1 Mr Fluffy 18414000 2011-11-03
2 Peppa Pig 9000000 2011-11-03
3 Mr Fluffy 7000000 2011-11-04
4 Dr Who 15000000 2011-11-04
>
> # What should I put here?
I'm assuming that you are using base graphics since you mention barplot. Here is an approach using that:
wide <- reshape(dat, idvar="CLIENT", timevar="DAY", direction="wide")
barplot(as.matrix(wide[-1]), beside=FALSE)
Alternatively, using ggplot2:
library("ggplot2")
ggplot(dat, aes(x=DAY, y=Freq)) +
geom_bar(aes(fill=CLIENT), position="stack")
Try ggplot2:
ggplot(df,aes(DAY,fill=CLIENT,weight=Freq))+geom_bar()
Shamelessly ripped from here:
http://had.co.nz/ggplot2/geom_bar.html
Related
ggplot2 stacked bar graph using rows as datapoints [duplicate]
This question already has an answer here: Grouping & Visualizing cumulative features in R (1 answer) Closed 6 years ago. I have a set of data that I would like to plot like this: Now this is plotted using LibreOffice Calc in Ubunutu. I have tried to do this in R using following code: ggplot(DATA, aes(x="Samples", y="Count", fill=factor(Sample1)))+geom_bar(stat="identity") This does not give me a stacked bar graph for each sample, but rather one single graph. I have had a similar question, that used a different dataframe, that was answered here. However, in this problem I don't have just one sample, but information for at least three. In LibreOffice Calc or Excel I can choose the stacked bar graph option and then choose to use rows as the data series. How can I achieve a similar graph in ggplot2? Here is the dataframe/object for which I am trying to produce the graph: Aminoacid Sequence,Sample1,Sample2,Sample3 Sequence 1,16,10,33 Sequence 2,2,2,7 Sequence 3,1,1,6 Sequence 4,4,1,1 Sequence 5,1,2,4 Sequence 6,4,3,14 Sequence 7,2,2,2 Sequence 8,8,5,12 Sequence 9,1,3,17 Sequence 10,7,1,4 Sequence 11,1,1,1 Sequence 12,1,1,2 Sequence 13,1,1,1 Sequence 14,1,2,2 Sequence 15,5,4,7 Sequence 16,3,1,8 Sequence 17,7,5,20 Sequence 18,3,3,21 Sequence 19,2,1,5 Sequence 20,1,1,1 Sequence 21,2,2,5 Sequence 22,1,1,3 Sequence 23,4,2,9 Sequence 24,2,1,1 Sequence 25,4,4,3 Sequence 26,4,1,3 I copied the content of a .csv file, is that reproducible enough? It worked for me to just use read.csv(.file) in R. Edit: Thank you for redirecting me to another post with a very similar problem, I did not find that before. That post brought me a lot closer to the solution. I had to change the code just a little to fit my problem, but here is the solution: df <- read.csv("example.csv") df2 <- melt(example, id="Aminoacid.Sequence") ggplot(df2, aes(x=variable, y=value, fill=Aminoacid.Sequence))+geom_bar(stat="identity") Using variable as on the x-axis makes bar graph for each sample (Sample1-Sample3 in the example). Using y=value uses the value in each cell for that sample on the y-axis. And most importantly, using fill="Aminoacid.Sequence" stacks the values for each sequence on top of each other giving me the same graph as seen in the screenshot above! Thank you for your help!
Try something along the following lines: library(reshape2) df <- melt(DATA) # you probably need to adjust the id.vars here... ggplot(df, aes(x=variable, y=value) + geom_bar(stat="identity") Note that you need to adjust the ggplot and the melt code somewhat, but since you haven't provided sample data, no one can provide the actual code necessary. The above provides the basic approach on how to deal with these multiple columns representing your samples, though. melt will "stack" the columns on top of each other, and create a column with the old variable name. This you can then use as x for ggplot. Note that if you have other data in the data frame as well, melt will also stack these. For that reason you will need to adjust the commands to fit your data. Edit: using your data: library(reshape2) library(ggplot2) ### reading your data: # df <- read.table(file="clipboard", header=T, sep=",") df2 <- melt(df) head(df2) Aminoacid.Sequence variable value 1 Sequence 1 preDLI 16 2 Sequence 2 preDLI 2 3 Sequence 3 preDLI 1 4 Sequence 4 preDLI 4 5 Sequence 5 preDLI 1 6 Sequence 6 preDLI 4 This can be used as in: ggplot(df2, aes(x=variable, y=value, fill=Aminoacid.Sequence)) + geom_bar(stat="identity") I am sure you want to change some details about the graph, such as the colors etc, but this should answer your inital question.
How to generate a plot for reported values and missing values in R - timeseries
Hi I am using R to analyze my data. I have time-series data in following format: dates ID 2008-02-12 3 2008-03-12 3 2008-05-12 3 2008-09-12 3 2008-02-12 8 2008-04-12 6 I would like to create a plot with dates at the x axis and ID on Y axis. Such that it draws a point if id is reported for that data and nothing if there is no data for that. In the original dataset I only have id if the value is reported on that date. For e.g. for 2008-02-12 for id 6 there is no data reported hence it is missing in my dataset. I was able to get all the dates with unique(df$dates) function, but dont know enough about R data structures on how to loop through data and make matrix with 1 0 for all ids and then plot it. I will be grateful if you guys can help me with the code or give me some pointers on what could be effective way to approach this problem. Thanks in advance.
It seems you want something like a scatter-plot : # input data DF <- read.csv( text= 'Year,ID 2008-02-12,3 2008-03-12,3 2008-05-12,3 2008-09-12,3 2008-02-12,8 2008-04-12,6', colClasses=c('character','integer')) # convert first column from characters to dates DF$Year <- as.POSIXct(DF$Year,format='%Y-%m-%d',tz='GMT') # scatter plot plot(x=DF$Year,y=DF$ID,type='p',xlab='Date',ylab='ID', main='Reported Values',pch=19,col='red') Result : But this approach has a problem. For example if you have unique(ids) = c(1,2,1000) the space on the y axis between id=2 and id=1000 will be very big (the same holds for the dates on the x axis). Maybe you want a sort of "map" id-dates, like the following : # input data DF <- read.csv( text= 'Year,ID 2008-02-12,3 2008-03-12,3 2008-05-12,3 2008-09-12,3 2008-02-12,8 2008-04-12,6', colClasses=c('character','integer')) dates <- as.factor(DF$Year) ids <- as.factor(DF$ID) plot(x=as.integer(dates),y=as.integer(ids),type="p", xlim=c(0.5,length(levels(dates))+0.5), ylim=c(0.5,length(levels(ids))+0.5), xaxs="i", yaxs="i", xaxt="n",yaxt="n",main="Reported Values", xlab="Date",ylab="ID",pch=19,col='red') axis(1,at=1:length(levels(dates)),labels=levels(dates)) axis(2,at=1:length(levels(ids)),labels=levels(ids)) # add grid abline(v=(1:(length(levels(dates))-1))+0.5,,col="Gray80",lty=2) abline(h=(1:(length(levels(ids))-1))+0.5,col="Gray80",lty=2) Result :
Choose which factor levels to plot
I'm wondering what the best way is to eliminate certain factors from a plot in ggplot. I have data that looks something like: Group Time Freq A 1 5000 B 1 70 C 1 60 ... I'm then using geom_path to plot how these frequencies change over time. For all of the time periods, group A has far more observations than the other groups, so I'd like to create some graphs that do not include group A. I'm wondering what the best way to do that is. Is there something I can pass to ggplot?
Simplest thing is to filter the dataframe: df[df$Group != "A",] Or subset(df, group!="A")
Correlation for subsets at a time
Maybe this question is posted, but couldn't find something that helps me. I have a data frame, which is a time series of 40 years with 4 columns: the first is the year (numbers), the second is the month (numbers from 1 to 12), and the third and fourth, the precipitation for place1 and place2. I would like to make a correlation analysis using cor(), for the precipitation of place1 and place2 but would like to make it for every 5 years at a time. Also, in the series I have NA values. Is there a way for doing this? Here's some sample data: year<-rep(1940:1959, each = 12) month<-rep(1:12,20) place1<-c(14.7,26.3,10.2,132.4,286.3,158.2,72,99.5,217.6,267.9,80.3,NA,38.9,20.9,29.1,312.2,110.1,245,163.2,38.3,251.3,95.3,89.4,13.5,13.3,49.1,26.9,105.6,188.7,186.1,140.5,241.6,143.2,156.9,37.4,29.8,19.6,27.3,80.7,102.9,222.5,88.4,59.1,107.3,119.5,451.2,52.2,0,14.3,7.9,55.4,31.1,152.2,190.7,251,200.2,158.7,93,44.3,40.3,18.6,15.2,11.4,110.3,377.9,42.3,68.2,289.5,219.7,133.2,114.4,115.2,15.3,14,86.7,66.1,204.1,33.9,51.8,83,238.8,231.4,70.6,41.7,99.5,176.4,1.3,63,238.2,48.6,82.6,66.9,257,141.4,14.5,35.5,28.6,32.5,1.3,50.7,300.8,74.1,110.9,64.8,128,309.9,71.1,22.6,2.5,2.3,57.6,24.4,171.9,91,116.3,224.3,123.5,149.1,17.8,26,62.8,47.1,9.6,38.1,72.2,141.2,52.2,110.7,246.6,330.5,8.6,38.6,57.5,26.7,0,210,601.2,79.4,166.2,128.8,133.5,81.8,42,30.4,12.5,20.3,27.7,191.6,223.6,63.5,175.3,42.3,277.9,60.9,26.5,9.7,59.7,9.4,40.5,70.1,307.1,163.5,230,51.8,160.4,115.9,54.4,25.3,15.3,67.6,77.9,108.8,283.5,297.2,99.9,103.4,277.4,474.6,91.8,23.9,43.4,12.7,3,179.5,259.4,154.3,201.1,363.3,253.7,257.9,38.2,71.3,29.5,95.1,128.2,36.7,137.8,182.6,85.8,23.6,48.7,218.1,30.4,42.3,35,43.9,30,58.2,139.2,99,39.6,13.9,152.6,117.6,39,25.9,169.6,31.2,63.1,124.2,377.4,279.8,168.2,100,191.9,108.6,55.2,27.7,16,8.1,5.6,75.7,38.8,131.7,131,135.9,97.4,188.9,304.8,34.6) place2<-c(5.4,18,0,19,111.5,30.6,39.2,178.8,77.3,292.5,28,21,45.9,31.5,16.5,54.9,117.8,270.2,131.6,45.5,248.6,55.5,32.5,16.3,42.9,18,19.4,112.4,77,315.8,71.9,201.8,37.3,84.8,25.4,10.6,31.3,12.1,54.1,112.4,122.4,44.4,55.6,160.3,81,257.1,65.8,3.8,11.9,10.7,16.5,51.9,81.4,142,321.5,251.7,144.4,97.6,3,1.8,11.1,16.6,13.9,41.7,218,55.7,50.6,159.8,94,57.9,48.1,121.8,8.6,3.3,64.2,21.8,169.8,55.9,26.4,79,77.5,75.5,67.1,41.9,40.9,132.4,37.3,93.7,67.1,128,52.6,17.2,184.9,97.6,4.3,15.2,21.1,39.9,1.5,53.3,89.4,43,97.7,55.1,232.3,27.9,118.2,5.1,0,4.3,66.1,9.2,122.1,191.4,81.1,80.4,79.8,112.9,51.5,13.9,14,21.3,42,16.7,261.1,287,26.1,134.1,106.3,205.1,29.5,1.5,5.9,14.5,1,219.1,451.3,107,213.6,48.2,92.4,105.2,11.5,6.9,3,13.7,44.5,61.2,99.3,95.7,193.4,13.2,217.1,87.8,11.2,3,75.7,5.3,0,31.1,167.8,198.2,42.2,121.6,180.2,121.9,31.3,22.8,31.9,25.5,69.9,19.4,109.6,179.2,73.2,198.6,425,612.1,26.8,3,71.4,34.9,7.1,8.8,69.8,227.7,86.6,88.7,126,195.4,13.5,36.6,1,80.5,23.4,24.1,31.4,139.5,68.6,53.6,40,232.9,77,32.2,21.1,23.1,9.1,15.3,48.6,140.2,50.8,55.8,59.6,46.2,10.2,18.3,105.9,11.1,0,46.3,307.7,110.2,294.2,200.5,74.3,147.9,30.9,31,67.9,15.8,30.1,56.1,128,25.9,119.2,41.1,56.2,235.4,22.9,10.8) data<-data.frame(year,month,place1,place2)
data$year.group <- cut(data$year,seq(1940,1960,by=5)) lapply(unique(data$year.group), function(x) with(data[data$year.group==x,], cor(place1,place2,use='pairwise.complete.obs'))) Alternatively, if you want to extend this to multiple columns, try this: lapply(unique(data$year.group), function(x) cor(data[data$year.group==x,c('place1','place2','place3')], use='pairwise.complete.obs')) (and change the use option as appropriate to what you want)
R preserving row order ggplot2 geom_tile
I am trying to plot some categorical data and this answer is very close to what I am trying to do, however in my case I have dates in the place of countries as seen in this example. How can I create the plot with the original row order from the data.frame? It appears that even though the factors are in the same order in dat and melt.data they are not ordered sequentially on the y axis in the plot. Here is a reproducible example: library(reshape) library(ggplot2) dat <- data.frame(dates=c("01/01/2002", "09/15/2003", "05/31/2012"), Germany = c(0,1,0), Italy = c(1,0,0)) melt.data<-melt(dat, id.vars="dates", variable_name="country") qplot(data=melt.data, x=country, y=dates, fill=factor(value), geom="tile")
Your problem is that date is stored as a character string. See str(dat) for a structure of the data. By adding dat$dates <- as.Date(dat$dates,"%m/%d/%Y") after loading dat, you can get the dates in the original order.
Your problem is that dat$dates is a factor, and by default R has sorted the levels lexicographically. R does not know they are dates. So levels(dat$dates) ## [1] "01/01/2002" "05/31/2012" "09/15/2003" and thererfore order(dat$dates) ## [1] 1 3 2 If you want R to treat these as dates, then you can convert them to Date column dat$dates <- as.Date(as.character(dat$dates), format = '%m/%d/%Y') # now order(dat$dates) ## 1 2 3 Which is what you want