I have a series of records of events (in this case deaths). They are now in a data frame, with one column containing the date as class Date and other columns containing details as factors (e.g. where the death occurred). They records are sorted into data order.
I am trying to use ggplot to plot time series of the numbers of deaths. But I get various error messages with different approaches. I thought the minimum that would create a bar chart of numbers of deaths by date would be something like:
F1 <- ggplot(DeathsSorted.df, aes('Date of death'))
F1 + geom_bar()
But all that produces is a greyed out block with no bars.
What's worse is that this code seemed to work before I updated to the latest version of R Studio and R.
This works fine:
df <- data.frame(date=as.Date(c("2017-09-08","2017-09-09","2017-09-08",
"2017-09-10","2017-09-08","2017-09-10","2017-09-01","2017-09-11")) )
F1 <- ggplot(df, aes(x=date))
F1 + geom_bar()
You just had a typo; date should not between quotes.
Next time, post some fake data (like my df), so people can help you better/more easily.
Related
Feel pretty dumb not figuring this out on my own, but figure I'll just ask at this point
This is the data I'm using www.google.org/flutrends/about/data/flu/us/data.txt saved as a .csv in excel
At the moment I have 162 columns using the data and I want to melt the data so I can group values to region names to create boxplots for all areas side by side. Unfortunately, melt orders the variable names alphabetically and I want to maintain the original column name order. I'm unsure how to do this. I imagine it has something to do with factor levels from what I've been able to find on the topic so far. This is the code I'm currently using, data1 is the read.csv:
data1 <- read.csv('http://www.google.org/flutrends/about/data/flu/us/data.txt', skip = 10)
gr_data1 <- reshape2::melt(data1[-1]) #Group data for all US by area (variable) and flu trend (value)
I tried running this and then checking names(data1):
names(data1) <- factor(names(data1), levels = unique(names(data1)))
But as you can see below didn't really solve anything for me in the order.
This is what I'm running to make the plot:
library('ggplot2')
ggplot(na.omit(gr_data1), aes(x = variable, y = value)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Ideally Baton Rouge LA would be last and United States first
This is what the output looks like with the column ordered changed.
This is what the data looks like. There's 162 columns including Date.
This is what the code does to the data -> i.stack.imgur.com/CWMbu.png
I'm an R beginner attempting to do what I figured (erroneously) would be a beginner-type task: produce a simple plot of means/standard deviations for multiple survey questions (vectors), grouped by a second variable (say, group).
So I am reading variables (say, q1-q10) into R from Stata and have even managed to melt the data following this suggestion.
What I would like is essentially the graph presented in the solution:
However, my data contain missing values (NA), and the NUMBER of missing values varies by question. So when I try to use ggplot to plot the 'melted' data, I get an error saying the vector lengths do not match.
Well, suppose that your variables q1-q10 are separated, then you should merge them into a data frame df:
df <- data.frame(q1, q2, ...,q10)
And then you can clean it such that you only have complete cases, i.e. only observations without NA:
df <- df[complete.cases(df),]
Afterwards, you should not have problems with ggplot.
I have a dataframe and would like to plot the values (aggregated residuals) by column on the same line graph in R. The dataframe has 1000 columns and 323 rows.
I found how to do it one series at a time by using ggplot, but I am having trouble figuring out how to plot all of them without having to do it one at a time. Does anyone have any ideas?
The data looks like this
http://imgur.com/Ry2eixO
(i didnt have the reputation to post images)
This should get you started: Assuming your data.frame is called df:
library(reshape2)
library(ggplot2)
D=melt(df, id='id')
ggplot(D,aes(id,value, group=variable, color=variable))+geom_line()
My dataframe contains three variables:
Row_Number Sample_ID Expression_Level
1 hum_449 0.25
2 hum_459 0.35
4 mur_223 0.45
I want to produce histograms of the third column using
hist(dataframe$Expression_Level)
And I want to label some of the bars with a list a list of Sample_ID values that correspond to that particular expression level.
I have the desired Sample_IDs stored as a list object and also as a data frame with corresponding Row_Number and Expression_Level values (essentially just a subset of the original data frame). I don't know what to do next or even what to type into a search engine.
I have ggplot2 installed because friends told me it would probably be helpful but I am unfamiliar with it and face the same problem of not knowing what to look for when reading the documentation. Would prefer not to install more packages if possible.
You could use the following to add a label corresponding to the third element of Sample_ID to the third "bar" of a histogram. But, this seems like an odd way to go really, since the bars of a histogram are counts. Might you be wanting to use barplot instead? same code would work with "barplot" instead of hist.
temp <- hist(dataframe$Expression_Level)
mtext(text=Expression_Level[3],side=1,line=2,at=temp[3])
Something like this?
set.seed(1) # for reproduceale example
# crate sample data - you have this already
df <- data.frame(sample_ID=paste0("S-",1:100),
Expression_Level=round(runif(100),1),
stringsAsFactors=F)
# you start here...
labels <- aggregate(sample_ID~Expression_Level,df,c)
labels$lab <- sapply(labels$sample_ID,function(x)paste(unlist(x),collapse="|"))
library(ggplot2)
ggplot(df, aes(x=factor(Expression_Level))) +
geom_histogram(fill="lightgreen",color="grey50")+
geom_text(data=labels,aes(y=.1,label=lab),hjust=0)+
labs(x="Expression_Level")+
coord_flip()
I've got transactional data from a SQL query which I turn into a data frame. The first column of the df contains UNIX timestamps (format="%Y/%d/%m %H:%M") which I would like to use to create a graphics plot using par to display 1 unique lineplot per date. At the moment I am fumbling around with splitting column 1 and comparing with previous row to look for a change then assigning a dummy indicator to use in my plot command.
Thanks,
Will
Somewhat hard to answer without any example data but I'll take a shot.
I'm guessing your date looks like this: "2009-03-04 17:45"
It's probably being read as character. You can verify the class of each column of your data frame by running str(data.frame)
Using package stringr, you can just read the y/d/m and convert that to a Date class like so:
library(stringr)
date="2009-03-04 17:45"
date=as.Date(str_replace_all(str_sub(date,3,10),"-","/"), "%y/%d/%m")
You can then use date as a group in ggplot2 to plot one line per date. You could also create separate panels (one per date) using + facet_wrap(~date) in your ggplot call.
start by just getting the date part from your timestamp
SELECT *,DATE(timestampcolumn) as thedate FROM yourtable;
Convert date column to factor
mydf <- transform(mydf,as.factor(thedate))
Plot it with e.g. xyplot
library(lattice)
xyplot(varx~vary|thedate,data=mydf)