How do you order a barplot by magnitude using qplot? [duplicate] - r

This question already has answers here:
Order Bars in ggplot2 bar graph
(16 answers)
Closed 8 years ago.
I use the arrange function to put my data frame in order by deaths, but when I try to do a bargraph of the top 5, they are in alphabetical order. How do I get them into order by value? Do I need to use ggplot?
library(dplyr)
library(ggplot2)
EventsByDeaths <- arrange(SumByEvent, desc(deaths))
> head(EventsByDeaths, 10)
Source: local data frame [10 x 3]
EVTYPE deaths damage
1 TORNADO 4662 2584635.60
2 EXCESSIVE HEAT 1418 53.80
3 HEAT 708 277.00
4 LIGHTNING 569 338956.35
5 FLASH FLOOD 567 759870.68
6 TSTM WIND 474 1090728.50
7 FLOOD 270 358109.37
8 RIP CURRENTS 204 162.00
9 HIGH WIND 197 170981.81
10 HEAT WAVE 172 1269.25
qplot(y=deaths, x=EVTYPE, data=EventsByDeaths[1:5,], geom="bar", stat="identity")

You could use the reorder() function
EventsByDeaths <- transform(EventsByDeaths, EVTYPE = reorder(EVTYPE, -deaths))
Then your original qplot call should work as desired. Hope this helps!

Related

Adding new Data rows in R

I am trying to build a data frame so I can generate a Plot with a specific set of data, but I am having trouble getting the data into a table correctly.
So, here is what I have available from a data query:
> head(c, n=10)
EVTYPE FATALITIES INJURIES
834 TORNADO 5633 91346
856 TSTM WIND 504 6957
170 FLOOD 470 6789
130 EXCESSIVE HEAT 1903 6525
464 LIGHTNING 816 5230
275 HEAT 937 2100
427 ICE STORM 89 1975
153 FLASH FLOOD 978 1777
760 THUNDERSTORM WIND 133 1488
244 HAIL 15 1361
I then tried to generate a set of data variables to build a finished a data.frame like this:
a <- c(c[1,1], c[1,2], c[1,3])
b <- c(c[6,1], c[4,2] + c[6,2], c[4,3] + c[6,3])
d <- c(c[2,1], c[2,2], c[2,3])
e <- c(c[3,1], c[3,2], c[3,3])
f <- c(c[5,1], c[5,2], c[5,3])
g <- c(c[7,1], c[7,2], c[7,3])
h <- c(c[8,1], c[8,2], c[8,3])
i <- c(c[9,1], c[9,2], c[9,3])
j <- c(c[10,1], c[10,2], c[10,3])
k <- c(c[11,1], c[11,2], c[11,3])
df <- data.frame(a,b,d,e,f,g,h,i,j)
names(df) <- c("Event", "Fatalities","Injuries")
But, that is failing miserably. What I am getting is a long string of all the data variables, repeated 10 times. nice trick, but that is not what I am looking for.
I would like to get a finished data.frame with ten (10) rows of the data, like it was originally, but with my combined data in place. Is that possible.
I am using R version 3.5.3. and the tidyverse library is not available for install on that version.
Any ideas as to how I can generate that data.frame?
If a barplot is what you're after, here's a piece of code to get you that:
First, you need to get the data in the right format (that's probably what you tried to do in df), by column-binding the two numerical variables using cbindand transposing the resulting dataframe using t(i.e., turning rows into columns and vice versa):
plotdata <- t(cbind(c$FATALITIES, c$INJURIES))
Then set the layout to your plot, with a wide margin for the x-axis to accommodate your long factor names:
par(mfrow=c(1,1), mar = c(8,3,3,3))
Now you're ready to plot the data; you grab the labels from c$EVTYPE, reduce the label size in cex.names and rotate them with las to avoid overplotting:
barplot(plotdata, beside=T, names = c$EVTYPE, col=c("red","blue"), cex.names = 0.7, las = 3)
(You can add main =to define the heading to your plot.)
That's the barplot you should obtain:

Mean Y for individual X values

I have a data set in .dta format with height and weight of baseball players. I want to calculate the mean height for each individual weight value.
From what I've been able to find, I could use dplyr and "group_by", but my R script does not recognize the command, despite having installed and called the package.
Thanks!
Here is an example coded in base R using baseball player height and weight data obtained from the UCLA SOCR MLB HeightsWeights data set.
After cleaning the data (weight is missing for one player), I posted it to GitHub to make it accessible without having to clean it again.
theCSVFile <- "https://raw.githubusercontent.com/lgreski/datasciencedepot/gh-pages/data/baseballPlayers.csv"
download.file(theCSVFile,"./data/baseballPlayers.csv",method="curl")
theData <- read.csv("./data/baseballPlayers.csv",header=TRUE,stringsAsFactors=FALSE)
aggData <- aggregate(HeightInInches ~ WeightInPounds,mean,
data=theData)
head(aggData)
...and the output is:
> head(aggData)
WeightInPounds HeightInInches
1 150 70.75000
2 155 69.33333
3 156 75.00000
4 160 71.46667
5 163 70.00000
6 164 73.00000
>
regards,
Len

R how to plot groups of data using ggplot [duplicate]

This question already has an answer here:
ggplot year by year comparison
(1 answer)
Closed 5 years ago.
I have a data frame with data like
year range count
2011 '0 to 500' 10
2011 '500 to 1000' 100
2012 '0 to 500' 12
2012 '500 to 1000' 50
2013 '0 to 500' 22
2013 '500 to 1000' 75
How can I use ggplot to plot Range on the x axis, count on the y axis and a line of each year?
I don't think this is a duplicate so I've provided an answer. You data structure require some minimal treatment of text (to extract to and from) and probably geom_segments instead of geom_line.
s<-"year;range;count
2011;'0 to 500';10
2011;'500 to 1000';100
2012;'0 to 500';12
2012;'500 to 1000';50
2013;'0 to 500';22
2013;'500 to 1000';75"
d<-read.delim(textConnection(s),header=TRUE,sep=";",strip.white=TRUE)
d$range <- as.character(d$range) # remove factor
d$range <- gsub("'","",d$range) # remove character
# strsplit returns a list, one per line, with two elements, here i'm getting
# each of those elements
d$from<-as.numeric(sapply(strsplit(d$range,' to '),function(X)X[1]))
d$to<-as.numeric(sapply(strsplit(d$range,' to '),function(X)X[2]))
d$year <- as.factor(d$year)
ggplot(d)+geom_segment(aes(x=from,xend=to,y=count,yend=count,col=year))

Using geom_bar() to stack values that add up?

I have a data frame in R which looks like:
Month numFlights numDelays onTime
1 2000-01-01 7520584 1299743 6220841
2 2000-02-01 6949127 1223397 5725730
3 2000-03-01 7808080 1390796 6417284
4 2000-04-01 7534239 1178425 6355814
5 2000-05-01 7720013 1236135 6483878
6 2000-06-01 7727349 1615408 6111941
7 2000-07-01 8000680 1652590 6348090
8 2000-08-01 7990440 1481498 6508942
9 2000-09-01 6811541 875381 5936160
10 2000-10-01 7026150 1046749 5979401
11 2000-11-01 6689783 987175 5702608
12 2000-12-01 6895454 1535196 5360258
What I'm looking to do, is create a bar chart where for each month (on the x axis), the bar reaches the numbers of delayed flights + number of flight on time. I tried figuring out how to do that using the example
qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(gear))
but my data isn't formatted the same way as mtcars. And I can't just use an alpha value, because I want to eventually add cancelled flights.
I know that some questions are very similar to this one, but not similar enough for me to work it out (this question is almost the same but I don't know how to manipulate the facet_grid yet.) Any ideas?
Thanks!
You can do this for example:
library(reshape2)
dat.m <- melt(dat,id.vars='Month',measure.vars=c('numDelays','onTime'))
library(ggplot2)
library(scales)
ggplot(dat.m) +
geom_bar(aes(Month,value,fill=variable))+
scale_y_continuous(labels = comma) +
coord_flip() +
theme_bw()

Boxplot with ggplot2 in R - returns by month

I have computed monthly returns from a price series. I then build a dataframe as follows:
y.ret_1981 y.ret_1982 y.ret_1983 y.ret_1984 y.ret_1985
1 0.0001015229 0.0030780203 -0.0052233836 0.017128325 -0.002427308
2 0.0005678989 0.0009249838 -0.0023294622 -0.030531971 0.001831160
3 -0.0019040392 -0.0021614791 0.0022451252 -0.003345983 0.005773503
4 -0.0006015118 0.0010695681 0.0052680258 0.008592513 0.009867972
5 0.0052736054 -0.0003181347 -0.0008505673 -0.000623061 -0.012225140
6 0.0014266119 -0.0101045071 -0.0003073150 -0.016084505 -0.005883687
7 -0.0069002733 -0.0078170620 0.0070058676 -0.007870294 -0.010265335
8 -0.0041963258 0.0039905142 0.0134996961 -0.002149331 -0.007860940
9 0.0020778541 -0.0038834826 0.0052289589 0.007271409 -0.005320848
10 0.0030956487 -0.0005027686 -0.0021452210 0.002502301 -0.001890657
11 -0.0032375542 0.0063916686 0.0009331531 0.004679741 0.004338580
12 0.0014882164 0.0039578527 0.0136663415 0.000000000 0.003807668
... where columns are the monthly returns for the years 1981 to 1985 and the rows 1 to 12 are the months of the year.
I would like to plot a a boxplot similar to the one below:
So what can I go? And I would like my graph to read the months of the years instead of integer 1 to 12.
Thank you.
First, add new column month to your original data frame containing month.name (built-in constant in R) and use it as factor. It is import to set also levels= inside the factor() to ensure that months are arranged in chronological order not the alphabetical.
Then melt this data frame from wide format to long format. In ggplot() use month as x values and value as y.
df$month<-factor(month.name,levels=month.name)
library(reshape2)
df.long<-melt(df,id.vars="month")
ggplot(df.long,aes(month,value))+geom_boxplot()

Resources