Excluding column on chart in ggplot2 - r

I would like to exclude one column from my chart. In csv I use as my data there is a lot of empty cells and thus there is nameless column in my chart that is at the same highest of them all. In my opinion it looks a bit stupid so I would like to get rid of it.
Here is my chart code:
ggplot(df, aes(Coverage, fill=(Coverage)))+
geom_bar(color="black",fill="brown3")+
theme(text = element_text(size=15),axis.text.x = element_text(angle=90, hjust=1))+
labs(title = "Diagram przedstawiajacy w ktorym miesiacu w kolejnych latach najwieksza liczba dziennikarzy poniosla smierc", x="Panstwo", y="Rok")
And here is how the chart looks like. First column is the one counting amount of empty cells.
thank you very much for all the help!

Alternatively to #Roman Luštrik's answer, you can use dplyr to filter your dataset and do the plot in the same sequence:
library(dplyr)
library(ggplot2)
df %>% filter(Coverage != "") %>%
ggplot(df, aes(Coverage, fill=(Coverage)))+
geom_bar(color="black",fill="brown3")+
theme(text = element_text(size=15),axis.text.x = element_text(angle=90, hjust=0.5))+
labs(title = "Diagram przedstawiajacy w ktorym miesiacu w kolejnych latach najwieksza liczba dziennikarzy poniosla smierc", x="Panstwo", y="Rok")
If this is not working for you, please consider to provide a reproducible example of your dataset (see: How to make a great R reproducible example)

You will need to remove those entries from your df. You could so something along the lines of df[!(df$Coverage %in% c("levels", "to", "exclude", "here")), ]. If that doesn't work, you may, in addition, need to use droplevels(), too.
When you rotate the text, you will also need to offset it a bit, too. You can do it in theme() using hjust or vjust (I always forget which one). Something along the lines of element_text(angle = 90, hjust = 0.5).

Related

ggplot2 interaction use only first character

in a project i present barplots and use the interaction command to order the groups, as one is a strict subgroup of the other. I would like to not print out the whole name of the first group as this takes up a lot of space. Is there a way to restrict the word to the first character or something like that?
mtcars$name <- rownames(mtcars)
ggplot(data = mtcars, aes(x=interaction(mtcars$cyl, mtcars$name)))+
geom_bar()+
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5))
Here for example only the #cylinders are interesting to me, I just use the car name to order them. But they take up a lot of space. Just having the first letter of the car written would be ideal. so i would like to have 8.A for example. In my original data the first variable has different length (not just 1 character as #cylinder has here)
Thanks for any answer,
Regards
You can edit the labels using regular expressions in scale_x_discrete :
library(ggplot2)
ggplot(data = mtcars, aes(x=interaction(mtcars$cyl, mtcars$name)))+
geom_bar()+
xlab('Interaction cyl vs Name') +
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5)) +
scale_x_discrete(labels = function(x) sub('(\\..).*', '\\1', x))
Everything inside () is referred to as a capture group where we specify which part of the text we want to keep. Here, we mention that we want to keep everything until a dot (i.e \\., . is a special character in regex which needs to be escaped with \\) followed by another character (.).

Assigning geom_text from a different dataframe labels to a graph

So I'm using Twitter APIs to gather info related to a certain topic, and one of the things I'm visualizing is the popularity of devices.
So far I have this:
https://gyazo.com/441a9ab80b943f9e0c3a36131273844a
The above is generated by this code:
device_types_condensed <- (ggplot(manu_tweets3, aes(x= statusSource_clean , fill = isRetweet)) + geom_bar()
+ theme(panel.background=element_rect(fill='white'),
axis.ticks.x=element_blank(),
axis.text.x=element_blank())
+ theme(axis.ticks.x=element_blank(), axis.text.x = element_text(angle = 25),
axis.text=element_text(size=8))
+ labs(x="", title = "Device Popularity for Tweet or Retweet Usage", y ="No. of Tweets on Device")
)
device_types_condensed
What I want to do is to add text above each bar that reflects the % of tweet activity that device is responsible for.
This means I am not changing the y-axis. The y-axis still reflects the count of tweet, and the number on top of the bar will be what reflects the percentage. So far I already have a table made with that value:
https://i.gyazo.com/5f14d2c1352e8c9c2c5997678ceea3b4.png
What I can't figure out for the life of me is how to select the % labels in the table just above, and then apply them to the ggplot graph based on device type.
Sorry, don't have the rep to post images but I linked the URLs!
You're pretty close. I didn't have access to your exact data so I simplified your problem. You said you had some devices, each with a count of tweets associated with those devices, and that each device had a separate proportion associated with it. You also said these were in two different data.frames.
The most ggplot-ish way to handle this would be to join them together into a single data.frame because both data.frames share a common key: The device. This simplifies the ggplot2 code a touch. First, I'll work up a solution without combining, and then I will end by showing you how to combine your two data.frames together.
I generated data that looked similar to your data like this:
mydf <- data.frame(device = c("A", "B", "C"),
num_tweets = c(100, 200, 50))
prop_df <- data.frame(device = c("A", "B", "C"),
proportion = c(.29, .57, .14))
Without joining them together first, I think you can get what you want with code like this:
ggplot(mydf) +
geom_col(aes(device,
num_tweets)) +
geom_text(data = prop_df,
aes(device,
max(mydf$num_tweets * 1.10),
label = paste0(proportion * 100, "%"))) +
scale_y_continuous(expand = expand_scale(mult = c(0, .1)))
Notice a few things:
I went with a geom_text call to get the percentages to display because I want ggplot2 to handle the x position for me (to match what already gets displayed when we call geom_col right above it) so the bars and percentages match up.
The geom_text call has as its first argument data = prop_df which tells geom_text to not use the the plot's default data.frame, mydf, and to use prop_df instead just for that layer.
In my aes call, I tell ggplot to map device to the x axis and then I hard-coded the y values to 110% of the maximum device count so they will display all at the same height, just above the bars.
ggplot2, by default, tries to shrink the plot area to match the data you've plotted and I wanted some more breathing room so I used expand_scale(mult = c(0, .1) to expand the plot in the y direction by 110%.
Is this similar to what you were looking for?
I then went ahead and simplified the ggplot call by joining the two data.frames together with dplyr::left_join prior:
library(dplyr)
mydf <- left_join(mydf, prop_df)
ggplot(mydf) +
geom_col(aes(device,
num_tweets)) +
geom_text(aes(device,
max(mydf$num_tweets * 1.10),
label = paste0(proportion * 100, "%"))) +
scale_y_continuous(expand = expand_scale(mult = c(0, .1)))
which is just a bit shorter and doesn't require you to override the data argument in geom_text.
What do you think?

Changing datastructure to create correct bar graph in ggplot

I would like to make a graph in R, which I managed to make in excel. It is a bargraph with species on the x-axis and the log number of observations on the y-axis. My current data structure in R is not suitable (I think) to make this graph, but I do not know how to change this (in a smart way).
I have (amongst others) a column 'camera_site' (site 1, site2..), 'species' (agouti, paca..), 'count'(1, 2..), with about 50.000 observations.
I tried making a dataframe with a column 'species" (with 18 species) and a column with 'log(total observation)' for each species (see dataframe) But then I can only make a point graph.
this is how I would like the graph to look:
desired graph made in excel
Your data seems to be in the correct format from what I can tell from your screenshot.
The minimum amount of code you would need to get a plot like that would be the following, assuming your data.frame is called df:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col()
Many people intuitively try geom_bar(), but geom_col() is equivalent to geom_bar(stat = "identity"), which you would use if you've pre-computed observations and don't need ggplot to do the counting for you.
But you could probably decorate the plot a bit better with some additions:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col() +
scale_x_discrete(name = "Species") +
scale_y_continuous(name = expression("Log"[10]*" Observations"),
expand = c(0,0,0.1,0)) +
theme(axis.text.x = element_text(angle = 90))
Of course, you could customize the theme anyway you would like.
Groetjes

How to change variable value labels WITHOUT changing the variable name

I've got a bar graph whose variable labels (a couple of them) need changing. In the specific example here, I've got a variable "Sputum.Throat" which refers to samples which could be either sputum or throat swabs, so the label for this value should really read "Sputum/Throat" or even "Sputum or Throat Swab" (this latter would only work if I can wrap the text). So far, no syntax I've tried can pull this off.
Here's my code:
CultPerf <- data.frame(Blood=ForAnalysis$Cult_lastmo_blood, CSF=ForAnalysis$Cult_lastmo_csf, Fecal=ForAnalysis$Cult_lastmo_fecal, Genital=ForAnalysis$Cult_lastmo_genital, `Sputum-Throat`=ForAnalysis$`Cult_lastmo_sput-throat`, Urine=ForAnalysis$Cult_lastm_urine, `Wound-Surgical`=ForAnalysis$`Cult_lastmo_wound-surg`, Other=ForAnalysis$Cult_lastmo_oth)
CP <- data.table::melt(CultPerf, variable.names("Frequency"))
CP$value <- factor(CP$value, levels=c(">100","50-100","25-50","0-25"))
CP$variable <- factor(CP$variable, levels = c("Other","Wound.Surgical","Urine","Sputum.Throat","Genital","Fecal","CSF","Blood"))
ggplot(data=CP)+
geom_bar(aes(x=variable, fill = value), position="dodge", width = 0.9)+
labs(x="Culture Type", y="Number of Labs", title="Number of Cultures Performed Per Month at Study Hospitals", subtitle="n=140")+
coord_flip()+
theme(legend.title = element_blank(),aspect.ratio = 1.25/1,plot.subtitle=element_text(face="italic",hjust=0.5),plot.title=element_text(hjust=0.5))+
guides(fill = guide_legend(reverse = TRUE))
And for reference, here's a copy of the successful plot which it does produce:
As I mentioned, all I want to do is change those labels of the individual values on the Y axis. Any suggestions will be appreciated!
If you want to just change the axis label for that one category, try adding in this line
scale_x_discrete(labels=c("Sputum.Throat"="Sputum/Throat"))
be sure to add it (+) to your ggplot object.
Using the helpful suggestion from #MrFlick above (with my thanks), I added the following to my ggplot code, which also gave me a word-wrapped label for the second label:
scale_x_discrete(labels=c("Sputum.Throat"="Sputum/Throat", "Wound.Surgical"="Surgical or \n Other Wound"))+
Resultant plot looks like this:
Revised plot

Label selected percentage values inside stacked bar plot (ggplot2)

I want to put labels of the percentages on my stacked bar plot. However, I only want to label the largest 3 percentages for each bar. I went through a lot of helpful posts on SO (for example: 1, 2, 3), and here is what I've accomplished so far:
library(ggplot2)
groups<-factor(rep(c("1","2","3","4","5","6","Missing"),4))
site<-c(rep("Site1",7),rep("Site2",7),rep("Site3",7),rep("Site4",7))
counts<-c(7554,6982, 6296,16152,6416,2301,0,
20704,10385,22041,27596,4648, 1325,0,
17200, 11950,11836,12303, 2817,911,1,
2580,2620,2828,2839,507,152,2)
tapply(counts,site,sum)
tot<-c(rep(45701,7),rep(86699,7), rep(57018,7), rep(11528,7))
prop<-sprintf("%.1f%%", counts/tot*100)
data<-data.frame(groups,site,counts,prop)
ggplot(data, aes(x=site, y=counts,fill=groups)) + geom_bar()+
stat_bin(geom = "text",aes(y=counts,label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
I wanted to insert my output image here but don't seem to have enough reputation...But the code above should be able to produce the plot.
So how can I only label the largest 3 percentages on each bar? Also, for the legend, is it possible for me to change the order of the categories? For example put "Missing" at the first. This is not a big issue here but for my real data set, the order of the categories in the legend really bothers me.
I'm new on this site, so if there's anything that's not clear about my question, please let me know and I will fix it. I appreciate any answer/comments! Thank you!
I did this in a sort of hacky manner. It isn't that elegant.
Anyways, I used the plyr package, since the split-apply-combine strategy seemed to be the way to go here.
I recreated your data frame with a variable perc that represents the percentage for each site. Then, for each site, I just kept the 3 largest values for prop and replaced the rest with "".
# I added some variables, and added stringsAsFactors=FALSE
data <- data.frame(groups, site, counts, tot, perc=counts/tot,
prop, stringsAsFactors=FALSE)
# Load plyr
library(plyr)
# Split on the site variable, and keep all the other variables (is there an
# option to keep all variables in the final result?)
data2 <- ddply(data, ~site, summarize,
groups=groups,
counts=counts,
perc=perc,
prop=ifelse(perc %in% sort(perc, decreasing=TRUE)[1:3], prop, ""))
# I changed some of the plotting parameters
ggplot(data2, aes(x=site, y=perc, fill=groups)) + geom_bar()+
stat_bin(geom = "text", aes(y=perc, label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
EDIT: Looks like your scales are wrong in your original plotting code. It gave me results with 7500000% on the y axis, which seemed a little off to me...
EDIT: I fixed up the code.

Resources