I want to make a histogram for each column. Each Column has three values (Phase_1_Mean, Phase_2_Mean and Phase_3_Mean)
The output should be:
12 histograms (because we have 12 rows), and per histogram the 3 values showed in a bar (Y axis = value, X axis = Phase_1_Mean, Phase_2_Mean and Phase_3_Mean).
Stuck: When I search the internet, almost everyone is making a "long" data frame. That is not helpful with this example (because than we will generate a value "value". But I want to keep the three "rows" separated.
At the bottom you can find my data. Appreciated!
I tried this (How do I generate a histogram for each column of my table?), but here is the "long table" problem, after that I tried Multiple Plots on 1 page in R, that solved how we can plot multiple graphs on 1 page.
dput(Plots1)
structure(list(`0-0.5` = c(26.952381, 5.455598, 28.32947), `0.5-1` =
c(29.798635,
25.972696, 32.87372), `1-1.5` = c(32.922764, 41.95935, 41.73577
), `1.5-2` = c(31.844156, 69.883117, 52.25974), `2-2.5` = c(52.931034,
128.672414, 55.65517), `2.5-3` = c(40.7, 110.1, 63.1), `3-3.5` =
c(73.466667,
199.533333, 70.93333), `3.5-4` = c(38.428571, 258.571429, 95),
`4-4.5` = c(47.6, 166.5, 233.4), `4.5- 5` = c(60.846154,
371.730769, 74.61538), `5-5.5` = c(7.333333, 499.833333,
51), `5.5-6` = c(51.6, 325.4, 82.4), `6-6.5` = c(69, 411.5,
134)), class = "data.frame", .Names = c("0-0.5", "0.5-1",
"1-1.5", "1.5-2", "2-2.5", "2.5-3", "3-3.5", "3.5-4", "4-4.5",
"4.5- 5", "5-5.5", "5.5-6", "6-6.5"), row.names = c("Phase_1_Mean",
"Phase_2_Mean", "Phase_3_Mean"))
Something which is showed in this example (which didn't worked for me, because it is Python) https://www.google.com/search?rlz=1C1GCEA_enNL765NL765&biw=1366&bih=626&tbm=isch&sa=1&ei=Yqc8XOjMLZDUwQLp9KuYCA&q=multiple+histograms+r&oq=multiple+histograms+r&gs_l=img.3..0i19.4028.7585..7742...1.0..1.412.3355.0j19j1j0j1......0....1..gws-wiz-img.......0j0i67j0i30j0i5i30i19j0i8i30i19j0i5i30j0i8i30j0i30i19.j-1kDXNKZhI#imgrc=L0Lvbn1rplYaEM:
I think you have to reshape to long to make this work, but I don't see why this is a problem. I think this code achieves what you want. Note that there are 13 plots because you have 13 (not 12) columns in the dataframe you posted.
# Load libraries
library(reshape2)
library(ggplot2)
Plots1$ID <- rownames(Plots1) # Add an ID variable
Plots2 <- melt(Plots1) # melt to long format
ggplot(Plots2, aes(y = value, x = ID)) + geom_bar(stat = "identity") + facet_wrap(~variable)
Below is the resulting plot. I've kept it basic, but of course you can make it pretty by adding further layers.
Related
for a research-project i want to plot a historical timeline with ruling-periods of ancient dynasties; seperated into male and female rulers.
I'm using R and the vistime-package (as suggested here). Generally it works, but the problem is that i can't figure out how to use dates BC; that means negative years.
A simple example of my code reads as follows:
# devtools::install_github("edgararuiz/gregorian")
# library(gregorian)
library(ggplot2)
library(vistime)
counter.chronology <- data.frame(
namen = c("Ptolemaios V.","Ptolemaios VI.","Kleopatra I.","Kleopatra II.","Kleopatra III."),
geschlecht = c("Männliche Regenten","Männliche Regenten","Weibliche Regentinnen","Weibliche Regentinnen","Weibliche Regentinnen"),
antritt = as.Date(c("0197-01-01","0180-01-01","0194-01-01","0175-01-01","0141-01-01")),
ende = as.Date(c("0180-01-01","0164-01-01","0176-01-01","0164-01-01","0130-01-01")),
stringsAsFactors = FALSE
)
vistime(counter.chronology, col.event = "namen", col.group = "geschlecht", col.start = "antritt", col.end = "ende")
This works fine and produces a plot like this:
plot with wrong (positive) dates
But if i change the date-format to negative years -- for example "-0175-01-01" -- the plotting doesn't work out and i receive the error-message:
Errore in charToDate(x) :
character string is not in a standard unambiguous format
I tried the gregorian-package and replaced as.Date with as_gregorian, but this seems imcompatible with vistime.
Does anyone know an easy solution for this problem? If negative dates are impossible, it would help to turn the plot around, in a way that it counts down an the x-axis from the highest to the lowest year.
A less important question, but also nice, if it's solved: It would be enough to use years as start and end dates. Months and days are unnecessary. But if i only enter, for example, "-0175 or the positive version "0175" the same error-message as above occurs. Therefore, i used 01-01 for month and day. It works anyway, because the timeline is not that detailed. To solve this would be nice, but it's not a must.
Thanks for all your replies and answers!
Best,
Flo
Edit after Allen response:
Another problem occured which you couldn't be aware of due to my short code-example. Sometimes there are overlapping ruling-periods. If i now enter another female reign which overlaps two of the other periods -- here, for example, "Arsinoe" -- , it looks awkward:
counter.chronology <- data.frame(
namen = c("Ptolemaios V.", "Ptolemaios VI.",
"Kleopatra I.","Arsinoe", "Kleopatra II.","Kleopatra III."),
geschlecht = rep(c("Männliche Regenten", "Weibliche Regentinnen"),
times = c(2, 4)),
antritt = -c(197, 180, 194, 200, 175, 141),
ende = -c(180, 164, 176, 150, 164, 130)
)
Overlapping periods
Is it possible to let ggplot place this overlapping bar automatically one stage above the others?
If you do this directly in ggplot, it is straightforward to simply use negative years as integer values on a continuous scale:
counter.chronology <- data.frame(
namen = c("Ptolemaios V.", "Ptolemaios VI.",
"Kleopatra I.", "Kleopatra II.","Kleopatra III."),
geschlecht = rep(c("Männliche Regenten", "Weibliche Regentinnen"),
times = c(2, 3)),
antritt = -c(197, 180, 194, 175, 141),
ende = -c(180, 164, 176, 164, 130)
)
library(geomtextpath)
ggplot(counter.chronology, aes(antritt, geschlecht)) +
geom_textsegment(aes(xend = ende, yend = geschlecht, label = namen,
colour = namen), gap = FALSE,
linewidth = 30, textcolour = 'black') +
scale_x_continuous(labels = abs) +
scale_color_brewer(palette = 'Pastel1', guide = 'none') +
theme_minimal(base_size = 16)
Found another way usin the timevis package. Works well so far, also with overlapping periods:
library(timevis)
counter.chronology <- data.frame(
content = c("Ptolemaios V.","Ptolemaios VI.","Kleopatra I.","Arsinoe","Kleopatra II.","Kleopatra III."),
start = c("-000197-01-01","-000180-01-01","-000194-01-01","-000190-01-01","-000175-01-01","-000141-01-01"),
end = c("-000180-01-01","-000164-01-01","-000160-01-01","-000145-01-01","-000164-01-01","-000130-01-01"),
group = c(1,1,2,2,2,2),
style = c("background-color: #f1d9a4; border-color: black;",
"background-color: #ceaf7a; border-color: black;",
"background-color: #ca9865; border-color: black;")
)
timevis(
counter.chronology,
groups = data.frame(
id = 1:2,
content = c("Regenten", "Regentinnen"),
width = 900
)
)
Working solution
From a style-point i like the ggplot version better, but didn't found any solution for the problem with overlapping bars. Using gg_vistime would be my preferred solution -- since it combines the tools of vistime and ggplot. Unfortunately, like seen on the vistime-Github-page, there seems to be no practicable solution for using BC-dates, except for a complicated workaround: https://github.com/shosaco/vistime/issues/6
For now, I'm going with the timevis solution.
I am trying to plot 16 boxplots, using a for loop. My problem is, that the 2nd title is plotted on the first plot, the 3rd title on the second plot and so forth.
Does anyone have a guess on, what I am doing wrong?
My code is the following:
boxplot(data$distance[data$countryname=="Sweden"]~data$alliance[data$countryname=="Sweden"],title(main = "Sweden"))
boxplot(data$distance[data$countryname=="Norway"]~data$alliance[data$countryname=="Norway"],title(main = "Norway"))
boxplot(data$distance[data$countryname=="Denmark"]~data$alliance[data$countryname=="Denmark"],title(main = "Denmark"))
boxplot(data$distance[data$countryname=="Finland"]~data$alliance[data$countryname=="Finland"],title(main = "Finland"))
boxplot(data$distance[data$countryname=="Iceland"]~data$alliance[data$countryname=="Iceland"],title(main = "Iceland"))
boxplot(data$distance[data$countryname=="Belgium"]~data$alliance[data$countryname=="Belgium"],title(main = "Belgium"))
boxplot(data$distance[data$countryname=="Netherlands"]~data$alliance[data$countryname=="Netherlands"],title(main = "Netherlands"))
boxplot(data$distance[data$countryname=="Luxembourg"]~data$alliance[data$countryname=="Luxembourg"],title(main = "Luxembourg"))
boxplot(data$distance[data$countryname=="France"]~data$alliance[data$countryname=="France"],title(main = "France"))
boxplot(data$distance[data$countryname=="Italy"]~data$alliance[data$countryname=="Italy"],title(main = "Italy"))
boxplot(data$distance[data$countryname=="Spain"]~data$alliance[data$countryname=="Spain"],title(main = "Spain"))
boxplot(data$distance[data$countryname=="Portugal"]~data$alliance[data$countryname=="Portugal"],title(main = "Portugal"))
boxplot(data$distance[data$countryname=="Germany"]~data$alliance[data$countryname=="Germany"],title(main = "Germany"))
boxplot(data$distance[data$countryname=="Austria"]~data$alliance[data$countryname=="Austria"],title(main = "Austria"))
boxplot(data$distance[data$countryname=="Ireland"]~data$alliance[data$countryname=="Ireland"],title(main = "Ireland"))
boxplot(data$distance[data$countryname=="UK"]~data$alliance[data$countryname=="UK"],title(main = "UK"))
I think this could replace all your lines and fix your problem:
for (i in data$countryname)
boxplot(distance~alliance, subset(data, countryname==i), main=i)
But that's hard to verify without a reproducible example or some of your data.frame.
Based on the documentation, you should be assigning a title to your boxplots by making explicit calls to the function title(), rather than as a parameter in the call to boxplot(). The first two calls to generate your boxplots should look something like the following:
boxplot(data$distance[data$countryname=="Sweden"]~data$alliance[data$countryname=="Sweden"])
title(main = "Sweden")
boxplot(data$distance[data$countryname=="Norway"]~data$alliance[data$countryname=="Norway"])
title(main = "Norway")
So sorry I'm quite new to R and have been trying to do this by myself but have been struggling.
I'm trying to do some sort of barplot or histogram of the tag 'Amateur' over the years 2007 to 2013 to show how it's changed over time.
The data set was downloaded from: https://sexualitics.github.io/ specifically looking at the hamster.csv
Here is some of the initial preprocessing of the data below.
head(xhamster) # Need to change upload_date into a date column, then add new column containing year
xhamster$upload_date<-as.Date(xhamster$upload_date,format="%d/%m/%Y")
xhamster$Year<-year(ymd(xhamster$upload_date)) #Adds new column containing just the year
xhamster$Year<-as.integer(xhamster$Year) # Changing new Year variable into an interger
head(xhamster) # Check changes made correctly
The filter for the years:
Yr2007<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2007")))
Yr2008<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2008")))
Yr2009<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2009")))
Yr2010<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2010")))
Yr2011<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2011")))
Yr2012<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2012")))
Yr2013<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2013")))
For example, I want to create a plot for the tag 'Amateur' in the data. Here is some of the code I have already done:
Amateur<-grep("Amateur",xhamster$channels)
Amateur_2007<-grep("Amateur", Yr2007$channels)
Amateur_2008<-grep("Amateur", Yr2008$channels)
Amateur_2009<-grep("Amateur", Yr2009$channels)
Amateur_2010<-grep("Amateur", Yr2010$channels)
Amateur_2011<-grep("Amateur", Yr2011$channels)
Amateur_2012<-grep("Amateur", Yr2012$channels)
Amateur_2013<-grep("Amateur", Yr2013$channels)
Amateur_2007 <- length(Amateur_2007)
Amateur_2008 <- length(Amateur_2008)
Amateur_2009 <- length(Amateur_2009)
Amateur_2010 <- length(Amateur_2010)
Amateur_2011 <- length(Amateur_2011)
Amateur_2012 <- length(Amateur_2012)
Amateur_2013 <- length(Amateur_2013)
Plot:
Amateur <- cbind(Amateur_2007, Amateur_2008, Amateur_2009,Amateur_2010, Amateur_2011, Amateur_2012, Amateur_2013)
barplot((Amateur),beside=TRUE,col = c("red","orange"),ylim=c(0,90000))
title(main="Usage of 'Amateur' as a tag from 2007 to 2013")
title(xlab="Amateur")
title(ylab="Frequency")
Plot showing amateur tag over the years
However this isn't exactly a great plot. I'm looking for a way to plot using ggplot ideally and to have the names of each bar to be the year rather than 'Amateur_2010' etc. How do I do this?
An even better bonus if I can add 'nb_views' for each year with this tag usage or something like that.
There are lots of ways to approach this, here is how I would tackle it:
library(tidyverse)
library(lubridate)
library(vroom)
xhamster <- vroom("xhamster.csv")
xhamster$upload_date<-as.Date(xhamster$upload_date,format="%d/%m/%Y")
xhamster$Year <- year(ymd(xhamster$upload_date))
xhamster %>%
filter(Year %in% 2007:2013) %>%
filter(grepl("Amateur", channels)) %>%
ggplot(aes(x = Year, y = ..count..)) +
geom_bar() +
scale_x_continuous(breaks = c(2007:2013),
labels = c(2007:2013)) +
ylab(label = "Count") +
xlab(label = "Amateur") +
labs(title = "Usage of 'Amateur' as a tag from 2007 to 2013",
caption = "Data obtained from https://sexualitics.github.io/ under a CC BY-NC-SA 3.0 license") +
theme_minimal(base_size = 14)
As Jared said, there are lots of ways, but I want to solve it with your way, so that you can internalize the solution better.
I just changed your cbind in the plot:
Amateur <- cbind("2007" = Amateur_2007,"2008" = Amateur_2008,"2009" = Amateur_2009, "2010" =Amateur_2010, "2011" = Amateur_2011, "2012" = Amateur_2012, "2013" = Amateur_2013)
As you can see, you can give names to your columns into cbind function like that :)
I have a data set that I've successfully read into R. It's a simple data.frame with ONE ROW of data (I'm not sure how many columns, but its in the hundreds). It was read with column headers, but no row labels. So the data set looks something like this:
df=structure(list(X500000 = 0.0958904109589041, X1500000 = 0.10958904109589, X2500000 = 0.10958904109589, X3500000 = 0.164383561643836, X4500000 = 0.136986301369863, X5500000 = 0.205479452054795, X6500000 = 0.136986301369863, X7500000 = 0.0273972602739726, X8500000 = 0.0821917808219178, X9500000 = 0.178082191780822), .Names = c("X500000", "X1500000", "X2500000", "X3500000", "X4500000", "X5500000", "X6500000", "X7500000", "X8500000", "X9500000"), class = "data.frame", row.names = 79L)
Except that it is MUCH LARGER (I don't know if it matters, but it has around 300 columns going across). I'm trying to plot it so that the X##### labels are on the x axis, and the value of each data point is plotted on the y axis (say like a scatter plot on excel or even a line graph). Doing just plot(df) gives me an extremely bizarre graph that makes no sense to me (a bunch of boxes each with a dot right in the centre and no labels?).
I have a feeling it might work if I were to transform the data frame into a vector by removing the headings and then adding x-axis labels individually afterwards and doing a plot() on the vector, but if there is a way of avoiding that it would be great....
As explained in '?plot', 'x' and 'y' must be two vectors of numerics, of same size:
df=structure(list(X500000 = 0.0958904109589041, X1500000 = 0.10958904109589, X2500000 = 0.10958904109589, X3500000 = 0.164383561643836, X4500000 = 0.136986301369863, X5500000 = 0.205479452054795, X6500000 = 0.136986301369863, X7500000 = 0.0273972602739726, X8500000 = 0.0821917808219178, X9500000 = 0.178082191780822), .Names = c("X500000", "X1500000", "X2500000", "X3500000", "X4500000", "X5500000", "X6500000", "X7500000", "X8500000", "X9500000"), class = "data.frame", row.names = 79L)
plot(x=as.numeric(substr(names(df),2,nchar(names(df)))), as.numeric(df), xlab="This is xlab", ylab="This is y")
I ran a Pig job on a Hadoop cluster that crunched a bunch of data down into something R can handle to do a cohort analysis. I have the following script, and as of the second to last line I have the data in the format:
> names(data)
[1] "VisitWeek" "ThingAge" "MyMetric"
VisitWeek is a Date. ThingAge and MyMetric are integers.
The data looks like:
2010-02-07 49 12345
The script I have so far is:
# Load ggplot2 for charting
library(ggplot2);
# Our file has headers - column names
data = read.table('weekly_cohorts.tsv',header=TRUE,sep="\t");
# Print the names
names(data)
# Convert to dates
data$VisitWeek = as.Date(data$VisitWeek)
data$ThingCreation = as.Date(data$ThingCreation)
# Fill in the age column
data$ThingAge = as.integer(data$VisitWeek - data$ThingCreation)
# Filter data to thing ages lt 10 weeks (70 days) + a sanity check for gt 0, and drop the creation week column
data = subset(data, data$ThingAge <= 70, c("VisitWeek","ThingAge","MyMetric"))
data = subset(data, data$ThingAge >= 0)
print(ggplot(data, aes(x=VisitWeek, y=MyMetric, fill=ThingAge)) + geom_area())
This last line does not work. I've tried lots of variations, bars, histograms, but as usual R docs defeat me.
I want it to show a standard Excel style stacked area chart - one time series for each ThingAge stacked across the weeks in the x axis, with the date on the y axis. An example of this kind of chart is here: http://upload.wikimedia.org/wikipedia/commons/a/a1/Mk_Zuwanderer.png
I've read the docs here: http://had.co.nz/ggplot2/geom_area.html and http://had.co.nz/ggplot2/geom_histogram.html and this blog http://chartsgraphs.wordpress.com/2008/10/05/r-lattice-plot-beats-excel-stacked-area-trend-chart/ but I can't quite make it work for me.
How can I achieve this?
library(ggplot2)
set.seed(134)
df <- data.frame(
VisitWeek = rep(as.Date(seq(Sys.time(),length.out=5, by="1 day")),3),
ThingAge = rep(1:3, each=5),
MyMetric = sample(100, 15))
ggplot(df, aes(x=VisitWeek, y=MyMetric)) +
geom_area(aes(fill=factor(ThingAge)))
gives me the image below. I suspect your problem lies in correctly specifying the fill mapping for the area plot: fill=factor(ThingAge)
ggplot(data.set, aes(x = Time, y = Value, colour = Type)) +
geom_area(aes(fill = Type), position = 'stack')
you need to give the geom_area a fill element and also stack it (though that might be a default)
found here http://www.mail-archive.com/r-help#r-project.org/msg84857.html
I was able to get my result with this:
I loaded the stackedPlot() function from https://stat.ethz.ch/pipermail/r-help/2005-August/077475.html
The function (not mine, see link) was:
stackedPlot = function(data, time=NULL, col=1:length(data), ...) {
if (is.null(time))
time = 1:length(data[[1]]);
plot(0,0
, xlim = range(time)
, ylim = c(0,max(rowSums(data)))
, t="n"
, ...
);
for (i in length(data):1) {
# Die Summe bis zu aktuellen Spalte
prep.data = rowSums(data[1:i]);
# Das Polygon muss seinen ersten und letzten Punkt auf der Nulllinie haben
prep.y = c(0
, prep.data
, 0
)
prep.x = c(time[1]
, time
, time[length(time)]
)
polygon(prep.x, prep.y
, col=col[i]
, border = NA
);
}
}
Then I reshaped my data to wide format. Then it worked!
wide = reshape(data, idvar="ThingAge", timevar="VisitWeek", direction="wide");
stackedPlot(wide);
Turning integers into factors and using geom_bar rather than geom_area worked for me:
df<-expand.grid(x=1:10,y=1:6)
df<-cbind(df,val=runif(60))
df$fx<-factor(df$x)
df$fy<-factor(df$y)
qplot(fy,val,fill=fx,data=df,geom='bar')