I am making a visualization that involves factors, ratios, and countries. There are about 15 factors and I am trying to use small multiples to create a large graph where the X and Y axes are the factors, ie, roughly:
Population
Num of Cars
Num of houses
Num of Houses Num of Cars Population
Where each intersection would be a plot of the values for each country (so, the plot at the intersection of # of Cars and # of Houses would be # of houses vs # of cars, etc). I currently have a data frame with the information with column headers: country, factors, ratios. I've tried using a few methods (facet_grid, facet_wrap, etc), but just can't get an output - when I run the script, a blank screen pops up. I haven't been able to figure out how to successfully google the type of small multiples plot i'm trying to create and am having a bit of trouble. I'm also brand new to R and have been stuck for a great many hours.
Any advice?
Edited: More information
some sample data:
factor country year ratio
1 LiteracyRate Afghanistan 2000 0.3622047
2 PostSecondarySchoolAgePopulation Afghanistan 2011 0.9272919
3 PrePrimaryEducationSchoolAgePopulation Afghanistan 2012 0.9397506
4 PrimaryEducationSchoolAgePopulation Afghanistan 2009 0.9344603
5 SecondarySchoolAgePopulation Afghanistan 2008 0.9301103
(I have this data for every country, and more factors than shown, also)
code that has been most successful so far:
try <- read.table(".../temper.csv", header = TRUE, sep = ",")
remr <- ggplot(try, aes(factor, ratio)) + geom_point()
remr + facet_grid(factor ~ factor)
Graph produced: http://www.flickr.com/photos/94273266#N05/11411186776/
Related
I'm fairly new to R and I've been having trouble with a plot.
I'm trying to create a line plot with:
$YEAR on the X axis
$METRIC on the Y axis
a different-colored line for each country (meaning, a total of 3 lines on the same plot)
$COUNTRY is a factor with 3 levels
COUNTRY YEAR METRIC
USA 2000 14.874
USA 2001 15.492
USA 2002 13.091
USA 2003 14.717
CAN 1999 15.031
CAN 2000 14.343
CAN 2001 12.972
CAN 2002 13.216
SWE 1999 14.771
SWE 2000 17.033
SWE 2001 15.932
SWE 2002 14.516
SWE 2003 15.655
When I create the plot with
plot(df$YEAR, df$METRIC, col=df$COUNTRY, type="p")
I get a plot with points for each (x,y) combination and different color for each level of the factor $COUNTRY
However, when I try to get a line for each country, with
plot(df$YEAR, df$METRIC, col=df$COUNTRY, type="l")
I get one non-stopping line, that starts with the 4 observations of "USA" and then goes back to the first year of the next country ("CAN").
Can anyone explain why is this happening?
Is it possible to create this plot using only the pre-built functions?
Thank you in advance for any assistance.
Other than my comments above, here is a basic base implementation. If initially your $COUNTRY is a factor (is.factor(df$COUNTRY)), then you can skip the creation of ctryfctr and change the lines call to lines(..., col=x$COUNTRY[1]):
df$ctryfctr <- factor(df$COUNTRY)
plot(NA, xlim=range(df$YEAR), ylim=range(df$METRIC))
for (x in split(df, df$COUNTRY)) lines(x$YEAR, x$METRIC, col=x$ctryfctr[1])
Since you seem to mix up some concepts, I thought it would be helpful to clarify things a bit.
R's base plot package is great for quick sketching without prior knowledge, but more complicated plots are defined easier with ggplot2 package. You can install it with install.packages("ggplot2"). With ggplot2 you can group the lines as you already tried, and as r2evans already pointed out.
library(ggplot2)
ggplot(df) + geom_line(aes(YEAR, METRIC, group=COUNTRY, color=COUNTRY))
So, you tell the ggplot that you are using the df as your data. You define the x and y axis for geom_line inside aes(). With group= you define the grouping variable, and with color= you define that each line is using a different color.
Hope that you have great time with R and ggplot2!
I would be extremely grateful for some help with R. I would like to plot a dataframe of gridded data (like for like running down the diagonal, from top left to bottom right). I've seen quite a few examples using ggplot2, however, I simply lack the experience necessary with R to manipulate the data structures; I've been programming in LISP and Java for years yet my head won't get around R :-(
The data looks like this:
tension cluster migraineNoAura migraineAura
tension NA 1.5 6.960453e+00 3.596953
cluster 1.943113e+08 NA NA NA
migraineNoAura 8.462798e+00 NA NA 7.499999
migraineAura 2.833333e+00 NA 7.148313e+07 NA
This is only a small subset, it's a 60x60 data frame. Notice the NAs.
I'm hoping for a 60x60 grid, coloured by the value and the x and y labeled using the names from the data frame.
First, you need to format your data frame from wide format to long format. The following is an example using tidyverse to format the data frame.
library(tidyverse)
dt2 <- dt %>%
rownames_to_column() %>%
gather(colname, value, -rowname)
head(dt2)
# rowname colname value
# 1 tension tension NA
# 2 cluster tension 1.943113e+08
# 3 migraineNoAura tension 8.462798e+00
# 4 migraineAura tension 2.833333e+00
# 5 tension cluster 1.500000e+00
# 6 cluster cluster NA
Now we are ready to use the ggplot2 to plot the heatmap using geom_tile.
ggplot(dt2, aes(x = rowname, y = colname, fill = value)) +
geom_tile()
I'm new to R, and am using Histograms for the first time. I need to construct a histogram chart to show the frequency of income for all 50 United States + District of Columbia.
This is the data given to me:
> data
X.Income. X.No.States.
1 -22.024 5
2 -25.027 13
3 -28.030 16
4 -31.033 9
5 -34.036 4
6 -37.039 2
7 -40.042 2
> hist(data$X.Income, col="red")
But that only produces a histogram of the number of frequency that each income amount appears in the graph, not the number of states that have that level of income. How do I account for the number of states that have each level of income in the chart?
Use a bar plot instead of a histogram, as the histogram expects to calculate the frequencies for you:
library(ggplot2)
# make some data to exercise
income = c(-22.024, -25.027, -28.030, -31.033, -34.036, -37.039,-40.042)
freq = c(5,13,16,9,4,2,2)
df <- data.frame(income, freq)
df <- names(c("income","freq"))
# the graph object
p <- ggplot(data=df) +
aes(x=income, y=freq) +
geom_bar(stat="identity", fill="red")
# call the object to view
p
I am using the xyplot in lattice trying to make a plot that shows temperature change over time in correlation with count data. I am not sure if ggplot2 would be better? My data is arrange like this:
Year (1998 1998 1999 2000 2001 2001 2002)
Low (2.777778 8.333330 10.555556 4.444444 26.388889 15.555556 12.500000)
Geese (2 14 10 16 7 10 15)
State (Arkansas California California California California Florida California)
I am stuck at this part of the code:
xyplot(c(geese,low)~year,subset=state=="California", par.settings=bwtheme, auto.key=TRUE)
The plot has the geese and low (temperature) as the same type of point and if I add a line there is no separation between the two. Please any help for this would be awesome.
To plot multiple series on the same plot, use + rather than c() to specify multiple y values. For example
xyplot(geese + low ~year, subset=state=="California", auto.key=TRUE, type="b")
That will produce
Data:
I have a data frame comprising 4 variables and about 300k rows including a unique account ID, a start date in yyyy-mm-dd, a start year, and the total number of months to-date the customer has held an account active. Snippet of the data below (don't let the row numbers confuse, this is obviously a subset, if more data is necessary, let me know):
> head(ten.by.id)
acct.id start_date strt.yr max_ten
1 155 1998-11-01 1998 175
19 902 2001-09-01 2001 143
39 995 2001-09-01 2001 143
59 1014 2000-10-01 2000 153
78 1017 2000-04-01 2000 160
100 1137 2000-11-01 2000 153
Problem (Why I want to render a faceted plot):
Showing a histogram of the entire dataset across all years renders the following:
Obviously, there are mixed distributions of information here, but the effect is unknown. First I thought I'd check for time domain effects with a visual. By using facets, I can provide a serial histogram of frequency distributions by year, overlaying the KDE plot for each year.
If multiple distributions were a product of something that occurred over time, I could spot check relevant shape changes (i.e. uni to multimodal). I used the code below to generate this plot:
maxten_time <- ggplot(ten.by.id, aes(max_ten))
+ geom_histogram(colour="grey19", fill="orange", binwidth=2, stat="bin")
+ scale_y_continuous(breaks=seq(0,12000,by=100))
+ scale_x_continuous(breaks=seq(0,180,by=45))
+ labs(title ="Serial Distribution of Max Length of Tenure for all Customers by Start Date", x="Max Tenure(months)", y="# of Customers", colour="blue")
+ facet_grid(. ~ strt.yr) + geom_density(fill=NA, colour="orange", cex=1) + aes(y = ..count..)
Which renders the following:
Questions for recreating the faceted plot:
What I wish to do is add a horizontal line (or some other single marker) to each facet which indicates
the total # of customer starts for each year. Can this be done in a faceted
plot?
I would like to add an additional axis that spans across the facets to
mark the number of months across all years (1 to 175). Am I reaching with ggplot to try to do this (i.e. since each facet is its own plot, would aligning the month markers across all facets even be possible)? I haven't seen any relevant examples on doing something quite like this.
The objective is merely to combine the horiz lines in each facet and the axis across facets into the entire plot. Any direction would be helpful.
Phillip