Apply function to all possible values of a variable - r

I would like to get as many plots as factors/values in a variable.
For example, I would like to plot the following variables (v1, v2, v3, v4, v5, v6, v7, v8) that I have defined as a scale for all possible values on the variable country. So i get, in that case, a total of three different plots.
I know how to plot it separately, for example in this cases I would have used the following:
basicgraph(Data[country==1, scale1] )
basicgraph(Data[country==2, scale1] )
basicgraph(Data[country==3, scale1] )
I would like my function to plot as many graphs as factors/values (without specifying the number of factors/values). I have tried with "apply" but i can't really make it work, so any clue could be good for me.
I have a dataset that looks like:
v1 v2 v3 v4 v5 v6 v7 v8 country
1 NA NA NA NA NA NA NA NA 1
2 5 5 5 5 5 4 5 5 2
3 4 5 3 5 4 5 5 5 3
4 5 5 5 4 2 4 4 5 1
5 4 3 5 4 4 5 4 5 2
6 5 5 5 2 3 4 3 5 3
7 NA NA NA NA NA NA NA NA 1
8 3 5 5 5 4 5 4 4 2
9 4 5 5 4 5 5 4 5 3
10 2 4 4 5 4 5 4 5 1
11 4 5 5 3 4 4 4 5 2
12 4 5 4 4 5 4 4 5 3
13 5 5 4 3 3 5 5 5 1
14 3 5 1 2 3 1 4 5 2
Ihave defined the scale as:
scale1 <- names(Data) %in% c( "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8")
I have defined a plot function by:
basicgraph<-function(df, title, lab)
{
for(i in 1:length(df))
{
y <- melt(df)
z <- with(y, as.data.frame(table(variable, value, exclude = NULL)))
z <- z[!is.na(z$variable), ]
z$scale <- z$variable
levelss<-levels(z$variable)
}
theme_nogrid <- function (base_size = 12, base_family = "")
{
theme_bw(base_size = base_size, base_family = base_family) %+replace%
theme(panel.grid = element_blank()) +
theme(axis.text.x =element_text(size = base_size * 0.8 , lineheight = 0.9,
vjust = 0.5, hjust=1, angle=90))
}
plot1<-function(z) {
ggplot(data = z, aes(x = variable, y = value, size = Freq))+
geom_point(aes(size = Freq, stat = "identity", position = "identity"), shape = 20, color="black", alpha=0.6) +
scale_size_continuous(range = c(3,15)) +
scale_x_discrete(breaks=levelss,labels=lab)+
xlab("")+ #Afegir/canviar títol eix x
ylab("Response")+ #Afegir/canviar títol eix y
ggtitle(title)+ #Títol a dalt
theme_nogrid()
}
}

This is a pretty confusing question and example. I think you want to produce a different graph for each country value? In that case I'd suggest something like this:
library(reshape2)
Data_m <- melt(Data, id.vars="country") # melt the data into 'long' format
f <- function(d) { # function that produces a graph and waits
print(qplot(variable, value, data=d) + ggtitle(unique(d$country)))
readline()
}
library(plyr)
d_ply(Data_m, .(country), f) # produces three separate graphs
The d_ply call splits Data_m into three parts and repeatedly calls f on each, producing a graph of that subset of the data, without knowing anything about the data being graphed.

Related

Graph multiple geom_lines with varied timestamped data over multiple days

I have data collected over multiple days, with timestamps that contain information for when food was eaten. Example dataframes:
head(Day3)
==================================================================
Day3.time Day3.Pellet_Count
1 18:05:30 1
2 18:06:03 2
3 18:06:34 3
4 18:06:40 4
5 18:06:52 5
6 18:07:03 6
head(Day4)
==================================================================
Day4.time Day4.Pellet_Count
1 18:00:21 1
2 18:01:34 2
3 18:02:22 3
4 18:03:35 4
5 18:03:54 5
6 18:05:06 6
Given the variability, the timestamps don't line up and therefore aren't matched. I've done a "full join" with merge from all of the data from two of the days, in the following way:
pellets <- merge(Day3, Day4, by = 'time', all=TRUE)
This results in the following:
head(pellets)
==================================================================
pellets.time pellets.Pellet_Count.x pellets.Pellet_Count.y
1 02:40:18 39 NA
2 18:00:21 NA 1
3 18:01:34 NA 2
4 18:02:22 NA 3
5 18:03:35 NA 4
6 18:03:54 NA 5
I would like to plot the Pellet_Count in one line graph from each of the days, but this is making it very difficult to group the data. My approach thus far has been:
pelletday <- ggplot() + geom_line(data=pellets, aes(x=time, y=Pellet_Count.x)) +
geom_line(data=pellets, aes(x=time, y=Pellet_Count.y))
But, I get this error:
geom_path: Each group consists of only one observation. Do you need to adjust the group
aesthetic?
I also would like to be able to merge all days (I oftentimes have up to 9 days) and plot it on the same graph.
I believe my goal is to ultimately get the following dataframe output:
==================================================================
pellets.time Pellet_Count Day
1 02:40:18 39 3
2 18:00:21 1 4
3 18:01:34 2 4
4 18:02:22 3 4
5 18:03:35 4 4
6 18:03:54 5 4
and to use this to graph:
ggplot(pellets, aes(time, Pellet_Count, group=Day)
Any ideas?
There's a couple of issues here
Firstly have you tried using rbind() or bind_rows() rather than merge.
This seems like a more natural fit for what you're trying to do. With a merge or some other join, you are effectively trying to bring new information into your data table. Most often you are trying to bring in new columns
But here you are really trying to append days' data together, you're not actually adding a new column.
So this is my attempt at replicating what you're describing above
Day3 <- tibble(
Day3.time = c('18:05:30', '18:06:03', '18:06:34',
'18:06:40', '18:06:52', '18:07:03'),
Day3.Pellet_Count = c(1, 2, 3, 4, 5, 6)) %>%
mutate(day = '3') %>%
rename(time = Day3.time)
Day4 <- tibble(
Day4.time = c('18:00:21', '18:01:34', '18:02:22',
'18:03:35', '18:03:54', '18:05:06'),
Day4.Pellet_Count = c(1, 2, 3, 4, 5, 6)) %>%
mutate(day = '4') %>%
rename(time = Day4.time)
pellets <- merge(Day3, Day4, by = 'time', all=TRUE)
time Day3.Pellet_Count day.x Day4.Pellet_Count day.y
1 18:00:21 NA <NA> 1 4
2 18:01:34 NA <NA> 2 4
3 18:02:22 NA <NA> 3 4
4 18:03:35 NA <NA> 4 4
5 18:03:54 NA <NA> 5 4
6 18:05:06 NA <NA> 6 4
7 18:05:30 1 3 NA <NA>
8 18:06:03 2 3 NA <NA>
9 18:06:34 3 3 NA <NA>
10 18:06:40 4 3 NA <NA>
11 18:06:52 5 3 NA <NA>
12 18:07:03 6 3 NA <NA>
And here is how you would work with bind_rows(), (rbind works the same) this should get you more useful data to work with
pettets <- bind_rows(Day3 %>%
+ rename(Pellet_Count = Day3.Pellet_Count),
+ Day4 %>%
+ rename(Pellet_Count = Day4.Pellet_Count))
> pettets
# A tibble: 12 x 3
time Pellet_Count day
<chr> <dbl> <chr>
1 18:05:30 1 3
2 18:06:03 2 3
3 18:06:34 3 3
4 18:06:40 4 3
5 18:06:52 5 3
6 18:07:03 6 3
7 18:00:21 1 4
8 18:01:34 2 4
9 18:02:22 3 4
10 18:03:35 4 4
11 18:03:54 5 4
12 18:05:06 6 4
Secondly you probably need to find a way to handle the dates. So with your Ggplot code a big problem is that you are passing characters where you want to pass date / time data. to get a useful datetime format I think you'll need to have the date.
You first need to convert your data from 'wide' to 'long' format (see example here). After this, you should be able to use ggplot (looks like you tried to use base R plot logic here with lines but it doesn't work with ggplot).
For example:
pellets %>% gather("day", "count", -pellets.time) %>% na.omit()
All together it will be:
pellets %>% rename(Day3 = pellets.Pellet_Count.x, Day4 = pellets.Pellet_Count.y) %>% gather("day", "count", -pellets.time) %>% na.omit() %>% ggplot() + geom_point(aes(x=pellets.time, y=count, col=day))
(I added rename to match your preferred output)

Plotting multiple bar plots on same y-axis but each on separate x-axis in ggplot2 for count data

I have some count variables against which I want to make bar-plots on the same y-axis but I have no grouping variable. Something like the following plot
B <- 25
iter_M1
[1] 5 13 14 11 7 8 10 14 10 5 7 13 10 12 4 5 9 6 5 12 8 8 7 11 9
max_M1 <- max(iter_M1)
count_M1 <- integer(max_M1)
for(i in 1:max_M1)
{
for(j in 1:B)
{
if(iter_M1[j] == i)
count_M1[i] = count_M1[i] +1
}
}
count_M1
[1] 0 0 0 1 4 1 3 3 2 3 2 2 2 2
df <- data.frame(x = 1:max_M1, y = count_M1)
p_M1 <-ggplot(data=df, aes(x=x, y=y)) + geom_bar(stat="identity")
p_M1
This results in a plot like this
and another similar variable
iter_M2
[1] 3 1 3 2 6 3 4 4 3 7 4 2 2 3 4 3 4 4 1 3 7 3 2 4 2
max_M2 <- max( iter_M2)
count_M2 <- integer(max_M2)
for(i in 1:max_M2)
{
for(j in 1:B)
{
if(iter_M2[j] == i)
count_M2[i] = count_M2[i] +1
}
}
count_M2
[1] 2 5 8 7 0 1 2 df1 <- data.frame(x1 = 1:max_M2, y1 = count_M2)
p_M2 <-ggplot(data=df1, aes(x=x1, y=y1)) +
geom_bar(stat="identity") p_M2
which results in a second plot as
and similar variables like these... How can I plot this data side by side. Also the way I'hv generated data currently, there is no common y-axis for all x-axis. Are there some suggestion to generate such a plot or dataset in other format to achive the requried plot.
As suggested in the comments, making a factor (class) is the easiest way, allowing you to facet the plot.
But you seem explicitly just to want to have the same y-axis. This is achievable with the scale limits. For example, generate a vector with the limits based on max and then use this in your plots.
ylimits <- c(0, max(c(count_M1, count_M2)))
p_M1 + ylim(ylimits)
p_M2 + ylim(ylimits)

Plotting several X,Y column pairs as data series, while excluding (0,0) points

I'm trying to plot three data series in a single plot. The X and Y coordinates of each series are in separate columns in my data frame:
X1 Y1 X2 Y2 X3 Y3
1 0 1 0 2 0 3
2 1 2 1 3 1 4
3 2 3 2 4 2 5
4 3 4 3 5 3 6
5 4 5 4 6 4 7
6 5 6 5 7 5 8
7 6 7 6 8 6 9
8 0 0 7 9 7 8
9 0 0 8 8 0 0
10 0 0 9 7 0 0
Since the trailing (0,0) data points of each series are invalid, only this subset of points should eventually be plotted:
X1 Y1 X2 Y2 X3 Y3
1 0 1 0 2 0 3
2 1 2 1 3 1 4
3 2 3 2 4 2 5
4 3 4 3 5 3 6
5 4 5 4 6 4 7
6 5 6 5 7 5 8
7 6 7 6 8 6 9
8 7 9 7 8
9 8 8
10 9 7
Additionally, the X-axis of the first series should be inverted:
Even without cleaning up with data frame first, I struggled to plot the column pairs as individual series in ggplot2 (see 'legend').
require(ggplot2)
report <- function(df){
plot = ggplot(data=df, aes(x=-X1, y=Y1, size=3)) + #inverted X-axis of series 1
layer(geom="point") +
geom_point(aes(X2, Y2, colour="red", size=2)) +
geom_point(aes(X3, Y3, colour="blue", size=1)) +
xlab("X") + ylab("Y")
print(plot)
}
X1 = c(0,1,2,3,4,5,6,0,0,0)
Y1 = c(1,2,3,4,5,6,7,0,0,0)
X2 = c(0,1,2,3,4,5,6,7,8,9)
Y2 = c(2,3,4,5,6,7,8,9,8,7)
X3 = c(0,1,2,3,4,5,6,7,0,0)
Y3 = c(3,4,5,6,7,8,9,8,0,0)
df <- data.frame(X1,Y1,X2,Y2,X3,Y3)
colnames(df) <- c("X1","Y1","X2","Y2","X3","Y3")
report(df)
What would be the best way to get rid of the invalid (0,0) data points in each series, and how should I plot them properly?
I think you actually want to transform your data.frame in order to make your ggplot call more concise. Here is the updated version to plot your data correctly using the dplyr package to transform the data.
In response to comment requesting additional info on dplyr. It provides the %>% operator which simply passed the argument to the left into the function on the right as the first argument. It allows for much more readable R code. The mutate function adds the Series variable via a manual setting of the variable given the knowledge of which points are part of which series. Then the filter function removes the 0,0 points which you indicated were not wanted. You can inspect the df after these operations to see the final output. Hope this helps interpret the below code. Also here is a link to the dplyr page.
library(dplyr)
df <- rbind.data.frame(
data.frame(X=-X1, Y=Y1),
data.frame(X=X2, Y=Y2),
data.frame(X=X3, Y=Y3))
df <- df %>%
mutate(Series=rep(c('S1', 'S2', 'S3'), each=10)) %>%
filter(!(X == 0 & Y == 0))
png('foo.png')
ggplot(df) + geom_point(aes(x=X, y=Y, color=Series, size=Series))
dev.off()
Also if you want to manual set the values of color and size as well as adding the lines as in your ideal example plot, here is a more complex ggplot command:
ggplot(df, aes(x=X, y=Y, color=Series, size=Series)) +
geom_point() + geom_line(size=1) + theme_bw() +
scale_color_manual(values=c('black', 'red', 'blue')) +
scale_size_manual(values=seq(4,2,-1))

R: Order of points and lines within geom in ggplot2

I am trying to plot a dataframe in ggplot and am having trouble getting the points and lines to display in the desired order.
The data is split based on the same column (of factors 0 or 1) and I want 0 to plot over 1 for both lines and points (which use data from 4 other separate columns).
I have made a test data frame below to illustrate my point. My real dataframe has thousands of points, and I want to plot a number of dataframes so don't really want to use a work around like subsetting my data and plotting as separate layers/geoms.
testdata <- data.frame(Split = c(rep(0,5), rep(1,5)), a = rep(1:5,2),
b = c(7,8,9,10,11,6,8,9,10,12), x = c(1:5, 1:5), y = c(1:3,5,6,1.1,2.1,4.1,5.1,7.1))
testdata$Split <- factor(testdata$Split)
ggplot(data = testdata)+
geom_point(aes(x = x, y = y, colour = Split), size = 4)+
geom_line(aes(x = a, y = b, colour = Split))
testdata$Split <- ordered(testdata$Split, levels = rev(levels(testdata$Split)))
When i run the line of code to reverse the order of my levels, it swaps which of my lines is brought to the front, but not which set of points. So initially both the points and line relating to Split = 0 are behind, however when I reverse the order the line from Split = 0 is infront (what I want) but the points for Split = 0 remain behind the points for Split = 1.
Any idea what's going on here and how I can get this to work would be appreciated.
Thanks
After investigating the situation for some time, this is what I found and suggest. In short, I believe that the solution is to assign the value 2 to 0 in unclass().
foo <- data.frame(split = rep(c("0", "1"), each = 5),
a = rep(1:5,2),
b = c(7,8,9,10,11,6,8,9,10,12),
x = c(1:5, 1:5),
y = c(1:3,5,6,1.1,2.1,4.1,5.1,7.1),
stringsAsFactors=F)
In order to assign 2 in unclass() to 0 in split, I did the following.
foo <- arrange(foo, desc(split))
foo$split <- as.factor(foo$split)
#> str(foo)
#'data.frame': 10 obs. of 5 variables:
# $ split: Factor w/ 2 levels "0","1": 2 2 2 2 2 1 1 1 1 1
# $ a : int 1 2 3 4 5 1 2 3 4 5
# $ b : num 6 8 9 10 12 7 8 9 10 11
# $ x : int 1 2 3 4 5 1 2 3 4 5
# $ y : num 1.1 2.1 4.1 5.1 7.1 1 2 3 5 6
Once again, 0 has 2 in unclass().
#> unclass(foo$split)
# [1] 2 2 2 2 2 1 1 1 1 1
#attr(,"levels")
#[1] "0" "1"
Now I run the following. q (for points) has the ideal outcome. But q2 (for lines) does not.
q <- ggplot(data = foo, aes(x = x, y = y, colour = split))+
geom_point(size = 6)
q2 <- ggplot(data = foo, aes(x = a, y = b, colour = split))+
geom_line()
So, I reversed the factor order and see what happens.
### Reorder the factor levels.
foo$split <- ordered(foo$split, rev(levels(foo$split)))
#> str(foo)
#'data.frame': 10 obs. of 5 variables:
#$ split: Ord.factor w/ 2 levels "1"<"0": 1 1 1 1 1 2 2 2 2 2
#$ a : int 1 2 3 4 5 1 2 3 4 5
#$ b : num 6 8 9 10 12 7 8 9 10 11
#$ x : int 1 2 3 4 5 1 2 3 4 5
#$ y : num 1.1 2.1 4.1 5.1 7.1 1 2 3 5 6
#> unclass(foo$split)
#[1] 1 1 1 1 1 2 2 2 2 2
#attr(,"levels")
#[1] "1" "0"
Both q3 and q4 got the correct outcomes.
q3 <- ggplot(data = foo, aes(x = x, y = y, colour = split))+
geom_point(size = 6)
q4 <- ggplot(data = foo, aes(x = a, y = b, colour = split))+
geom_line()
So, Here is the final form.
ggplot(data = foo)+
geom_point(aes(x = x, y = y, colour = split), size = 6)+
geom_line(aes(x = a, y = b, colour = split))

Why doesn't qplot plot lines in multiple series for this data file?

It's my first day learning R and ggplot. I've followed some tutorials and would like plots like are generated by the following command:
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
It looks like the figure on this page:
http://www.r-bloggers.com/quick-introduction-to-ggplot2/
I had a handmade test data file I created, which looks like this:
site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7
but when I try to read and plot it with:
test <- read.table('test.data')
qplot(temp, humidity, data = test, color=site, geom = c("point", "line"))
the lines on the plot aren't separate series, but link together:
http://imgur.com/weRaX
What am I doing wrong?
Thanks.
You need to tell ggplot2 how to group the data into separate lines. It's not a mind reader! ;)
dat <- read.table(text = " site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7",sep = "",header = TRUE)
qplot(temp, humidity, data = dat, group = site,color=site, geom = c("point", "line"))
Note that you probably also wanted to do color = factor(site) in order to force a discrete color scale, rather than a continuous one.

Resources