How to use ggplot with prop.table(table(x)? - r

First, I have a data with two categorical variables into like this:
nombre <- c("A","B","C","A","D","F","F","H","I","J")
sexo <- c(rep("man",4),rep("woman",6))
edad <- c (25,14,25,76,12,90,65,45,56,43)
pais <- c(rep("spain",3),rep("italy",4),rep("portugal",3))
data <- data.frame(nombre=nombre,sexo=sexo,edad=edad,pais=pais)
If I use:
prop.table(table(data$sexo,data$pais), margin=1)
I can see the relative frequency of the levels, for example for Italy (Man=0.25 Woman=0.5)
but the problem is that when I try to plot the prop.table(table(x)) I get something different
ggplot(as.data.frame(prop.table(table(data),margin=1)), aes(x=pais ,y =Freq, fill=sexo))+geom_bar(stat="identity")
On the Y axis from 0 to 3 and for example in the bar Italy (Woman=2 Man=2.5)
I don't need that (and I don't know what is showing), I want the same with as I had with the table of the prop.table(table(x))
I think the problem is something related with the margin=1
Thanks you!

You need to make the same table
tab = prop.table(table(data$sexo,data$pais), margin=1)
tab = as.data.frame(tab)
Then plot:
ggplot(tab,aes(x=Var2,y=Freq,fill=Var1)) + geom_col()
Or simply:
barplot(prop.table(table(data$sexo,data$pais), margin=1))

You're probably looking for something like position = "dodge"
If I run the following on your data :
P <- prop.table(table(data$sexo,data$pais), margin=1)
ggplot(as.data.frame(P), aes(x = Var2, y = Freq, fill = Var1)) +
geom_bar(stat="identity", position = "dodge")
I output the following graph :

Related

paired data for a facet_wrap

Imagine I have data foo below. Each row contains a measurement (y) on a species and each species is paired with another (species.pair). So in the example below, species a is paired with e, b with f, and so on. The number of observations for each species varies. I'd like to plot the density of each species's distribution along with its partner's distribution in its own facet. Below I hand coded this with the column sppPairs. The species are all unique and each has a match in species.pair. I'm unsure of how to make the grouping column sppPairs below. I'm sure there is some clever way to do this with {dplyr} but I can't figure out what to do. Some kind of pasting species to species.pair I imagine? Any help much appreciated.
foo <- data.frame(species = rep(letters[1:8],each=10),
species.pair = rep(letters[c(5:8,1:4)],each=10),
y=rnorm(80))
# species and species pair match exactly
all(unique(foo$species) %in% unique(foo$species.pair))
# what I want
foo$sppPairs <- c(rep("a:e",10),
rep("b:f",10),
rep("c:g",10),
rep("d:h",10),
rep("a:e",10),
rep("b:f",10),
rep("c:g",10),
rep("d:h",10))
p1 <- ggplot(foo,aes(y,fill=species))
p1 <- p1 + geom_density(alpha=0.5)
p1 <- p1 + facet_wrap(~sppPairs)
p1
Yes, you can use apply on the appropriate columns to paste the sorted elements together in the correct order (otherwise a:e is different from e:a and so on, and you end up with 8 groups instead of 4):
library(ggplot2)
foo <- data.frame(species = rep(letters[1:8], each = 10),
species.pair = rep(letters[c(5:8, 1:4)], each = 10),
y = rnorm(80))
foo$sppPairs <- apply(foo[c("species", "species.pair")], 1,
function(x) paste(sort(x), collapse = ":"))
ggplot(foo, aes(y, fill = species)) +
geom_density(alpha = 0.5) +
facet_wrap(~sppPairs)
Created on 2020-10-05 by the reprex package (v0.3.0)

How to read in Unicode code points into R data frame

I have a two-column text file of Unicode code points of interest (Greek symbols in this test, but any set of Unicode characters, generically):
$ cat ut.txt
\u0391 Α
\u0392 Β
\u0393 Γ
\u0394 Δ
\u0395 Ε
\u0396 Ζ
...
I'd like to read this into R, so that I can kick the tires on the typeface I am using to make plots that contain mathematical or other Unicode symbols.
As a minimally-reproducible start, I start by drawing a random sample from this Unicode table:
set.seed(42)
df <- data.frame(date = 1:10 , value = cumsum(runif(10 , max = 10)) )
ut <- read.table("ut.txt", allowEscapes=TRUE)
df$labels <- paste("\\", sample(ut$V1, size=10), sep="")
The head of the data frame looks like this:
date value labels
1 1 9.14806 \\u03A8
2 2 18.51881 \\u03BB
3 3 21.38021 \\u03C4
4 4 29.68469 \\u039C
5 5 36.10214 \\u03A6
6 6 41.29310 \\u03C2
When I plot from the labels column, R writes out the literal string, and not the Unicode character it represents:
library(ggplot2)
p <- ggplot(df, aes(x=date, y=value, label=labels))
p <- p + geom_line()
p <- p + ggtitle("5\u03BCg (\u03C7-squared test)") # control title
p <- p + geom_text()
library(Cairo)
ggsave("test.pdf", device=cairo_pdf)
Here is what the test plot looks like:
What I would like to see are Greek symbols at each point along the line, instead of their literal string equivalents.
How can I read a set of Unicode code points from a text file and use them directly?
Important note: I did test sampling from the second column of ut.txt, which works. However, I am specifically interested in learning what is required to correctly read in the encoded code point equivalent from a file.
Here's one approach using scale_shape_manual. I included the code of how I entered your data so I didn't have to read the text file
set.seed(42)
df <- data.frame(date = 1:10 , value = cumsum(runif(10 , max = 10)) )
df <- df[1:6, ]
## Following line stands in for what you read from `read.table`. In your solution, just use what you got from `read.table`
df$labels <- c("\u03A8", "\u03BB", "\u03C4", "\u039C", "\u03A6", "\u03C2")
library(ggplot2)
p <- ggplot(df, aes(x=date, y=value, shape = labels))
p <- p + geom_line()
p <- p + ggtitle("5\u03BCg (\u03C7-squared test)") # control title
p <- p + geom_point(size = 5) + scale_shape_manual(values = df$labels)
p
It also works with geom_text() (Infinite thanks to #astrofunkswag for its smart way to teach us how to include symbols properly):
library(ggplot2)
#Code
ggplot(df, aes(x=date, y=value))+
geom_line()+
ggtitle("5\u03BCg (\u03C7-squared test)")+
geom_text(label=c("\u03A8", "\u03BB", "\u03C4", "\u039C", "\u03A6", "\u03C2"))
Output:

Position dodge with geom_point(), x=continuous, y=factor

I have made a function that can plot the loadings from many factor analyses at once, also when their variables do not overlap perfectly (or at all). It works fine, but sometimes factor loadings are identical across analyses which means that the points get plotted on top of each other.
library(pacman)
p_load(devtools, psych, stringr, plotflow)
source_url("https://raw.githubusercontent.com/Deleetdk/psych2/master/psych2.R")
loadings.plot2 = function(fa.objects, fa.names=NA) {
fa.num = length(fa.objects) #number of fas
#check names are correct or set automatically
if (length(fa.names)==1 & is.na(fa.names)) {
fa.names = str_c("fa.", 1:fa.num)
}
if (length(fa.names) != fa.num) {
stop("Names vector does not match the number of factor analyses.")
}
#merge into df
d = data.frame() #to merge into
for (fa.idx in 1:fa.num) { #loop over fa objects
loads = fa.objects[[fa.idx]]$loadings
rnames = rownames(loads)
loads = as.data.frame(as.vector(loads))
rownames(loads) = rnames
colnames(loads) = fa.names[fa.idx]
d = merge.datasets(d, loads, 1)
}
#reshape to long form
d2 = reshape(d,
varying = 1:fa.num,
direction="long",
ids = rownames(d))
d2$time = as.factor(d2$time)
d2$id = as.factor(d2$id)
colnames(d2)[2] = "fa"
print(d2)
#plot
g = ggplot(reorder_by(id, ~ fa, d2), aes(x=fa, y=id, color=time, group=time)) +
geom_point(position=position_dodge()) +
xlab("Loading") + ylab("Indicator") +
scale_color_discrete(name="Analysis",
labels=fa.names)
return(g)
}
#Some example plots
fa1 = fa(iris[-5])
fa2 = fa(iris[-c(1:50),-5])
fa3 = fa(ability)
fa4 = fa(ability[1:50,])
loadings.plot2(list(fa1,fa1,fa2))
Here I've plotted the same object twice just to show the effect. The plot has no red points because the green ones from fa.2 are on top. Instead, I want them to be dodged on the y-axis. However, position="dodge" with various settings does not appear to make a difference.
However, position="jitter" works, but it is random, so sometimes it does not work well as well as makes the plot chaotic to look at.
How do I make the points dodged on the y-axis?
Apparently, you can only dodge sideways, but there is a workaround. The trick is to flip your x and y, do the position_dodge, and then do a coord_flip().
g = ggplot(data = reorder_by(id, ~ fa, d2), aes(x=id, y=fa, color=time, group=time)) +
geom_point(position=position_dodge(width = .5)) +
xlab("Loading") + ylab("Indicator") +
scale_color_discrete(name="Analysis",
labels=fa.names) +
coord_flip()
Possible duplicate
In the linked post, the right answer states that one must use position_jitter() instead of position_dodge(). It has worked for me.

ggplot2: add stat_function for particular domain?

I'd like to add a curve to a plot I'm making with ggplot, but I only want the curve to appear for a particular domain.
I've tried various approaches using stat_function:
data <- data.frame(Date = ..., cases = ...)
end_date <- ... ## calculated from a date (e.g., Sys.Date()) minus an offset
start_date <- ... ## end_date - some offset
p1 <- ggplot(data) + aes(x=Date, y=cases) + ... ## data has Date, cases columns
p1 + stat_function(...something..., fun=function(t) ...)
where for something I've tried to put a new, subsetted chunk of data:
data = data[(start_date <= data$Date) & (data$Date <= end_date),] ## no change
and a new aes
aes = aes(xmin = start_date, xmax = end_date)
## error - thinks start_date / end_date don't exist,
## though they are declared earlier
Any suggestions? I've also fiddled around with annotate("path", ...) but nothing concrete there. I feel like this should be something easy, I just don't have my head around the "ggplot way" to make it happen.
It may also be relevant that I'm making these plots in a shiny application, though aside from funny crap w/ data.table, I haven't noticed that affecting anything.
The following seems to work, though it still feels very hacky to me:
data$fit <- ... # evaluate function on Date
relrows <- (start_date <= data$Date) & (data$Date <= end_date)
p1 <- p1 + annotate("line", y=data$fit[relrows], x=data$Date[relrows])
Try adding another label as a new column in your dataframe.
df$newlabel[(start_date <= data$Date) $ (data$Date <= end_date)]<-a
then add groups to your ggplot
p1 <- ggplot(data)
+ aes(x=Date, y=cases, group=newlabel, colour=newlabel)
+ geom_point()
+ stat_smooth(method = "lm", formula = y ~ poly(x,2), size=1)

Plot contours by groups on map with ggmap/ggplot2

So I think I have a pretty simple question, but I can't find the answer anywhere.
I have a lot of data containing catches of lobsters. It all pretty much looks like this.
Trip.ID Latitude Longitude DateTime ML6 TotalNephropsLandings
16409 OTB_CRU_32-69_0_0DK102831 57.931 9.277 2012-10-04 19:02:00 OTB_CRU_32-69_0_0 0.2188619
16410 OTB_CRU_32-69_0_0DK102831 57.959 9.375 2012-10-04 21:02:00 OTB_CRU_32-69_0_0 0.2188619
16411 OTB_CRU_32-69_0_0DK102831 58.201 10.232 2012-10-04 02:00:00 OTB_CRU_32-69_0_0 0.2188619
16412 OTB_CRU_32-69_0_0DK102831 58.208 10.260 2012-10-04 03:00:00 OTB_CRU_32-69_0_0 0.2188619
16413 OTB_CRU_32-69_0_0DK102831 58.169 10.078 2012-10-03 23:00:00 OTB_CRU_32-69_0_0 0.2188619
16414 OTB_CRU_32-69_0_0DK102831 57.919 9.227 2012-10-04 18:00:00 OTB_CRU_32-69_0_0 0.2188619
What I would like to do is simply make a map with contours around areas based on the "ML6" column, which are different tools used for fishing.
I tried using geom_density2d, which looks like this:
However I really don't want to show density, only where they are present. So basically one line around a group of coordinates that are from the same level in ML6. Could anyone help me with this?
It would also be nice to have the alternative to fill these in as polygons as well. But perhaps that could simple be accomplished using "fill=".
If anyone knows how to do this without R, you are also welcome to help, but then I would possibly need more in depth information.
Sorry for not producing more of my data frame...
Of course I should have produced the code I had for the plot, so here it is basically:
#Get map
map <- get_map(location=c(left= 0, bottom=45, right=15 ,top=70), maptype = 'satellite')
ggmap(map, extent="normal") +
geom_density2d(data = df, aes(x=Longitude, y=Latitude, group=ML6, colour=ML6))
There are probably better way of doing this work. But, here is my approach for you. I hope this approach works with ggmap as well. Given time I have, this is my best for you. Since your sample data is way too small, I decided to use a part of my own data. What you want to do is to look into ggplot_build(objectname)$data[1]. (It seems that, when you use ggmap, data would be in ggplot_build(object name)$data[4].) For example, create an object like this.
foo <- ggmap(map, extent="normal") +
geom_density2d(data = df, aes(x=Longitude, y=Latitude, group=ML6, colour=ML6))
Then, type ggplot_build(foo)$data[1]. You should see a data frame which ggplot is using. There will be a column called level. Subset data with the minimum level value. I am using filter from dplyr. For example,
foo2 <- ggplot_build(foo)$data[1]
foo3 <- filter(foo2, level == 0.02)
foo3 now has data point which allows you to draw lines on your map. This data has the data points for the most outer circles of the level. You would see something like this.
# fill level x y piece group PANEL
#1 #3287BD 0.02 168.3333 -45.22235 1 1-001 1
#2 #3287BD 0.02 168.3149 -45.09596 1 1-001 1
#3 #3287BD 0.02 168.3197 -44.95455 1 1-001 1
Then, you would do something like the following. In my case, I do not have googlemap. I have a map data of New Zealand. So I am drawing the country with the first geom_path. The second geom_path is the one you need. Make sure you change lon and lat to x and y like below.In this way I think you have the circles you want.
# Map data from gadm.org
NZmap <- readOGR(dsn=".",layer="NZL_adm2")
map.df <- fortify(NZmap)
ggplot(NULL)+
geom_path(data = map.df,aes(x = long, y = lat, group=group), colour="grey50") +
geom_path(data = foo3, aes(x = x, y = y,group = group), colour="red")
UPDATE
Here is another approach. Here I used my answer from this post. You basically identify data points to draw a circle (polygon). I have some links in the post. Please have a look. You can learn what is happening in the loop. Sorry for being short. But, I think this approach allows you to draw all circles you want. Remind that the outcome may not be nice smooth circles like contours.
library(ggmap)
library(sp)
library(rgdal)
library(data.table)
library(plyr)
library(dplyr)
### This is also from my old answer.
### Set a range
lat <- c(44.49,44.5)
lon <- c(11.33,11.36)
### Get a map
map <- get_map(location = c(lon = mean(lon), lat = mean(lat)), zoom = 14,
maptype = "satellite", source = "google")
### Create pseudo data.
foo <- data.frame(x = runif(50, 11.345, 11.357),
y= runif(50, 44.4924, 44.4978),
group = "one",
stringsAsFactors = FALSE)
foo2 <- data.frame(x = runif(50, 11.331, 11.338),
y= runif(50, 44.4924, 44.4978),
group = "two",
stringsAsFactors = FALSE)
new <- rbind(foo,foo2)
### Loop through and create data points to draw a polygon for each group.
cats <- list()
for(i in unique(new$group)){
foo <- new %>%
filter(group == i) %>%
select(x, y)
ch <- chull(foo)
coords <- foo[c(ch, ch[1]), ]
sp_poly <- SpatialPolygons(list(Polygons(list(Polygon(coords)), ID=1)))
bob <- fortify(sp_poly)
bob$area <- i
cats[[i]] <- bob
}
cathy <- as.data.frame(rbindlist(cats))
ggmap(map) +
geom_path(data = cathy, aes(x = long, y = lat, group = area), colour="red") +
scale_x_continuous(limits = c(11.33, 11.36), expand = c(0, 0)) +
scale_y_continuous(limits = c(44.49, 44.5), expand = c(0, 0))

Resources