How to make a BW plot to show the data - r

I have extracted seeds from the seed bank , under the tree crown and 3 m away the crown. I have these data for three study sites in two countries, south Australia and Sri Lanka (a part of the data is attached). The script I used to develop a BW plot using lattice is given below. In fact I have prepared two plots here separately for the two countries. I want to develop this graph. I want to show data of one country (South Australia) on one side of the plot(beneath crown and 3m away in 2 colors) and the other side the other country (Sri Lanka) same two colors to show beneath crown and 3 m away.
setwd("E:/Research/Fieldwork SL-data/Seed bank/analysis")
seed.bank <- read.csv(file="seedbank_rev.csv", header=TRUE, sep=',')
attach(seed.bank)
names(seed.bank)
## [1] "seed.no." "location" "study.site" "country"
seed.bank1<-seed.bank[!(country=="Sri Lanka"),]
seed.bank2<-seed.bank[!(country=="South Australia"),]
library("lattice")
bwplot(log(seed.no.) ~ study.site | location, data=seed.bank1, xlab="Study Sites in South Australia", ylab="log(seed number)")
bwplot(log(seed.no.) ~ study.site | location, data=seed.bank2, xlab="Study Sites in Sri Lanka", ylab="log(seed number)")

A simple R base solution to what seems to be your problem is a barplot (since you have categorical variables):
# define dataframe:
df <- data.frame(
location = c(rep("beneath",1), rep("at_distance",4),rep("beneath",4), rep("at_distance",3)),
country = c(rep("SA",6),rep("SL",6))
)
# check structure:
str(df)
This step reveals that the data are formatted as factors. These need to be converted to characters to obtain frequency counts:
# convert factors to characters:
df <- lapply(df, as.character)
# make frequency table:
freq_seeds <- table(df$country, df$location)
Now you are ready to plot the data; the key argument to place the corresponding bars side by side is beside=TRUE:
# define plotting region:
par(mfrow=c(1,1), mar=c(4,4,4,4))
# barplot:
barplot(freq_seeds, beside=T, main="Seeds", col=c("blue", "green"))
# draw legend:
legend("topright", c("South Australia", "Sri Lanka"), fill=c("blue", "green"), col=c("blue", "green"))

Related

us states plot in R with election results

I have plotted a figure of the US states in R.
Here is the very simple code:
library(usmap)
library(ggplot2)
plot_usmap(region = 'states')
And here is the resulting figure:
Figure of US states in R - states are not colored
Furthermore, I have a csv file containing the names of the states in US, and a color value, equal to red if that state voted for Republicans or blue if the state voted for Democrats. This is the top 5 rows of the CSV file:
State
Color
Alabama
#E81B23
Alaska
#E81B23
Arizona
#1405bd
Arkansas
#E81B23
How can I fill the states of my figure based on the colors in the CSV file?
To color the regions specified in the plot_usmap() function, you can provide your data via data= and then set the values= argument to the column in your data used for mapping the colors.
Here's an example with some randomly-generated data. The plot_usmap() is using a dataset that includes the 50 US states + the District of Columbia, so you'll want to make sure they are all in your dataset or you may get some NA labels.
library(usmap)
library(ggplot2)
set.seed(1234)
color_data <- data.frame(
state = c(state.name, "District of Columbia"),
the_colors = sample(c("A", "B"), size=51, replace=TRUE)
)
plot_usmap(
region = "states",
data = color_data,
values = "the_colors",
color="white"
) +
scale_fill_manual(values=c("#E81B23", "#1405bd"))
Note that I think the lines between the states look good in white, so color="white" fixes that. You may also notice that you typically don't specify the actual color in the dataframe - you can specify that via scale_fill_manual(values=...). In your case, you can use scale_fill_identity().
For your data, just make sure the "States" column in your dataset is renamed "state" and it should work.

R - Multiple Columns on one single Scatterplot

Would you mind taking a look at this?
https://docs.google.com/spreadsheets/d/14vVWxhaQynPmnAsZHlrkkdeJTt0XlDzHc5JSd4DNF-Y/edit?usp=sharing
I have three variables; first one for Year from 2000 - 2017, second one for each country's GDP over the 2000-2017 and the third for soccer ranking over the 2000-2017.
I would like to draw one giant scatter plot; Year 2000-2017 on X-axis, Rank reversed starting from 200 on bottom to 1 on top on Y-axis while each scatter point size vary with GDP size.
All I can come up with is plotting a scatter plot for one country only:
rank <- read.csv("Test1.csv", sep=",", header=TRUE)
library(ggplot2)
qplot(Year, Rank , data = rank, size = Aruba)
But I would like to fit all the countries into one scatter plot while y-axis being reversed and draw a linear regression of all scatter points if possible.
Can someone help me on this?
I am not sure how you want the regression done. But here is the graph.
Edits: Because there is a country named "Rankmibia" which I never heard of, select by prefix won't work, I used position this time.
rank <- read.csv("Test1.csv", sep=",", header=TRUE)
library(tidyr)
library(ggplot2)
library(dplyr)
r=rank %>% select(seq(3,ncol(rank),2)) %>% gather(id,rank)
g=rank %>% select(1,seq(2,ncol(rank),2)) %>% gather(country,GDP,-Year)
df=cbind(g, rank=r$rank)
g=qplot(Year, rank , data = df, size = GDP, color=country)+scale_y_reverse()
ggsave("fig.png",g,width=40,height=20)

How to specify colors manually in ggplot2 within a for loop?

I am building some code to plot energy supply vs. energy demand for bird populations with user-defined foraging guilds. I don't know in advance how many guilds a user will specify, and I need to create guild-specific plots. I would like to be able to specify the colors of the lines for energy supply and demand. Here's my data:
guilds <- c("ducks","geese") #I have 2 foraging guilds
dayz <- c(1,2,3,4,5,6) #The plot will span 6 days
guild1_supply <- c(100,120,150,130,110,70) #energy available to guild 1
guild2_supply <- c(70,90,110,120,100,80) #energy available to guild 2
supply_by_guild <- cbind(guild1_supply,guild2_supply)
guild1_demand <- c(10,80,120,130,70,20) #energy demand for guild 1
guild2_demand <- c(5,45,75,60,30,0) #energy demand for guild 2
demand_by_guild <- cbind(guild1_demand,guild2_demand)
setwd("C:/Users/XXX) #Set working directory for figure output
Now I create guild-specific plots in a for loop.
for (i in 1:length(guilds)) { # use a for loop to create data frames
temp <- data.frame(cbind(dayz,supply_by_guild[,i],demand_by_guild[,i]))
names(temp)[2] <- "Food Energy Supply" # rename variable
names(temp)[3] <- "Food Energy Demand" # rename variable
temp <- melt(temp, id.vars = c("dayz")) # melt data frame to long format
names(temp)[2] <- "Legend" # rename variable
names(temp)[3] <- "energy" # rename variable
jpeg(paste("Fig",i+4,".jpg",sep=""),width=2000, height=1000, res=300)
print(ggplot(data=temp, aes(x=dayz, y=energy,group=Legend, color=Legend)) + geom_line() + xlab("Day") + ylab(paste("Energy (",energy_unit,")",sep="")) + ggtitle(paste("Figure ",i+4,". Daily Food Energy Supply and Demand: ",guilds[i],sep="")))
dev.off()
}
This above code works, but when I try to manually specify the colors of the lines by adding + scale_color_manual(values=c("cyan4", "orangered1")), I get the error "non-numeric argument to binary operator." Any help much appreciated.

Ordering the axis labels in geom_tile

I have a data frame containing order data for each of 20+ products from each of 20+ countries. I have put it in a highlight table using ggplot2 with code similar to this:
require(ggplot2)
require(reshape)
require(scales)
mydf <- data.frame(industry = c('all industries','steel','cars'),
'all regions' = c(250,150,100), americas = c(150,90,60),
europe = c(150,60,40), check.names = FALSE)
mydf
mymelt <- melt(mydf, id.var = c('industry'))
mymelt
ggplot(mymelt, aes(x = industry, y = variable, fill = value)) +
geom_tile() + geom_text(aes(fill = mymelt$value, label = mymelt$value))
Which produces a plot like this:
In the real plot, the 450 cell table very nicely shows the 'hotspots' where orders are concentrated. The last refinement I want to implement is to arrange the items on both the x-axis and y-axis in alphabetical order. So in the plot above, the y-axis (variable) would be ordered as all regions, americas, then europe and the x-axis (industry) would be ordered all industries, cars and steel. In fact the x-axis is already ordered alphabetically, but I wouldn't know how to achieve that if it were not already the case.
I feel somewhat embarrassed about having to ask this question as I know there are many similar on SO, but sorting and ordering in R remains my personal bugbear and I cannot get this to work. Although I do try, in all except the simplest cases I got lost in a welter of calls to factor, levels, sort, order and with.
Q. How can I arrange the above highlight table so that both y-axis and x-axis are ordered alphabetically?
EDIT: The answers from smillig and joran below do resolve the question with the test data but with the real data the problem remains: I can't get an alphabetical sort. This leaves me scratching my head as the basic structure of the data frame looks the same. Clearly I have omitted something, but what??
> str(mymelt)
'data.frame': 340 obs. of 3 variables:
$ Industry: chr "Animal and vegetable products" "Food and beverages" "Chemicals" "Plastic and rubber goods" ...
$ variable: Factor w/ 17 levels "Other areas",..: 17 17 17 17 17 17 17 17 17 17 ...
$ value : num 0.000904 0.000515 0.007189 0.007721 0.000274 ...
However, applying the with statement doesn't result in levels with an alphabetical sort.
> with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))
[1] USA USA USA
[4] USA USA USA
[7] USA USA USA
[10] USA USA USA
[13] USA USA USA
[16] USA USA USA
[19] USA USA Canada
[22] Canada Canada Canada
[25] Canada Canada Canada
[28] Canada Canada Canada
All the way down to:
[334] Other areas Other areas Other areas
[337] Other areas Other areas Other areas
[340] Other areas
And if you do a levels() it seems to show the same thing:
[1] "Other areas" "Oceania" "Africa"
[4] "Other Non-Eurozone" "UK" "Other Eurozone"
[7] "Holland" "Germany" "Other Asia"
[10] "Middle East" "ASEAN-5" "Singapore"
[13] "HK/China" "Japan" "South Central America"
[16] "Canada" "USA"
That is, the non-reversed version of the above.
The following shot shows what the plot of the real data looks like. As you can see, the x-axis is sorted and the y-axis is not. I'm perplexed. I'm missing something but can't see what it is.
The y-axis on your chart is also already ordered alphabetically, but from the origin. I think you can achieve the order of the axes that you want by using xlim and ylim. For example:
ggplot(mymelt, aes(x = industry, y = variable, fill = value)) +
geom_tile() + geom_text(aes(fill = mymelt$value, label = mymelt$value)) +
ylim(rev(levels(mymelt$variable))) + xlim(levels(mymelt$industry))
will order the y-axis from all regions at the top, followed by americas, and then europe at the bottom (which is reverse alphabetical order, technically). The x-axis is alphabetically ordered from all industries to steel with cars in between.
As smillig says, the default is already to order the axes alphabetically, but the y axis will be ordered from the lower left corner up.
The basic rule with ggplot2 that applies to almost anything that you want in a specific order is:
If you want something to appear in a particular order, you must make the corresponding variable a factor, with the levels sorted in your desired order.
In this case, all you should need to do it this:
mymelt$variable <- with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))
which should work regardless of whether you're running R with stringsAsFactors = TRUE or FALSE.
This principle applies to ordering axis labels, ordering bars, ordering segments within bars, ordering facets, etc.
For continuous variables there is a convenient scale_*_reverse() but apparently not for discrete variables, which would be a nice addition, I think.
Another possibility is to use fct_reorder from forecast library.
library(forecast)
mydf %>%
pivot_longer(cols=c('all regions', 'americas', 'europe')) %>%
mutate(name1=fct_reorder(name, value, .desc=FALSE)) %>%
ggplot( aes(x = industry, y = name1, fill = value)) +
geom_tile() + geom_text(aes( label = value))
Maybe a little bit late,
with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))
this function doesn't order, because you are ordering "variable" that has no order (it's an unordered factor).
You should transform first the variable to a character, with the as.character function, like so:
with(mymelt,factor(variable,levels = rev(sort(unique(as.character(variable))))))
maybe this StackOverflow question can help:
Order data inside a geom_tile
specifically the first answer by Brandon Bertelsen:
"Note it's not an ordered factor, it's a factor in the right order"
It helped me to get the right order of the y-axis in a ggplot2 geom_tile plot.

R-Graphs: exclude non-relevant values from axis

have something alike. I have a dataset with 22000 values and want to show them in a proper way (with my data: a graph for every river with the fish species cought in this river on the y-axis and the number of fish caught per species on the x-axis.
dat<-file[file$RiverName=="Mississippi",]
boxplot(FishCought ~ FishName, cex.axis=0.7, horizontal=TRUE, las=2, col="green", xlab="Abundanz [Ind./ha]")
If I do so, the Graph shows all "Fishname"s on the y-Axis, only drawing a boxplot at those fish which were caught in this River.... how can I get rid of those Fish Names that aren't caught in this river (to make the graph better-looking)?!
Any suggestions?
I'm assuming that FishCought is actually FishCaught... The syntax would be
boxplot(FishCaught ~ FishName, data =
within(subset(file, RiverName=="Mississippi" & FishCaught > 0),
FishName <- factor(FishName)))
subset(file, RiverName=="Mississippi" & FishCaught > 0) selects only the samples you want.
within(...,FishName <- factor(FishName)) returns a data frame with FishName as a categorical variable where fish not caught in this river is not included as a category (or "factor level" in R parlance).

Resources