Scatterplot for comparing species abundance - r

I have a homework question that states the following:
The file “channel_islands_counts_edit.csv” contains survey data on temperate rocky reef fishes from the Channel Islands, collected at many sites over many years. The data has columns for Year, Date, Site, count, and SpeciesName (broken into adults and juveniles). The version of the data that I’ve given you looks at 16 sites over 27 years, with count data for 27 categories of fish. Imagine we’re interested in whether the abundance of different species are correlated across sites (to get a sense for whether species have similar habitat preferences and/or interact with each other), and whether the across-site correlations are consistent over time. To visualize this, make some code that does the following:
For each year, draw a scatterplot that compares the abundance of Hypsypops rubicundus (adults) and the abundance of Paralabrax clathratus (adults) across sites. Feel free to transform the data for plotting purposes, if you think that helps you see any patterns.
I imported my data set, and ran the following code which is giving me 27 plots, with Site as x and Count as y, but there is no data shown in the plots.
head(channel_islands)
sapply(channel_islands, class)
levels(channel_islands$SpeciesName)
par(mfrow= c(6,5)) # set the plotting area into a 6 row*5 column array
for (i in 1:27) {
HR11<-subset(channel_islands,SpeciesName=="Hypsypops rubicundus,adult"[i] & Site==11)
PC15<-subset(channel_islands,SpeciesName=="Paralabrax clathratus,adult"[i] & Site==15)
with(HR11,plot(count~Site,type='b',pch=19,ylim=c(0,10),xlim=c(0,16),col='green',main=i))
with(PC15,plot(count~Site,type='b',pch=19,ylim=c(0,10),xlim=c(0,16),col='blue',main=i))
}
If anyone could help me figure out how to compare species abundance across sites, over 27 years, I would really appreciate it.

The code "Hypsypops rubicundus,adult"[i] doesn't really make sense. Technically, it should work for when i == 1 but beyond that it would just return NA. I'm assuming SpeciesName == NA will never be true so you will get an empty subset.
Consider looking into using ggplot2 with facet_grid to quickly make multiple plots without the loop. The R Graphics Cookbook has good examples on using facets.

Related

Use R to compare metagenomic data

Good evening everyone,
I am not exactly new to R, I have done a course on coursera, but I haven't really done anything serious with R yet.
Now I have some metagenomic data, split into tibbles such as domains of metagenome 1 in a tibble, metagenome 2 in a tibble etc, similarly for phyla, class, order, genus, family etc. I need to make comparisons of the data. Compare the genera present in a metagenome with four or five other metagenomes. Can you point me towards libraries and functions with which I can compare data like this.
Example data,
The tibbles with genus, and family data are even longer with hundreds of columns.
Archaea
Bacteria
Eukaryota
Viruses
other.sequences
unclassified.sequences
649
423655
4901
64
7
317
Now I understand that I should clean the data to make the column names into a column(ex. taxon) using pivot.longer()
But what are some good ways to visualize data similar to this

Is there an R function for finding shared traits among variables?

I have a data set of plants and plant traits. It is a large data set with over 150 plants and over 300 different traits. However I do not have data for all 300 traits for all of the 150 plants. Some plants have data for 100 traits, other plants have data for only 2 or 3 traits.
I have figured out how to isolate which plants have the most trait data, but I can’t figure out how to isolate which traits these plants have in common
For example. I have 10 plants, numbered 1-10, and each of these 10 plants has data for 75 traits, with trait numbers varying from 1-3000. So each plant has 75 different traits, but with some overlap. I want to find which traits overlap. I want to analyze all of the traits that they share/have in common, so I need to isolate the shared traits.
Is there an easy way to do this in R? It seems like there should be a relatively easy way, but I can’t quite figure it out.
My data set looks something like this, just much larger.
In this example I would want to highlight Traits #1 and #4, because those are the two which have data for all three plants.
I hope this all makes sense. Thanks everyone in advance for your help!

How to use image() function to plot the data in R

I have a clinical dataset and I would like to plot it using image() function to see if I can spot out the different groups within my data.
The structure of this data is a List of 2: 56 samples and 5000 gene expressions.
When I use image(lung), all I see a just a plot of orange color and I do not see pattern or any group standing out to me.
Basically, there are four types of clinical conditions in the dataset: Colon cancer (13 samples), smallcell (6 samples), etc.
I wanted to see, for instance, ```smallcell" with 6 samples has its own pattern compared to the rest of the groups/conditions within this dataset.
load(url("https://github.com/hughng92/dataset/raw/master/lung.RData"))
rownames(lung)
image(lung)
This is all I see:
I am wondering if I can combine the four different plots of these 4 conditions from the data set, it will look different.
Any tip would be great!
I'd suggest looking at the image output after rearranging the like types together. I think I now see some group differences in those gene expression profiles. Specifically the "Normal" category has generally fewer red bands although there are a couple where "normal" is red and the others are not. I think it is interesting, and not particularly surprising, that the appears to be less variability within the Normal columns (in the image) than there is within each the tumor types. I have a friend who's a molecular biologist who characterizes tumors as "genetic train wrecks":
table( rownames( lung[order(rownames(lung)), ]))
Carcinoid Colon Normal SmallCell
20 13 17 6
------------------
image( lung[order(rownames(lung)), ])
This would give a better indication of the boundaries of the type grouping:
image( lung[order(rownames(lung)), ], xaxt="n")
axis(1, at=(cumsum( table( rownames( lung[order(rownames(lung)), ])))-1)/56 ,
labels=names(table( rownames( lung[order(rownames(lung)), ]))),las=2)

Stacked bart chart 4 variables with ggplot

I am very new to this and I wanted to add that the various ways in which I tried to reshape/melt the data. My data in three different variations:
Version 1:
year,type,total,action,perc
2015,v,"1,199,310",crime,42.16
2015,p,"8,024,115",crime,18.24
2015,v,"505,681",arrest,42.16
2015,p,"1,463,213",arrest,18.24
2016,v,"1,250,162",crime,32.85
2016,p,"7,928,530",crime,17.07
2016,v,"410,717",arrest,32.85
2016,p,"1,353,283",arrest,17.07
2017,v,"1,247,321",crime,41.58
2017,p,"7,694,086",crime,16.24
2017,v,"518,617",arrest,41.58
2017,p,"1,249,757",arrest,16.24
Version 2:
year,type,crime,arrest,perc
2015,1,"1,199,310","505,681",42.16
2015,2,"8,024,115","1,463,213",18.24
2016,1,"1,250,162","410,717",32.85
2016,2,"7,928,530","1,353,283",17.07
2017,1,"1,247,321","518,617",41.58
2017,2,"7,694,086","1,249,757",16.24
Version 3:
df <- vpcrimetotal
year,vcrime,varrest,varrestperc,pcrime,parrest,parrestperc
2017,"1,247,321","518,617",0.4158,"7,694,086","1,249,757",0.1624
2016,"1,250,162","410,717",0.3285,"7,928,530","1,353,283",0.1707
2015,"1,199,310","505,681",0.4216,"8,024,115","1,463,213",0.1824
The idea is to show the total number of violent crime versus property crime from 1990-2017 with the number of arrests (labeled as a percent) inside each bar based on crime type (property or violent). The preference is to stack all four into one bar per year with different colors for each.
I found these that helped but was still confused in figuring out how to fit my data into them. how to create stacked bar charts for multiple variables with percentages, but to maybe look like this Count and Percent Together using Stack Bar in R
I have used these sets of data to the code but is probably confusing if I post all the different ones I tried that don't work.

barplot: selecting data in R

I have a problem for building barplot.
I am working on air traffic in different countries. I would like to get barplots for each countries with the different airport names in the X axis. The Y axis will show the quantity of airlines using the airport.
My plan is to make the script for 1 country and to replicate it manually for the others.
in my data, I have in the different columns:
Country / aiport / destination.
So each rows is actually one airline that is using the airport.
Do you have an idea about how to do this?
For now I have this idea:
UK<-traffic[traffic$Country=="UK",]
UK$airport <- as.factor(UK$airport)
countUK<-table(UK$airport)
barplot(countUK)
This is not working, I have a bunch of airports that are not in UK in the X axis...
Thanks for your help
Answer found:
You could try to drop unused factor levels, i.e.
UK <- droplevels(UK) after the line UK$airport <- as.factor(UK$airport).

Resources