Good evening everyone,
I am not exactly new to R, I have done a course on coursera, but I haven't really done anything serious with R yet.
Now I have some metagenomic data, split into tibbles such as domains of metagenome 1 in a tibble, metagenome 2 in a tibble etc, similarly for phyla, class, order, genus, family etc. I need to make comparisons of the data. Compare the genera present in a metagenome with four or five other metagenomes. Can you point me towards libraries and functions with which I can compare data like this.
Example data,
The tibbles with genus, and family data are even longer with hundreds of columns.
Archaea
Bacteria
Eukaryota
Viruses
other.sequences
unclassified.sequences
649
423655
4901
64
7
317
Now I understand that I should clean the data to make the column names into a column(ex. taxon) using pivot.longer()
But what are some good ways to visualize data similar to this
I have a data set of plants and plant traits. It is a large data set with over 150 plants and over 300 different traits. However I do not have data for all 300 traits for all of the 150 plants. Some plants have data for 100 traits, other plants have data for only 2 or 3 traits.
I have figured out how to isolate which plants have the most trait data, but I can’t figure out how to isolate which traits these plants have in common
For example. I have 10 plants, numbered 1-10, and each of these 10 plants has data for 75 traits, with trait numbers varying from 1-3000. So each plant has 75 different traits, but with some overlap. I want to find which traits overlap. I want to analyze all of the traits that they share/have in common, so I need to isolate the shared traits.
Is there an easy way to do this in R? It seems like there should be a relatively easy way, but I can’t quite figure it out.
My data set looks something like this, just much larger.
In this example I would want to highlight Traits #1 and #4, because those are the two which have data for all three plants.
I hope this all makes sense. Thanks everyone in advance for your help!
I have a clinical dataset and I would like to plot it using image() function to see if I can spot out the different groups within my data.
The structure of this data is a List of 2: 56 samples and 5000 gene expressions.
When I use image(lung), all I see a just a plot of orange color and I do not see pattern or any group standing out to me.
Basically, there are four types of clinical conditions in the dataset: Colon cancer (13 samples), smallcell (6 samples), etc.
I wanted to see, for instance, ```smallcell" with 6 samples has its own pattern compared to the rest of the groups/conditions within this dataset.
load(url("https://github.com/hughng92/dataset/raw/master/lung.RData"))
rownames(lung)
image(lung)
This is all I see:
I am wondering if I can combine the four different plots of these 4 conditions from the data set, it will look different.
Any tip would be great!
I'd suggest looking at the image output after rearranging the like types together. I think I now see some group differences in those gene expression profiles. Specifically the "Normal" category has generally fewer red bands although there are a couple where "normal" is red and the others are not. I think it is interesting, and not particularly surprising, that the appears to be less variability within the Normal columns (in the image) than there is within each the tumor types. I have a friend who's a molecular biologist who characterizes tumors as "genetic train wrecks":
table( rownames( lung[order(rownames(lung)), ]))
Carcinoid Colon Normal SmallCell
20 13 17 6
------------------
image( lung[order(rownames(lung)), ])
This would give a better indication of the boundaries of the type grouping:
image( lung[order(rownames(lung)), ], xaxt="n")
axis(1, at=(cumsum( table( rownames( lung[order(rownames(lung)), ])))-1)/56 ,
labels=names(table( rownames( lung[order(rownames(lung)), ]))),las=2)
I am very new to this and I wanted to add that the various ways in which I tried to reshape/melt the data. My data in three different variations:
Version 1:
year,type,total,action,perc
2015,v,"1,199,310",crime,42.16
2015,p,"8,024,115",crime,18.24
2015,v,"505,681",arrest,42.16
2015,p,"1,463,213",arrest,18.24
2016,v,"1,250,162",crime,32.85
2016,p,"7,928,530",crime,17.07
2016,v,"410,717",arrest,32.85
2016,p,"1,353,283",arrest,17.07
2017,v,"1,247,321",crime,41.58
2017,p,"7,694,086",crime,16.24
2017,v,"518,617",arrest,41.58
2017,p,"1,249,757",arrest,16.24
Version 2:
year,type,crime,arrest,perc
2015,1,"1,199,310","505,681",42.16
2015,2,"8,024,115","1,463,213",18.24
2016,1,"1,250,162","410,717",32.85
2016,2,"7,928,530","1,353,283",17.07
2017,1,"1,247,321","518,617",41.58
2017,2,"7,694,086","1,249,757",16.24
Version 3:
df <- vpcrimetotal
year,vcrime,varrest,varrestperc,pcrime,parrest,parrestperc
2017,"1,247,321","518,617",0.4158,"7,694,086","1,249,757",0.1624
2016,"1,250,162","410,717",0.3285,"7,928,530","1,353,283",0.1707
2015,"1,199,310","505,681",0.4216,"8,024,115","1,463,213",0.1824
The idea is to show the total number of violent crime versus property crime from 1990-2017 with the number of arrests (labeled as a percent) inside each bar based on crime type (property or violent). The preference is to stack all four into one bar per year with different colors for each.
I found these that helped but was still confused in figuring out how to fit my data into them. how to create stacked bar charts for multiple variables with percentages, but to maybe look like this Count and Percent Together using Stack Bar in R
I have used these sets of data to the code but is probably confusing if I post all the different ones I tried that don't work.
I have a problem for building barplot.
I am working on air traffic in different countries. I would like to get barplots for each countries with the different airport names in the X axis. The Y axis will show the quantity of airlines using the airport.
My plan is to make the script for 1 country and to replicate it manually for the others.
in my data, I have in the different columns:
Country / aiport / destination.
So each rows is actually one airline that is using the airport.
Do you have an idea about how to do this?
For now I have this idea:
UK<-traffic[traffic$Country=="UK",]
UK$airport <- as.factor(UK$airport)
countUK<-table(UK$airport)
barplot(countUK)
This is not working, I have a bunch of airports that are not in UK in the X axis...
Thanks for your help
Answer found:
You could try to drop unused factor levels, i.e.
UK <- droplevels(UK) after the line UK$airport <- as.factor(UK$airport).