How can I create a volcano plot in r using muma package - r

I've been trying to create a volcano plot using the muma package in r but I've had some difficulties in importing my data in CSV format, every time I try to run the code (below) I get this error message.
If you know an easier way to create a volcano plot, using a different package, it would really help me.
thanks
explore.data(file="datosFin_A1AT.csv",scaling="Auto", scal = TRUE, normalize = TRUE,
imputation = TRUE, imput="mean")
Error in [.data.frame(comp, , 3:ncol(comp)) :
undefined columns selected
structure(list(Muestra = c("MI-001", "MI-003", "MI-009", "MI-012"), Class = c("Presencia",
"Ausencia", "Presencia", "Ausencia"), Per_Cintura = c(97.6, 92.8, 98.8, 113.4), HDL = c(38, 51, 51, 44), TG = c("195", "76", "160", "128"), ApoB = c(145, 161, 173, 50.9), Glucosa_mg=c(86, 85, 96, 79, 7), LBP = c(443.187, 438.925, 703.752,
540.541), IFABP = c(0.705485, 0.906843, 144.873, 145.884), CLD3 = c(0.2, 501.596, 315.582, 446.307), Acetico = c(NA, 745.654, NA, 105.378), Propionico = c(NA, 682.719, 86.628, 303.139), Butirico = c(NA, 571.421, 265.559, 135.674), Isobutirico = c(286.085, 0.0381631, 0.276992, 0.0467809), prevotella = c(0.12843, 0.07927, 0.22459, 0.01726), Pathogen=c(0.05639, 0.16051, 0.01617, 0.04398), Lachnospiraceae = c(0.24202, 0.73606, 0.67789, 0.62656), Aker_Bacter = c(0.06167, 0.00999, 0.03426, 0.0211), Ruminoco = c(0.33593, 3e-05, 0.01538, 0.01298), TNFa = c(14.16, 35.35, 43.71, 42.99), PCR = c(1.71, 1.84, 3.52, 2.32), IL33 = c(148.7, 207.6, 146.2, 162.6), IL8 = c(157.9, 115.3, NA, NA), IL1b = c(13.68, 12.36, 13.69, 19.06), IL18 = c(231.6, 293.5, 366.2, 298.5)))

Related

Create mean value plot without missing values count to total

Using a dataframe with missing values:
structure(list(id = c("id1", "test", "rew", "ewt"), total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8), total_frq_4 = c(36, NA, 104, NA)), row.names = c(NA, 4L), class = "data.frame")
How is is possible to create a bar plot with the mean for every column, excluding the id column, but without filling the missing values with 0 but leaving out the row with missing values example for total_frq_3 24+25+8 = 57/3 = 19
You can use colMeans function and pass it the appropriate argument to ignore NA.
library(ggplot2)
xy <- structure(list(id = c("id1", "test", "rew", "ewt"),
total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8),
total_frq_4 = c(36, NA, 104, NA)),
row.names = c(NA, 4L),
class = "data.frame")
xy.means <- colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE)
xy.means <- as.data.frame(xy.means)
xy.means$total <- rownames(xy.means)
ggplot(xy.means, aes(x = total, y = xy.means)) +
theme_bw() +
geom_col()
Or just use base image graphic
barplot(height = colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE))

Manually draw boxplot using ggplot

I think my question is very similar to this one, the only difference being that I'd love to use ggplot (and the answer with ggplot was missing a tiny bit of detail). I have data like this:
show<-structure(list(Median = c(20, 39, 21, 52, 45.5, 24, 36, 20, 134,
27, 44, 43), IQR = c(4, 74, 28, 51.5, 73.5, 18, 47.5, 26.5, 189.5,
46, 54, 61), FirstQuartile = c(`25%` = 19, `25%` = 24, `25%` = 12,
`25%` = 30.5, `25%` = 36.5, `25%` = 18, `25%` = 16.5, `25%` = 13,
`25%` = 53.5, `25%` = 15, `25%` = 24.5, `25%` = 27), ThirdQuartile = c(`75%` = 23,
`75%` = 98, `75%` = 40, `75%` = 82, `75%` = 110, `75%` = 36,
`75%` = 64, `75%` = 39.5, `75%` = 243, `75%` = 61, `75%` = 78.5,
`75%` = 88), Group = c("Program Director", "Editor", "Everyone",
"Board Director", "Board Director", "Program Director", "Editor",
"Everyone", "Board Director", "Everyone", "Editor", "Program Director"
), Decade = c("1980's", "1980's", "1980's", "1980's", "1990's",
"1990's", "1990's", "1990's", "2000's", "2000's", "2000's", "2000's"
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))
And I would like to draw a graph like this:
With "group" as the color, instead of "fellowship". The problem is, that graph was drawn from "complete" data (with 800ish rows), and I clearly only have summary data above. I realize it won't be able to draw outliers but that is ok. Any help would be appreciated! I'm specifically struggling with how I would draw the ymin/max and the edges of the notch. Thank you
You can use geom_boxplot() with stat = "identity" and fill in the five boxplot numbers as aesthetics.
library(ggplot2)
# show <- structure(...) # omitted for previty
ggplot(show, aes(Decade, fill = Group)) +
geom_boxplot(
stat = "identity",
aes(lower = FirstQuartile,
upper = ThirdQuartile,
middle = Median,
ymin = FirstQuartile - 1.5 * IQR, # optional
ymax = ThirdQuartile + 1.5 * IQR) # optional
)
As pointed out by jpsmith in the comments below, the 1.5 * IQR rule becomes hairy if you don't have the range of the data. However, if you have information about the data extrema or the data domain, you can limit the whiskers as follows:
# Dummy values assuming data is >= 0 up to infinity
show$min <- 0
show$max <- Inf
ggplot(show, aes(Decade, fill = Group)) +
geom_boxplot(
stat = "identity",
aes(lower = FirstQuartile,
upper = ThirdQuartile,
middle = Median,
ymin = pmax(FirstQuartile - 1.5 * IQR, min),
ymax = pmin(ThirdQuartile + 1.5 * IQR, max))
)

Build curves of populations in function of time

In my work i'm studying a lot of varieties of maize.
I would like to determinate the area under the curve during flowering (male and female) of these varieties.
I used the package DescTools and the function AUC (area under the curve). I converted my dates as a numeric vector. So my scipt is:
a<-XAUC$Date.flowering.male
b<-XAUC$Date.flowering.female
c<- XAUC$....
Here is my issue, because i would like to identify c as the population as function of time. How can i do this?
In this picture: the first graph is what i have and the second is what i would like to have.
and then the end of my script will be:
AUCfemale<-AUC(b,c,method = c("trapezoid"))
AUCmale<-AUC(a,c,method = c("trapezoid"))
Airdiff<-AUCmale-AUCfemale
Data
XAUC <- structure(list(Varietes = c("Abelastone", "Abelastone", "Abelastone", "Abelastone", "Abelastone"), ligne.rep = c(1, 1, 1, 1, 1), Pied = c(1, 2, 3, 6, 7), `Date.floraison.mâle` = c(7.29, 8.02, 8.01, 8.03, 8.04), Date.floraison.femelle = c(8.1, 8.17, 8.11, 8.25, 8.17 ), ASIi = c(12, 15, 10, 22, 13), Hauteur.des.pieds = c(230, 228, 226, 240, 233), Hauteur.des.soies = c(123, 118, 116, 124, 122), Date.floraison.mâle.graph = c(29, 33, 32, 34, 35), Date.floraison.femelle.graph = c(41, 48, 42, 56, 48)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c("6" = 6L, "10" = 10L, "20" = 20L, "21" = 21L, "24" = 24L), class = "omit"))

Subsetting and plotting data by TimeStamp

I have a data.frame P1 (5000rows x 4cols) and would like to save the subset of data in columns 2,3 and 4 when the time-stamp in column 1 falls into a set range determined by a vector TimeStamp (in seconds).
E.g. put all values in columns 2, 3, and 4 into a new data.frame and call each section of data: Condition.1.P1, Condition.2.P1, etc.
The reason I'd like to label separately as I have 35 versions of P1 (P2, P3, P33, etc) and need to be able to melt them together to plot them.
dput(TimeStamp)
c(18, 138, 438, 678, 798, 1278, 1578, 1878, 2178)
dput(head(P1))
structure(list(Time = c(0, 5, 100, 200, 500, 1200), SkinTemp = c(27.781,
27.78, 27.779, 27.779, 27.778, 27.777), HeartRate = c(70, 70,
70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c("Time",
"SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA,
6L), class = "data.frame")
Do you want to seperate the data by the timestamp range and put it in a list? Than this might be what you are looking for:
TimeStamp <- c(18, 138, 438, 678, 798, 1278, 1578, 1878, 2178)
dat <- structure(list(Time = c(0, 5, 100, 200, 500, 1200), SkinTemp =(27.781,
27.78, 27.779, 27.779, 27.778, 27.777), HeartRate = c(70, 70,
70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c ("Time",
"SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA,
6L), class = "data.frame")
dat$Segment <- cut(dat$Time,c(-Inf,TimeStamp))
split(dat,dat$Segment)
P2 = data.frame(NA, NA, NA, NA) # Create empty data.frame
for (i in 1:length(ts)){
P3 = data.frame() # Create empty changing data.frame
if (i == 1) {ts1 = 0} else {ts1 = ts[i-1]} #First time stamp starts at 0
ts2 = ts[i]
P3 = subset(P1, P1$Time > ts1 & P1$Time < ts2)[,c(2,3,4)] #Subset the columns and assign to P3
if (nrow(P3) == 0){P3 = data.frame(NA, NA, NA)} #If the subset is empty, assign NA
P3$TimeStamp = paste(ts1,ts2,sep="-") # Append TimeStamp to the P3
colnames(P3) = colnames(P2) #Make sure column names are same to allow rbind
P2 = rbind(P2,P3) #Append P3 to P2
}
P2 = P2[c(2:nrow(P2)),] #Remove the first row (that has NA)
colnames(P2) = c("SkinTemp", "HeartRate", "RespirationRate", "TimeStamp") #Provide column names)
rm(P3); rm(i); rm(ts1); rm(ts2) #Cleanup

Ggplot: Same data not plotting aline plot, but making a bar plot (Each group consist of only one observation.)

I'm trying to create a line plot from the following data:
> dput(agdata)
structure(list(date = c("2014-11-30", "2014-12-01", "2014-12-02",
"2014-12-03", "2014-12-04", "2014-12-05", "2014-12-06", "2014-12-07",
"2014-12-14", "2014-12-15", "2014-12-16", "2014-12-17", "2014-12-18"
), A = c(86.3333333333333, 91.1666666666667, 83.4, 83, 86, 94.75,
78, 87, 87, 92, 98.6, 87, 85.3333333333333), B = c(1015.16666666667,
1014.33333333333, 1017.2, 1021, 1017.5, 1021.5, 1029, 1022, 1009,
1012.4, 1014.8, 1011, 1011), C = c(8.55666666666667, 7.145, 7.51,
4.61, 4.335, 3.2625, 6.585, 8.35, 9.09, 6.48, 2.532, 11.74, 11.7933333333333
), D = c(24, 74.6666666666667, 77, 57.5, 82.5, 56.25, 0, 88,
32, 61, 50, 92, 80.6666666666667)), .Names = c("date", "A", "B",
"C", "D"), row.names = c(NA, -13L), class = "data.frame")
I tried this:
ggplot(data = agdata,aes(x = date, y = A)) + geom_line(stat="identity")
and various other parameters, including removing the stat parameter, moving aes to geom_line, and a few others.
I keep getting:
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
To check if the data is fine, I tried:
ggplot(data = agdata,aes(x = date, y = A)) + geom_bar(stat="identity")
which works just fine.
Any pointers as to what I'm missing here? I have a feeling it has something to do with a group= parameter in aes() by looking at this, and this, but not sure what.
Ugh, nevermind. Figured it the minute I posted this. it gas to be group=1 in aes().
Feel rockheaded now.

Resources