I have a problem connecting two points with the same y value. My dataset looks like this (I hope the formatting is ok):
attackerip,min,max
125.88.146.123,2016-03-29 17:38:17.949778,2016-03-30 07:28:47.912983
58.218.205.101,2016-04-05 15:53:20.69986,2016-05-12 17:32:08.583255
183.3.202.195,2016-04-05 15:58:27.862509,2016-04-15 18:15:13.117774
58.218.199.166,2016-04-05 16:09:34.448588,2016-04-24 06:02:12.237922
58.218.204.107,2016-04-05 16:57:17.624509,2016-05-31 00:52:44.007908
What I have so far is the following:
mydata = read.csv("timeline.csv", sep=',')
mydata$min <- strptime(as.character(mydata$min), format='%Y-%m-%d %H:%M:%S')
mydata$max <- strptime(as.character(mydata$max), format='%Y-%m-%d %H:%M:%S')
plot(mydata$min, mydata$attackerip, col="red")
points(mydata$max, mydata$attackerip, col="blue")
Which results in:
Now I want to connect the points with the same y-axis value. And can not get lines or abline to work. Thanks in Advance!
EDIT: dput of data
dput(mydata)
structure(list(attackerip = structure(c(1L, 5L, 2L, 3L, 4L), .Label = c("125.88.146.123",
"183.3.202.195", "58.218.199.166", "58.218.204.107", "58.218.205.101"
), class = "factor"), min = structure(1:5, .Label = c("2016-03-29 17:38:17.949778",
"2016-04-05 15:53:20.69986", "2016-04-05 15:58:27.862509", "2016-04-05 16:09:34.448588",
"2016-04-05 16:57:17.624509"), class = "factor"), max = structure(c(1L,
4L, 2L, 3L, 5L), .Label = c("2016-03-30 07:28:47.912983", "2016-04-15 18:15:13.117774",
"2016-04-24 06:02:12.237922", "2016-05-12 17:32:08.583255", "2016-05-31 00:52:44.007908"
), class = "factor")), .Names = c("attackerip", "min", "max"), class = "data.frame", row.names = c(NA,
-5L))
Final Edit:
The reason why plotting lines did not work was, that the datatype of min and max was timestamps. Casting those to numeric values yielded the expected result. Thanks for your help everyone
The lines function should work just fine. However, you will need to call it for every pair (or set) of points that share the same y value. Here is a reproducible example:
# get sets of observations with the same y value
dupeVals <- unique(y[duplicated(y) | duplicated(y, fromLast=T)])
# put the corresponding indices into a list
dupesList <- lapply(dupeVals, function(i) which(y == i))
# scatter plot
plot(x, y)
# plot the lines using sapply
sapply(dupesList, function(i) lines(x[i], y[i]))
This returns
data
set.seed(1234)
x <- sort(5* runif(30))
y <- sample(25, 30, replace=T)
As it appears that you have two separate groups for which you would like draw these lines, the following would be the algorithm:
for each group, (min and max, I believe)
calculate the duplicate values of the y variable
put the indicies of these duplicates into a dupesList (maybe dupesListMin and dupesListMax).
plot the points
run one sapply function on each dupesList.
Related
I tried to make a basic scatter plot with R using gene expression data.
#import data:
oldmice <- read.table("oldmice.txt", header = TRUE)
youngmice <- read.table("youngmice.txt", header = TRUE)
Imported data contains: format is the same for both imported data but MGE has different values.
gene MGE
Sox17 -6.74193774617653
Mrpl15 -0.212567471203473
Lypla1 -0.711251006455475
and so on..
Made basic volcano plot using: youngmice$MGE vs oldmice$MGE
plot(oldmice$MGE, youngmice$MGE, main="old vs young mice!!",
xlab="oldmice$MGE ", ylab="youngmice$MGE ", pch=19)
My question is how to color "genes" which is in multiple_gene_lists into oldmice$MGE, youngmice$MGE? (which should label the only multiple_gene_list which are in multiple_gene_lists into oldmice$MGE, youngmice$MGE)
Here is my multiple_gene_list
multiple_gene_list <- read.table("multiple_gene_list.txt", header = TRUE)
multiple_gene_list <- as.vector(multiple_gene_list )
multiple_gene_list contains:
gene
Six6
Arl2
Tmem74B
Rab9B
Rasgef1B
Ccne1
Apln
Spag7
C17Orf59
Krtap4-4
And my goal is to only label multiple_gene_list in oldmice$MGE, youngmice$MGE.
I also tried the following code but failed!
with(subset(ASC_oldmice_exprs, ASC_oldmice_exprs$gene %in% multiple_gene_list$gene), points(ASC_youngmice_exprs$MGE, pch=20, col="red"))
Thank you!
Let's get some data:
multiple_gene_list =structure(list(gene = structure(c(8L, 2L, 10L, 6L, 7L, 4L, 1L,
9L, 3L, 5L), .Label = c("Apln", "Arl2", "C17Orf59", "Ccne1",
"Krtap4-4", "Rab9B", "Rasgef1B", "Six6", "Spag7", "Tmem74B"),
class = "factor")), class = "data.frame", row.names = c(NA,
-10L))
set.seed(111)
oldmice = data.frame(
gene=c("Six6","Arl2","Tmem74B",letters[1:10]),
MGE=runif(13))
youngmice = data.frame(
gene=c("Six6","Arl2","Tmem74B",letters[1:10]),
MGE=runif(13))
There's 3 overlap, and we define colors like:
COLS = ifelse(oldmice$gene %in% multiple_gene_list$gene,
"turquoise","orange")
And plot:
plot(oldmice$MGE, youngmice$MGE, main="old vs young mice!!",
xlab="oldmice$MGE ", ylab="youngmice$MGE ", pch=19,col=COLS)
sel = oldmice$gene %in% multiple_gene_list$gene
text(x=oldmice$MGE[sel]+0.01,
y=youngmice$MGE[sel]+0.01,
oldmice$gene[sel])
I have a simple trellis scatterplot. Two panels - male/female. ID is a unique number for each participant. The var1 is a total test time. Mean.values is a vector of two numbers (the means for gender).
No point including a best fit line so what I want is to plot a trend line of the mean in each panel. The two panels have different means, say male = 1 minute, female = 2 minutes.
xyplot(var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values)
})
At the minute the graph is coming out so that both trendlines appear in each panel. I want only one trendline in each.
Does anyone have the way to do this?
I have tried a number of different ways including the long code for function Addline which just doesn't work for me. I just want to define which panel im looking at and i've looked at ?panel.number but not sure how that works as its coming up that I don't have a current row. (current.row(prefix)).
There must be a simple way of doing this?
[EDIT - Here's the actual data i'm using]
I've tried to simplify the DF
library(lattice)
dput(head(DF))
structure(list(ID = 1:6, Var1 = c(2333858, 4220644,
2941774, 2368496, 3165740, 3630300), mean = c(2412976, 2412976,
2412976, 2412976, 2412976, 2412976), Gender = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = c(NA, 6L), class = "data.frame")
dput(tail(DF))
structure(list(ID = 161:166, Var1= c(2825246, 3552170,
3688882, 2487760, 3849108, 3085342), mean = c(3689805, 3689805,
3689805, 3689805, 3689805, 3689805), Gender = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = 109:114, class = "data.frame")
plot i'm using:
xyplot((Var1/1000) ~ ID|Gender, data=DF,
group = Gender,scales=list(x=list(at=NULL)),
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values) })
causes 2 lines.
[EDIT - This is the code which includes the function Addline & is everywhere on all the posts and doesn't seem to work for me]
addLine<- function(a=NULL, b=NULL, v = NULL, h = NULL, ..., once=F) { tcL <- trellis.currentLayout() k<-0 for(i in 1:nrow(tcL)) for(j in 1:ncol(tcL)) if (tcL[i,j] > 0) { k<-k+1 trellis.focus("panel", j, i, highlight = FALSE) if (once) panel.abline(a=a[k], b=b[k], v=v[k], h=h[k], ...) else panel.abline(a=a,b=b, v=v, h=h, ...) trellis.unfocus() } }
then writing after the trellis plot (mean.values being a vector of two numbers, mean for female, mean for male)
addLine(v=(mean.values), once=TRUE)
Update - I managed to do it in ggplot2.
Make the ggplot using facet_wrap then -
hline.data <- data.frame(z = c(2413, 3690), Gender = c("Female","Male"))
This creates a DF of the two means and the Gender, 2x2 DF
myplot <- myplot + geom_hline(aes(yintercept = z), hline.data)
This adds the lines to the ggplot.
If you just wanted plot the mean of values you are drawing on the plot aready, you can skip the mean.values variable and just do
xyplot(Var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(x,y,...) {
panel.xyplot(x,y,...)
panel.abline(h=mean(y))
}
)
With the sample data
DF<-data.frame(
ID=1:10,
Gender=rep(c("M","F"), each=5),
Var1=c(5,6,7,6,5,8,9,10,8,9)
)
this produces
I believe lattice has a specific panel function for this, panel.average().
Try replacing panel.abline(h=mean.values) with panel.average(...).
If that doesn't solve the problem, we might need more information; try using dput() on your data (e.g., dput(DF), or some representative subset).
This may be a very simple question, but I don't see how to answer it.
I have the following reproducible code, where I have two small dataframes that I use to calculate a percentage value based on each column total:
#dataframe x
x <- structure(list(PROV = structure(c(1L, 1L), .Label = "AG", class = "factor"),
APT = structure(1:2, .Label = c("AAA", "BBB"), class = "factor"),
PAX.2013 = c(5L, 4L), PAX.2014 = c(4L, 2L), PAX.2015 = c(4L,0L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#dataframe y
y <- structure(list(PROV = structure(c(1L, 1L), .Label = "AQ", class = "factor"),
APT = structure(1:2, .Label = c("CCC", "AAA"), class = "factor"),
PAX.2013 = c(3L, 7L), PAX.2014 = c(2L, 1L), PAX.2015 = c(0L,3L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#list z (with x and y)
z <- list(x,y)
#percentage value of x and y based on columns total
round(prop.table(as.matrix(z[[1]][3:5]), margin = 2)*100,1)
round(prop.table(as.matrix(z[[2]][3:5]), margin = 2)*100,1)
as you can see, it works just fine.
Now I want to automate for all the list, but I can't figure out how to get the results. This is my simple code:
#for-loop that is not working
for (i in length(z))
{round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1)}
You have two problems.
First, you have not put a range into your for loop so you are just trying to iterate over a single number and second, you are not assigning your result anywhere on each iteration.
Use 1:length(z) to define a range. Then assign the results to a variable.
This would work:
my_list <- list()
for (i in 1:length(z)){
my_list[[i]] <- round(prop.table(as.matrix(z[[i]][3:5]),
margin = 2)*100,1)
}
my_list
But it would be more efficient and idiomatic to use lapply:
lapply(1:length(z),
function(x) round(prop.table(as.matrix(z[[x]][3:5]), margin = 2)*100,1))
Barring discussions whether for-loops is the best approach, you had two issues. One, your for loop only iterates over 2 (which is length(z)) instead of 1:2. Two, you need to do something with the round(....) statement. In this solution, I added a print statement.
for (i in 1:length(z)){
print(round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1))
}
I've checked quite extensively through the forum and on the web but I couldn't find anyone that already presented my case, so here you are the question:
my goal: how can I extend the example presented here in case I have more than one conditioning factor?
I've tried several ways to modify the which.panel variable of strip.default function, but I couldn't come out of my problem.
This is the code I'm using at the moment (with comments):
if (!require("plyr","lattice")) install.packages("plyr","lattice")
require("plyr")
require("lattice")
# dataframe structure (8 obs. of 6 variables)
data2 <- structure(list(
COD = structure(c(1L, 1L, 1L, 1L, 2L, 2L,2L, 2L),
.Label = c("A", "B"), class = "factor"),
SPEC = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L),
.Label = c("15/25-(15/06)", "15/26-(22/06)"), class = "factor"),
DATE = structure(c(16589, 16590, 16589, 16590, 16589, 16590, 16589, 16590), class = "Date"),
PM.BDG = c(1111.25, 1111.25, 1141.29, 1141.29, 671.26, 671.26, 707.99, 707.99),
PM = c(1033.14, 1038.4, 1181.48, 1181.48, 616.39, 616.39, 641.55, 641.55),
DELTA.PM = c(-78.12, -72.85, 40.19, 40.19, -54.87, -54.87, -66.44, -66.44)),
.Names = c("COD", "SPEC", "DATE", "PM.BDG", "PM", "DELTA.PM"),
row.names = c(NA, 8L), class = "data.frame")
# create a dataframe with a vector of colors
# based on the value of DELTA.PM for the last
# date available for each combination of COD and SPEC.
# Each color will be used for a specific panel, and it will
# forestgreen if DELTA.PM is higher than zero, red otherwise.
listaPM <- ddply(data2, .(COD,SPEC), summarize, ifelse(DELTA.PM[DATE=="2015-06-04"]<0, "red", "forestgreen"))
names(listaPM) <- c("COD","SPEC","COLOR")
# set a personalized strip, with bg color based on listaPM$COLOR
# and text based on listaPM$COD and listaPM$SPEC
myStripStylePM <- function(which.panel, factor.levels, ...) {
panel.rect(0, 0, 1, 1,
col = listaPM[which.panel,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[which.panel,1],listaPM[which.panel,2], sep=" - "),
col = "white")}
# prepare a xyplot function to plot that will be used later with dlply.
# Here I want to plot the values of PM.BDG and PM over time (DATE),
# conditioning them on the SPEC (week) and COD (code) factors.
graficoPM <- function(df) {
xyplot (PM.BDG + PM ~ DATE | SPEC + COD,
data=df,
type=c("l","g"),
col=c("black", "red"),
abline=c(h=0,v=0),
strip = myStripStylePM
)}
# create a trellis object that has a list of plots,
# based on different COD (codes)
grafico.PM <- dlply(data2, .(data2$COD), graficoPM)
# graphic output, 1st row should be COD "A",
# 2nd row should be COD "B", each panel is a different SPEC (week)
par(mfrow=c(2,1))
print(grafico.PM[[1]], position=c(0,0.5,1,1), more=TRUE)
print(grafico.PM[[2]], position=c(0,0,1,0.5))
As you can see, the first row of plots is correct: text of the first strip is "A" (1st COD), the weeks (SPEC) are shown and the color represents if PM is above or below PM.BDG on the last date of the plot
On the contrary, the 2nd row of plots just repeats the same scheme of the first row (as it can be seen by the fact that COD is Always "A" and 2nd strip's bg color in the 2nd row is green, when the line of PM in red is clearly well below the PM.BDG line in black).
Although I'd like to keep my code, I'm pretty sure my goal could be achieved with a different strategy. If you can find a better way to use my dataframe, I'll be happy to study the code and see if it works with my data.
The problem is match up the current panel data to the listaPM data. Because you are doing different sub-setting in each of the calls, it's difficult to use which.panel() to match up the data sets.
There is an undocumented feature which allows you to get the conditioning variable names to make the matching more robust. Here's how you would use it in your case.
myStripStylePM <- function(which.panel, factor.levels, ...) {
cp <- dimnames(trellis.last.object())
ci <- arrayInd(packet.number(), .dim=sapply(cp, length))
cv <- mapply(function(a,b) a[b], cp, as.vector(ci))
idx<-which(apply(mapply(function(n, v) listaPM[, n] == v, names(cv), cv),1,all))
stopifnot(length(idx)==1)
panel.rect(0, 0, 1, 1,
col = listaPM[idx,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[idx,1],listaPM[idx,2], sep=" - "),
col = "white")
}
When run with the rest of your code, it produces this plot
A pairwise scatterplot showing relationship between genes (columns in data frame) across multiple samples (rows in data frame) is created. The samples belong to two distinct groups: group "A" and "B". Since one dot in plot represent one sample, I need to color the data points (dots) according to groups with two different colors, say group A with "green" and group B with "red". Is it possible to do that?
Any kind of help will be appreciated.
plot(DF[1:6], pch = 21) #command used for plotting, DF is data frame
Sample Data Frame Example:
CBX3 PSPH ATP2C1 SNX10 MMD ATP13A3
B 10.589844 6.842970 8.084550 8.475023 9.202490 10.403811
A 10.174385 5.517944 7.736994 9.094834 9.253766 10.133408
B 10.202084 5.669137 7.392141 7.522270 7.830969 9.123178
B 10.893231 6.630709 7.601690 7.894177 8.979142 9.791841
B 10.071038 5.091222 7.032585 8.305581 7.903737 8.994821
A 10.005002 4.708631 7.927246 7.292527 8.257853 10.054630
B 10.028055 5.080944 6.421961 7.616856 8.287496 9.642294
A 10.144115 6.626483 7.686203 7.970934 7.919615 9.475175
A 10.675386 6.874047 7.900560 7.605519 8.585158 8.858613
A 9.855063 5.164399 6.847923 8.072608 8.221344 9.077744
A 10.994228 6.545318 8.606128 8.426329 8.787876 9.857079
A 10.501266 6.677360 7.787168 8.444976 8.928174 9.542558
GGally has a good function for this as well.
library(GGally)
ggpairs(dd, color = 'CLASS',columns = 2:ncol(dd) )
It might not be that easy to do with base graphics. You could easily do this with lattice. With this sample data.frame
dd<-structure(list(CLASS = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
CBX3 = c(10.589844, 10.174385, 10.202084, 10.893231, 10.071038,
10.005002, 10.028055, 10.144115, 10.675386, 9.855063, 10.994228,
10.501266), PSPH = c(6.84297, 5.517944, 5.669137, 6.630709,
5.091222, 4.708631, 5.080944, 6.626483, 6.874047, 5.164399,
6.545318, 6.67736), ATP2C1 = c(8.08455, 7.736994, 7.392141,
7.60169, 7.032585, 7.927246, 6.421961, 7.686203, 7.90056,
6.847923, 8.606128, 7.787168), SNX10 = c(8.475023, 9.094834,
7.52227, 7.894177, 8.305581, 7.292527, 7.616856, 7.970934,
7.605519, 8.072608, 8.426329, 8.444976), MMD = c(9.20249,
9.253766, 7.830969, 8.979142, 7.903737, 8.257853, 8.287496,
7.919615, 8.585158, 8.221344, 8.787876, 8.928174), ATP13A3 = c(10.403811,
10.133408, 9.123178, 9.791841, 8.994821, 10.05463, 9.642294,
9.475175, 8.858613, 9.077744, 9.857079, 9.542558)), .Names = c("CLASS",
"CBX3", "PSPH", "ATP2C1", "SNX10", "MMD", "ATP13A3"), class = "data.frame", row.names = c(NA, -12L))
you can do
library(lattice)
splom(~dd[,-1], groups=dd$CLASS)
to get
You can add color to the points by specifying the argument col
to plot
DF <- read.delim(textConnection(
"category CBX3 PSPH ATP2C1 SNX10 MMD ATP13A3
B 10.589844 6.842970 8.084550 8.475023 9.202490 10.403811
A 10.174385 5.517944 7.736994 9.094834 9.253766 10.133408
B 10.202084 5.669137 7.392141 7.522270 7.830969 9.123178
B 10.893231 6.630709 7.601690 7.894177 8.979142 9.791841
B 10.071038 5.091222 7.032585 8.305581 7.903737 8.994821
A 10.005002 4.708631 7.927246 7.292527 8.257853 10.054630
B 10.028055 5.080944 6.421961 7.616856 8.287496 9.642294
A 10.144115 6.626483 7.686203 7.970934 7.919615 9.475175
A 10.675386 6.874047 7.900560 7.605519 8.585158 8.858613
A 9.855063 5.164399 6.847923 8.072608 8.221344 9.077744
A 10.994228 6.545318 8.606128 8.426329 8.787876 9.857079
A 10.501266 6.677360 7.787168 8.444976 8.928174 9.542558"))
plot(DF[2:7],col = ifelse(DF$category == 'A','red','green'))
A list of valid color values can be obtained by calling colors(). Vectors with a gradient of colors can be created via rainbow(), and just for fun, I use this little function for choosing pretty colors when making a figure.
(Edited per suggestions from #MrFlick)
#! #param n The number of colors to be selected
colorchoose <- function (n = 1, alpha, term = F)
{
cols <- colors()
mod <- ceiling(sqrt(length(cols)))
plot(xlab = "", ylab = "", main = "click for color name",
c(0, mod), c(0, mod), type = "n", axes = F)
s<-seq_along(cols)
dev.hold()
points(s%%mod, s%/%mod, col = cols, pch = 15, cex = 2.4)
dev.flush()
p <- locator(n)
return(cols[round(p$y) * mod + round(p$x)])
}