Related
I want to replace the points in my graph with a line like in the first picture, the second picture is what I have.
but its not quite what im looking for, I want a smooth line without the points
I think I have to use predict for the 1/x curve but I am not sure how,
Assuming f(1/x) fits the data well. One can use the lm() function to fix the desired function y= a/x + b and then use the predict() function to estimate the desired points.
If a more complicated nonlinear function is required to fit the data then the nls() maybe required
x<- c(176.01685819061, 21.6704613594849, 19.007554742708, 50.1865574864131, 17.6174002411188, 40.2758022496774, 11.0963214407251, 1249.94375253114, 694.894678288085, 339.786950220117, 42.1452961176151, 220.352895161601, 19.6303352674776, 9.10350287678884, 10.6222946396451, 44.1984352318898, 21.8069112975004, 42.1237630342764, 22.7551891190248, 12.9587850506626, 12.0207189111152, 20.2704921282476, 13.3441156357956, 9.13092569988769, 1781.08346869568, 71.2690023512206, 80.2376892286713, 344.114362037227, 208.830841645638, 91.1778810401913, 2220.0120768657, 41.4820962277111, 16.5730025748281, 32.30173229022, 108.703930214512, 51.6770035143256, 709.071405759588, 87.9618878732223, 10.4198968123037, 34.4951840238729, 57.8603720445067, 72.3289197551429, 30.2366643066749, 23.8696161364716, 270.014690419247, 13.8170113452005, 39.5159584479013, 27.764841260433, 18.0311836472615, 40.5709477295999, 33.1888820958952, 9.03112843931787, 4.63738971549635, 12.7591169313099, 4.7998894219979, 8.93458248803248, 7.33904760386628, 12.0940344070925, 7.17364602165948, 6.514191844409, 9.69911157978057, 6.57874454980745, 7.90556524435596)
y<- c(0.02840637, 0.230728821, 0.2630533, 0.099628272, 0.28381032, 0.12414402, 0.45059978, 0.00400018, 0.00719533500000001, 0.014715103086687, 0.118637201789886, 0.022690875, 0.254707825, 0.54923913, 0.470708088, 0.113126176837872, 0.22928510745, 0.118697847481752, 0.219730100850697, 0.38583864, 0.4159485, 0.24666396693114, 0.374696992776912, 0.547589605297248, 0.00280728, 0.070156727820596, 0.062314855376136, 0.01453005323695, 0.02394282358199, 0.0548378613646, 0.00225224, 0.120533928, 0.301695482, 0.15479046, 0.045996497, 0.096754836, 0.00705147600000001, 0.0568428, 0.47985120103071, 0.14494777, 0.08641493, 0.069128642, 0.165362156, 0.20947132, 0.018517511, 0.36187275779699, 0.126531158458224, 0.180083867690804, 0.277297380904852, 0.1232408972382, 0.15065285976048, 0.55364067, 1.07819275643191, 0.39187665, 1.04169066418176, 0.55962324, 0.68128731, 0.41342697, 0.69699564, 0.76755492, 0.515511133042674, 0.760023430328564, 0.632465844687028)
#data frame for prediction
df <- data.frame(x=sort(x))
# fit model y= a/x + b
model <-lm( y ~ I(1/x))
#summary(model)
#plot model
plot(df$x, predict(model, df), type="l", col="blue")
#optional
points(x, y)
Update - response to comments
x is sorted in the data frame, so that points are plotted in order. If not the line could go from x=1 to x=100, back to x=10 etc. thus making a mess. Try removing the sort and see what happens.
The I(1/x) term is to signal lm to perform the inverse transform on x first and then perform the least squares regression.
The predict() function is on the axis since that is the variable used in the plot function. To change this just assign the output from the predict function to a better variable name and plot that. Or use the "ylab= " option.
For smoothing, you can fit a linear model as foolws:
m <- lm(AM_cost_resorb~I(1/AM_leafP), data=data)
Then extract the predictied values on a new data set that covers the range of the exposure variable.
newx <- seq(min(data$AM_leafP), max(data$AM_leafP), by=0.01)
pr <- predict(m, newdata=data.frame(AM_leafP=newx))
And visualize:
plot(AM_cost_resorb~AM_leafP, data=data, type="p", pch= 15, col="red",ylab="Cost of reabsorbtion (kg C m^-2 yr^-1)", xlab="leaf P before senescence (g P/m2)", ylim=c(0,500), las=1)
lines(newx, y=pr, col="blue", lwd=2)
Data:
data <- structure(list(AM_cost_resorb = c(176.01685819061, 21.6704613594849,
19.007554742708, 50.1865574864131, 17.6174002411188, 40.2758022496774,
11.0963214407251, 1249.94375253114, 694.894678288085, 339.786950220117,
42.1452961176151, 220.352895161601, 19.6303352674776, 9.10350287678884,
10.6222946396451, 44.1984352318898, 21.8069112975004, 42.1237630342764,
22.7551891190248, 12.9587850506626, 12.0207189111152, 20.2704921282476,
13.3441156357956, 9.13092569988769, 1781.08346869568, 71.2690023512206,
80.2376892286713, 344.114362037227, 208.830841645638, 91.1778810401913,
2220.0120768657, 41.4820962277111, 16.5730025748281, 32.30173229022,
108.703930214512, 51.6770035143256, 709.071405759588, 87.9618878732223,
10.4198968123037, 34.4951840238729, 57.8603720445067, 72.3289197551429,
30.2366643066749, 23.8696161364716, 270.014690419247, 13.8170113452005,
39.5159584479013, 27.764841260433, 18.0311836472615, 40.5709477295999,
33.1888820958952, 9.03112843931787, 4.63738971549635, 12.7591169313099,
4.7998894219979, 8.93458248803248, 7.33904760386628, 12.0940344070925,
7.17364602165948, 6.514191844409, 9.69911157978057, 6.57874454980745,
7.90556524435596), AM_leafP = c(0.02840637, 0.230728821, 0.2630533,
0.099628272, 0.28381032, 0.12414402, 0.45059978, 0.00400018,
0.00719533500000001, 0.014715103086687, 0.118637201789886, 0.022690875,
0.254707825, 0.54923913, 0.470708088, 0.113126176837872, 0.22928510745,
0.118697847481752, 0.219730100850697, 0.38583864, 0.4159485,
0.24666396693114, 0.374696992776912, 0.547589605297248, 0.00280728,
0.070156727820596, 0.062314855376136, 0.01453005323695, 0.02394282358199,
0.0548378613646, 0.00225224, 0.120533928, 0.301695482, 0.15479046,
0.045996497, 0.096754836, 0.00705147600000001, 0.0568428, 0.47985120103071,
0.14494777, 0.08641493, 0.069128642, 0.165362156, 0.20947132,
0.018517511, 0.36187275779699, 0.126531158458224, 0.180083867690804,
0.277297380904852, 0.1232408972382, 0.15065285976048, 0.55364067,
1.07819275643191, 0.39187665, 1.04169066418176, 0.55962324, 0.68128731,
0.41342697, 0.69699564, 0.76755492, 0.515511133042674, 0.760023430328564,
0.632465844687028)), class = "data.frame", row.names = c(NA,
-63L))
I am trying to apply Hierarchical Clustering for Time Series in order to identify the states with similar behaviors in the time series for residential_percent_change_from_baseline. I get the dendrogram but the index i get in the x axis are just numbers and I want the states names.
my data looks like this:
Data
And this is some part of my code
data <- dataset
#Convert to factor
cols <- c("country_region_code", "country_region", "sub_region_1", "iso_3166_2_code")
data[cols] <- lapply(data[cols], factor)
sapply(data, class)
data$date <- as.Date(data$date)
summary(data)
#Data preparation
n <- 10
s <- sample(1:100, n)
i <- c(s,0+s, 279+s, 556+s, 833+s, 1110+s, 1387+s, 1664+s, 1941+s, 2218+s, 2495+s, 2772+s, 3049+s, 3326+s, 3603+s, 3880+s, 4157+s, 4434+s, 4711+s, 4988+s, 5265+s, 5542+s, 5819+s, 6096+s, 6373+s, 6650+s, 6927+s, 7204+s, 7481+s, 7758+s, 8035+s, 8312+s, 8589+s, 8866+s)
d <- data[i,3:4]
d$residential <- data[i,11]
d[,2] =NULL
str(d)
pattern <- c(rep('Mexico', n),
rep('Aguascalientes', n),
rep('Baja California',n),
rep('Baja California Sur',n),
rep('Campeche',n),
rep('Coahuila',n),
rep('Colima',n),
rep('Chiapas',n),
rep('Chihuahua',n),
rep('Durango',n),
rep('Guanajuato',n),
rep('Guerrero',n),
rep('Hidalgo',n),
rep('Jalisco',n),
rep('México City',n),
rep('Michoacan',n),
rep('Morelos',n),
rep('Nayarit',n),
rep('Nuevo León',n),
rep('Oaxaca',n),
rep('Puebla',n),
rep('Querétaro',n),
rep('Quintana Roo',n),
rep('San Luis Potosí',n),
rep('Sinaloa',n),
rep('Sonora',n),
rep('Tabasco',n),
rep('Tamaulipas',n),
rep('Tlaxcala',n),
rep('Veracruz',n),
rep('Yucatán',n),
rep('Zacatecas.',n))
d <- data.matrix(d)
distance <- dist(d, method = 'euclidean')
hc <- hclust(distance, method="ward.D")
plot(hc, cex=.7, hang = -1, col='blue', labels=pattern)
I get this dendrogram when I don't specify labels
dendrogram with numeric labels
But when I do I get this error
Error in graphics:::plotHclust(n1, merge, height, order(x$order), hang, : invalid dendrogram input
I hope somebody can help me, I am little bit tired of this
Maybe it will work with an alternative to the base r plot function. Try ggdendroplot. It should display the labels on the axis. You will need ggplot2 for this.
devtools::install("nicolash2/ggdendroplot")
library(ggdendroplot)
library(ggplot2)
ggplot() + geom_dendro(hc)
If you want to modify it (turn it, color it, etc.) check out the github page: https://github.com/NicolasH2/ggdendroplot
I want to plot spline effect of a parameter called "NO2" on birthweight, but I want 4 graphs for four quartiles. My current code gives only one graph, could you please help me to figure out the problem? You can see the code at the end, model_1_F1_spline is adjusted for different parameters, but my question is about F1_quartile. When I adjust NO2 by F1_quartile, it includes results for four quartiles, but I don't know how to extract those results and draw 4 graphs.
Here is a reproducible example:
structure(list(coefficients = structure(c(2779.15322482481, 11.6029323631846,
-109.637722127332, -70.5777182211836, -33.2026137282293, 1.34507275289371,
-104.16616170941, -84.3138020433217, 17.079775791272, 49.2699120523702,
65.7993773354024, 73.9523088264003, 62.1308005103464, 11.8305504033343,
17.2509811135892, 34.167485824927, 37.5379409075558, 39.4891005510156,
2.08045456267659, 95.0617726758795, 159.185162814325, 216.767405256274,
30.4053773772453, 67.9509936017346, 75.9715680793893, 76.0634702947319,
197.304475883704, 346.536371507916, 452.520999581153, 582.904282791219,
646.972345369266, -13.117918823958, -21.2577276011179, -36.4775602045112,
-2.53495678184362, 4.25561833400684, -4.24061504987865, 1.22183358211853,
-17.6781972182122, -13.9465039223737, -24.9221422877004, -26.5305128528655,
2.72740931108257, 17.3508955652218, -4.33132009995294, -11.4103790176564,
48.1115836583216, -23.8853869176324, -11.9906695483978, 0.159117077270929,
3.1823388043623, -30.2233558177321, 22.9158634128136, 1.86241593993877,
-7.46279510854093, -17.7265172939209, 15.6908002520418, 10.7367940888643,
11.9368630460758, 48.0464522543244, -10.5383667390476, 8.84142833076189,
38.6344171322845, -4.18823289724547, 20.9039579936433, -27.1572322476693,
-23.3055121479652, -10.125234127069, -2.3505578660444, -5.59801575548779,
21.0487614265911, -0.113655733751338, 1.4592300415459, -0.395003023852113,
-1.33572259818002, -0.195697887437374, -1.22245366980104, 0.161927450428184,
-8.83284987935688, -11.7655241486702, 10.0814083754381, 4.95053998927621,
0.0512729497898481, -2.47612645668306, -0.324705343736638, -2.73702305143146,
0.367899109531455, -17.8006136959884, -20.7138572162521, 1.66439599003613,
0.991339450831016, -0.094477049206764, -0.333359963322134, -0.0535341357101135,
-0.166135609567417, 0.0263694684353763, -0.790300658406237, -7.88088655871398,
2.30124665956728, 0.526763779856579, -0.729268724581621, -1.64502812073609,
0.245438533444878, -1.68875200672467, 0.471404077584143, -12.0519624220913,
-8.61178665100117), .Names = c("(Intercept)", "M_ethni_cat3FB White",
"M_ethni_cat3USB Black", "M_ethni_cat3FB Black", "M_ethni_cat3USB Hispanic",
"M_ethni_cat3FB Hispanic", "M_ethni_cat3USB Asian", "M_ethni_cat3FB Asian",
"M_Age_Cat1", "M_Age_Cat2", "M_Age_Cat3", "M_Age_Cat4", "M_Age_Cat5",
"M_EDU_Cat1", "M_EDU_Cat2", "M_EDU_Cat3", "M_EDU_Cat4", "M_EDU_Cat5",
"MEDICAID1", "prepregBMI_4cat1", "prepregBMI_4cat2", "prepregBMI_4cat3",
"PNC_RECEIVED1", "Parity_Cat1", "Parity_Cat2", "Parity_Cat3",
"gest_clin38", "gest_clin39", "gest_clin40", "gest_clin41", "gest_clin42",
"concept_year2008", "concept_year2009", "concept_year2010", "conc_season_num2",
"conc_season_num3", "conc_season_num4", "s(UHF34).1", "s(UHF34).2",
"s(UHF34).3", "s(UHF34).4", "s(UHF34).5", "s(UHF34).6", "s(UHF34).7",
"s(UHF34).8", "s(UHF34).9", "s(UHF34).10", "s(UHF34).11", "s(UHF34).12",
"s(UHF34).13", "s(UHF34).14", "s(UHF34).15", "s(UHF34).16", "s(UHF34).17",
"s(UHF34).18", "s(UHF34).19", "s(UHF34).20", "s(UHF34).21", "s(UHF34).22",
"s(UHF34).23", "s(UHF34).24", "s(UHF34).25", "s(UHF34).26", "s(UHF34).27",
"s(UHF34).28", "s(UHF34).29", "s(UHF34).30", "s(UHF34).31", "s(UHF34).32",
"s(UHF34).33", "s(UHF34).34", "s(NO2300_mean_total):F1_quartile1.1",
"s(NO2300_mean_total):F1_quartile1.2", "s(NO2300_mean_total):F1_quartile1.3",
"s(NO2300_mean_total):F1_quartile1.4", "s(NO2300_mean_total):F1_quartile1.5",
"s(NO2300_mean_total):F1_quartile1.6", "s(NO2300_mean_total):F1_quartile1.7",
"s(NO2300_mean_total):F1_quartile1.8", "s(NO2300_mean_total):F1_quartile1.9",
"s(NO2300_mean_total):F1_quartile2.1", "s(NO2300_mean_total):F1_quartile2.2",
"s(NO2300_mean_total):F1_quartile2.3", "s(NO2300_mean_total):F1_quartile2.4",
"s(NO2300_mean_total):F1_quartile2.5", "s(NO2300_mean_total):F1_quartile2.6",
"s(NO2300_mean_total):F1_quartile2.7", "s(NO2300_mean_total):F1_quartile2.8",
"s(NO2300_mean_total):F1_quartile2.9", "s(NO2300_mean_total):F1_quartile3.1",
"s(NO2300_mean_total):F1_quartile3.2", "s(NO2300_mean_total):F1_quartile3.3",
"s(NO2300_mean_total):F1_quartile3.4", "s(NO2300_mean_total):F1_quartile3.5",
"s(NO2300_mean_total):F1_quartile3.6", "s(NO2300_mean_total):F1_quartile3.7",
"s(NO2300_mean_total):F1_quartile3.8", "s(NO2300_mean_total):F1_quartile3.9",
"s(NO2300_mean_total):F1_quartile4.1", "s(NO2300_mean_total):F1_quartile4.2",
"s(NO2300_mean_total):F1_quartile4.3", "s(NO2300_mean_total):F1_quartile4.4",
"s(NO2300_mean_total):F1_quartile4.5", "s(NO2300_mean_total):F1_quartile4.6",
"s(NO2300_mean_total):F1_quartile4.7", "s(NO2300_mean_total):F1_quartile4.8",
"s(NO2300_mean_total):F1_quartile4.9"))), .Names = "coefficients")
Here is how I do:
model_1_F1_spline <- gam(BWGT~ s(UHF34,bs="re") + s(NO2300_mean_total, by=F1_quartile)+M_ethni_cat3 + M_Age_Cat + M_EDU_Cat + MEDICAID +
prepregBMI_4cat + PNC_RECEIVED + Parity_Cat + gest_clin + concept_year + conc_season_num, data=births_stressors, method="REML")
png(filename="plot_factor1_spline.png")
plot(model_1_F1_spline, ylab="Change in birth weight (g)", xlab="NO2")
dev.off()
From your provide coefficient vector of your fitted GAM, I could infer that F1_quartile is a factor by variable, with levels 1, 2, 3, 4, so that you have smooth functions s(NO2300_mean_total):F1_quartile1, s(NO2300_mean_total):F1_quartile2, s(NO2300_mean_total):F1_quartile3 and s(NO2300_mean_total):F1_quartile4.
In this situation, calling predict.gam should return you 5 plots, one being a Q-Q plot of your 34-level random intercept s(UHF34, bs = 're'), and 4 plots for the by smooths.
Your question is mainly regarding the by smooths, so consider the following minimal reproducible example.
dat <- data.frame(y = rnorm(40), x = runif(40), f = gl(4, 10))
library(mgcv)
fit <- gam(y ~ f + s(x, k = 5, by = f))
Note that you need to put by as a covariate, too, as factor by smooth is subject to centering constraint (if unclear of this, skip it).
Now if you call plot.gam(fit, page = 1), you will see 4 plots: a smooth s(x) for each level of f.
Note that plot.gam can invisibly return data generating the plots. If you do
oo <- plot.gam(fit, page = 1)
you will see that oo is a list of 4. For each element, say oo[[1]], $x and $fit gives respectively the x-coordinate and y-coordinate of the plot, while se gives standard error. $xlab gives variable name, $ylab gives smooth function name. These data are sufficient for you to reconstruct the plots by plot.gam.
Can somebody help me convert an 'ashape3d' class object to class 'mesh3d'?
In ashape3d, the triangle en tetrahedron faces are are stored in different fields. As I don't think there's a function that can create a mesh3d object from triangles&tetrahedrons simultaneously, I tried the following (pseudocode):
model <- ashape3d(rtorus(1000, 0.5, 2),alpha=0.25)
vert <- model$x[model$vert[,2]==1,]
vert <- cbind(vert,rep(1,nrow(vert)))
tria <- model$triang[model$triang[,4]==1,1:3]
tetr <- model$tetra[model$tetra[,6]==1,1:4]
m3dTria <- tmesh3d(vertices=vert , indices=tria)
m3dTetr <- qmesh3d(vertices=vert , indices=tetr)
m3d <- mergeMeshes(m3dTria,m3dTetr)
plot.ashape3d(model) # works fine
plot3d(m3d) # Error in x$vb[1, x$it] : subscript out of bounds
Does anybody have a better way?
I needed to do this recently and found this unanswered question. The easiest way to figure out what is going on is to look at plot.ashape3d and read the docs for ashape3d. plot.ashape3d only plots triangles.
The rgl package has a generic as.mesh3d function. This defines a method for that generic function.
as.mesh3d.ashape3d <- function(x, ...) {
if (length(x$alpha) > 1)
stop("I don't know how to handle ashape3d objects with >1 alpha value")
iAlpha = 1
# from help for ashape3d
# for each alpha, a value (0, 1, 2 or 3) indicating, respectively, that the
# triangle is not in the alpha-shape or it is interior, regular or singular
# (columns 9 to last)
# Pick the rows for which the triangle is regular or singular
selrows = x$triang[, 8 + iAlpha] >= 2
tr <- x$triang[selrows, c("tr1", "tr2", "tr3")]
rgl::tmesh3d(
vertices = t(x$x),
indices = t(tr),
homogeneous = FALSE
)
}
You can try it out on the data above
model <- ashape3d(rtorus(1000, 0.5, 2),alpha=0.25)
plot(model, edges=F, vertices=F)
library(rgl)
model2=as.mesh3d(model)
open3d()
shade3d(model2, col='red')
I am trying to plot a venn diagram in an optimized way (see below) and with the cases as internal labels (not the number of cases in each intersection). I know there are post for each of them but non of the solutions allowed me to do both.
I have this:
x <- list()
x$A <- as.character(c("Per_36","Cent","CeM","vDG","LAVL","RSGd"))
x$B <- as.character(c("vCA1","DLE","Per_36","vDG","DIE","Per_35"))
x$C <- as.character(c("vCA1","Cg1","LAVL", "RSGc", "RSGd","Per_35","Per_36"))
x$D <- as.character(c("Por","Cg1","RSGc","LAVL","Per_35","RSGd","Per_36"))
require(VennDiagram)
v0 <-venn.diagram(x, lwd = 3, col = c("red", "green", "orange", "blue"),
fill = c("red", "blue", "green", "orange"), apha = 0.5, filename = NULL)
grid.draw(v0)
overlaps <- calculate.overlap(x)
overlaps <- rev(overlaps)
for (i in 1:length(overlaps)){
v0[[i+8]]$label <- paste(overlaps[[i]], collapse = "\n")
}
grid.newpage()
grid.draw(v0)
I get the following output:
Regarding the organization of the venn diagramI want to do this:
c <- venn(x, simplify = TRUE, small = 0.5, intersections = TRUE)
which I got from package gplots() using the venn function with simplify = TRUE. However, in the venn function, I seem to no be able to replace the counts by the names of the labels. I used the intersections = TRUE, which by the description of the argument should work, but it doesn't (although if I look inside the variable c, the info is there).
Logical flag indicating if the returned object should have the attribute
"individuals.in.intersections" featuring for every set a list of individuals
that are assigned to it.
Question: Using VennDiagrampackage, is there a way to do exactly the same as the simplify argument does in the venn function from gplots package?
Question 2: Using the venn function from gplots package, is there a way to display the names of each element instead of the element counts? Like I did in the 'venn.diagram' function?
Thanks in advance,
Here is my approach which is by far no solution rather a hack.
# Print a venn and save it to an object
a <- venn(list(letters[1:5], letters[3:8]))
# save the intersections
b <- attr(a, "intersections")
# find the coordinates
s <- seq(0,500,100); abline(h=s); text(s, y=s, x=0)
s <- seq(0,500,50); abline(v=s); text(s, y=0, x=s)
# the hack, destroy the venn to avoid the plotting of the internal numbers
rownames(a) <- letters[1:nrow(a)]
a
plot.venn(a)
>Error in data[n, 1] : subscript out of bounds
# include the internal labels
text(200,300,paste(b$`01`,collapse = "\n"))
text(200,200,paste(b$`11`,collapse = "\n"))
text(200,100,paste(b$`10`,collapse = "\n"))
It's annoying with multiple venns. Otherwise you can save the venn as an .svg and edit it with inkscape or similar softwares or ask the developer by email.
Edit:
If your plots looking alwas the same you can check the source code for the venn function (In RStudio by hitting F2) and copy paste the positions for 4 and 5 circle venns and replace the labels function lab("1000", data) with your desired labels.
For 4 circles:
text(35, 250, lab("1000", data))
text(140, 315, lab("0100", data))
text(260, 315, lab("0010", data))
text(365, 250, lab("0001", data))
text(90, 280, lab("1100", data), cex = small)
text(95, 110, lab("1010", data))
text(200, 50, lab("1001", data), cex = small)
text(200, 290, lab("0110", data))
text(300, 110, lab("0101", data))
text(310, 280, lab("0011", data), cex = small)
text(130, 230, lab("1110", data))
text(245, 75, lab("1101", data), cex = small)
text(155, 75, lab("1011", data), cex = small)
text(270, 230, lab("0111", data))
text(200, 150, lab("1111", data))
Edit
Nowadays I would switch to a ggplot solution
ggVennDiagram::ggVennDiagram(x)