Related
I have data that looks similar to the following example data and I'm looking for a way to fit an equation that i can use on other data with similar profiles but might be higher or lower.
structure(list(day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132,
133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158,
159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,
172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184,
185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197,
198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,
211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,
224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236,
237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249,
250, 251, 252, 253), Count = c(10, 50, 500, 425, 300, 400, 275,
98, 115, 79, 87, 114, 69, 105, 81, 82, 117, 87, 123, 81, 119,
97, 84, 124, 122, 53, 114, 95, 49, 95, 101, 114, 74, 120, 72,
61, 79, 59, 96, 95, 105, 53, 110, 69, 69, 79, 106, 52, 50, 98,
102, 107, 122, 108, 47, 68, 51, 114, 96, 102, 121, 113, 130,
134, 143, 144, 141, 139, 140, 142, 141, 125, 134, 130, 137, 139,
123, 138, 108, 133, 97, 122, 120, 110, 144, 121, 103, 127, 103,
100, 139, 138, 103, 105, 114, 142, 128, 141, 141, 122, 110, 125,
112, 98, 130, 116, 138, 120, 135, 143, 136, 145, 101, 120, 131,
119, 131, 116, 114, 143, 126, 102, 116, 106, 133, 110, 102, 141,
141, 132, 110, 95, 130, 133, 131, 128, 103, 111, 120, 140, 107,
114, 95, 113, 116, 131, 145, 144, 121, 111, 100, 145, 96, 130,
95, 119, 135, 127, 113, 105, 110, 102, 105, 116, 145, 115, 102,
120, 143, 140, 141, 132, 143, 136, 108, 106, 127, 112, 122, 118,
112, 96, 116, 141, 162, 168, 198, 156, 165, 180, 179, 166, 194,
194, 162, 199, 156, 193, 200, 160, 160, 187, 150, 185, 161, 183,
166, 167, 199, 159, 146, 195, 151, 161, 161, 162, 167, 193, 191,
181, 148, 200, 182, 164, 147, 182, 165, 165, 159, 163, 188, 154,
192, 157, 149, 163, 170, 151, 185, 168, 154, 164, 191, 169, 186,
157, 182, 195, 150, 145, 152, 188, 176)), row.names = c(NA, -253L
), class = c("tbl_df", "tbl", "data.frame"))
The red line is an example of what an equation might look like. Very rough drawing.
I think what you may be looking for is a generalized additive model (GAM), which is often used to model nonlinear data like time. Here I have saved your dput as data and fit it to a GAM below. First, we can load the mgcv package for the GAM fit.
#### Load Library ####
library(mgcv)
Then you fit the GAM. This can be a very complex topic, and I advise reading a lot on this, but essentially you fit the regression in a similar manner you are probably used to if you have done regression in R before. The only difference is what spline terms you add to the regression, or the nonlinear functions that approximate the relationship between x and y. Here I have just fit a cubic regression spline using the s function for the spline, "day" as the variable, and bs = "cr" for the cubic regression spline. I also use REML here, recommended by a lot of GAM experts, to automatically adjust the knots and smoothing parameters. This can be customized a lot, but for simplicity I leave it alone here.
#### Fit GAM ####
fit <- gam(
Count ~ s(day, bs = "cr"),
method = "REML",
data=data
)
The results can be run here:
#### Summary ####
summary(fit)
As seen below. Here you see the intercept is listed like typical regression summaries. Now you have an additional "Approximate significance of smooth terms" section, which lists some useful metrics for your smoothing term. EDF is how curvilinear it is, and Ref.df & F are used for the significance test, seen to the far right. In this case, the smoothing term is significant. There are also many model metrics on the bottom that are worth observing:
Family: gaussian
Link function: identity
Formula:
Count ~ s(day, bs = "cr")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 133.538 2.504 53.32 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(day) 8.037 8.751 17.93 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.382 Deviance explained = 40.2%
-REML = 1298.7 Scale est. = 1586.9 n = 253
Technically we can write an equation based off this knowledge, but the difference with GAMs is that each spline fit sets separate coefficients for each part of the nonlinear trend, so its not entirely useful for nonlinear data (some reasons are given here). For example, if I want all of the coefficients for a linear equation, I can run coef(fit) and get this very long list:
(Intercept) s(day).1 s(day).2 s(day).3 s(day).4 s(day).5
133.537549 -83.413590 -54.926693 -35.398280 -38.849985 -41.564495
s(day).6 s(day).7 s(day).8 s(day).9
-38.991790 9.101440 4.764924 24.764163
Plotting the data can be done with the below function and is a much better approximation of the regression fit:
#### Plot Fit ####
plot(fit)
Which shows the data fit with its spline and standard error, along with a rug showing the data points with lines on the x axis. This plotting can be customized a lot too, especially with the gratia package, but I leave it here as is. In any case, the interpretation from the plot is far more clear...counts initially decrease a ton, then rebound slightly, plateau for some time, then rebound again before plateauing again.
Hope that is helpful and I recommend reading a lot on this topic. I have included some links to some really useful primers on the subject below.
Citations
Simpson, 2018: GAMs article. This covers mostly fixed effect versions, which you are probably more likely to use.
Pedersen et al., 2019: GAMMs article. This covers some random effects parts too, which may be difficult to understand unless you know more about mixed models.
This book is also a canonical reference to GAMs that is a lot more comprehensive, but I find it is a difficult read and not the best source for beginners.
I am not sure if i got it right,
are you looking for the code to make this kind of plot?
install tidyverse (a collection of packages) and then -
Add this piece of code:
# your code first and then use the pipe operator '%>%':
%>%
ggplot(aes(x = day, y = Count))+geom_line()
My dataset contains 2 variables Y and X. Y was measured every 1.0 seconds.
My Data:
dput(Dataexample)
structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133,
134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146,
147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,
160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172,
173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185,
186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198,
199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211,
212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237,
238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250,
251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263,
264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276,
277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289,
290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302,
303, 304, 305, 306), Y = c(71756.2344, 71745.85, 70882.42, 71025.61,
70539.02, 70602.3047, 70811.87, 70514.125, 69998.63, 70531.76,
70424.9141, 70663.51, 70075.375, 69731.0859, 70029.74, 70519.31,
69858.63, 69987.23, 70080.56, 69970.63, 69829.6, 69872.12, 69775.68,
69679.24, 69814.05, 69639.84, 69645.02, 69344.35, 69430.41, 70078.49,
69239.65, 69734.1953, 69736.27, 69549.63, 69506.0859, 69108,
69669.91, 69516.45, 69490.54, 69609.77, 69314.29, 69454.25, 69590.07,
69721.76, 69525.79, 69736.27, 69303.92, 69171.23, 69294.59, 69430.41,
69457.36, 69462.54, 69144.27, 69590.07, 69446.99, 70083.67, 69358.87,
69800.56, 69680.28, 69332.95, 69723.83, 69942.63, 69772.56, 69969.59,
69808.86, 70043.23, 70208.13, 70077.45, 69856.56, 70423.875,
69490.54, 69984.12, 70175.98, 70192.58, 70279.7, 70480.93, 70594,
70792.16, 70234.06, 70165.61, 70249.62, 70564.95, 70403.13, 70444.625,
70426.99, 69907.375, 70327.4141, 70686.3359, 70473.67, 71031.83,
70864.78, 70710.1953, 70691.52, 70703.97, 70826.39, 70708.12,
70595.04, 70946.75, 71319.27, 70977.875, 70475.74, 70612.68,
70680.11, 70527.61, 70461.22, 70877.2344, 70631.35, 70723.68,
70677, 70433.21, 70306.6641, 71246.63, 70375.125, 70416.62, 70150.0547,
70733.0156, 70583.63, 70866.86, 70580.5156, 70433.21, 70377.2,
70114.79, 70347.12, 70613.71, 70576.37, 70599.19, 70407.28, 70581.5547,
70650.02, 71122.11, 70909.4, 70694.63, 71076.45, 70650.02, 71133.52,
70810.83, 71240.41, 70630.31, 71144.94, 71493.63, 71117.95, 71374.28,
71143.9, 70805.64, 71349.375, 71208.2344, 71322.39, 71727.1641,
71060.88, 71546.56, 71569.4, 70984.1, 72032.37, 71573.55, 71787.375,
71469.76, 71398.15, 71683.57, 71709.52, 71637.9, 71556.9453,
71870.4141, 71612.99, 71953.47, 71515.43, 71315.125, 72007.4453,
72021.9844, 71549.68, 72001.22, 71359.75, 71775.95, 72327.23,
71949.31, 71844.47, 71857.96, 72128.9141, 72147.6, 71501.94,
72268.05, 72104, 72217.1641, 72253.51, 72198.48, 72908.78, 72084.27,
72653.29, 72431.06, 72858.92, 72512.0547, 72632.5156, 72700.02,
72335.53, 72713.52, 73065.62, 72818.42, 73004.3359, 72458.06,
73436.48, 73231.82, 73002.26, 73313.89, 73213.125, 72980.4453,
72948.25, 73106.13, 72931.625, 73409.47, 73057.31, 73141.4453,
73218.32, 73216.24, 73273.375, 73701.42, 73486.35, 72574.37,
73229.74, 73576.74, 73195.46, 73697.2656, 73115.48, 73065.62,
73062.5, 73111.32, 73988.23, 73619.3359, 73874.95, 73683.76,
73674.41, 73550.7656, 74166.9844, 73875.99, 74013.17, 74092.16,
73872.875, 74015.25, 73984.07, 73911.33, 73606.87, 74082.8, 73866.64,
74550.53, 74271.95, 73980.95, 74502.71, 74901.92, 74753.25, 74310.4141,
75178.51, 74748.05, 74756.37, 75194.1, 74797.95, 75531.0547,
75549.77, 75293.94, 75378.17, 75457.21, 75676.67, 76087.56, 76141.6641,
76008.5, 76241.55, 76585.96, 76091.73, 76880.4844, 76898.18,
77005.38, 77080.32, 77548.78, 77337.4453, 77000.18, 77448.8359,
76997.0547, 77314.54, 77919.47, 77185.46, 78127.75, 77464.45,
78349.59, 77824.71, 77465.49, 77818.46, 78140.25, 78547.51, 77850.74,
78236.06, 78341.2656, 78104.8359, 78464.17, 77888.23, 78392.3,
78686.0547, 78149.625, 78623.5547, 78672.5156, 78810.03, 78498.55,
78652.72, 78717.31, 78831.91, 78882.96, 78715.23, 78499.5859,
78892.3359, 78372.51)), row.names = c(NA, -306L), class = c("tbl_df",
"tbl", "data.frame"))
I have used ggplot to plot the data and used a loop to calculate the average slope within a moving 60-second-window for the entire duration of the dataset to find the 60 consecutive seconds where the slope is greatest.
Code:
library(readr)
library(ggplot2)
Dataexample<- read_csv("HF-6.csv", skip = 3)
Dataexample<- head(Dataexample, -1)
Dataexample$X <- as.numeric(Dataexample$X)
df <- data.frame(Dataexample)
ggplot(data=df, aes(x=X, y=Y, group=1)) +
geom_line()
slopes <- rep(NA, nrow(Dataexample)-59)
for( i in 1:length(slopes)){
slopes[i] <- lm(Y ~ X, data=Dataexample[i:(i+59), ])$coefficients[2]
}
print(slopes)
which.max(slopes)
max(slopes)
My questions is how can I then take the results of my loop that show the consecutive 60 seconds where the slope is highest and change the color of the line in the plot during those 60 seconds to highlight where slope is greatest.
This should work:
maxslope_ind <- which.max(slope)
Dataexample$highlight <- ifelse(Dataexample$X %in% maxslope_ind:(maxslope_ind+59), 1, 0)
library(ggplot2)
ggplot(data=Dataexample, aes(x=X, y=Y, group=1)) +
geom_line(aes(colour=as.factor(highlight)), show.legend=FALSE) +
scale_colour_manual(values=c("black", "red"))
This question already has an answer here:
extract values from a data frame based on a vector of row numbers in R
(1 answer)
Closed 2 years ago.
I have a sample data.frame below (subset of a very large cyclic database)
> dput(try)
structure(list(Actuator.Force = c(-402.57388, -400.83463, -402.72595,
-404.24283, -404.07663, -403.83575, -407.55435, -418.7684, -435.86246,
-462.38239, -504.09146, -558.40039, -618.46674, -681.58704, -748.87347,
-814.95032, -880.57739, -946.11627, -1012.9043, -1075.2557, -1141.4972,
-1209.1968, -1272.8707, -1336.021, -1400.5078, -1465.5786, -1528.6499,
-1589.5626, -1654.6541, -1717.825, -1780.0903, -1839.9329, -1902.9841,
-1964.1945, -2025.569, -2085.9578, -2148.239, -2207.5295, -2267.5806,
-2328.6467, -2388.4958, -2447.5298, -2506.7534, -2567.687, -2625.7661,
-2682.866, -2741.3511, -2802.1934, -2858.2546, -2915.1028, -2972.7683,
-3030.8093, -3089.2439, -3145.5701, -3199.8442, -3259.2087, -3315.8582,
-3371.958, -3426.5596, -3484.3855, -3541.2642, -3595.3362, -3650.0208,
-3708.3748, -3763.8076, -3820.0623, -3875.3044, -3932.9504, -3989.6238,
-4047.5957, -4104.8169, -4164.8237, -4223.5444, -4283.3813, -4341.3989,
-4403.166, -4462.1479, -4522.5728, -4584.0186, -4644.7656, -4704.3525,
-4762.6826, -4821.8706, -4878.8818, -4924.1021, -4959.0415, -4985.9517,
-5005.4531, -5017.8027, -5026.0757, -5032.3428, -5036.8042, -5038.9292,
-5039.5361, -5043.021, -5043.0981, -5043.0415, -5042.627, -5014.4199,
-4853.5854, -4566.9771, -4198.7612, -3774.5527, -3317.6958, -2847.5229,
-2364.7585, -1880.9485, -1405.4272, -930.289, -467.04822, -18.867363,
421.17499, 838.86719, 1239.9121, 1626.0669, 1990.6389, 2334.0852,
2655.344, 2962.0227, 3243.7817, 3506.2249, 3744.2622, 3959.8271,
4156.7061, 4324.9048, 4469.229, 4591.6689, 4687.4194, 4764.0801,
4814.6167, 4840.313, 4846.0181, 4826.3135, 4777.6553, 4696.0791,
4583.854, 4442.457, 4272.5254, 4076.7224, 3851.1211, 3603.1853,
3330.7456, 3038.3157, 2724.115, 2386.5476, 2032.5809, 1660.0547,
1268.0084, 859.16675, 432.4075, -14.131592, -479.29309, -955.67108,
-1444.614, -1937.2562, -2437.0085, -2941.8914, -3450.9009, -3959.9597,
-4468.9795, -4981.2549, -5492.6997, -6002.334, -6510.5425, -7016.2432,
-7517.8286, -8013.1348, -8500.4199, -8974.8867, -9439.5479, -9890.5938,
-10326.367, -10744.421, -11147.754, -11534.83, -11902.651, -12248.997,
-12577.919, -12885.458, -13172.309, -13441.554, -13691.502, -13922.634,
-14127.116, -14305.272, -14458.267, -14582.934, -14685.274, -14758.539,
-14806.058, -14830.719, -14836.625, -14822.204, -14773.916, -14700.484,
-14597.968, -14469.834, -14312.099, -14126.422, -13915.136, -13676.505,
-13412.388, -13120.703, -12807.961, -12473.883, -12115.751, -11740.082,
-11342.633, -10929.945, -10502.158, -10062.869, -9611.8271, -9146.6006,
-8673.3545, -8191.7417, -7700.769, -7200.9346, -6695.8809, -6185.2378,
-5670.8711, -5154.9995, -4643.4414, -4135.0015, -3629.2859, -3125.657,
-2626.541, -2134.0662, -1646.4242, -1168.816, -699.63068, -245.34488,
192.7984, 618.76703, 1033.223, 1428.922, 1807.2645, 2165.6274,
2507.6655, 2826.2754, 3120.4724, 3395.2593, 3647.6946, 3879.4983,
4086.3855, 4265.1323, 4421.6831, 4554.3594, 4657.8184, 4736.9561,
4792.6724, 4822.3784, 4830.3091, 4815.9038, 4773.9692, 4706.4736,
4614.8379, 4491.3198, 4337.8892, 4158.002, 3949.3147, 3713.4622,
3453.9114, 3167.8179, 2861.2598, 2536.3259, 2187.3623, 1822.752,
1437.5449, 1034.8208, 617.23962, 183.35637, -270.79733, -738.95618,
-1220.1345, -1710.7787, -2206.1941, -2706.4871, -3210.8625, -3721.0002,
-4233.6387, -4747.7271, -5258.7578, -5771.3071, -6280.7759, -6791.0166,
-7295.0229, -7794.4199, -8287.4189, -8771.6377, -9243.3457, -9702.2559,
-10146.865, -10577.053, -10989.863, -11385.981, -11760.477, -12116.938,
-12456.351, -12772.688, -13071.995), No.Rows = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152,
153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165,
166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178,
179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204,
205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217,
218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230,
231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243,
244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256,
257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282,
283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295,
296, 297, 298, 299, 300)), row.names = c(NA, 300L), class = "data.frame") class =
"data.frame")
I find the peaks and valleys of the data using:
library(quantmod)
max <- findPeaks(try$Actuator.Force)
min <- findValleys(try$Actuator.Force)
The result is the row.number of the try data.frame corresponding to the peaks and valleys. What I want is a vector of the Actuator.Force peak values corresponding to the row.numbers that the findPeaks and findValleys function find.
If the min and max values are the row index of try data frame, you can get a subset of try:
> try[min, ]
Actuator.Force No.Rows
5 -404.0766 5
97 -5043.0415 97
193 -14822.2040 193
> try[max, ]
Actuator.Force No.Rows
3 -402.7260 3
7 -407.5543 7
133 4826.3135 133
253 4815.9038 253
If you want to get only the Actuator.Force values for max and min row index:
> try[min, "Actuator.Force"]
[1] -404.0766 -5043.0415 -14822.2040
> try[max, "Actuator.Force"]
[1] -402.7260 -407.5543 4826.3135 4815.9038
This is sort of a repost with an update, because the person who replied to the original post was able to solve part of the problem, but we discovered a new issue that needed to be solved and I haven't found other posts addressing this. If this isn't allowed, let me know!
So I have a tibble with four columns and a whole bunch of rows. I'll put the first few rows at the end of the post. The first column is called id, and each row has a unique id. The next column is doy.series and the third column is called smooth.series Each entry in the doy.series and smooth.series columns are lists. The last column is just called doy which is an integer.
So what I want is to plot the doy.series against the smooth.series for each row, but plot all of those as lines on the same plot. I'd also like the lines to be colored by the doy. I would like the highest doy values to be red gradually transitioning to the lowest doy values which I want to be blue.
The issue is that the length of the two lists varies slightly row to row (so the doy.series and smooth.series lists for a given row have the same number of elements, but the number of elements varies from row to row). So if I try to do this:
library(tidyverse)
df2 <- df %>%
unnest()
ggplot(df2, aes(x = doy.series, y = smooth.series, color = doy, group = doy)) +
geom_line() +
scale_color_gradient(low = "blue", high = "red")
I get Error: All nested columns must have the same number of elements.
Any ideas of how to solve this?
Sample of data:
df=structure(list(id = c("1", "2", "3"), doy = c(152, 158, 142),
smooth.series = list(c(0.356716711457841, 0.370050893258325,
0.383236999766461, 0.396376974233949, 0.40957275991249, 0.422784291482468,
0.435895856103075, 0.448895925744217, 0.461772972375802,
0.474515467967738, 0.48722268616777, 0.499933470515835, 0.512545647820125,
0.524957044888832, 0.537065488530148, 0.549189274496968,
0.561532939869938, 0.573823673448877, 0.5857886640336, 0.597155100423927,
0.608751798005646, 0.621116663488914, 0.633540522660091,
0.645314201305544, 0.655728525211634, 0.665571086939856,
0.675708836320647, 0.685551635043781, 0.694509344799033,
0.701991827276177, 0.70938842013153, 0.717660871422796, 0.725577658441836,
0.731907258480512, 0.735418148830686, 0.737609381488737,
0.740068708326791, 0.741697656450321, 0.741397752964802,
0.738070524975708, 0.730787113459408, 0.720275348839784,
0.707921792393576, 0.695113005397529, 0.683235549128384,
0.66854065601544, 0.648565682783239, 0.626626377151392, 0.606038486839507,
0.590117759567193, 0.575248354822936, 0.557183338977548,
0.538291820074129, 0.520942906155777, 0.507505705265592,
0.497170423227522, 0.487542218972326, 0.478612630203321,
0.470373194623822, 0.462815449937146, 0.458831683827816,
0.459466542404155, 0.461940101005184, 0.463472434969922,
0.461283619637389, 0.458826926942516, 0.459760482491641,
0.461611642130895, 0.461907761706409, 0.458176197064313,
0.45041527548862, 0.440794234319326, 0.430096794486539, 0.419106676920368,
0.408607602550923, 0.396242242656226, 0.380503346606902,
0.363449752471964, 0.347140298320423, 0.333633822221291,
0.321253095838767, 0.307606088569194, 0.293679435079791,
0.28045977003778, 0.268933728110384, 0.258699817638372, 0.248739536036072,
0.239114001581039, 0.22988433255083, 0.221111647222998, 0.213576575535607,
0.207511807976447, 0.202156553647666, 0.196750021651411,
0.19053142108983, 0.183900129705502, 0.177683584689661, 0.17176308431744,
0.166019926863971, 0.160335410604389, 0.153743267353215,
0.146014563103421, 0.138136597397814, 0.131096669779199,
0.125882079790385, 0.121448622919517, 0.116554575980903,
0.111890960506595, 0.10814879802864, 0.106019110079089, 0.105661169696536,
0.106498694582266, 0.108119373262358, 0.110110894262892,
0.11206094610995, 0.11540233539241, 0.120997725074791, 0.127579588246629,
0.133880397997461, 0.138632627416822, 0.143475963087052,
0.150098990228934, 0.157307529889666, 0.163907403116449,
0.168704430956481, 0.172368187413333, 0.176173416587563,
0.179833694671856, 0.183062597858895, 0.185573702341366,
0.187052218702838, 0.187638597324987, 0.187729274097655,
0.187720684910687, 0.188009265653923, 0.188094043051569,
0.187473296399738, 0.186542340446139, 0.185696489938482,
0.185331059624476, 0.18519846898834, 0.184852749545972, 0.184391634092598,
0.183912855423446, 0.183514146333744, 0.183117312589153,
0.182606699408324, 0.182023848765966, 0.181410302636787,
0.180807602995498, 0.180212334083628, 0.179591966640683,
0.178944372388341, 0.178267423048277, 0.177558990342169,
0.176816945991692, 0.176039161718523, 0.175223509244339,
0.174367860290815), c(0.774610362619149, 0.746412269781788,
0.719913789191898, 0.695420287796062, 0.673237132540861,
0.653273968452586, 0.635200894750251, 0.618963959669522,
0.604509211446066, 0.59178269831555, 0.581143108860635, 0.572741206185169,
0.566211150306591, 0.561187101242345, 0.557303219009872,
0.555232501533965, 0.555534753213423, 0.557674343776691,
0.561115642952215, 0.565323020468442, 0.573372729704498,
0.586895414376992, 0.603187029720595, 0.619543530969977,
0.63326087335981, 0.649100787927623, 0.670935206211495, 0.694725384196925,
0.716432577869408, 0.732018043214443, 0.745048344614319,
0.760240534670835, 0.775281601698759, 0.787858534012854,
0.795658319927888, 0.799986340749408, 0.803308852568436,
0.805054155877942, 0.804650551170897, 0.801526338940272,
0.794278541385548, 0.783000246619136, 0.769363854003397,
0.755041762900692, 0.741706372673384, 0.725475019829043,
0.703589930868272, 0.679410840142977, 0.656297482005064,
0.637609590806439, 0.619522671663228, 0.59768917697572, 0.574684262022063,
0.553083082080407, 0.535460792428901, 0.520724989583801,
0.506208924554847, 0.492126234360422, 0.478690556018906,
0.466115526548682, 0.456163205377972, 0.449307080301381,
0.443827930886861, 0.438006536702364, 0.430123677315843,
0.422161025148951, 0.416341337462143, 0.411307021081574,
0.405700482833399, 0.398164129543771, 0.388509330252805,
0.377833000710694, 0.366705867808713, 0.355698658438137,
0.34538209949024, 0.334476225402477, 0.322019919293152, 0.309062181074817,
0.296652010660021, 0.285838407961315, 0.276447235011443,
0.267512949813937, 0.258897446237528, 0.250462618150946,
0.242070359422922, 0.234129641095434, 0.226902750204945,
0.220031554611916, 0.21315792217681, 0.205923720760086, 0.19852372751621,
0.191343478746417, 0.184310981031329, 0.177354240951568,
0.170401265087756, 0.163192040497278, 0.155702671847874,
0.148215342135701, 0.141012234356916, 0.134375531507676,
0.127758694427545, 0.120727232121655, 0.113731399834516,
0.107221452810637, 0.101647646294526, 0.0965005694542742,
0.091209569652388, 0.0861330394250022, 0.0816293713082512,
0.0780569578382695, 0.075371315752509, 0.0732346851768886,
0.0715655437302434, 0.0702823690314086, 0.0693036386992192,
0.0691863250953481, 0.0701883842483067, 0.0717797692771877,
0.073430433301084, 0.0746103294390886, 0.0761784204310707,
0.0788956524181342, 0.0820849536212236, 0.0850692522612836,
0.0871714765592586, 0.0890972731947091, 0.0916871981841642,
0.094466543754022, 0.0969606021306803, 0.0986946655405372,
0.0998521594454551, 0.100914181761071, 0.101852533394445,
0.102639015252636, 0.103245428242704, 0.103462264899288,
0.103250578499722, 0.102838011065113, 0.102452204616565,
0.102320801175184, 0.102287042959656, 0.102088178523127,
0.101792859388387, 0.10146973707823, 0.101187463115447, 0.100889974984667,
0.100495657145597, 0.100034831583652, 0.0995378202842451,
0.0990349452327908, 0.0985243275768939, 0.097982837008049,
0.0974089183275044, 0.0968010163365085, 0.0961575758363094,
0.0954770416281557, 0.0947578585132956, 0.0939984712929773,
0.0931973247684494), c(0.754994105046569, 0.759262980856892,
0.763248462599852, 0.767062652758686, 0.77081765381663, 0.774472838830307,
0.777902459086919, 0.781090934415562, 0.784022684645335,
0.786682129605334, 0.789179136777192, 0.791558707699299,
0.793707963285894, 0.795514024451214, 0.796864012109498,
0.798059444416928, 0.799307737538445, 0.800354859401417,
0.800946777933214, 0.800829461061205, 0.800535183022801,
0.800489794522664, 0.800279628189041, 0.799491016650182,
0.797710292534332, 0.793727807721641, 0.787531875152631,
0.780505769046113, 0.774032763620899, 0.7694961330958, 0.766814565429895,
0.764644322900798, 0.76247504120512, 0.759796356039472, 0.756097903100465,
0.753105409945246, 0.751812834849642, 0.750612159588286,
0.747895365935809, 0.742054435666847, 0.734349211270437,
0.726724708968583, 0.718600671135361, 0.709396840144846,
0.698532958371117, 0.685958066059214, 0.672217339551, 0.657624725365782,
0.642494170022865, 0.627139620041553, 0.608952937377657,
0.586693889874402, 0.562728282882226, 0.539421921751564,
0.519140611832852, 0.498827185487661, 0.475313450129851,
0.450798673688293, 0.427482124091856, 0.40756306926941, 0.391594783774196,
0.377888811456193, 0.365281824189383, 0.352610493847746,
0.338711492305265, 0.324742577633052, 0.312133619351959,
0.300129456510569, 0.287974928157469, 0.274914873341241,
0.260271155053797, 0.244483391182105, 0.228473053117332,
0.21316161225065, 0.199470539973226, 0.18615371420193, 0.171962954821233,
0.157816646173702, 0.144633172601908, 0.13333091844842, 0.12330701757161,
0.113524600030087, 0.104251044466771, 0.0957537295245833,
0.0883000338464436, 0.0824702507214148, 0.0782344478172879,
0.0749446653450757, 0.0719529435157916, 0.0686113225404485,
0.0661995394292459, 0.0657367472597934, 0.0661841103442199,
0.0665027929946539, 0.0656539595232242, 0.0644681965998428,
0.0641406846621584, 0.0641794786739045, 0.0640926335988149,
0.0633882044006231, 0.0619833956742033, 0.0602788637336122,
0.058507841121357, 0.0569035603799449, 0.0556992540518834,
0.0546528395507013, 0.0534742168963767, 0.0523245925410911,
0.0513651729370255, 0.0507571645363614, 0.0504376622860656,
0.050241048436331, 0.0501744603941783, 0.0502450355666279,
0.0504599113607003, 0.0510283221178971, 0.052012828131863,
0.0532005557378333, 0.0543786312710432, 0.055334181066728,
0.0564245098836434, 0.057962991953209, 0.0596708578196554,
0.0612693380272137, 0.0624796631201147, 0.0635479377810315,
0.0648035693895804, 0.0660927425354069, 0.0672616418081562,
0.0681564517974736, 0.0687850088878122, 0.0692835297177245,
0.0696907692580061, 0.0700454824794523, 0.0703864243528584,
0.0706277104270338, 0.0707098566767007, 0.0707112187115236,
0.0707101521411668, 0.0707850125752948, 0.0708833573425856,
0.0709155836751044, 0.0709043070375961, 0.0708721428948058,
0.0708417067114787, 0.070795102344694, 0.0707055967085618,
0.0705825498533734, 0.0704353218294197, 0.0702732726869918,
0.0700959115439309, 0.0698961835839182, 0.069673602956005,
0.0694276838092427, 0.0691579402926828, 0.0688638865553766,
0.0685450367463756, 0.0682009050147312, 0.0678310055094947
)), doy.series = list(c(55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118,
119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166,
167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178,
179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190,
191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202,
203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213), c(55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124,
125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136,
137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160,
161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172,
173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184,
185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196,
197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208,
209, 210, 211, 212, 213), c(55, 56, 57, 58, 59, 60, 61, 62,
63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141,
142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153,
154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165,
166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177,
178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189,
190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201,
202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213
)), year = c("2000", "2000", "2000"), geometry = structure(list(
structure(c(-164.047259999849, -164.044659999559, -164.044719999628,
-164.038089999654, -164.028189999968, -164.019179999957,
-164.005899999985, -164.004819999643, -164.006060000169,
-164.01439999986, -164.020739999951, 62.9589599997043,
62.9570600002189, 62.9551799998571, 62.9500200002229,
62.9453699998257, 62.9321099998767, 62.9228599995894,
62.9198900002234, 62.9182900001834, 62.9161899995689,
62.9119300000695), .Dim = c(11L, 2L), class = c("XY",
"LINESTRING", "sfg")), structure(c(-163.950299999945,
-163.929679999632, -163.91427000036, -163.903839999616,
-163.892950000142, -163.874760000374, -163.857260000049,
-163.83827000026, -163.831219999803, -163.826049999708,
-163.831939999731, -163.830590000428, -163.822, -163.815322687912,
62.7265500001824, 62.7286899999436, 62.7327399996513,
62.7292899997337, 62.7222099996918, 62.7222000001299,
62.7196300003243, 62.7251300003493, 62.7253400001409,
62.7224699999905, 62.7144400002059, 62.7114699999406,
62.7062799998222, 62.7052201090963), .Dim = c(14L, 2L
), class = c("XY", "LINESTRING", "sfg")), structure(c(-163.815322687912,
-163.798689999744, -163.782269999761, -163.768690000343,
-163.762120000438, -163.757980000177, -163.754040000146,
-163.750479999652, -163.741150000172, -163.731440000256,
-163.727959999854, -163.716170000245, -163.707080000142,
-163.69419999973, -163.687290000333, -163.670841577631,
62.7052201090963, 62.7025800000671, 62.7027099997667,
62.7047399998511, 62.7076500000475, 62.7154200004327,
62.7186199996133, 62.7195300002094, 62.718240000076,
62.7119499995929, 62.7111100004263, 62.7123500000526,
62.71900000005, 62.7184800003518, 62.7155900001783, 62.7051667771758
), .Dim = c(16L, 2L), class = c("XY", "LINESTRING", "sfg"
))), class = c("sfc_LINESTRING", "sfc"), precision = 0, bbox = structure(c(-164.047259999849,
62.7025800000671, -163.670841577631, 62.9589599997043), .Names = c("xmin",
"ymin", "xmax", "ymax"), class = "bbox"), crs = structure(list(
epsg = 4326L, proj4string = "+proj=longlat +datum=WGS84 +no_defs"), .Names = c("epsg",
"proj4string"), class = "crs"), n_empty = 0L)), .Names = c("id",
"doy", "smooth.series", "doy.series", "year", "geometry"), row.names = c(NA,
3L), class = c("sf", "data.frame"), sf_column = "geometry", agr = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Names = c("id",
"doy", "smooth.series", "doy.series", "year"), .Label = c("constant",
"aggregate", "identity"), class = "factor"))
With the small example you gave, I only added select(-geometry) :
library(tidyverse)
df3 <- df %>%
select(-geometry) %>%
unnest()
df3 %>%
ggplot(aes(x = doy.series, y = smooth.series, color = doy, group = doy)) +
geom_line() +
scale_color_gradient(low = "blue", high = "red")
I have tried for several days to just flip a dendrogram so that the last gene is the first in the figure and the first the last. But even when I have managed to move leaves around the internal ordering is not the same. Here is my script:
cluster.hosts <- read.table("Norm_0_to1_heatmap.txt", header = TRUE, sep="", quote="/", row.names = 1)
# A table with 8 columnns and 229 rows cirresponding to gene expression
hosts.dist <- dist(cluster.hosts, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
hc <- hclust(hosts.dist, method = "average")
dd <- as.dendrogram(hc)
order.dendrogram(dd)
X11()
par(cex=0.5,font=3)
plot(dd, main="Dendrogram of Syn9 genes")
order.dd <- order.dendrogram(dd) #the numbers in the order indicate the position of the gene in the original table
#Then I generate a vector with the opposed order to the one obtained
y <- c(206, 204, 210, 209, 213, 212, 211, 207, 208, 94, 199, 192, 195, 198, 193, 201, 203, 200, 185, 61, 191, 190, 197, 189, 188, 196, 187, 215, 214, 202, 217, 220, 219, 218, 95, 180, 179, 181, 182, 186, 178, 132, 133, 122, 66, 65, 64, 58, 91, 88, 92, 89, 62, 184, 103, 128, 127, 229, 231, 230, 148, 63, 228, 116, 134, 104, 221, 78, 20, 232, 160, 159, 225, 112, 167, 164, 166, 140, 222, 51, 149, 227, 79, 68, 90, 131, 130, 136, 135, 105, 147, 172, 150, 176, 175, 174, 177, 152, 151, 165, 137, 168, 163, 52, 146, 141, 145, 82, 81, 56, 161, 120, 144, 129, 84, 1, 173, 143, 142, 86, 85, 83, 194, 183, 111, 55, 53, 54, 224, 171, 170, 223, 169, 93, 59, 60, 123, 121, 124, 87, 125, 226, 3, 158, 47, 10, 162, 138, 139, 154, 153, 119, 118, 117, 106, 80, 45, 70, 69, 126, 205, 77, 67, 19, 102, 46, 13, 108, 107, 109, 72, 71, 73, 23, 22, 25, 57, 48, 216, 155, 29, 24, 101, 35, 113, 115, 36, 37, 114, 110, 2, 14, 6, 16, 15, 17, 18, 74, 31, 30, 76, 12, 75, 8, 11, 5, 7, 99, 98, 100, 39, 38, 33, 32, 97, 96, 49, 44, 34, 50, 156, 26, 157, 42, 41, 43, 4, 28, 27, 9, 40, 21)
rx <- reorder(dd, y, agglo.FUN=mean)
order.rx <- order.dendrogram(rx)
write(order.rx, file="order_hosts_rx.txt", sep="\t")
write(labels(rx), file="labels_order_hosts_rx.txt", sep="\t")
X11()
par(cex=0.5)
plot(rx, main="Dendrogram of Syn9 genes")
I guess it has something to do with the heights of the leaves but I just want to flip the dendrogram...
Thanks in advance!
Miguel
You can use rev(dd); rev.dendrogram simply returns the dendrogram with reversed nodes:
hc <- hclust(dist(USArrests), "ave")
dd <- as.dendrogram(hc)
plot(dd)
plot(rev(dd))