ggplot2 and CSV "inventing" data that isn't in my input - r

I'm attempting to produce an attractive graph of bandwidth data across a number of machines and tests. My attempts seem to work for small manually entered amounts of data, but when I feed the "full" 1773 entries, I get results in my graph that don't seem to exist in the input data.
I believe this is likely because the different tests are each of different duration, but I can't seem to prove this. If I use the following input data as csv (sorry, off-site because of size) I end up with a strange upwards-curve on my geom_smooth line, and additional data points that I can't actually see in my .csv input data. (I have much more data in real life, this is a subset that produces the strange behaviour)
I would expect the first four tries (try01-try04) to flat-line at zero, and try05 to carry on at around 1GBit/sec. Here's my code
library("ggplot2")
library("RColorBrewer")
speed = read.csv(file="data.csv")
svg("all_results.svg",width=24)
ggplot(speed,
aes(x = Second, y = Bandwidth, group=Test, colour=Test)) +
scale_fill_brewer(palette="Paired") +
geom_point() +
geom_smooth()
dev.off()
Here's the image produced
#Gregor seems to be exactly right in that the seconds are interpreted as text, when they should represent the number of the seconds since the start of that test.
Here's some example input data - please note the times are not always on a .00 second boundary due to the output of iperf.
structure(list(Machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "valhalla", class = "factor"),
User = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "alice", class = "factor"),
Test = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "try01", class = "factor"),
Second = structure(c(1L, 2L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("0.00-1.00",
"1.00-2.00", "10.00-11.00", "11.00-12.00", "12.00-13.00",
"13.00-14.00", "14.00-15.00", "15.00-16.00", "16.00-17.00",
"17.00-18.00", "18.00-19.00", "19.00-20.00", "2.00-3.00",
"3.00-4.00", "4.00-5.00", "5.00-6.00", "6.00-7.00", "7.00-8.00",
"8.00-9.00", "9.00-10.00"), class = "factor"), Bandwidth = c(937,
943, 944, 943, 943, 943, 943, 944, 658, 943, 944, 943, 944,
644, 943, 943, 943, 944, 943, 943)), row.names = c(NA, 20L
), class = "data.frame")
I'll try casting (or whatever R calls it) those to a float now.

Points have a single x value, not a range of x-values, so we'll separate your Second column into beginning and end of the interval and plot the points at the beginning. Calling your data dd"
library(tidyr)
library(dplyr)
dd = dd %>%
separate(Second, into = c("sec_start", "sec_end"), sep = "-", remove = FALSE) %>%
mutate(sec_start = as.numeric(sec_start),
sec_end = as.numeric(sec_end))
After that the plotting should go just fine if you put sec_start or sec_end on the x-axis. (Or calculate the middle, whatever you want...)
If you want to visualize the durations, you could use geom_segment and aes(x = sec_start, xend = sec_end, y = Bandwidth, yend = Bandwidth), but since everything is just about the same duration, it doesn't seem like this would add much value.

Related

how to use capscale {vegan} r

I have used
?capscale
and scoured the internet for answers, but am still not understanding what should be the dependent variable for my dataset if I want to use
capscale()
to analyze my data.
I have NDVI data with some continuous and some categorical variables:
ndvi=structure(list(siteOID = 25840:25939, Elevation = c(1871.92,
1875.38, 1878.28, 1878.54, 1878.33, 1879.2, 1880.51, 1883.78,
1884.6, 1884.85, 1885.46, 1888.72, 1890.94, 1897.19, 1901.95,
1902.47, 1902.81, 1903.49, 1906.62, 1908.73, 1909.4, 1910.65,
1913.44, 1915, 1915.81, 1918.06, 1920.01, 1921.53, 1925.48, 1926.66,
1927.64, 1931.02, 1932.8, 1935.27, 1938.33, 1941.19, 1945.71,
1948.68, 1951.52, 1951.83, 1955.76, 1961.02, 1963.92, 1963.25,
1969.53, 1972.56, 1977.92, 1978.93, 1981.54, 1985.64, 1987.6,
1987.62, 1988.78, 1991.92, 1997.03, 1998.06, 1998.98, 2001.26,
2006.97, 2009.56, 2009.81, 2011.55, 2017.92, 2021.75, 2023.42,
2024.91, 2028.15, 2032.83, 2032.83, 2033.5, 2035.75, 2037.44,
2045.51, 2047.38, 2049.85, 2052.33, 2059.36, 2069.27, 2071.41,
2071.83, 2074.15, 2081.55, 2083.52, 2086.3, 2090.5, 2095.57,
2096.69, 2100.65, 2108.06, 2110.48, 2113.45, 2121.78, 2124.82,
2133.54, 2137.54, 2146.43, 2150.53, 2156.63, 2160.05, 2168.57
), Shape_Area = c(2940.395887, 5105.447128, 3763.362181, 2801.775054,
3854.690283, 4627.01632, 6863.6264, 5452.724569, 3504.284818,
3967.710707, 7004.963815, 3926.00215, 7645.532158, 6306.085153,
3451.101972, 4699.688114, 3880.378241, 4792.898829, 5542.142348,
3674.957345, 3562.897792, 3219.790167, 5369.915585, 3854.684578,
3737.522732, 5190.103216, 5137.457907, 4753.975071, 3605.727759,
4682.430962, 3412.007599, 4955.96479, 0, 5106.057222, 3026.454348,
6814.973732, 5422.439336, 4523.077568, 3092.711952, 2667.1801,
2318.487235, 1623.008863, 2672.648264, 2524.245809, 2164.660806,
3153.921959, 3170.875701, 3755.980623, 4505.277, 3954.724973,
3592.717424, 2877.927426, 3465.37684, 2317.185185, 3249.657309,
2710.26402, 3421.803771, 2556.020604, 3849.407062, 3782.797907,
1950.365079, 3522.797668, 2340.599897, 2451.029503, 3034.109721,
2873.167998, 2278.337947, 2546.02206, 3545.854694, 3514.69201,
2731.819076, 2537.618027, 2116.84627, 2213.553587, 4430.489625,
2648.387315, 4408.844477, 3453.225099, 2457.844425, 3597.718985,
3933.191433, 3280.424579, 2309.053402, 4062.750209, 2755.087578,
3785.974581, 3485.221528, 4698.642524, 3647.400111, 4512.594002,
4509.418612, 3908.621289, 5856.573472, 4084.254238, 4772.464487,
4587.251362, 3275.527576, 3236.108516, 4771.636048, 5241.064376
),slopemean = c(7.012740221, 6.374673005, 6.713881453, 6.219425964,
5.393005565, 5.567550724, 4.557037692, 5.122994391, 5.608577084,
5.054081163, 3.020378928, 3.535192937, 2.910682318, 2.262314184,
1.872473637, 2.04489899, 1.358906129, 1.738190173, 2.190473907,
2.285263883, 1.932403531, 1.318049102, 2.323188104, 2.838744229,
2.5508166, 3.662199524, 2.645026659, 2.691092801, 2.209619006,
2.360828268, 2.83633309, 2.917255029, 3.814524024, 3.594417877,
2.537033654, 2.758014447, 6.487904879, 6.546860137, 6.611400228,
6.548973659, 7.320545057, 7.167488849, 7.486095047, 6.736548642,
6.978404939, 6.209158245, 5.780635711, 5.952286865, 6.21757545,
6.026404989, 8.286706911, 5.013909823, 4.302618208, 5.958519395,
4.735497169, 6.86024694, 5.923437148, 4.814125561, 6.278868822,
6.369820399, 4.211901608, 5.067338774, 7.276210246, 9.342363631,
7.382804547, 7.026542905, 7.386944243, 6.993269548, 4.999933584,
5.386859906, 5.74222567, 6.407413812, 6.220262604, 6.361011563,
7.89187751, 7.504486516, 8.071826326, 7.282463079, 5.730589071,
6.75588336, 5.865557512, 5.567460529, 5.743696501, 6.234486916,
6.672290961, 4.424730467, 3.993647329, 5.934593258, 7.937450668,
8.264807165, 7.39251924, 7.862093222, 6.829388913, 7.447980573,
6.477102849, 6.185640762, 7.760704698, 8.44009344, 8.557933442,
7.60872553), avwidth = c(41.38533, 44.11806, 43.54585, 38.07962,
40.80878, 49.52246, 49.97194, 51.36124, 50.45419, 51.12577, 52.46919,
49.68379, 43.48322, 51.95128, 46.91944, 58.70265, 55.41018, 50.92463,
55.55058, 45.73485, 50.29035, 49.08618, 52.57013, 51.48199, 52.90921,
44.27491, 55.71036, 50.08104, 47.3439, 49.8397, 51.81409, 50.43767,
60.95491, 38.50229, 47.8118, 52.66532, 44.10194, 46.67934, 46.46481,
37.64217, 21.84973, 25.04575, 33.79403, 29.61029, 29.71018, 21.3549,
28.02716, 38.25882, 45.25996, 40.10562, 46.15768, 40.82856, 42.1975,
31.75748, 32.83316, 34.33412, 33.54285, 39.29999, 33.25312, 33.65804,
30.00087, 32.63515, 31.11767, 31.14068, 27.83876, 30.20586, 34.80735,
32.65111, 38.31069, 43.65983, 35.21719, 32.87317, 28.83573, 33.8517,
29.72621, 32.61762, 31.11199, 23.89315, 31.26606, 33.78306, 34.89358,
38.64512, 34.68206, 34.2003, 44.12035, 35.59922, 48.34063, 47.52268,
47.02729, 51.07513, 51.5254, 43.25953, 47.01821, 38.28714, 35.90366,
40.30569, 48.04857, 54.46596, 49.70541, 49.18992), watershed = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("marys", "reese", "wwalker"), class = "factor")), row.names = c(NA,
100L), class = "data.frame")
So, siteOID is each individual site and the categorical variable is watershed.
Now, I am trying to evaluate potential influences of NDVI in multivariate space, by conducting a Canonical Analysis of Principal Coordinates on a Gower similarity matrix of scaled habitat variables using the “capscale” function from the vegan package in R (Oksanen et al. 2013).
ndvi.cap <- capscale(siteOID~ Elevation + Shape_Area + avwidth + slopemean + watershed, ndvi,
dist="bray")
I don't understand how capscale formula is meant to be set up due to the lack of examples with actual explanations of what should be the dependent variable. The example in
?capscale
## Basic Analysis
vare.cap <- capscale(varespec ~ N + P + K + Condition(Al), varechem,
dist="bray")
vare.cap
plot(vare.cap)
anova(vare.cap)
uses two (!) different datasets, which does not make sense to me. Should I be putting the actual NDVI values as the dependent variable and not the site? I do have many NDVI-related variables associated with each site (actual NDVI value, differences between seasons, sens slope for trend), but I am not sure if a continuous variable should be listed as the dependent variable or not.
My question is similar to : Alternative example for capscale function in vegan package
but the answer given there did not help me.

Converting factor to integer [duplicate]

This question already has answers here:
R error "sum not meaningful for factors"
(1 answer)
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 3 years ago.
Converting factor to integer from a .csv using RStudio.
Hi, I know this question has been asked frequently but I've been trying to wrap my head around things for an hour with no success.
In my .csv file 'Weighted.average' is a calculation of Weighted.count/count (before conversion), but when I use the file in R it is a factor, despite being completely numeric (with decimal points).
I'm aiming to aggregate the data using Weighted.average's numeric values. But as it is still considered a factor it doesn't work. I'm newish to R so I'm having trouble converting other examples to my own.
Thanks
RENA <- read.csv('RENA.csv')
RENAVG <- aggregate(Weighted.average~Diet+DGRP.Line, data = RENA, FUN = sum)
ggplot(RENAVG, aes(x=DGRP.Line, y=Weighted.average, colour=Diet)) +
geom_point()
Expected to form a dot plot using Weighted.average, error
Error in Summary.factor(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
‘sum’ not meaningful for factors
occurs. I know it's due to it not being read as an integer, but I'm lost at how to convert.
Thanks
Output from dput
> dput(head(RENA))
structure(list(DGRP.Line = structure(c(19L, 19L, 19L, 19L, 20L,
20L), .Label = c("105a", "105b", "348", "354", "362a", "362b",
"391a", "391b", "392", "397", "405", "486a", "486b", "712", "721",
"737", "757a", "757b", "853", "879"), class = "factor"), Diet = structure(c(1L,
1L, 2L, 2L, 1L, 1L), .Label = c("Control", "Rena"), class = "factor"),
Sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Female",
"Male"), class = "factor"), Count = c(0L, 0L, 0L, 0L, 1L,
0L), Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("16/07/2019",
"17/07/2019", "18/07/2019", "19/07/2019", "20/07/2019", "21/07/2019",
"22/07/2019"), class = "factor"), Day = c(1L, 1L, 1L, 1L,
1L, 1L), Weighted.count = c(0L, 0L, 0L, 0L, 1L, 0L), Weighted.average = structure(c(60L,
59L, 52L, 63L, 44L, 36L), .Label = c("", "#DIV/0!", "1.8",
"1.818181818", "2", "2.275862069", "2.282608696", "2.478873239",
"2.635135135", "2.705882353", "2.824561404", "2.903614458",
"2.911392405", "2.917525773", "3", "3.034090909", "3.038461538",
"3.083333333", "3.119402985", "3.125", "3.154929577", "3.175438596",
"3.1875", "3.220338983", "3.254237288", "3.263157895", "3.314606742",
"3.341463415", "3.35", "3.435483871", "3.5", "3.6", "3.606557377",
"3.666666667", "3.6875", "3.694214876", "3.797619048", "3.813953488",
"3.833333333", "3.875", "3.909090909", "3.916666667", "4.045454545",
"4.047169811", "4.111111111", "4.333333333", "4.40625", "4.444444444",
"4.529411765", "4.617021277", "4.620689655", "4.666666667",
"4.714285714", "4.732283465", "4.821428571", "4.823529412",
"4.846153846", "4.851851852", "4.855263158", "4.884615385",
"4.956521739", "5", "5.115384615", "5.230769231", "5.343283582",
"5.45", "5.464285714", "5.484848485", "5.538461538", "5.551724138",
"5.970588235", "6", "6.2"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")
Just modify your first line (the read.csv) to specify the nature of each variable during the import.

Conditional updating coordinate column in dataframe

I am attempting to populate two newly empty columns in a data frame with data from other columns in the same data frame in different ways depending on if they are populated.
I am trying to populate the values of HIGH_PRCN_LAT and HIGH_PRCN_LON (previously called F_Lat and F_Lon) which represent the final latitudes and londitudes for those rows this will be based off the values of the other columns in the table.
Case 1: Lat/Lon2 are populated (like in IDs 1 & 2), using the great
circle algorithm a midpoint between them should be calculated and
then placed into F_Lat & F_Lon.
Case 2: Lat/Lon2 are empty, then the values of Lat/Lon1 should be put
into F_Lat and F_Lon (like with IDs 3 & 4).
My code is as follows but doesn't work (see previous versions, removed in an edit).
The preperatory code I am using is as follows:
incidents <- structure(list(id = 1:9, StartDate = structure(c(1L, 3L, 2L,
2L, 2L, 3L, 1L, 3L, 1L), .Label = c("02/02/2000 00:34", "02/09/2000 22:13",
"20/01/2000 14:11"), class = "factor"), EndDate = structure(1:9, .Label = c("02/04/2006 20:46",
"02/04/2006 22:38", "02/04/2006 23:21", "02/04/2006 23:59", "03/04/2006 20:12",
"03/04/2006 23:56", "04/04/2006 00:31", "07/04/2006 06:19", "07/04/2006 07:45"
), class = "factor"), Yr.Period = structure(c(1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 3L), .Label = c("2000 / 1", "2000 / 2", "2000 /3"
), class = "factor"), Description = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "ENGLISH TEXT", class = "factor"),
Location = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L
), .Label = c("Location 1", "Location 1 : Location 2"), class = "factor"),
Location.1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = "Location 1", class = "factor"), Postcode.1 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Postcode 1", class = "factor"),
Location.2 = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
1L), .Label = c("", "Location 2"), class = "factor"), Postcode.2 = structure(c(2L,
2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("", "Postcode 2"
), class = "factor"), Section = structure(c(2L, 2L, 3L, 1L,
4L, 4L, 2L, 1L, 4L), .Label = c("East", "North", "South",
"West"), class = "factor"), Weather.Category = structure(c(1L,
2L, 4L, 2L, 2L, 2L, 4L, 1L, 3L), .Label = c("Animals", "Food",
"Humans", "Weather"), class = "factor"), Minutes = c(13L,
55L, 5L, 5L, 5L, 522L, 1L, 11L, 22L), Cost = c(150L, 150L,
150L, 20L, 23L, 32L, 21L, 11L, 23L), Location.1.Lat = c(53.0506727,
53.8721035, 51.0233529, 53.8721035, 53.6988355, 53.4768766,
52.6874562, 51.6638245, 51.4301359), Location.1.Lon = c(-2.9991256,
-2.4004125, -3.0988341, -2.4004125, -1.3031529, -2.2298073,
-1.8023421, -0.3964916, 0.0213837), Location.2.Lat = c(52.7116187,
53.746791, NA, 53.746791, 53.6787167, 53.4527824, 52.5264907,
NA, NA), Location.2.Lon = c(-2.7493169, -2.4777984, NA, -2.4777984,
-1.489026, -2.1247029, -1.4645023, NA, NA)), class = "data.frame", row.names = c(NA, -9L))
#gpsColumns is used as the following line of code is used for several data frames.
gpsColumns <- c("HIGH_PRCN_LAT", "HIGH_PRCN_LON")
incidents [ , gpsColumns] <- NA
#create separate variable(?) containing a list of which rows are complete
ind <- complete.cases(incidents [,17])
#populate rows with a two Lat/Lons with great circle middle of both values
incidents [ind, c("HIGH_PRCN_LON_2","HIGH_PRCN_LAT_2")] <-
with(incidents [ind,,drop=FALSE],
do.call(rbind, geosphere::midPoint(cbind.data.frame(Location.1.Lon, Location.1.Lat), cbind.data.frame(Location.2.Lon, Location.2.Lat))))
#populate rows with one Lat/Lon with those values
incidents[!ind, c("HIGH_PRCN_LAT","HIGH_PRCN_LON")] <- incidents[!ind, c("Location.1.Lat","Location.1.Lon")]
I will use the geosphere::midPoint function based off a recommendation here: http://r.789695.n4.nabble.com/Midpoint-between-coordinates-td2299999.html.
Unfortunately, it doesn't appear that this way of populating the column will work when there are several cases.
The current error that is thrown is:
Error in `$<-.data.frame`(`*tmp*`, F_Lat, value = integer(0)) :
replacement has 0 rows, data has 178012
Edit: also posted to reddit: https://www.reddit.com/r/Rlanguage/comments/bdvavx/conditional_updating_column_in_dataframe/
Edit: Added clarity on the parts of the code I do not understand.
#replaces the F_Lat2/F_Lon2 columns in rows with a both sets of input coordinates
dataframe[ind, c("F_Lat2","F_Lon2")] <-
#I am unclear on what this means, specifically what the "with" function does and what "drop=FALSE" does and also why they were used in this case.
with(dataframe[ind,,drop=FALSE],
#I am unclear on what do.call and rbind are doing here, but the second half (geosphere onwards) is binding the Lats and Lons to make coordinates as inputs for the gcIntermediate function.
do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
cbind.data.frame(Lat2, Lon2), n = 1)))
Though your code doesn't work as-written for me, and I cannot calculate the same precise values your expect, I suspect the error your seeing can be fixed with these steps. (Data is down at the bottom here.)
Pre-populate the empty columns.
Pre-calculate the complete.cases step, it'll save time.
Use cbind.data.frame for inside gcIntermediate.
I'm inferring from
gcIntermediate([dataframe...
^
this is an error in R
that you are binding those columns together, so I'll use cbind.data.frame. (Using cbind itself produced some ignorable warnings from geosphere, so you can use it instead and perhaps suppressWarnings, but that function is a little strong in that it'll mask other warnings as well.)
Also, since it appears you want one intermediate value for each pair of coordinates, I added the gcIntermediate(..., n=1) argument.
The use of do.call(rbind, ...) is because gcIntermediate returns a list, so we need to bring them together.
dataframe$F_Lon2 <- dataframe$F_Lat2 <- NA_real_
ind <- complete.cases(dataframe[,4])
dataframe[ind, c("F_Lat2","F_Lon2")] <-
with(dataframe[ind,,drop=FALSE],
do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
cbind.data.frame(Lat2, Lon2), n = 1)))
dataframe[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]
dataframe
# ID Lat1 Lon1 Lat2 Lon2 F_Lat F_Lon F_Lat2 F_Lon2
# 1 1 19.05067 -3.999126 92.71332 -6.759169 55.88200 -5.379147 55.78466 -6.709509
# 2 2 58.87210 -1.400413 54.74679 -4.479840 56.80945 -2.940126 56.81230 -2.942029
# 3 3 33.02335 -5.098834 NA NA 33.02335 -5.098834 33.02335 -5.098834
# 4 4 54.87210 -4.400412 NA NA 54.87210 -4.400412 54.87210 -4.400412
Update, using your new incidents data and switching to geosphere::midPoint.
Try this:
incidents$F_Lon2 <- incidents$F_Lat2 <- NA_real_
ind <- complete.cases(incidents[,4])
incidents[ind, c("F_Lat2","F_Lon2")] <-
with(incidents[ind,,drop=FALSE],
geosphere::midPoint(cbind.data.frame(Location.1.Lat,Location.1.Lon),
cbind.data.frame(Location.2.Lat,Location.2.Lon)))
incidents[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]
One (big) difference is that geosphere::gcIntermediate(..., n=1) returns a list of results, whereas geosphere::midPoint(...) (no n=) returns just a matrix, so no rbinding required.
Data:
dataframe <- read.table(header=T, stringsAsFactors=F, text="
ID Lat1 Lon1 Lat2 Lon2 F_Lat F_Lon
1 19.0506727 -3.9991256 92.713318 -6.759169 55.88199535 -5.3791473
2 58.8721035 -1.4004125 54.746791 -4.47984 56.80944725 -2.94012625
3 33.0233529 -5.0988341 NA NA 33.0233529 -5.0988341
4 54.8721035 -4.4004125 NA NA 54.8721035 -4.4004125")

Error in r.squaredGLMM()

I am constructing GLMMs (using glmer() of "lme4" R package) and sometimes I get an error when estimating R2 values (using r.squaredGLMM() from "MuMIn" package).
The model I am trying to fit is simmilar to this one:
library(lme4)
lmA <- glmer(x~y+(1|w)+(1|w/k), data = data1, family = binomial(link="logit"))
Then, to estime R2, I use:
library(MuMIn)
r.squaredGLMM(lmA)
And I get this:
The result is correct only if all data used by the model has not changed since model was fitted. Error in .rsqGLMM(fam = family(x),
varFx = var(fxpred), varRe = varRe, : 'names' attribute [2] must be the same length as the vector [0]
Do you have any idea why this error appears? For instance, If I use only a single random factor (in this case, (1|w)) this error does not appear.
Here is my dataset:
data1 <-
structure(list(w = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
1L, 2L, 1L), .Label = c("CA", "CB"), class = "factor"), k = structure(c(4L,
4L, 3L, 3L, 3L, 4L, 1L, 3L, 2L, 3L, 2L), .Label = c("CAF01-CAM01",
"CAM01", "CBF01-CBM01", "CBM01"), class = "factor"), x = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L), y = c(-0.034973549,
0.671720643, 4.557044729, 5.347170897, 2.634240583, -0.555740207,
4.118277809, 2.599825716, 0.95853864, 4.327804344, 0.057331718
)), .Names = c("w", "k", "x", "y"), class = "data.frame", row.names = c(NA,
-11L))
Any thoughts?
This was a bug that has been fixed in version >= 1.15.8 (soon on CRAN, currently on R-Forge).

Increasing size of circles in ggplot2 graphs [duplicate]

This question already has answers here:
How to increase size of the points in ggplot2, similar to cex in base plots?
(2 answers)
Closed 8 years ago.
I want to increase the scale of circles in ggplot2. I tried something like this aes(size=100*n) but it did not work for me.
df <-
structure(list(Logit = c(-2.9842723737754, 1.49511606166294,
-2.41756623714116, -2.96160412831003, -2.12996384688938, -1.61751836789074,
-0.454353048358851, 0.9284099250287, -0.144082412641708, -2.30422500981431,
-0.658367257547178, 0.082600042011989, -0.318343575566633, -0.717447827238429,
-1.0508122312565, -2.82559465551781, 0.361703788394458, -1.85086010050691,
-0.0916611209129359, -0.740116072703798, 0.0599317965466193,
-0.370764867295404, -0.703703748477917, -0.749040239408657, -2.7575899191217,
-2.51532401980067, 1.38177483433609, 1.47244781619757, -0.205002348239784,
0.135021333740761), PRes = c(-0.661648371860934, 1.63444424896772,
-0.30348016008728, -0.230651042355737, 1.07487559116003, -0.460143991337599,
-0.823052248365889, -0.999903730870253, -0.959022180953211, -0.321344960297977,
-1.40881799070885, -0.674754839222841, 0.239931843185434, -1.81660411888874,
0.830318780187542, -0.24702802619469, 0.692695708496924, -0.40412065378683,
-0.977640032689132, -0.715192962242284, -1.06270128658429, -0.856103053117159,
-0.731162073769824, 1.51334938767359, 4.02946801536109, 3.56902361409375,
0.505952430753934, 0.483660641952208, 1.13712619443209, 0.951889504154342
), n = c(7L, 38L, 1L, 1L, 11L, 1L, 1L, 4L, 1L, 1L, 3L, 9L, 2L,
8L, 2L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L)), .Names = c("Logit", "PRes", "n"), row.names = c(NA, -30L
), class = "data.frame")
library(ggplot2)
ggplot(data=df, mapping=aes(x=Logit, y=PRes, label=rownames(df))) +
geom_point(aes(size=n), shape=1, color="black") +
geom_text() +
theme_bw() +
theme(legend.position="none")
Simply add a scale for size:
+ scale_size_continuous(range = c(10, 15))

Resources