I'm looking to create some 3d surface plots with some data in R. I've got a large dataframe of many variables, but I'm only interested in displaying 3 (the distance, the year, and the mortality rate). I've been trying to use the surface plot library for a day or two now, and I can't get anything to display. I can get the other libraries to work. Here's the code I'm working with at the moment -
cold3Dplot <- plot_ly(showscale = FALSE) %>%
add_surface(z = ~as.matrix(filtered$distance,2014,filtered$RelativeDeaths.2014),opacity = 0.98) %>%
add_surface(z = ~as.matrix(filtered$distance,2015,filtered$RelativeDeaths.2015),opacity = 0.98) %>%
add_surface(z = ~as.matrix(filtered$distance,2016,filtered$RelativeDeaths.2016),opacity = 0.98)%>%
add_surface(z = ~as.matrix(filtered$distance,2017,filtered$RelativeDeaths.2017),opacity = 0.98)
cold3Dplot
This is what my filtered data frame looks like:
datazone label code shape_area shape_leng stdareakm2 stdareaha hhcnt2011 respop2011 totpop2011
561 S01010318 S12000046S01010318 S01010318 128009.62 2950.044 0.128009 12.800961 304 505 1588
562 S01010319 S12000046S01010319 S01010319 96160.88 2282.814 0.096161 9.616090 407 711 711
599 S01010356 S12000046S01010356 S01010356 286149.89 4874.812 0.286150 28.614989 308 682 682
606 S01010363 S12000046S01010363 S01010363 157714.24 3083.966 0.157714 15.771425 394 758 758
563 S01010320 S12000046S01010320 S01010320 73061.83 1544.579 0.073063 7.306183 437 670 675
709 S01010466 S12000046S01010466 S01010466 529203.94 4393.694 0.529203 52.920395 458 879 879
name Feature.Name RelativeDeaths.2014 RelativeDeaths.2015 RelativeDeaths.2016 RelativeDeaths.2017
561 Ruchill - 04 NA -5 -5 -3 -4
562 Ruchill - 05 NA -6 -5 -6 -6
599 Maryhill West - 03 NA -5 -6 -2 -9
606 Kelvindale - 01 NA -2 -2 -5 -5
563 Ruchill - 06 NA -4 -6 -3 -6
709 Anniesland East - 02 NA -5 -7 -3 0
AverageRelativeDeaths distance lon lat ID OBJECTID Name TotPop2011 ResPop2011 HHCnt2011
561 -4 30.44933 -4.278552 55.88277 24 3813 Ruchill - 04 1588 505 304
562 -6 32.69077 -4.280502 55.88595 23 3814 Ruchill - 05 711 711 407
599 -6 50.42398 -4.287960 55.89284 32 3851 Maryhill West - 03 682 682 308
606 -4 105.39898 -4.309206 55.89306 10 3858 Kelvindale - 01 758 758 394
563 -5 149.88008 -4.282008 55.88644 23 3815 Ruchill - 06 675 670 437
709 -4 154.92867 -4.319477 55.89498 72 3961 Anniesland East - 02 879 879 458
Easting Northing coords.x1 coords.x2 optional
561 257542 667848 -4.278992 55.88265 TRUE
562 257447 668188 -4.280688 55.88567 TRUE
599 256994 669045 -4.288377 55.89323 TRUE
606 255670 668966 -4.309488 55.89213 TRUE
563 257256 668176 -4.283733 55.88551 TRUE
709 254973 669173 -4.320734 55.89378 TRUE
But like I said, I'm only interested in using the Relative Deaths, distance and year from it.
What am I doing wrong exactly? I'd like to get something to show at least and then I can see if I'm going in the right direction or not.
Cheers!
Related
Here is what I am trying to get into a bar chart
Daysoutfromelection 2018 2020 2022
-48 1 0 0
-47 1 0 0
-46 6 0 0
-45 29 9 3
-44 55 26 3
-43 90 63 84
-42 120 77 250
-41 141 105 356
-40 163 185 422
-39 187 201 487
-38 193 204 487
-37 208 212 487
-36 238 229 644
-35 265 237 745
-34 286 272 802
-33 309 336 881
-32 335 518 951
-31 342 564 953
-30 356 585 953
-29 389 623 1150
-28 423 705 1264
-27 1374 3262 1771
-26 1698 5347 2225
-25 1974 8006 3818
-24 1986 10649 4896
-23 2006 11011 7313
-22 15285 36989 27733
-21 114431 142780 52288
-20 190945 209455 165335
-19 262430 407002 253878
-18 337650 603601 324661
-17 398354 652945 328549
-16 402512 717160 328748
-15 456827 859288 460281
-14 587248 1045476 594185
-13 737361 1228168 708165
-12 820264 1361718 804466
-11 888008 1482940 894919
-10 954156 1553120 949524
-9 960346 1613982 949552
-8 1016452 1742538 1042852
-7 1132337 1956437 1203929
-6 1234037 2116439 1350648
-5 1392544 2278454 1425982
-4 1487850 2390960 1499179
-3 1514435 2420439 1525419
-2 1514571 2420789 1549176
-1 1528274 2510411 1579877
0 1530403 2810924 1596362
1 1562526 2880302 1865700
2 1563402 2887366 1884284
3 1563403 2887572 1888305
4 1563405 2887579 1888305
5 1563405 2887579 1888308
Here is a picture in case you cannot see that properly
I am having major trouble getting all three columns to appear. I would like there to be 3 x values all next to eachother on a bar chart, with the y showcasing total votes so that you can compare which year had the most votes. I only have dplyr, ggplot2, tidyr, stringr, and tidyverse.
I've tried it like
viz2 <- data2 %>%
ggplot(aes(x = Daysoutfromelection, y = '2018'))+
geom_bar()
print(viz2)
But I have also switched around the x & y, tried using geom_line.
and also some other ways. Any help would be greatly appreciated in getting this data to properly appear
You have to reshape your data first using e.g. tidyr::pivot_longer so that your years become categories of a new column which I name year and the values go into a new column called value. Afterwards you could create your bar chart where (as you already have a value column) you have to use geom_col. Additionally I added a position?"dodge" to put the bars side by side:
library(ggplot2)
library(tidyr)
dat <- data2 %>%
tidyr::pivot_longer(-Daysoutfromelection, names_to = "year", values_to = "value")
ggplot(dat, aes(x = Daysoutfromelection, y = value, fill = year)) +
geom_col(position = "dodge")
However, IMHO a line chart using geom_line is more appropriate for your data:
ggplot(dat, aes(x = Daysoutfromelection, y = value, color = year)) +
geom_line()
I would like estimate the parameters of the Gompert-Makeham distribution, but I haven't got a result.
I would like a method in R, like this Weibull parameter estimation code:
weibull_loglik <- function(parm){
gamma <- parm[1]
lambda <- parm[2]
loglik <- sum(dweibull(vec, shape=gamma, scale=lambda, log=TRUE))
return(-loglik)
}
weibull <- nlm(weibull_loglik,parm<-c(1,1), hessian = TRUE, iterlim=100)
weibull$estimate
c=weibull$estimate[1];b=weibull$estimate[2]
My data:
[1] 872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
[16] 20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
[31] 50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
[46] 374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
[61] 894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
[76] 2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
[91] 2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
[106] 12 6 3 2 1 1
summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 32.0 314.0 900.9 1355.0 4054.0
It would be nice to have a reproducible example, but something like:
library(bbmle)
library(eha)
set.seed(101)
vec <- rmakeham(1000, shape = c(2,3), scale = 2)
dmwrap <- function(x, shape1, shape2, scale, log) {
res <- try(dmakeham(x, c(shape1, shape2), scale, log = log), silent = TRUE)
if (inherits(res, "try-error")) return(NA)
res
}
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=1),
data = data.frame(y = vec),
method = "Nelder-Mead"
)
Define a wrapper that (1) takes shape parameters as separate values; (2) returns NA rather than throwing an error when e.g. parameters are negative
Use Nelder-Mead rather than default BFGS for robustness
the fitdistrplus package might help too
if you're going to do a lot of this it may help to fit parameters on the log scale (i.e. use parameters logshape1, etc., and use exp(logshape1) etc. in the fitting formula)
I had to work a little harder to fit your data; I scaled the variable by 1000 (and found that I could only compute the log-likelihood; the likelihood gave an error that I didn't bother trying to track down). Unfortunately, it doesn't look like a great fit (too many small values).
x <- scan(text = "872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
12 6 3 2 1 1")
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=10000),
data = data.frame(y = x/1000),
method = "Nelder-Mead"
)
cc <- as.list(coef(m1))
png("gm.png")
hist(x,breaks = 25, freq=FALSE)
with(cc,
curve(exp(dmwrap(x/1000, shape1, shape2, scale, log = TRUE))/1000, add = TRUE)
)
dev.off()
I am trying to write a function to calculate Williams %R on data in R. Here is my code:
getSymbols('AMD', src = 'yahoo', from = '2018-01-01')
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
-100 * ((highh - close) / (highh - lowl))
}
williampr = wr(AMD$AMD.High, AMD$AMD.Low, AMD$AMD.Close, n = 10)
After implementing a buy/sell/hold signal, it returns integer(0):
## 1 = BUY, 0 = HOLD, -1 = SELL
## implement Lag to shift the time back to the previous day
tradingSignal = Lag(
## if wpr is greater than 0.8, BUY
ifelse(Lag(williampr) > 0.8 & williampr < 0.8,1,
## if wpr signal is less than 0.2, SELL, else, HOLD
ifelse(Lag(williampr) > 0.2 & williampr < 0.2,-1,0)))
## make all missing values equal to 0
tradingSignal[is.na(tradingSignal)] = 0
## see how many SELL signals we have
which(tradingSignal == "-1")
What am I doing wrong?
It would have been a good idea to identify that you were using the package quantmod in your question.
There are two things preventing this from working.
You didn't inspect what you expected! Your results in williampr are all negative. Additionally, you multiplied the values by 100, so 80% is 80, not .8. I removed -100 *.
I have done the same thing so many times.
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
((highh - close) / (highh - lowl))
}
That's it. It works now.
which(tradingSignal == "-1")
# [1] 13 15 19 22 39 71 73 84 87 104 112 130 134 136 144 146 151 156 161 171 175
# [22] 179 217 230 255 268 288 305 307 316 346 358 380 386 404 449 458 463 468 488 492 494
# [43] 505 510 515 531 561 563 570 572 574 594 601 614 635 642 644 646 649 666 668 672 691
# [64] 696 698 719 729 733 739 746 784 807 819 828 856 861 872 877 896 900 922 940 954 968
# [85] 972 978 984 986 1004 1035 1048 1060
I'm trying to pull down a subset of rows in a sqlite database using dplyr. Since slice doesn't work with tbl_sql objects, I'm using the window function row_number. But I get the following error:
Source: sqlite 3.8.6
[/Library/Frameworks/R.framework/Versions/3.2/Resources/library/dplyr/db/nycflights13.sqlite]
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: no such function: ROW_NUMBER
dplyr version 0.4.3.9000, RSQLite version 1.0.0. Reproducible example:
library(dplyr)
library(nycflights13)
flights_sqlite <- tbl(nycflights13_sqlite(), "flights")
filter(flights_sqlite, row_number(month) == 1L) %>% collect()
Probably there's a more efficient and faster way, but head seems to do the job.
To extract first n rows, for instance first 10 records:
head(flights_sqlite, 10) %>% collect()
Output:
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227 1400 5 17
2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227 1416 5 33
3 2013 1 1 542 2 923 33 AA N619AA 1141 JFK MIA 160 1089 5 42
4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183 1576 5 44
5 2013 1 1 554 -6 812 -25 DL N668DN 461 LGA ATL 116 762 5 54
6 2013 1 1 554 -4 740 12 UA N39463 1696 EWR ORD 150 719 5 54
7 2013 1 1 555 -5 913 19 B6 N516JB 507 EWR FLL 158 1065 5 55
8 2013 1 1 557 -3 709 -14 EV N829AS 5708 LGA IAD 53 229 5 57
9 2013 1 1 557 -3 838 -8 B6 N593JB 79 JFK MCO 140 944 5 57
10 2013 1 1 558 -2 753 8 AA N3ALAA 301 LGA ORD 138 733 5 58
A percentage of the first rows
head(flights_sqlite, nrow(flights_sqlite)*0.1) %>% collect()
To subset any specific number of rows. For instance rows 578 and 579:
head(flights_sqlite, nrow(flights_sqlite))[578:579, ] %>% collect()
Output:
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
578 2013 1 1 1701 -9 2026 11 AA N3FUAA 695 JFK AUS 247 1521 17 1
579 2013 1 1 1701 1 1856 16 UA N418UA 689 LGA ORD 144 733 17 1
I'm doing a particular operation quite a bit, and I need help generalizing it.
I have a lot of data that "looks" kind of like this:
> hflights::hflights %>% tbl_df %>% mutate(month=Month, carrier=UniqueCarrier) %>%
group_by(month, carrier) %>% summarize(delay=sum(ArrDelay, na.rm=T)) %>%
dcast(month ~ carrier)
month AA AS B6 CO DL EV F9 FL MQ OO UA US WN XE YV
1 1 18 296 229 27031 1026 1337 851 216 2322 3957 -219 -1068 31701 24248 NA
2 2 461 249 802 15769 1657 730 707 1079 4283 11486 323 -663 36729 27861 -44
3 3 317 476 1037 49061 905 2529 673 1111 2524 12955 1665 -606 28758 50702 -38
4 4 1147 465 518 52086 1856 4483 515 927 5085 17439 1803 -711 47084 69590 260
5 5 1272 56 654 63413 1381 3563 1334 1213 7899 22190 1798 1627 73771 66972 18
6 6 -262 172 504 60042 3736 2618 744 983 4519 21652 6260 2140 40191 66456 49
7 7 -460 112 1241 41300 2868 1628 321 506 1529 23432 2780 497 21200 98484 34
8 8 -1417 59 1659 36106 -949 808 42 -1366 310 11038 3546 -84 6991 33554 34
9 9 -841 -364 -202 24857 1022 -424 151 -747 -1373 4502 1743 248 15592 31846 NA
10 10 215 -112 -45 26437 1082 -1005 277 -537 522 13 1833 -1878 14725 27539 NA
11 11 97 -5 -72 20339 -101 207 180 449 2286 2628 230 -1093 8424 24199 NA
12 12 2287 -242 310 6644 1281 -1082 585 79 2311 5900 -491 -951 12735 65269 NA
There are positive and negative values with some groups; in this case, month & carrier. I can plot it like this:
> hflights::hflights %>% tbl_df %>% mutate(month=Month, carrier=UniqueCarrier) %>%
group_by(month, carrier) %>% summarize(delay=mean(ArrDelay, na.rm=T)) %>%
ggplot(aes(x=month, y=delay, fill=carrier)) + geom_bar(stat='identity')
Which gives me an eye-bleedy chart like this:
It also gives me the message:
Warning message:
Stacking not well defined when ymin != 0
This message is kind of what I'm after. I want to separate positive from negative so that I can see the "gross" amount, and also generate the sum per group and show the "net" amount.
For this dataset, I can do that like so:
> df <- hflights::hflights %>% tbl_df %>%
mutate(month=Month, carrier=UniqueCarrier) %>%
group_by(month, carrier) %>% summarize(delay=mean(ArrDelay, na.rm=T))
> ggplot(NULL, aes(x=month, y=delay, fill=carrier)) +
geom_bar(data=df %>% filter(delay > 0), stat='identity') +
geom_bar(data=df %>% filter(delay < 0), stat='identity') +
geom_bar(data=df %>% group_by(month) %>% summarize(delay=sum(delay, na.rm=T)), fill='black', width=0.25, alpha=0.5, stat='identity')
Which gives me this chestnut:
This is much nicer because in September, it doesn't do netting so I get a better sense of the magnitude of the positives and the negatives.
However, the above only works for this dataset. What happens when I have different groups? How do I generalize this?
Adding position = "identity" to geom_bar should get rid of the warning you are getting in your first plot.
The reason for this warning is related to interpreting that bars have negative height instead of just negative values.