R Points in Polygon - r

Was wondering if you could help me with the following. I am trying to calculate the amount of points that fall within each polygon US state. There are 52 states total. The point data and the polygon data are both in the same transformation.
I can run the function:
over(Transformed.States, clip.points)
Which returns:
0 1 2 3 4 5 6 7 8 9 10
4718 NA 488 2688 4454 3762 2041 NA 5 NA 3620
11 12 13 14 15 16 17 18 19 20 21
412 3042 2028 3390 2755 4250 3275 2484 466 4255 1
22 23 24 25 26 27 28 29 30 31 32
3238 744 4125 2926 927 495 3541 4640 3039 895 620
33 34 35 36 37 38 39 40 41 42 43
4069 4671 3801 1012 4023 626 1158 4627 217 13 4055
44 45 46 47 48 49 50 51
573 3456 NA 4670 4505 903 4172 4641
However, I want to write this function so that each polygon is given a value based on the amount of points in the polygon that can then be plotted such as:
plot(points.in.state)
What would be the best function to go about this? So that I still have polygon data but with the new point in polygons data attached?
The end goal of this is to make a graduated symbol map for each state based on the values for points in each state.
Thanks!
Jim

Related

Putting several rows into one column in R

I am trying to run a time series analysis on the following data set:
Year 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780
Number 101 82 66 35 31 7 20 92 154 125
Year 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790
Number 85 68 38 23 10 24 83 132 131 118
Year 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800
Number 90 67 60 47 41 21 16 6 4 7
Year 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810
Number 14 34 45 43 48 42 28 10 8 2
Year 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820
Number 0 1 5 12 14 35 46 41 30 24
Year 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830
Number 16 7 4 2 8 17 36 50 62 67
Year 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840
Number 71 48 28 8 13 57 122 138 103 86
Year 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850
Number 63 37 24 11 15 40 62 98 124 96
Year 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860
Number 66 64 54 39 21 7 4 23 55 94
Year 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870
Number 96 77 59 44 47 30 16 7 37 74
My problem is that the data is placed in multiple rows. I am trying to make two columns from the data. One for Year and one for Number, so that it is easily readable in R. I have tried
> library(tidyverse)
> sun.df = data.frame(sunspots)
> Year = filter(sun.df, sunspots == "Year")
to isolate the Year data, and it works, but I am unsure of how to then place it in a column.
Any suggestions?
Try this:
library(tidyverse)
df <- read_csv("test.csv",col_names = FALSE)
df
# A tibble: 6 x 4
# X1 X2 X3 X4
# <chr> <dbl> <dbl> <dbl>
# 1 Year 123 124 125
# 2 Number 1 2 3
# 3 Year 126 127 128
# 4 Number 4 5 6
# 5 Year 129 130 131
# 6 Number 7 8 9
# Removing first column and transpose it to get a dataframe of numbers
df_number <- as.data.frame(as.matrix(t(df[,-1])),row.names = FALSE)
df_number
# V1 V2 V3 V4 V5 V6
# 1 123 1 126 4 129 7
# 2 124 2 127 5 130 8
# 3 125 3 128 6 131 9
# Keep the first two column (V1,V2) and assign column names
df_new <- df_number[1:2]
colnames(df_new) <- c("Year","Number")
# Iterate and rbind with subsequent columns (2 by 2) to df_new
for(i in 1:((ncol(df_number) - 2 )/2)) {
df_mini <- df_number[(i*2+1):(i*2+2)]
colnames(df_mini) <- c("Year","Number")
df_new <- rbind(df_new,df_mini)
}
df_new
# Year Number
# 1 123 1
# 2 124 2
# 3 125 3
# 4 126 4
# 5 127 5
# 6 128 6
# 7 129 7
# 8 130 8
# 9 131 9

Predicting using an exponential model

I have the following data:
Days Total cases
1 3
2 3
3 5
4 6
5 28
6 30
7 31
8 34
9 39
10 48
11 63
12 70
13 82
14 91
15 107
16 112
17 127
18 146
19 171
20 198
21 258
22 334
23 403
24 497
25 571
26 657
27 730
28 883
29 1024
30 1139
31 1329
32 1635
33 2059
34 2545
35 3105
36 3684
37 4289
38 4778
39 5351
40 5916
41 6729
42 7600
43 8452
44 9210
45 10453
46 11484
47 12370
48 13431
49 14353
50 15724
51 17304
52 18543
53 20080
54 21372
I defined days as 'days' and total cases as 'cases1'. I run the following code:
exp.mod <- lm(log(cases1)~days)
I get a good model with reasonable residuals and p-value.
but when i run the following:
predict(exp.mod, data.frame(days=60))
I get the value of 11.66476, which doesnt seem to be correct.
I need to get the value and also include the predictive plot in the exponential model.
Hope that clarifies the issue.
you should consider the EST models from the forecast package.
Below an example.
library(dplyr)
library(forecast)
ausair %>% ets() %>% forecast() %>% autoplot()
I suggest you to check the free book of the Prof. Rob J Hyndman and Prof George Athanasopoulos wrote (are the authors of the forecast package).

how to fix “No appropriate likelihood could be inferred” for network meta-analysis in R?

I am currently learning Network meta-analysis in R with "gemtc",and "netmeta".
As I try to fit the GLM model for analysis, I encountered this error message " No appropriate likelihood could be inferred" .
My code are:
gemtc_network_numbers <-mtc.network(data.ab=diabetes_data,treatments=treatments)
mtcmodel<-mtc.model(network=gemtc_network_numbers,type="consistency",factor=2.5, n.chain=4, linearModel="random")
mtcresults <- mtc.run(mtcmodel, n.adapt = 20000, n.iter=100000, thin=10, sampler="rjags")
# View results summary
print(summary(mtcresults))
My data are:
> diabetes_data
study treatment responder samplesize
1 1 1 45 410
2 1 3 70 405
3 1 4 32 202
4 2 1 119 4096
5 2 4 154 3954
6 2 5 302 6766
7 3 2 1 196
8 3 5 8 196
9 4 1 138 2800
10 4 5 200 2826
11 5 3 799 7040
12 5 4 567 7072
13 6 1 337 5183
14 6 3 380 5230
15 7 2 163 2715
16 7 6 202 2721
17 8 1 449 2623
18 8 6 489 2646
19 9 5 29 416
20 9 6 20 424
21 10 4 177 4841
22 10 6 154 4870
23 11 3 86 3297
24 11 5 75 3272
25 12 1 102 2837
26 12 6 155 2883
27 13 4 136 2508
28 13 5 176 2511
29 14 3 665 8078
30 14 4 569 8098
31 15 2 242 4020
32 15 3 320 3979
33 16 3 37 1102
34 16 5 43 1081
35 16 6 34 2213
36 17 3 251 5059
37 17 4 216 5095
38 18 1 335 3432
39 18 6 399 3472
40 19 2 93 2167
41 19 6 115 2175
42 20 5 140 1631
43 20 6 118 1578
44 21 1 93 1970
45 21 3 97 1960
46 21 4 95 1965
47 22 2 690 5087
48 22 4 845 5074
Thanks for your help.
Angel
You have to solution :
1- Replace your responder variable by "responders" and your samplesize variable by "sampleSize".
or
2- Use for example : mtc.model(...,likelihood="poisson",link="log")).

R, correlation in p-values

quite new with R and spending lot of time to solve issues...
I have a big table(named mydata) containing more that 14k columns. this is a short view...
Latitude comp48109 comp48326 comp48827 comp49708 comp48407 comp48912
59.8 21 29 129 440 23 13
59.8 18 23 32 129 19 34
59.8 19 27 63 178 23 27
53.1 21 28 0 0 26 10
53.1 15 21 129 423 25 36
53.1 18 44 44 192 26 42
48.7 14 32 0 0 17 42
48.7 11 26 0 0 20 33
48.7 24 37 0 0 26 20
43.6 34 40 1 3 23 4
43.6 19 28 0 1 26 33
43.6 19 35 0 0 14 3
41.4 22 67 253 1322 15 4
41.4 44 39 0 0 11 14
41.4 24 41 63 174 12 4
39.5 21 45 102 291 12 17
39.5 17 26 69 300 16 79
39.5 13 46 151 526 14 14
Despite I manage to get the correlation scores for the first column ("Latitude") against the others with
corrScores <- cor(Latitude, mydata[2:14429])
I need to get a list of the p-values by applying the function cor.test(x, y,...)$p.value
How can I do that without getting the error 'x' and 'y' must have the same length?
You can use sapply:
sapply(mydata[-1], function(y) cor.test(mydata$Latitude, y)$p.value)
# comp48109 comp48326 comp48827 comp49708 comp48407 comp48912
# 0.331584624 0.020971913 0.663194866 0.544407919 0.005375973 0.656831836
Here, mydata[-1] means: All columns of mydata except the first one.

Generating Stacked bar plots

I have a dataframe with 3 columns
$x -- at http://pastebin.com/SGrRUJcA
$y -- at http://pastebin.com/fhn7A1rj
$z -- at http://pastebin.com/VmVvdHEE
that I wish to use to generate a stacked barplot. All of these columns hold integer data. The stacked barplot should have the levels along the x-axis and the data for each level along the y-axis. The stacks should then correspond to each of $x, $y and $z.
UPDATE: I now have the following:
counted <- data.frame(table(myDf$x),variable='x')
counted <- rbind(counted,data.frame(table(myDf$y),variable='y'))
counted <- rbind(counted,data.frame(table(myDf$z),variable='z'))
counted <- counted[counted$Var1!=0,] # to get rid of 0th level??
stackedBp <- ggplot(counted,aes(x=Var1,y=Freq,fill=variable))
stackedBp <- stackedBp+geom_bar(stat='identity')+scale_x_discrete('Levels')+scale_y_continuous('Frequency')
stackedBp
which generates:
.
Two issues remain:
the x-axis labeling is not correct. For some reason, it goes: 46, 47, 53, 54, 38, 40.... How can I order it naturally?
I also wish to remove the 0th label.
I've tried using +scale_x_discrete(breaks = 0:50, labels = 1:50) but this doesn't work.
NB. axis labeling issue: Dataframe column appears incorrectly sorted
Not completely sure what you're wanting to see... but reading ?barplot says the first argument, height must be a vector or matrix. So to fix your initial error:
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
barplot(as.matrix(myDf))
If you provide a reproducible example and a more specific description of your desired output you can get a better answer.
Or if I were to guess wildly (and use ggplot)...
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
myDf.counted<- data.frame(table(myDf$x),variable='x')
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$y),variable='y'))
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$z),variable='z'))
ggplot(myDf.counted,aes(x=Var1,y=Freq,fill=variable))+geom_bar(stat='identity')
I'm surprised that didn't blow up in your face. Cross-classifying the joint occurrence of three different vectors each of length 35204 would often consume many gigabytes of RAM (and would possibly create lots of useless 0's as you found). Maybe you wanted to examine instead the results of sapply(myDf, table)? This then creates three separate tables of counts.
It's a rather irregular result and would need further work to get it into a matrix form but you might want to consider using densityplot to display the comparative distributions which I think is your goal.
$x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
126 711 1059 2079 3070 2716 2745 3329 2916 2671 2349 2457 2055 1303 892 692
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
559 799 482 299 289 236 156 145 100 95 121 133 60 34 37 13
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
15 12 56 10 4 7 2 14 13 28 30 20 16 62 74 58
49 50
40 15
$y
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3069 32 1422 1376 1780 1556 1937 1844 1967 1699 1910 1924 1047 894 975 865
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
635 1002 710 908 979 848 678 908 696 491 417 412 499 411 421 217
32 33 34 35 36 37 39 42 46 47 53 54
265 182 121 47 38 11 2 2 1 1 1 4
$z
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
31 202 368 655 825 1246 900 1136 1098 1570 1613 1144 1107 1037 1239 1372
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1306 1085 843 867 813 1057 1213 1020 1210 939 725 644 617 602 739 584
32 33 34 35 36 37 38 39 40 41 42 43
650 733 756 681 684 657 544 416 220 48 7 1
The density plot is really simple to create in lattice:
densityplot( ~x+y+z, myDf)

Resources