I tried converting my data frame into polygon using the code from previous
post but I got an error message. Please I need assistance on how to fix this.
Thanks. Below is my code:
County MEDIAN_V latitude longitude RACE DRAG AGIP AGIP2 AGIP3
Akpa 18.7 13.637 46.048 3521875 140.1290323 55 19 5
Uopa 17.9 12.85 44.869 3980000 86.71929825 278 6 4
Kaop 15.7 14.283 45.41 6623750 167.6746988 231 66 17
Nguru 14.7 13.916 44.764 3642500 152.256705 87 15 11
Nagima 20.2 14.7666636 43.249999 23545500 121.699 271 287 450
Dagoja 17.2 16.7833302 45.5166646 2316000 135.5187713 114 374 194
AlKoma 20.7 16.7999968 51.7333304 767000 83.38818565 NA NA NA
Ikaka 18.1 15.46833146 43.5404978 5687500 99.86455331 18 29 11
Maru 17.4 15.452 44.2173 10845625 90.98423127 679 424 159
Nko 19.4 16.17 43.89 10693000 109.7594937 126 140 60
Dfor 16.8 14.702 44.336 16587000 120.7656012 74 52 30
Hydr 20.7 16.666664 49.499998 5468000 126.388535 2 5 NA
lami 23 16.17 43.156 10432875 141.3487544 359 326 795
Ntoka 16.9 13.9499962 44.1833326 21614750 134.3637902 153 84 2
Lakoje 20.6 13.244 44.606 4050250 100.5965167 168 108 75
Mbiri 14.6 15.4499982 45.333332 2386625 166.9104478 465 452 502
Masi 18.2 14.633 43.6 4265250 117.16839 6 1 NA
Sukara 20.6 16.94021 43.76393 6162750 66.72009029 974 928 1176
Shakara 18.9 15.174 44.213 10721000 151.284264 585 979 574
Bambam 18.8 14.5499978 46.83333 3017625 142.442623 101 84 134
Erika 17.8 13.506 43.759 23565000 93.59459459 697 728 1034
mydata %>%
group_by(County) %>%
summarise(geometry = st_sfc(st_cast(st_multipoint(cbind(longitude,
latitude)), 'POLYGON'))) %>%
st_sf()
After running the above I got an error message:
Error in ClosePol(x) : polygons require at least 4 points
Please can someone help me out with how to fix this.
Related
I have 2 different data.frames. I want to add the grouping$.group column to the phenology data.frame under the conditions given by the group data.frame (LEVEL and SPECIES). I have tried the merge() function using by= but it keeps giving me "Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column". Sorry this might seem like a very easy thing. I'm a beginner..
> head(phenology1)
YEAR GRADIENT SPECIES ELEVATION SITE TREE_ID CN b_E b_W b_M d_E d_W d_X c_E c_W t_max r_max r_delta_t LEVEL
1 2019 1 Pseudotsuga menziesii 395 B1_D B1_D1 59 119 135.5 143.0 139.0 148.5 165 258.0 284 154 0.7908536 0.4244604 lower
2 2019 1 Pseudotsuga menziesii 395 B1_D B1_D2 69 106 127.0 142.0 177.0 173.0 194 283.0 300 156 0.9807529 0.3898305 lower
3 2019 1 Pseudotsuga menziesii 395 B1_D B1_D3 65 97 125.0 154.5 169.0 174.0 202 266.0 299 167 NA 0.3846154 lower
4 2019 1 Picea abies 405 B1_F B1_F1 68 162 171.5 182.0 106.5 127.5 137 268.5 299 190 NA 0.6384977 lower
5 2019 1 Picea abies 405 B1_F B1_F2 78 139 165.5 176.5 152.0 140.5 167 291.0 306 181 0.9410427 0.5131579 lower
6 2019 1 Picea abies 405 B1_F B1_F3 34 147 177.5 188.0 100.0 97.5 128 247.0 275 187 0.5039245 0.3400000 lower
> grouping
LEVEL SPECIES emmean SE df lower.CL upper.CL .group
lower Pseudotsuga menziesii 107 8.19 12 89.5 125 1
upper Pseudotsuga menziesii 122 8.19 12 103.8 140 12
lower Abies alba 128 8.19 12 110.2 146 12
upper Abies alba 144 8.19 12 126.7 162 12
upper Picea abies 147 8.19 12 129.2 165 2
lower Picea abies 149 8.19 12 131.5 167 2
You can use left_join() from dplyr package (join phenology1 with only the columns LEVEL, SPECIES and .group from grouping):
library(dplyr)
phenology1 %>%
left_join(grouping %>% select(LEVEL, SPECIES, .group))
This automatically selects identical column names in both data frames to join on. If you want to set these explicitely, you can add by = c("LEVEL" = "LEVEL", "SPECIES" = "SPECIES").
Base R using match function:
phenology1$.group <- grouping$.group[match(grouping$SPECIES, phenology1$SPECIES) & match(grouping$LEVEL, phenology1$LEVEL)]
I want to grab content in the url while the original data come in simple column and row. I tried readHTMLTable and obviously its not working. Using webcsraping xpath, how to get clean data without '\n...' and keep the data in data.frame. Is this possible without saving in csv? kindly help me to improve my code. Thank you
library(rvest)
library(dplyr)
page <- read_html("http://weather.uwyo.edu/cgi-bin/sounding?region=seasia&TYPE=TEXT%3ALIST&YEAR=2006&MONTH=09&FROM=0100&TO=0100&STNM=48657")
xpath <- '/html/body/pre[1]'
txt <- page %>% html_node(xpath=xpath) %>% html_text()
txt
[1] "\n-----------------------------------------------------------------------------\n PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV\n hPa m C C % g/kg deg knot K K K \n-----------------------------------------------------------------------------\n 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3\n 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5\n 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4\n 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1\n 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9\n 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1\n 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3\n 839.0 1624 18.8 11.8 64 10.47 225 17 307.0 338.8 308.9\n 828.0 1735 18.0 11.4 65 10.33 ... <truncated>
We can extend your base code and treat the web page as an API endpoint since it takes parameters:
library(httr)
library(rvest)
I use more than ^^ below via :: but I don't want to pollute the namespace.
I'd usually end up writing a small, parameterized function or small package with a cpl parameterized functions to encapsulate the logic below.
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = "seasia",
TYPE = "TEXT:LIST",
YEAR = "2006",
MONTH = "09",
FROM = "0100",
TO = "0100",
STNM = "48657"
)
) -> res
^^ makes the web page request and gathers the response.
httr::content(res, as="parsed") %>%
html_nodes("pre") -> wx_dat
^^ turns it into an html_document
Now, we extract the readings:
html_text(wx_dat[[1]]) %>% # turn the first <pre> node into text
strsplit("\n") %>% # split it into lines
unlist() %>% # turn it back into a character vector
{ col_names <<- .[3]; . } %>% # pull out the column names (we'll use them later)
.[-(1:5)] %>% # strip off the header
paste0(collapse="\n") -> readings # turn it back into a big text blob
^^ cleaned up the table and we'll use readr::read_table() to parse it. We'll also turn the extract column names into the actual colum names:
readr::read_table(readings, col_names = tolower(unlist(strsplit(trimws(col_names), "\ +"))))
## # A tibble: 106 x 11
## pres hght temp dwpt relh mixr drct sknt thta thte thtv
## <dbl> <int> <dbl> <dbl> <int> <dbl> <int> <int> <dbl> <dbl> <dbl>
## 1 1009 16 23.8 22.7 94 17.6 170 2 296. 347. 299.
## 2 1002 78 24.6 21.6 83 16.5 252 4 298. 346. 300.
## 3 1000 96 24.4 21.3 83 16.2 275 4 298. 345. 300.
## 4 962 434 22.9 20 84 15.6 235 10 299. 345 302.
## 5 925 777 21.4 18.7 85 14.9 245 11 301. 345. 304.
## 6 887 1142 20.3 16 76 13.0 255 15 304. 343. 306.
## 7 850 1512 19.2 13.2 68 11.3 230 17 306. 341. 308.
## 8 839 1624 18.8 11.8 64 10.5 225 17 307 339. 309.
## 9 828 1735 18 11.4 65 10.3 220 17 307. 339. 309.
## 10 789 2142 15.1 10 72 9.84 205 16 308. 339. 310.
## # ... with 96 more rows
You didn't say you wanted the station metadata but we can get that too (in the second <pre>:
html_text(wx_dat[[2]]) %>%
strsplit("\n") %>%
unlist() %>%
trimws() %>% # get rid of whitespace
.[-1] %>% # blank line removal
strsplit(": ") %>% # separate field and value
lapply(function(x) setNames(as.list(x), c("measure", "value"))) %>% # make each pair a named list
dplyr::bind_rows() -> metadata # turn it into a data frame
metadata
## # A tibble: 30 x 2
## measure value
## <chr> <chr>
## 1 Station identifier WMKD
## 2 Station number 48657
## 3 Observation time 060901/0000
## 4 Station latitude 3.78
## 5 Station longitude 103.21
## 6 Station elevation 16.0
## 7 Showalter index 0.34
## 8 Lifted index -1.40
## 9 LIFT computed using virtual temperature -1.63
## 10 SWEAT index 195.39
## # ... with 20 more rows
Your data is truncated, so I'll work with what I can:
txt <- "\n-----------------------------------------------------------------------------\n PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV\n hPa m C C % g/kg deg knot K K K \n-----------------------------------------------------------------------------\n 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3\n 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5\n 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4\n 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1\n 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9\n 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1\n 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3\n"
It appears to be fixed-width, with lines compacted into a single string using the \n delimiter, so let's split it up:
strsplit(txt, "\n")
# [[1]]
# [1] ""
# [2] "-----------------------------------------------------------------------------"
# [3] " PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV"
# [4] " hPa m C C % g/kg deg knot K K K "
# [5] "-----------------------------------------------------------------------------"
# [6] " 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3"
# [7] " 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5"
# [8] " 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4"
# [9] " 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1"
# [10] " 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9"
# [11] " 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1"
# [12] " 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3"
It seems that row 1 is empty, and 2 and 5 are lines that need to be removed. Rows 3-4 appear to be the column header and units, respectively; since R doesn't allow multi-row headers, I'll remove the units, and leave it to you to save them elsewhere if you need them.
From here, it's a straight-forward call (noting the [[1]] for strsplit's returned list):
read.table(text=strsplit(txt, "\n")[[1]][-c(1,2,4,5)], header=TRUE)
# PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
# 1 1009 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3
# 2 1002 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5
# 3 1000 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4
# 4 962 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1
# 5 925 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9
# 6 887 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1
# 7 850 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3
http://www.aqistudy.cn/historydata/daydata.php?city=%E8%8B%8F%E5%B7%9E&month=201504
This is the website from with I want to read data.
My code is as follows,
library(XML)
fileurl <- "http://www.aqistudy.cn/historydata/daydata.php?city=苏州&month=201404"
doc <- htmlTreeParse(fileurl, useInternalNodes = TRUE, encoding = "utf-8")
rootnode <- xmlRoot(doc)
pollution <- xpathSApply(rootnode, "/td", xmlValue)
But I got a lot of messy code, and I don't know how to fix this problem.
I appreciate for any help!
This can be simplified using library(rvest) to directly read the table
library(rvest)
url <- "http://www.aqistudy.cn/historydata/daydata.php?city=%E8%8B%8F%E5%B7%9E&month=201504"
doc <- read_html(url) %>%
html_table()
doc[[1]]
# 日期 AQI 范围 质量等级 PM2.5 PM10 SO2 CO NO2 O3 排名
# 1 2015-04-01 106 67~144 轻度污染 79.3 105.1 20.2 1.230 89.5 76 308
# 2 2015-04-02 74 31~140 良 48.1 79.7 18.8 1.066 51.5 129 231
# 3 2015-04-03 98 49~136 良 72.9 89.2 16.0 1.323 50.9 62 293
# 4 2015-04-04 92 56~158 良 67.6 78.2 14.3 1.506 57.4 93 262
# 5 2015-04-05 87 42~167 良 63.7 56.1 16.9 1.245 50.8 91 215
# 6 2015-04-06 46 36~56 优 29.1 30.8 10.0 0.817 37.5 98 136
# 7 2015-04-07 45 34~59 优 27.0 42.4 12.0 0.640 36.6 77 143
My data is follow the sequence:
deptime .count
1 4.5 6285
2 14.5 5901
3 24.5 6002
4 34.5 5401
5 44.5 5080
6 54.5 4567
7 104.5 3162
8 114.5 2784
9 124.5 1950
10 134.5 1800
11 144.5 1630
12 154.5 1076
13 204.5 738
14 214.5 556
15 224.5 544
16 234.5 650
17 244.5 392
18 254.5 309
19 304.5 356
20 314.5 364
My ggplot code:
ggplot(pplot, aes(x=deptime, y=.count)) + geom_bar(stat="identity",fill='#FF9966',width = 5) + labs(x="time", y="count")
output figure
There are a gap between each 100. Does anyone know how to fix it?
Thank You
I have some data taken from a moving instrument through the water and flow data were logged every 0.5 seconds.I need to make a graph showing flow along the path of the instrument with different colors for selected range of flow (lets say 0.1 -1.5) and all outliers in same color.
How can I use R to plot the graph?
Here's some of my data:
idNr flow dep
21 0.135714532 3.16
22 0.131061729 3.07
23 0.13299406 3.11
24 0.145316675 3.1
25 6.31297442 3.07
26 0.331310509 3.14
27 0.445034592 3.17
28 0.637348777 3.45
29 0.87382414 4.04
30 1.302061623 5.31
31 1.80235436 6.78
32 1.63399146 8.24
33 1.675284308 9.78
34 1.686855996 11.27
35 1.775785232 12.72
36 1.455096956 14.22
37 1.530298919 15.69
38 1.431618958 17.15
39 1.446519477 18.7
40 1.532840436 20.15
41 1.595988278 21.55
42 1.478074882 23.07
43 1.545724299 24.5
44 1.475195233 26.05
45 1.542920138 27.56
46 1.437394899 29.02
47 1.644596033 30.59
48 2.303170426 31.94
49 1.77077097 33.5
50 1.530637295 34.97
51 1.439630621 36.41
52 1.469339767 37.9
53 1.330177211 39.36
54 1.478639264 40.85
55 1.465239548 42.3
56 1.536369473 43.85
57 1.484045946 45.39
58 1.561566967 46.92
59 1.487268243 48.5
60 1.509553327 50.01
61 1.436443415 51.42
62 1.520071435 52.98
63 1.455013499 54.55
64 1.472445826 56.01
65 1.499715554 57.48
66 1.514167783 58.99
67 1.493601132 60.62
68 1.531585488 62.05
69 1.625335896 63.61
70 1.478178989 65.08
71 1.531871974 66.57
72 1.472113782 68.16
73 1.4799859 69.58
74 1.458177137 71.21
75 1.591356624 72.79
76 1.542120401 74.28
77 1.694959183 75.77
78 1.720245831 77.25
79 1.519968728 78.75
80 1.390172108 80.34
81 1.520286586 81.81
82 1.592769579 83.38
83 1.632539512 84.88
84 1.481495103 86.41
85 1.529086844 87.98
86 1.536760058 89.52
87 1.52298084 91.03
88 1.731281442 92.53
89 1.639074839 94.02
90 1.562987505 95.53
91 1.543290194 97.04
92 1.578430537 98.61
93 1.702396728 100.12
94 1.657955781 101.6
95 1.557872012 103.16
96 1.613944568 104.68
97 1.631505361 106.16
98 1.435526209 107.66
99 1.711407354 109.26
100 1.57266259 110.72
101 1.514305998 112.27
102 1.56082106 113.78
103 1.828251113 115.27
104 1.748255115 116.76
105 1.854233769 118.3
106 1.803737202 119.75
107 1.67996921 121.25
108 1.751109178 122.77
109 1.76849805 124.32
110 1.758307258 125.82
111 1.740444751 127.32
112 1.644748694 128.81
113 1.620253049 130.29
114 1.75889143 131.77
115 1.760015837 133.24
116 1.683797088 134.78
117 1.713609054 136.31
118 1.26352548 137.78
119 1.81112139 139.37
120 1.888694446 140.83
121 1.774687553 142.36
122 1.739557437 143.78
123 1.64517875 145.33
124 1.699596858 146.8
125 1.628577412 148.35
126 1.769012673 149.81
127 1.594415839 151.41
128 1.493148224 152.89
129 1.581041449 154.42
130 1.538720671 155.93
131 1.589420092 157.41
132 1.64016166 158.93
133 1.575397227 160.43
134 1.63183131 162
135 1.75038462 163.44
136 1.434958447 165.01
137 1.74120127 166.5
138 1.748106592 167.98
139 1.813005453 169.46
140 1.541089106 170.98
141 1.556216895 172.56
142 1.660628956 174.08
143 1.693981673 175.61
144 1.67059241 177.09
145 1.66300418 178.66
146 1.652198157 180.17
147 1.709649777 181.65
148 1.745386082 183.15
149 1.385201724 184.62
150 1.468321001 186.15
151 1.627495534 187.58
152 1.678188454 189.03
153 1.810850273 190.55
154 1.585102162 192.01
155 1.652869637 193.48
156 1.593472296 195.03
157 1.846131262 196.53
158 1.442232687 198.03
159 1.279801142 199.57
160 1.803737202 201.06
161 1.794407014 202.61
162 1.456371696 204
163 1.815429315 205.38
164 1.518992563 206.89
165 1.647235482 208.41
166 1.47721908 209.86
167 1.698562049 211.35
168 0.835645183 211.94
169 0.971816361 211.79
170 0.215360462 212.01
171 0.576920795 212.05
172 0.504199289 212.02
173 0.352668234 212.02
174 0.503149022 211.89
175 0.180540198 212
176 0.242642996 211.91
177 0.132911363 211.98
178 0.131504237 212
179 0.131893716 211.96
180 0.132283195 211.9
181 0.132672674 212.02
182 0.919043761 212.02
183 1.705414848 211.94
184 2.491785935 211.9
185 0.187914127 211.97
186 0.137465923 211.98
187 0.490799032 211.85
188 0.695088396 211.6
189 0.283676082 211
190 0.965362936 210.07
191 0.769205485 209.16
192 1.41417407 208.1
193 1.437885765 207.08
194 1.359908615 206.02
195 1.311925665 204.77
196 1.239993728 203.57
197 1.352713698 202.38
198 1.454984487 201.12
199 1.07880741 199.89
200 1.171552813 198.65
201 1.32237999 197.41
202 1.354385018 196.15
203 1.090512744 194.97
204 1.390356612 193.71
205 1.213781005 192.33
206 1.431367612 191.08
207 1.384149391 189.81
208 1.282003839 188.57
209 1.332777243 187.45
210 1.235528323 186.19
211 1.110193788 184.9
212 1.247866678 183.7
213 1.193442015 182.56
214 1.313545026 181.36
215 1.284521151 180.08
216 1.258253835 178.91
217 1.263683914 177.68
218 1.331011134 176.49
219 1.234639947 175.26
220 1.338309201 174.07
221 1.224766564 172.91
222 1.280519417 171.66
223 1.150706617 170.44
224 1.260841259 169.25
225 1.09289038 168.05
226 1.262740102 166.84
227 1.170657422 165.68
228 1.176689196 164.49
229 1.262806038 163.31
230 1.40733017 162.17
231 1.24825008 160.93
232 1.282003839 159.8
233 1.237756113 158.62
234 1.339447054 157.48
235 1.383181118 156.3
236 1.213781005 155.12
237 1.310221259 154.01
238 1.344108343 152.81
239 1.367520862 151.66
240 1.197841162 150.49
241 1.219617617 149.28
242 1.299869434 148.11
243 1.370740857 146.95
244 1.175491535 145.83
245 1.272503374 144.68
246 1.251364988 143.52
247 1.321138129 142.3
248 1.127328278 141.16
249 1.319040773 139.94
250 1.118599826 138.74
251 1.312986113 137.64
252 1.392439005 136.41
253 1.197514718 135.29
254 1.218054343 134.14
255 1.230533733 132.94
256 1.209399639 131.82
257 1.362942371 130.63
258 1.365576729 129.47
259 1.235870422 128.24
260 1.250791647 127.08
261 1.407273491 125.89
262 1.279498015 124.77
263 1.419950309 123.67
264 1.343802584 122.51
265 1.359908615 121.28
266 1.221997022 120.16
267 1.406570776 118.98
268 1.32583614 117.77
269 1.456010534 116.59
270 1.446890774 115.47
271 1.486626504 114.25
272 1.271696732 113.07
273 1.123066893 111.94
Something like below?
# cut the flow data into range and add that into new "group" column
df$group <- cut(df$flow, c(0.1,1.5,max(df$flow)), labels=c("group1","group2"))
# plot the graph
library(ggplot2); library(dplyr)
ggplot(df, aes(x=idNr, y=flow)) +
geom_point(data = df %>% filter(group == "group1"), aes(colour=flow)) +
scale_colour_gradient(low="green", high="red", name="Not Outliers") +
geom_point(data = df %>% filter(group == "group2"), aes(size="Outliers"), colour="blue") +
guides(size = guide_legend("Outliers"))
Edit
per comment from OP.
If use dep as Y-axis, i.e. replace flow with dep in code (for ggplot) above,
graph will be