Using R to read html but got a mistake

Using R to read html but got a mistake - r

http://www.aqistudy.cn/historydata/daydata.php?city=%E8%8B%8F%E5%B7%9E&month=201504
This is the website from with I want to read data.
My code is as follows,
library(XML)
fileurl <- "http://www.aqistudy.cn/historydata/daydata.php?city=苏州&month=201404"
doc <- htmlTreeParse(fileurl, useInternalNodes = TRUE, encoding = "utf-8")
rootnode <- xmlRoot(doc)
pollution <- xpathSApply(rootnode, "/td", xmlValue)
But I got a lot of messy code, and I don't know how to fix this problem.
I appreciate for any help!

This can be simplified using library(rvest) to directly read the table
library(rvest)
url <- "http://www.aqistudy.cn/historydata/daydata.php?city=%E8%8B%8F%E5%B7%9E&month=201504"
doc <- read_html(url) %>%
html_table()
doc[[1]]
# 日期 AQI 范围 质量等级 PM2.5 PM10 SO2 CO NO2 O3 排名
# 1 2015-04-01 106 67~144 轻度污染 79.3 105.1 20.2 1.230 89.5 76 308
# 2 2015-04-02 74 31~140 良 48.1 79.7 18.8 1.066 51.5 129 231
# 3 2015-04-03 98 49~136 良 72.9 89.2 16.0 1.323 50.9 62 293
# 4 2015-04-04 92 56~158 良 67.6 78.2 14.3 1.506 57.4 93 262
# 5 2015-04-05 87 42~167 良 63.7 56.1 16.9 1.245 50.8 91 215
# 6 2015-04-06 46 36~56 优 29.1 30.8 10.0 0.817 37.5 98 136
# 7 2015-04-07 45 34~59 优 27.0 42.4 12.0 0.640 36.6 77 143

Related

Read in CSV in mixed English and French number format

I would like to read the a CSV into R that is quoted, comma-separated (i.e. sep = "," not sep = ";" as read.csv2 defaults to) but that
uses the comma inside fields as the decimal separator
contains periods to separate each group of three digits from the right
An example of a problematic entry is "3.051,00" in the final line of the excerpt from the CSV shown.
I tried
dat <- read.csv2("path_to_csv.csv", sep = ",", stringsAsFactors = FALSE)
and a variant using read.csv (both are identical except for their defaults as noted in Difference between read.csv() and read.csv2() in R. Both return improperly-formatted data.frames (e.g. containing 3.051,00).
Can I read this comma-separated file in directly with read.table without having to perform text-preprocessing?
Excerpt of CSV
praf,pmek,plcg,PIP2,PIP3,p44/42,pakts473,PKA,PKC,P38,pjnk
"26,40","13,20","8,82","18,30","58,80","6,61","17,00","414,00","17,00","44,90","40,00"
"35,90","16,50","12,30","16,80","8,13","18,60","32,50","352,00","3,37","16,50","61,50"
"59,40","44,10","14,60","10,20","13,00","14,90","32,50","403,00","11,40","31,90","19,50"
"62,10","51,90","13,60","30,20","10,60","14,30","37,90","692,00","6,49","25,00","91,40"
"75,00","33,40","1,00","31,60","1,00","19,80","27,60","505,00","18,60","31,10","7,64"
"20,40","15,10","7,99","101,00","35,90","9,14","22,90","400,00","11,70","22,70","6,85"
"47,80","19,60","17,50","33,10","82,00","17,90","35,20","956,00","22,50","43,30","20,00"
"59,90","53,30","11,80","77,70","12,90","11,10","37,90","1.407,00","18,80","29,40","16,80"
"46,60","27,10","12,40","109,00","21,90","21,50","38,20","207,00","11,00","31,30","12,00"
"51,90","21,30","49,10","58,80","10,80","58,80","200,00","3.051,00","15,30","39,20","15,70"
Note: I am aware of the question European and American decimal format for thousands, which is not sufficient. This user preprocesses the file they want to read in whereas I would like a direct means of reading a CSV of the kind shown into R.

Most of it is resolved with dec=",",
# saved your data to 'file.csv'
out <- read.csv("file.csv", dec=",")
head(out)
# praf pmek plcg PIP2 PIP3 p44.42 pakts473 PKA PKC P38 pjnk
# 1 26.4 13.2 8.82 18.3 58.80 6.61 17.0 414,00 17.00 44.9 40.00
# 2 35.9 16.5 12.30 16.8 8.13 18.60 32.5 352,00 3.37 16.5 61.50
# 3 59.4 44.1 14.60 10.2 13.00 14.90 32.5 403,00 11.40 31.9 19.50
# 4 62.1 51.9 13.60 30.2 10.60 14.30 37.9 692,00 6.49 25.0 91.40
# 5 75.0 33.4 1.00 31.6 1.00 19.80 27.6 505,00 18.60 31.1 7.64
# 6 20.4 15.1 7.99 101.0 35.90 9.14 22.9 400,00 11.70 22.7 6.85
Only one column is string:
sapply(out, class)
# praf pmek plcg PIP2 PIP3 p44.42 pakts473 PKA PKC P38
# "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "character" "numeric" "numeric"
# pjnk
# "numeric"
This can be resolved post-read with:
ischr <- sapply(out, is.character)
out[ischr] <- lapply(out[ischr], function(z) as.numeric(gsub(" ", "", chartr(",.", ". ", z))))
out$PKA
# [1] 414 352 403 692 505 400 956 1407 207 3051
If you'd rather read it in without post-processing, you can pipe(.) it, assuming you have sed available[^1]:
out <- read.csv(pipe("sed -E 's/([0-9])[.]([0-9])/\\1\\2/g;s/([0-9]),([0-9])/\\1.\\2/g' < file.csv"))
Notes:
sed is generally available on all linux/macos systems, and on windows computers it is included within Rtools.

Like r2evans's comment says, dec = "," takes care of the cases without thousands separators. Then use lapply/gsub to process the other cases, which are still of class "character".
txt <- '
praf,pmek,plcg,PIP2,PIP3,p44/42,pakts473,PKA,PKC,P38,pjnk
"26,40","13,20","8,82","18,30","58,80","6,61","17,00","414,00","17,00","44,90","40,00"
"35,90","16,50","12,30","16,80","8,13","18,60","32,50","352,00","3,37","16,50","61,50"
"59,40","44,10","14,60","10,20","13,00","14,90","32,50","403,00","11,40","31,90","19,50"
"62,10","51,90","13,60","30,20","10,60","14,30","37,90","692,00","6,49","25,00","91,40"
"75,00","33,40","1,00","31,60","1,00","19,80","27,60","505,00","18,60","31,10","7,64"
"20,40","15,10","7,99","101,00","35,90","9,14","22,90","400,00","11,70","22,70","6,85"
"47,80","19,60","17,50","33,10","82,00","17,90","35,20","956,00","22,50","43,30","20,00"
"59,90","53,30","11,80","77,70","12,90","11,10","37,90","1.407,00","18,80","29,40","16,80"
"46,60","27,10","12,40","109,00","21,90","21,50","38,20","207,00","11,00","31,30","12,00"
"51,90","21,30","49,10","58,80","10,80","58,80","200,00","3.051,00","15,30","39,20","15,70"
'
df1 <- read.csv(textConnection(txt), dec = ",")
i <- sapply(df1, is.character)
df1[i] <- lapply(df1[i], \(x) gsub("\\.", "", x))
df1[i] <- lapply(df1[i], \(x) as.numeric(sub(",", ".", x)))
df1
#> praf pmek plcg PIP2 PIP3 p44.42 pakts473 PKA PKC P38 pjnk
#> 1 26.4 13.2 8.82 18.3 58.80 6.61 17.0 414 17.00 44.9 40.00
#> 2 35.9 16.5 12.30 16.8 8.13 18.60 32.5 352 3.37 16.5 61.50
#> 3 59.4 44.1 14.60 10.2 13.00 14.90 32.5 403 11.40 31.9 19.50
#> 4 62.1 51.9 13.60 30.2 10.60 14.30 37.9 692 6.49 25.0 91.40
#> 5 75.0 33.4 1.00 31.6 1.00 19.80 27.6 505 18.60 31.1 7.64
#> 6 20.4 15.1 7.99 101.0 35.90 9.14 22.9 400 11.70 22.7 6.85
#> 7 47.8 19.6 17.50 33.1 82.00 17.90 35.2 956 22.50 43.3 20.00
#> 8 59.9 53.3 11.80 77.7 12.90 11.10 37.9 1407 18.80 29.4 16.80
#> 9 46.6 27.1 12.40 109.0 21.90 21.50 38.2 207 11.00 31.3 12.00
#> 10 51.9 21.3 49.10 58.8 10.80 58.80 200.0 3051 15.30 39.2 15.70
Created on 2022-02-07 by the reprex package (v2.0.1)

Convert data frame to polygon

I tried converting my data frame into polygon using the code from previous
post but I got an error message. Please I need assistance on how to fix this.
Thanks. Below is my code:
County MEDIAN_V latitude longitude RACE DRAG AGIP AGIP2 AGIP3
Akpa 18.7 13.637 46.048 3521875 140.1290323 55 19 5
Uopa 17.9 12.85 44.869 3980000 86.71929825 278 6 4
Kaop 15.7 14.283 45.41 6623750 167.6746988 231 66 17
Nguru 14.7 13.916 44.764 3642500 152.256705 87 15 11
Nagima 20.2 14.7666636 43.249999 23545500 121.699 271 287 450
Dagoja 17.2 16.7833302 45.5166646 2316000 135.5187713 114 374 194
AlKoma 20.7 16.7999968 51.7333304 767000 83.38818565 NA NA NA
Ikaka 18.1 15.46833146 43.5404978 5687500 99.86455331 18 29 11
Maru 17.4 15.452 44.2173 10845625 90.98423127 679 424 159
Nko 19.4 16.17 43.89 10693000 109.7594937 126 140 60
Dfor 16.8 14.702 44.336 16587000 120.7656012 74 52 30
Hydr 20.7 16.666664 49.499998 5468000 126.388535 2 5 NA
lami 23 16.17 43.156 10432875 141.3487544 359 326 795
Ntoka 16.9 13.9499962 44.1833326 21614750 134.3637902 153 84 2
Lakoje 20.6 13.244 44.606 4050250 100.5965167 168 108 75
Mbiri 14.6 15.4499982 45.333332 2386625 166.9104478 465 452 502
Masi 18.2 14.633 43.6 4265250 117.16839 6 1 NA
Sukara 20.6 16.94021 43.76393 6162750 66.72009029 974 928 1176
Shakara 18.9 15.174 44.213 10721000 151.284264 585 979 574
Bambam 18.8 14.5499978 46.83333 3017625 142.442623 101 84 134
Erika 17.8 13.506 43.759 23565000 93.59459459 697 728 1034
mydata %>%
group_by(County) %>%
summarise(geometry = st_sfc(st_cast(st_multipoint(cbind(longitude,
latitude)), 'POLYGON'))) %>%
st_sf()
After running the above I got an error message:
Error in ClosePol(x) : polygons require at least 4 points
Please can someone help me out with how to fix this.

Save content in web as data.frame

I want to grab content in the url while the original data come in simple column and row. I tried readHTMLTable and obviously its not working. Using webcsraping xpath, how to get clean data without '\n...' and keep the data in data.frame. Is this possible without saving in csv? kindly help me to improve my code. Thank you
library(rvest)
library(dplyr)
page <- read_html("http://weather.uwyo.edu/cgi-bin/sounding?region=seasia&TYPE=TEXT%3ALIST&YEAR=2006&MONTH=09&FROM=0100&TO=0100&STNM=48657")
xpath <- '/html/body/pre[1]'
txt <- page %>% html_node(xpath=xpath) %>% html_text()
txt
[1] "\n-----------------------------------------------------------------------------\n PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV\n hPa m C C % g/kg deg knot K K K \n-----------------------------------------------------------------------------\n 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3\n 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5\n 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4\n 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1\n 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9\n 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1\n 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3\n 839.0 1624 18.8 11.8 64 10.47 225 17 307.0 338.8 308.9\n 828.0 1735 18.0 11.4 65 10.33 ... <truncated>

We can extend your base code and treat the web page as an API endpoint since it takes parameters:
library(httr)
library(rvest)
I use more than ^^ below via :: but I don't want to pollute the namespace.
I'd usually end up writing a small, parameterized function or small package with a cpl parameterized functions to encapsulate the logic below.
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = "seasia",
TYPE = "TEXT:LIST",
YEAR = "2006",
MONTH = "09",
FROM = "0100",
TO = "0100",
STNM = "48657"
)
) -> res
^^ makes the web page request and gathers the response.
httr::content(res, as="parsed") %>%
html_nodes("pre") -> wx_dat
^^ turns it into an html_document
Now, we extract the readings:
html_text(wx_dat[[1]]) %>% # turn the first <pre> node into text
strsplit("\n") %>% # split it into lines
unlist() %>% # turn it back into a character vector
{ col_names <<- .[3]; . } %>% # pull out the column names (we'll use them later)
.[-(1:5)] %>% # strip off the header
paste0(collapse="\n") -> readings # turn it back into a big text blob
^^ cleaned up the table and we'll use readr::read_table() to parse it. We'll also turn the extract column names into the actual colum names:
readr::read_table(readings, col_names = tolower(unlist(strsplit(trimws(col_names), "\ +"))))
## # A tibble: 106 x 11
## pres hght temp dwpt relh mixr drct sknt thta thte thtv
## <dbl> <int> <dbl> <dbl> <int> <dbl> <int> <int> <dbl> <dbl> <dbl>
## 1 1009 16 23.8 22.7 94 17.6 170 2 296. 347. 299.
## 2 1002 78 24.6 21.6 83 16.5 252 4 298. 346. 300.
## 3 1000 96 24.4 21.3 83 16.2 275 4 298. 345. 300.
## 4 962 434 22.9 20 84 15.6 235 10 299. 345 302.
## 5 925 777 21.4 18.7 85 14.9 245 11 301. 345. 304.
## 6 887 1142 20.3 16 76 13.0 255 15 304. 343. 306.
## 7 850 1512 19.2 13.2 68 11.3 230 17 306. 341. 308.
## 8 839 1624 18.8 11.8 64 10.5 225 17 307 339. 309.
## 9 828 1735 18 11.4 65 10.3 220 17 307. 339. 309.
## 10 789 2142 15.1 10 72 9.84 205 16 308. 339. 310.
## # ... with 96 more rows
You didn't say you wanted the station metadata but we can get that too (in the second <pre>:
html_text(wx_dat[[2]]) %>%
strsplit("\n") %>%
unlist() %>%
trimws() %>% # get rid of whitespace
.[-1] %>% # blank line removal
strsplit(": ") %>% # separate field and value
lapply(function(x) setNames(as.list(x), c("measure", "value"))) %>% # make each pair a named list
dplyr::bind_rows() -> metadata # turn it into a data frame
metadata
## # A tibble: 30 x 2
## measure value
## <chr> <chr>
## 1 Station identifier WMKD
## 2 Station number 48657
## 3 Observation time 060901/0000
## 4 Station latitude 3.78
## 5 Station longitude 103.21
## 6 Station elevation 16.0
## 7 Showalter index 0.34
## 8 Lifted index -1.40
## 9 LIFT computed using virtual temperature -1.63
## 10 SWEAT index 195.39
## # ... with 20 more rows

Your data is truncated, so I'll work with what I can:
txt <- "\n-----------------------------------------------------------------------------\n PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV\n hPa m C C % g/kg deg knot K K K \n-----------------------------------------------------------------------------\n 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3\n 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5\n 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4\n 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1\n 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9\n 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1\n 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3\n"
It appears to be fixed-width, with lines compacted into a single string using the \n delimiter, so let's split it up:
strsplit(txt, "\n")
# [[1]]
# [1] ""
# [2] "-----------------------------------------------------------------------------"
# [3] " PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV"
# [4] " hPa m C C % g/kg deg knot K K K "
# [5] "-----------------------------------------------------------------------------"
# [6] " 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3"
# [7] " 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5"
# [8] " 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4"
# [9] " 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1"
# [10] " 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9"
# [11] " 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1"
# [12] " 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3"
It seems that row 1 is empty, and 2 and 5 are lines that need to be removed. Rows 3-4 appear to be the column header and units, respectively; since R doesn't allow multi-row headers, I'll remove the units, and leave it to you to save them elsewhere if you need them.
From here, it's a straight-forward call (noting the [[1]] for strsplit's returned list):
read.table(text=strsplit(txt, "\n")[[1]][-c(1,2,4,5)], header=TRUE)
# PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
# 1 1009 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3
# 2 1002 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5
# 3 1000 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4
# 4 962 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1
# 5 925 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9
# 6 887 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1
# 7 850 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3

How can I organise and move row of data based on label matches?

I have raw data shown below. I'm trying to move a row of data that corresponds to a label it matches to a new location in the dataframe.
dat<-read.table(text='RowLabels col1 col2 col3 col4 col5 col6
L 24363.7 25944.9 25646.1 25335.4 23564.2 25411.5
610 411.4 439 437.3 436.9 420.7 516.9
1 86.4 113.9 103.5 113.5 80.3 129
2 102.1 99.5 96.3 100.4 99.5 86
3 109.7 102.2 100.2 112.9 92.3 123.8
4 88.9 87.1 103.6 102.5 93.6 134.1
5 -50.3 -40.2 -72.3 -61.4 -27 -22.7
6 -35.3 -9.3 25.3 -0.3 15.6 -27.3
7 109.9 85.8 80.7 69.3 66.4 94
181920 652.9 729.2 652.1 689.1 612.5 738.4
1 104.3 107.3 103.5 104.2 98.3 110.1
2 103.6 102.6 100.1 103.2 88.8 117.7
3 53.5 99.1 46.7 70.3 53.9 32.5
4 93.5 107.2 98.3 99.3 97.3 121.1
5 96.8 109.3 104 102.2 98.7 112.9
6 103.6 96.9 104.7 104.4 91.5 137.7
7 97.6 106.8 94.8 105.5 84 106.4
181930 732.1 709.6 725.8 729.5 554.5 873.1
1 118.4 98.8 102.3 102 101.9 115.8
2 96.7 103.3 104.6 105.2 81.9 128.7
3 96 98.2 99.4 97.9 69.8 120.6
4 100.7 101 103.6 106.6 59.6 136.2
5 106.1 103.4 104.7 104.8 76.1 131.8
6 105 102.1 103 108.3 81 124.7
7 109.2 102.8 108.2 104.7 84.2 115.3
N 3836.4 4395.8 4227.3 4567.4 4009.9 4434.6
610 88.1 96.3 99.6 92 90 137.6
1 88.1 96.3 99.6 92 90 137.6
181920 113.1 100.6 106.5 104.2 87.3 108.2
1 113.1 100.6 106.5 104.2 87.3 108.2
181930 111.3 99.1 104.5 115.5 103.6 118.8
1 111.3 99.1 104.5 115.5 103.6 118.8
',header=TRUE)
I want to match the values of the three N-prefix labels: 610, 181920 and 181930 with its corresponding L-prefix labels. Basically move that row of data into the L-prefix as a new row, labeled 0 or 8 for example. So, the result for label, 610 would look like:
RowLabels col1 col2 col3 col4 col5 col6
610 411.4 439 437.3 436.9 420.7 516.9
1 86.4 113.9 103.5 113.5 80.3 129
2 102.1 99.5 96.3 100.4 99.5 86
3 109.7 102.2 100.2 112.9 92.3 123.8
4 88.9 87.1 103.6 102.5 93.6 134.1
5 -50.3 -40.2 -72.3 -61.4 -27 -22.7
6 -35.3 -9.3 25.3 -0.3 15.6 -27.3
7 109.9 85.8 80.7 69.3 66.4 94
8 88.1 96.3 99.6 92 90 137.6
Is this possible? I tried searching and I found some resources pointing toward dplyr or tidyr or aggregate. But I can't find a good example that matches my case. How to combine rows based on unique values in R? and
Aggregate rows by shared values in a variable

library(dplyr)
library(zoo)
df <- dat %>%
filter(grepl("^\\d+$",RowLabels)) %>%
mutate(RowLabels_temp = ifelse(grepl("^\\d{3,}$",RowLabels), as.numeric(as.character(RowLabels)), NA)) %>%
na.locf() %>%
select(-RowLabels) %>%
distinct() %>%
group_by(RowLabels_temp) %>%
mutate(RowLabels_indexed = row_number()-1) %>%
arrange(RowLabels_temp, RowLabels_indexed) %>%
mutate(RowLabels_indexed = ifelse(RowLabels_indexed==0, RowLabels_temp, RowLabels_indexed)) %>%
rename(RowLabels=RowLabels_indexed) %>%
data.frame()
df <- df %>% select(-RowLabels_temp)
df
Output is
col1 col2 col3 col4 col5 col6 RowLabels
1 411.4 439.0 437.3 436.9 420.7 516.9 610
2 86.4 113.9 103.5 113.5 80.3 129.0 1
3 102.1 99.5 96.3 100.4 99.5 86.0 2
4 109.7 102.2 100.2 112.9 92.3 123.8 3
5 88.9 87.1 103.6 102.5 93.6 134.1 4
6 -50.3 -40.2 -72.3 -61.4 -27.0 -22.7 5
7 -35.3 -9.3 25.3 -0.3 15.6 -27.3 6
8 109.9 85.8 80.7 69.3 66.4 94.0 7
9 88.1 96.3 99.6 92.0 90.0 137.6 8
...

It sounds like you want to use the match() function, for example:
target<-c(the values of your target order)
df<-df[match(target, df$column_to_reorder),]

avoid scatterplot between the same variable in a loop

I have a data frame (tab3) looking like this:
CWRES ID AGE BMI WGT
3 0.59034000 1 37.5 20.7 64.6
4 1.81300000 1 37.5 20.7 64.6
5 1.42920000 1 37.5 20.7 64.6
6 0.59194000 1 37.5 20.7 64.6
7 0.30886000 1 37.5 20.7 64.6
8 -0.14601000 1 37.5 20.7 64.6
9 -0.19776000 1 37.5 20.7 64.6
10 0.74208000 1 37.5 20.7 64.6
11 -0.69280000 1 37.5 20.7 64.6
38 -2.42900000 1 37.5 20.7 64.6
39 -0.25732000 1 37.5 20.7 64.6
40 -0.49689000 1 37.5 20.7 64.6
41 -0.11556000 1 37.5 20.7 64.6
42 0.91036000 1 37.5 20.7 64.6
43 -0.24766000 1 37.5 20.7 64.6
44 -0.14962000 1 37.5 20.7 64.6
45 -0.45651000 1 37.5 20.7 64.6
48 0.53237000 2 58.5 23.0 53.4
49 -0.53284000 2 58.5 23.0 53.4
50 -0.33086000 2 58.5 23.0 53.4
51 -0.56355000 2 58.5 23.0 53.4
52 0.00883120 2 58.5 23.0 53.4
53 -1.00650000 2 58.5 23.0 53.4
80 0.85810000 2 58.5 23.0 53.4
81 -0.71715000 2 58.5 23.0 53.4
82 0.44346000 2 58.5 23.0 53.4
83 1.09890000 2 58.5 23.0 53.4
84 0.98726000 2 58.5 23.0 53.4
85 0.19667000 2 58.5 23.0 53.4
86 -1.32570000 2 58.5 23.0 53.4
89 -4.56920000 3 43.5 26.7 66.2
90 0.75174000 3 43.5 26.7 66.2
91 0.40935000 3 43.5 26.7 66.2
92 0.18340000 3 43.5 26.7 66.2
93 0.27399000 3 43.5 26.7 66.2
94 -0.23596000 3 43.5 26.7 66.2
95 -1.59460000 3 43.5 26.7 66.2
96 -0.03708900 3 43.5 26.7 66.2
97 0.68750000 3 43.5 26.7 66.2
98 -0.47979000 3 43.5 26.7 66.2
125 2.23200000 3 43.5 26.7 66.2
126 0.90470000 3 43.5 26.7 66.2
127 -0.34493000 3 43.5 26.7 66.2
128 -0.02114400 3 43.5 26.7 66.2
129 -1.08830000 3 43.5 26.7 66.2
130 -0.33937000 3 43.5 26.7 66.2
131 1.19820000 3 43.5 26.7 66.2
132 0.81653000 3 43.5 26.7 66.2
133 1.61810000 3 43.5 26.7 66.2
134 0.42914000 3 43.5 26.7 66.2
135 -1.03150000 3 43.5 26.7 66.2
...
I want to plot the variable CWRES versus ID, AGE, BMI and WGT. To do this I use this code:
library(ggplot2)
plotloop <- function(x, na.rm = TRUE, ...) {
nm <- names(x)
for (i in seq_along(nm)) {
print(ggplot(x,aes_string(x = nm[i], y = nm[1])) +
geom_point()) }
}
plotloop(tab3)
However, it also plots CWRES vs CWRES and I do not want to plot CWRES vs CWRES.
What should I do?
Thanks in advance,
Mario

The loop may not be the best way to go about plotting multiple plots with ggplot. Hence I will not try to fix your code but suggest an alternative route.
You should first melt your data.frame to transform it to long format with only retaining the CWRES variable:
require(reshape2)
mDf <- melt(x, c("CWRES"))
Now you can create your plots as follows:
g <- ggplot(mDF,aes(x=CWRES,y=value))
g <- g + geom_point()
g <- g + facet_grid(.~variable)
g
This creates a faceted plot with the four scatter plots next to each other.
If you really want to plot multiple i would proceed as follows (based on the formatting above):
variables <- unique(mDF$variable)
for (v in variables)
{
print(ggplot(mDF[mDF$variable==v,],aes(x=CWRES,y=value)) + geom_point() )
}

Finally I found the solution:
This code works great!
for(i in names(tab3)[2:5]) {
df2 <- tab3[, c(i, "CWRES")]
print(ggplot(tab3) + geom_point(aes_string(x = i, y = "CWRES")) + theme_bw())
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using R to read html but got a mistake - r

Related

Read in CSV in mixed English and French number format

Convert data frame to polygon

Save content in web as data.frame

How can I organise and move row of data based on label matches?

avoid scatterplot between the same variable in a loop

Categories

Resources