using rbind to create dataframe is not working - r

I am trying to write a script to get some specific values for the equation 25a+20b=1600 with a in the range between 24:60 and b in 20:50
I need to get the pairs of a and b satisfying the equation.
My first problem was how to define a and b with a single digit decimal place (a=24.0,24.1,24.2...etc.) but I overcame that defining a<-c(240:600)/10, so my first question is: Is there any direct method to do that?
Now, I did a couple of nested loops and I am able to get each time the equation is satisfied in a vector, I want to use rbind() to attach this vector to a matrix or a dataframe but it is not working without any error or warning. it just takes the value of the first vector and that's it !
Here is my code, can someone help me define where the problem is?
solve_ms <- function() {
index<-1
sol<-data.frame()
temp<-vector("numeric")
a<-c(240:600)/10
b<-c(200:500)/10
for (i in 1:length(a)){
for (j in 1:length(b)) {
c <- 25*a[i]+20*b[j]
if(c == 1600) {
temp<-c(a[i], b[j])
if(index == 1) {
sol<-temp
index<-0
}
else rbind(sol,temp)
}
}
}
return(sol)
}
I found our where my code problem is, it is using rbind without assigning its return to a dataframe. I had to do {sol<-rbind(sol,temp)} and it will work.
I will check other suggestions as well.. thanks.

Try this instead:
#define a function
fun <- function(a,b) (25*a+20*b) == 1600
Since floating point precision could be an issue:
#alternative function
fun <- function(a,b,tol=.Machine$double.eps ^ 0.5) abs(25*a+20*b-1600) < tol
#create all possible combinations
paras <- expand.grid(a=c(240:600)/10, b=20:50)
paras[fun(paras$a,paras$b),]
a b
241 48.0 20
594 47.2 21
947 46.4 22
1300 45.6 23
1653 44.8 24
2006 44.0 25
2359 43.2 26
2712 42.4 27
3065 41.6 28
3418 40.8 29
3771 40.0 30
4124 39.2 31
4477 38.4 32
4830 37.6 33
5183 36.8 34
5536 36.0 35
5889 35.2 36
6242 34.4 37
6595 33.6 38
6948 32.8 39
7301 32.0 40
7654 31.2 41
8007 30.4 42
8360 29.6 43
8713 28.8 44
9066 28.0 45
9419 27.2 46
9772 26.4 47
10125 25.6 48
10478 24.8 49
10831 24.0 50

If the problem is really this simple i.e. solving for roots of 2 variable linear equation, you can always rearrange the equation to write b in terms of a i.e. b = (1600-25*a)/20 and get all the values of b for corresponding values of a and filter the combinations by b
e.g.
a = c(240:600)/10
b = 20:50
RESULTS <- data.frame(a, b = (1600 - 25 * a)/20)[((1600 - 25 * a)/20) %in% b, ]
RESULTS
## a b
## 1 24.0 50
## 9 24.8 49
## 17 25.6 48
## 25 26.4 47
## 33 27.2 46
## 41 28.0 45
## 49 28.8 44
## 57 29.6 43
## 65 30.4 42
## 73 31.2 41
## 81 32.0 40
## 97 33.6 38
## 105 34.4 37
## 121 36.0 35
## 137 37.6 33
## 145 38.4 32
## 161 40.0 30
## 177 41.6 28
## 185 42.4 27
## 193 43.2 26
## 201 44.0 25
## 209 44.8 24
## 217 45.6 23
## 225 46.4 22
## 233 47.2 21
## 241 48.0 20

Related

Can't load in txt. file

I am trying to read in this data, but can't load in successfully, what is the problem?
height <- data.table::fread('http://www.stat.nthu.edu.tw/~swcheng/Teaching/stat5410/data/height.txt')
height
Warning message:
In data.table::fread("http://www.stat.nthu.edu.tw/~swcheng/Teaching/stat5410/data/height.txt") :
Stopped early on line 3. Expected 5 fields but found 3. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<62 65.5 2>>
It's header is pretty weird, you may try(There might be better options)
x <- read.table(url('http://www.stat.nthu.edu.tw/~swcheng/Teaching/stat5410/data/height.txt'), skip = 2, header= F)
names(x) <- c("Height of Father", "Average Height of Son", "# of Fathers")
x
Height of Father Average Height of Son # of Fathers
1 62 65.5 2
2 63 66.5 6
3 64 66.8 12
4 65 66.8 19
5 66 67.6 27
6 67 67.8 26
7 68 68.6 26
8 69 69.1 26
9 70 69.5 20
10 71 70.6 15
11 72 70.3 8
12 73 72.0 5

Save content in web as data.frame

I want to grab content in the url while the original data come in simple column and row. I tried readHTMLTable and obviously its not working. Using webcsraping xpath, how to get clean data without '\n...' and keep the data in data.frame. Is this possible without saving in csv? kindly help me to improve my code. Thank you
library(rvest)
library(dplyr)
page <- read_html("http://weather.uwyo.edu/cgi-bin/sounding?region=seasia&TYPE=TEXT%3ALIST&YEAR=2006&MONTH=09&FROM=0100&TO=0100&STNM=48657")
xpath <- '/html/body/pre[1]'
txt <- page %>% html_node(xpath=xpath) %>% html_text()
txt
[1] "\n-----------------------------------------------------------------------------\n PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV\n hPa m C C % g/kg deg knot K K K \n-----------------------------------------------------------------------------\n 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3\n 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5\n 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4\n 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1\n 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9\n 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1\n 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3\n 839.0 1624 18.8 11.8 64 10.47 225 17 307.0 338.8 308.9\n 828.0 1735 18.0 11.4 65 10.33 ... <truncated>
We can extend your base code and treat the web page as an API endpoint since it takes parameters:
library(httr)
library(rvest)
I use more than ^^ below via :: but I don't want to pollute the namespace.
I'd usually end up writing a small, parameterized function or small package with a cpl parameterized functions to encapsulate the logic below.
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = "seasia",
TYPE = "TEXT:LIST",
YEAR = "2006",
MONTH = "09",
FROM = "0100",
TO = "0100",
STNM = "48657"
)
) -> res
^^ makes the web page request and gathers the response.
httr::content(res, as="parsed") %>%
html_nodes("pre") -> wx_dat
^^ turns it into an html_document
Now, we extract the readings:
html_text(wx_dat[[1]]) %>% # turn the first <pre> node into text
strsplit("\n") %>% # split it into lines
unlist() %>% # turn it back into a character vector
{ col_names <<- .[3]; . } %>% # pull out the column names (we'll use them later)
.[-(1:5)] %>% # strip off the header
paste0(collapse="\n") -> readings # turn it back into a big text blob
^^ cleaned up the table and we'll use readr::read_table() to parse it. We'll also turn the extract column names into the actual colum names:
readr::read_table(readings, col_names = tolower(unlist(strsplit(trimws(col_names), "\ +"))))
## # A tibble: 106 x 11
## pres hght temp dwpt relh mixr drct sknt thta thte thtv
## <dbl> <int> <dbl> <dbl> <int> <dbl> <int> <int> <dbl> <dbl> <dbl>
## 1 1009 16 23.8 22.7 94 17.6 170 2 296. 347. 299.
## 2 1002 78 24.6 21.6 83 16.5 252 4 298. 346. 300.
## 3 1000 96 24.4 21.3 83 16.2 275 4 298. 345. 300.
## 4 962 434 22.9 20 84 15.6 235 10 299. 345 302.
## 5 925 777 21.4 18.7 85 14.9 245 11 301. 345. 304.
## 6 887 1142 20.3 16 76 13.0 255 15 304. 343. 306.
## 7 850 1512 19.2 13.2 68 11.3 230 17 306. 341. 308.
## 8 839 1624 18.8 11.8 64 10.5 225 17 307 339. 309.
## 9 828 1735 18 11.4 65 10.3 220 17 307. 339. 309.
## 10 789 2142 15.1 10 72 9.84 205 16 308. 339. 310.
## # ... with 96 more rows
You didn't say you wanted the station metadata but we can get that too (in the second <pre>:
html_text(wx_dat[[2]]) %>%
strsplit("\n") %>%
unlist() %>%
trimws() %>% # get rid of whitespace
.[-1] %>% # blank line removal
strsplit(": ") %>% # separate field and value
lapply(function(x) setNames(as.list(x), c("measure", "value"))) %>% # make each pair a named list
dplyr::bind_rows() -> metadata # turn it into a data frame
metadata
## # A tibble: 30 x 2
## measure value
## <chr> <chr>
## 1 Station identifier WMKD
## 2 Station number 48657
## 3 Observation time 060901/0000
## 4 Station latitude 3.78
## 5 Station longitude 103.21
## 6 Station elevation 16.0
## 7 Showalter index 0.34
## 8 Lifted index -1.40
## 9 LIFT computed using virtual temperature -1.63
## 10 SWEAT index 195.39
## # ... with 20 more rows
Your data is truncated, so I'll work with what I can:
txt <- "\n-----------------------------------------------------------------------------\n PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV\n hPa m C C % g/kg deg knot K K K \n-----------------------------------------------------------------------------\n 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3\n 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5\n 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4\n 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1\n 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9\n 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1\n 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3\n"
It appears to be fixed-width, with lines compacted into a single string using the \n delimiter, so let's split it up:
strsplit(txt, "\n")
# [[1]]
# [1] ""
# [2] "-----------------------------------------------------------------------------"
# [3] " PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV"
# [4] " hPa m C C % g/kg deg knot K K K "
# [5] "-----------------------------------------------------------------------------"
# [6] " 1009.0 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3"
# [7] " 1002.0 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5"
# [8] " 1000.0 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4"
# [9] " 962.0 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1"
# [10] " 925.0 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9"
# [11] " 887.0 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1"
# [12] " 850.0 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3"
It seems that row 1 is empty, and 2 and 5 are lines that need to be removed. Rows 3-4 appear to be the column header and units, respectively; since R doesn't allow multi-row headers, I'll remove the units, and leave it to you to save them elsewhere if you need them.
From here, it's a straight-forward call (noting the [[1]] for strsplit's returned list):
read.table(text=strsplit(txt, "\n")[[1]][-c(1,2,4,5)], header=TRUE)
# PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
# 1 1009 16 23.8 22.7 94 17.56 170 2 296.2 346.9 299.3
# 2 1002 78 24.6 21.6 83 16.51 252 4 297.6 345.6 300.5
# 3 1000 96 24.4 21.3 83 16.23 275 4 297.6 344.8 300.4
# 4 962 434 22.9 20.0 84 15.56 235 10 299.4 345.0 302.1
# 5 925 777 21.4 18.7 85 14.90 245 11 301.2 345.2 303.9
# 6 887 1142 20.3 16.0 76 13.04 255 15 303.7 342.7 306.1
# 7 850 1512 19.2 13.2 68 11.34 230 17 306.2 340.6 308.3

R , how to Aggregate data with same date field in an R dataframe

Hi I have an R dataframe that looks like the following:
SURVEY.DATE A B C
1898 2010-05-13 38 34 21
1899 2010-05-13 38 33 21
1897 2010-05-14 37 34 21
1895 2010-05-21 38 29 21
1896 2010-05-21 39 32 21
1894 2010-05-23 39 32 21
I would like to average the rows with the same date so to have only one average observation per day. Ideally I would like to end up with an xts obsject that would look like :
SURVEY.DATE A B C
1898 2010-05-13 38 33.5 21
1897 2010-05-14 37 34 21
1896 2010-05-21 38.5 30.5 21
1894 2010-05-23 39 32 21
Seems to be a challenge for my newbie R skills...any help / pointers would be appreciated
You could try
library(dplyr)
res <- df1 %>%
group_by(SURVEY.DATE) %>%
summarise_each(funs(mean))
Or
res1 <- aggregate(.~SURVEY.DATE, df1, mean)
and then convert it to xts
library(xts)
xts(res1[-1], order.by= as.Date(res1[,1]))
# A B C
#2010-05-13 38.0 33.5 21
#2010-05-14 37.0 34.0 21
#2010-05-21 38.5 30.5 21
#2010-05-23 39.0 32.0 21
Here's how I'd do this using data.table.
require(data.table)
setDT(df)[, lapply(.SD, mean), by=SURVEY.DATE]
# SURVEY.DATE A B C
# 1: 2010-05-13 38.0 33.5 21
# 2: 2010-05-14 37.0 34.0 21
# 3: 2010-05-21 38.5 30.5 21
# 4: 2010-05-23 39.0 32.0 21
Check the new HTML vignettes if you'd like to learn more.

from ffdf to regular dataframe

Is there a way to transform a ffdf into a normal dataframe?
Assuming that the thing is small enough to fit in the ram.
for example:
library(ff)
library(ffbase)
data(trees)
Girth <- ff(trees$Girth)
Height <- ff(trees$Height)
Volume <- ff(trees$Volume)
aktiv <- ff(as.factor(sample(0:1,31,replace=T)))
#Create data frame with some added parameters.
data <- ffdf(Girth=Girth,Height=Height,Volume=Volume,aktiv=aktiv)
rm(Girth,Height,Volume,trees,aktiv)
aktiv <- subset.ffdf(data, data$aktiv== "1" )
and then convert aktiv to data frame and save the RData
(sadly the person waiting the output don't want to learn how to work with the ff package, so I have no choise)
Thanks
Just use as.data.frame:
aktiv <- subset(as.data.frame(data), aktiv == 1)
Girth Height Volume aktiv
2 8.6 65 10.3 1
7 11.0 66 15.6 1
9 11.1 80 22.6 1
12 11.4 76 21.0 1
13 11.4 76 21.4 1
15 12.0 75 19.1 1
17 12.9 85 33.8 1
20 13.8 64 24.9 1
21 14.0 78 34.5 1
23 14.5 74 36.3 1
26 17.3 81 55.4 1
27 17.5 82 55.7 1
28 17.9 80 58.3 1
31 20.6 87 77.0 1
From here you can easily use save or write.csv, e.g.:
save(aktiv, file="aktiv.RData")

Aggregating multiple subtotals?

Is there a way to aggregate multiple sub-totals with reshape2? E.g. for the airquality dataset
require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
aqm <- subset(aqm, month %in% 5:6 & day %in% 1:7)
I can make a subtotal column for each month, that has the average for all variables within that month:
dcast(aqm, day ~ month+variable, mean, margins = "variable")
day 5_ozone 5_solar.r 5_wind 5_temp 5_(all) 6_ozone 6_solar.r
1 1 41 190 7.4 67 76.350 NaN 286
2 2 36 118 8.0 72 58.500 NaN 287
3 3 12 149 12.6 74 61.900 NaN 242
4 4 18 313 11.5 62 101.125 NaN 186
5 5 NaN NaN 14.3 56 35.150 NaN 220
6 6 28 NaN 14.9 66 36.300 NaN 264
7 7 23 299 8.6 65 98.900 29 127
6_wind 6_temp 6_(all)
1 8.6 78 124.20000
2 9.7 74 123.56667
3 16.1 67 108.36667
4 9.2 84 93.06667
5 8.6 85 104.53333
6 14.3 79 119.10000
7 9.7 82 61.92500
I can also make a subtotal column for each variable, that has the average for all months within that variable:
dcast(aqm, day ~ variable+month, mean, margins = "month")
day ozone_5 ozone_6 ozone_(all) solar.r_5 solar.r_6 solar.r_(all)
1 1 41 NaN 41 190 286 238.0
2 2 36 NaN 36 118 287 202.5
3 3 12 NaN 12 149 242 195.5
4 4 18 NaN 18 313 186 249.5
5 5 NaN NaN NaN NaN 220 220.0
6 6 28 NaN 28 NaN 264 264.0
7 7 23 29 26 299 127 213.0
wind_5 wind_6 wind_(all) temp_5 temp_6 temp_(all)
1 7.4 8.6 8.00 67 78 72.5
2 8.0 9.7 8.85 72 74 73.0
3 12.6 16.1 14.35 74 67 70.5
4 11.5 9.2 10.35 62 84 73.0
5 14.3 8.6 11.45 56 85 70.5
6 14.9 14.3 14.60 66 79 72.5
7 8.6 9.7 9.15 65 82 73.5
Is there a way to tell reshape2 to calculate both sets of subtotals in one command? This command is close, adding in the grand total, but omits the monthly subtotals:
dcast(aqm, day ~ variable+month, mean, margins = c("variable", "month"))
If I get your question right, you can use
acast(aqm, day ~ variable ~ month, mean, margins = c("variable", "month"))[,,'(all)']
The acast gets you the summary for each day over each variable over each month. The total aggregate "slice" ([,,'(all)']) has a row for each day, with a column for each variable (averaged over all months) and a '(all)' column averaging each day, over all variables over all months.
Is this what you needed?

Resources