I'd like to have the output of an R command shown in a horizontally scrolling box. Reprex:
library(ggplot2movies)
head(movies)
# title year length budget rating votes r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa Action Animation Comedy Drama Documentary Romance Short
# 1 $ 1971 121 NA 6.4 348 4.5 4.5 4.5 4.5 14.5 24.5 24.5 14.5 4.5 4.5 0 0 1 1 0 0 0
# 2 $1000 a Touchdown 1939 71 NA 6.0 20 0.0 14.5 4.5 24.5 14.5 14.5 14.5 4.5 4.5 14.5 0 0 1 0 0 0 0
# 3 $21 a Day Once a Month 1941 7 NA 8.2 5 0.0 0.0 0.0 0.0 0.0 24.5 0.0 44.5 24.5 24.5 0 1 0 0 0 0 1
# 4 $40,000 1996 70 NA 8.2 6 14.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 34.5 45.5 0 0 1 0 0 0 0
# 5 $50,000 Climax Show, The 1975 71 NA 3.4 17 24.5 4.5 0.0 14.5 14.5 4.5 0.0 0.0 0.0 24.5 0 0 0 0 0 0 0
# 6 $pent 2000 91 NA 4.3 45 4.5 4.5 4.5 14.5 14.5 14.5 4.5 4.5 14.5 14.5 0 0 0 1 0 0 0
How do I make the output horizontally scrollable on a xaringan slide?
#Yihui Xie has pretty much provided the answer on Github. I'm just making it into a working example here. Things to note are:
1) One can specify css as code chunks in Rmarkdown, or one can write his or her own css file following these guidelines: https://github.com/yihui/xaringan/wiki. I'm assuming this is a one-off thing so for simplicity I'm including the css in the Rmd file.
2) After setting attributes for the pre element, one also need to set the width option or R to a large value, otherwise head will wrap the output for you.
---
title: "Horizontal scroll for wide output"
output:
xaringan::moon_reader:
css: ["default"]
nature:
highlightLines: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{css, echo=FALSE}
pre {
background: #FFBB33;
max-width: 100%;
overflow-x: scroll;
}
```
```{r}
library(ggplot2movies)
op <- options("width"=250) # large number to trick head, otherwise see next slide
head(movies)
options(op) # set options back to default
```
---
```{r}
head(movies) # head with default width, note text gets wrapped. Though you can still scroll horizontally, as an effect of setting `pre`
```
Related
I've got a (pretty simple) code to download a table with data:
library(rvest)
link = "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/team/2442/statistics"
aguada = read_html(link)
stats = aguada %>% html_nodes("tbody")
stats = aguada %>% html_nodes(xpath="/html/body/div[1]/div[6]/div/div/div/div[4]/table") %>% html_table()
my_df <- as.data.frame(stats)
And now I'm trying to do the same, but for the URLs for each player in the same table
for (i in 1:17){
url_path="/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr[i]/td[1]/a"
jugador[i] = aguada %>% html_nodes(xpath=url_path)%>% html_attr("href")
}
I've tried the code above, and while it doesn't crash, it doesn't work as intended either. I want to create an array with the urls or something like that so I can then get the stats for each player easily. While we're at it, I'd like to know if, instead of doing 1:17 in the for and manually counting the players, there's a way to automate that too, so I can do something like for i in 1:table_length
You need to initialise the vector jugador to be able to append the links to it. Also, when you create a path that invloves changing a character within the path, paste concatenates the strings with the number i to create the path, as shown below:
jugador <- vector()
for(i in 1:17){
url_path <- paste("/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr[", i, "]/td[1]/a", sep = "")
jugador[i] <- aguada %>% html_nodes(xpath=url_path)%>% html_attr("href")
}
Result:
> jugador
[1] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/15257?"
[2] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/17101?"
[3] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/17554?"
[4] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/43225?"
[5] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/262286?"
[6] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/623893?"
[7] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/725720?"
[8] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/858052?"
[9] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1645559?"
[10] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1651515?"
[11] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1717089?"
[12] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1924883?"
[13] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1924884?"
[14] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1931124?"
[15] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1950388?"
[16] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1971299?"
[17] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1991297?"
Links in the last column. Without loop
library(tidyverse)
library(rvest)
page <-
"https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/team/2442/statistics" %>%
read_html()
df <- page %>%
html_table() %>%
pluck(1) %>%
janitor::clean_names() %>%
mutate(link = page %>%
html_elements("td a") %>%
html_attr("href") %>%
unique())
# A tibble: 17 x 21
jugador p i pts_pr pts as_pr as ro_pr rd_pr rt_pr rt bl_prom bl re_pr re min_pr tc_percent x2p_percent x3p_percent tl_percent link$value
<chr> <int> <int> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 F. MEDINA 22 9 6 131 1.3 29 0.5 0.8 1.3 28 0 0 0.6 13 22 37 55.6 26.8 60 https://hosted.dcd.share~
2 J. SANTISO 23 23 12 277 5.6 128 0.4 2.9 3.3 75 0 0 0.7 15 31 43.1 43.2 43 75 https://hosted.dcd.share~
3 A. ZUVICH 17 1 8.2 139 0.7 11 2 2.9 4.9 83 0.5 8 1.1 19 15.9 59.8 67.1 16.7 76.5 https://hosted.dcd.share~
4 A. YOUNG 15 14 12.5 187 1.3 20 0.4 3.3 3.7 55 0.5 7 0.6 9 30.5 36.2 41.9 32 78.8 https://hosted.dcd.share~
5 E. VARGAS 23 23 16.1 370 1.9 44 3.5 8.4 11.9 273 1.6 37 1.1 25 30.3 53.3 53.5 0 62.6 https://hosted.dcd.share~
6 L. PLANELLS 23 0 3.6 83 1.6 37 0.5 1.1 1.6 37 0.1 2 0.7 17 15.1 35.4 35.1 35.6 90 https://hosted.dcd.share~
7 T. METZGER 11 9 6.8 75 0.6 7 1.7 3.3 5 55 0.4 4 0.5 5 23.1 37 44.2 28.9 40 https://hosted.dcd.share~
8 L. SILVA 19 0 1.1 21 0.1 2 0.2 0.2 0.3 6 0.1 1 0 0 4 35 71.4 15.4 100 https://hosted.dcd.share~
9 J. STOLL 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 0 0 0 0 https://hosted.dcd.share~
10 G. BRUN 4 0 0.8 3 0 0 0.3 0 0.3 1 0 0 0 0 0.6 50 0 50 0 https://hosted.dcd.share~
11 A. GENTILE 3 0 0 0 0 0 0.3 0.3 0.7 2 0 0 0 0 1 0 0 0 0 https://hosted.dcd.share~
12 L. CERMINATO 19 5 8.6 163 1.7 33 1.3 3.6 4.9 93 0.7 14 0.9 17 20.9 44.1 51.9 27.1 57.1 https://hosted.dcd.share~
13 J. ADAMS 8 8 16.6 133 1.9 15 1 2.5 3.5 28 0.3 2 1.9 15 28.9 46.2 53.9 26.7 81.8 https://hosted.dcd.share~
14 K. FULLER 5 5 4.6 23 1.8 9 0.6 0.6 1.2 6 0 0 0.4 2 20.1 17.1 0 28.6 83.3 https://hosted.dcd.share~
15 S. MAC 4 4 12.5 50 2 8 0 3 3 12 0.5 2 1.8 7 29.9 37.8 35.5 42.9 76.9 https://hosted.dcd.share~
16 O. JOHNSON 12 12 15.4 185 3.4 41 1 3.2 4.2 50 0.3 4 0.8 9 31.8 47.3 53.6 34.7 75 https://hosted.dcd.share~
17 G. SOLANO 2 2 15.5 31 6.5 13 0.5 5.5 6 12 0 0 1 2 32.4 41.4 55.6 18.2 71.4 https://hosted.dcd.share~
Inside the string, i is just a regular character, and XPath doesn’t know it: it has no connection to the variables in your R session.
However, if you want to select all elements with a given XPath, you don’t need the index at all. That is, the following XPath expression works (I’ve simply removed the [i] part):
/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr/td[1]/a
Here’s the corresponding ‘rvest’ code. Note that it uses no loop:
library(rvest)
link = "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/team/2442/statistics"
aguada = read_html(link)
jugador = aguada %>%
html_nodes(xpath = "/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr/td[1]/a/#href")
Or, alternatively:
jugador = aguada %>%
html_nodes(xpath = "/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr/td[1]/a") %>%
html_attr("href")
Both return a vector of hyperrefs. The first solution has a slightly different return type (xml_nodeset) but for most purposes they will be similar.
I want to do a logistic regression to calculate the probability of a student pursuing a master degree.
I have dataset containing many students who have done certain courses in certain years. These courses also receive a rating (as well does the tutor) and this is course and year specific.
These students may or may not do a master at the same university. Based on the results a student gets, bases on the rating a course gets, based on the number of resits a student does, I want to predict the probability of a student pursuing a master.
To do so, I want to run a logistic regression and hence I need to split the data into a training and validation/test set. However, as you see, multiple rows can revolve around the same student. E.g. row 1 to 12 revolve around student 9000006.
The problem when doing a logistic regression now is that the regression sees every row as a seperate unit, while in fact, the students are kind of 'grouped'.
Programme Resits Student_ID Course_code Academic_year Course_Grade_Binned Graduated Master_Student Course.rating_M Rating.tutor_M Selfstudy_M
1 IB 0 9000006 ABC1198 2013 B TRUE 1 7.5 8.2 14.1
2 IB 0 9000006 ABC1192 2014 B TRUE 1 8.4 8.8 13.0
3 IB 0 9000006 ABC1277 2014 A TRUE 1 6.0 6.4 10.6
4 IB 0 9000006 ABC1448 2013 B TRUE 1 5.7 7.8 14.4
5 IB 0 9000006 ABC1120 2014 B TRUE 1 7.1 7.4 11.2
6 IB 0 9000006 ABC1362 2013 B TRUE 1 6.7 7.5 15.8
7 IB 0 9000006 ABC1213 2013 C TRUE 1 7.7 8.1 11.4
8 IB 0 9000006 ABC1382 2013 B TRUE 1 6.6 7.1 16.3
9 IB 0 9000006 ABC1108 2013 C TRUE 1 7.1 7.6 15.7
10 IB 1 9000006 ABC1329 2014 B TRUE 1 7.5 7.9 10.7
11 IB 0 9000006 ABC1126 2013 B TRUE 1 6.7 7.5 15.3
12 IB 0 9000006 ABC1003 2013 B TRUE 1 7.3 8.5 12.6
13 IB 0 9000014 ABC1309 2014 B TRUE 0 6.9 6.1 12.4
14 IB 0 9000014 ABC1198 2013 A TRUE 0 7.5 8.2 14.1
15 IB 0 9000014 ABC1277 2014 A TRUE 0 6.0 6.4 10.6
16 IB 0 9000014 ABC1448 2013 A TRUE 0 5.7 7.8 14.4
17 IB 0 9000014 ABC1362 2013 B TRUE 0 6.7 7.5 15.8
18 IB 0 9000014 ABC1213 2013 B TRUE 0 7.7 8.1 11.4
19 IB 0 9000014 ABC1152 2014 A TRUE 0 7.0 7.6 12.3
20 IB 0 9000014 ABC1382 2013 A TRUE 0 6.6 7.1 16.3
21 IB 0 9000014 ABC1108 2013 B TRUE 0 7.1 7.6 15.7
22 IB 0 9000014 ABC1455 2014 A TRUE 0 6.7 7.3 11.2
23 IB 0 9000014 ABC1126 2013 B TRUE 0 6.7 7.5 15.3
24 IB 0 9000014 ABC1003 2013 A TRUE 0 7.3 8.5 12.6
25 IB 1 9000028 ABC1213 2014 C TRUE 0 7.8 8.6 10.7
26 IB 0 9000028 ABC1198 2014 B TRUE 0 7.1 8.0 15.5
Does anyone have any tips on how to perform a logistic regression on this kind of data? If you have another suggestion to calculate a probability of a student pursuing a master, then please let me know as well :)
Cheers!
So, I have the following data.frame, and I want to generate two plots in one graph for yval vs. xval, for each zval and type tp. The lef
> df
xval yval se zval cond
1 1.0 1.831564e-02 1.831564e-03 0 a
2 1.2 2.705185e-02 2.705185e-03 0 a
3 1.4 3.916390e-02 3.916390e-03 0 a
4 1.6 5.557621e-02 5.557621e-03 0 a
5 1.8 7.730474e-02 7.730474e-03 0 a
6 2.0 1.053992e-01 1.053992e-02 0 a
7 2.2 1.408584e-01 1.408584e-02 0 a
8 2.4 1.845195e-01 1.845195e-02 0 a
9 2.6 2.369278e-01 2.369278e-02 0 a
10 2.8 2.981973e-01 2.981973e-02 0 a
11 3.0 3.678794e-01 3.678794e-02 0 a
12 3.2 4.448581e-01 4.448581e-02 0 a
13 3.4 5.272924e-01 5.272924e-02 0 a
14 3.6 6.126264e-01 6.126264e-02 0 a
15 3.8 6.976763e-01 6.976763e-02 0 a
16 4.0 7.788008e-01 7.788008e-02 0 a
17 4.2 8.521438e-01 8.521438e-02 0 a
18 4.4 9.139312e-01 9.139312e-02 0 a
19 4.6 9.607894e-01 9.607894e-02 0 a
20 4.8 9.900498e-01 9.900498e-02 0 a
21 5.0 1.000000e+00 1.000000e-01 0 a
22 5.2 9.900498e-01 9.900498e-02 0 a
23 5.4 9.607894e-01 9.607894e-02 0 a
24 5.6 9.139312e-01 9.139312e-02 0 a
25 5.8 8.521438e-01 8.521438e-02 0 a
26 6.0 7.788008e-01 7.788008e-02 0 a
27 6.2 6.976763e-01 6.976763e-02 0 a
28 6.4 6.126264e-01 6.126264e-02 0 a
29 6.6 5.272924e-01 5.272924e-02 0 a
30 6.8 4.448581e-01 4.448581e-02 0 a
31 7.0 3.678794e-01 3.678794e-02 0 a
32 7.2 2.981973e-01 2.981973e-02 0 a
33 7.4 2.369278e-01 2.369278e-02 0 a
34 7.6 1.845195e-01 1.845195e-02 0 a
35 7.8 1.408584e-01 1.408584e-02 0 a
36 8.0 1.053992e-01 1.053992e-02 0 a
37 8.2 7.730474e-02 7.730474e-03 0 a
38 8.4 5.557621e-02 5.557621e-03 0 a
39 8.6 3.916390e-02 3.916390e-03 0 a
40 8.8 2.705185e-02 2.705185e-03 0 a
41 9.0 1.831564e-02 1.831564e-03 0 a
42 9.2 1.215518e-02 1.215518e-03 0 a
43 9.4 7.907054e-03 7.907054e-04 0 a
44 9.6 5.041760e-03 5.041760e-04 0 a
45 9.8 3.151112e-03 3.151112e-04 0 a
46 10.0 1.930454e-03 1.930454e-04 0 a
47 1.0 3.726653e-06 7.453306e-07 0 b
48 1.2 9.929504e-06 1.985901e-06 0 b
49 1.4 2.541935e-05 5.083869e-06 0 b
50 1.6 6.252150e-05 1.250430e-05 0 b
51 1.8 1.477484e-04 2.954967e-05 0 b
52 2.0 3.354626e-04 6.709253e-05 0 b
53 2.2 7.318024e-04 1.463605e-04 0 b
54 2.4 1.533811e-03 3.067621e-04 0 b
55 2.6 3.088715e-03 6.177431e-04 0 b
56 2.8 5.976023e-03 1.195205e-03 0 b
57 3.0 1.110900e-02 2.221799e-03 0 b
58 3.2 1.984109e-02 3.968219e-03 0 b
59 3.4 3.404745e-02 6.809491e-03 0 b
60 3.6 5.613476e-02 1.122695e-02 0 b
61 3.8 8.892162e-02 1.778432e-02 0 b
62 4.0 1.353353e-01 2.706706e-02 0 b
63 4.2 1.978987e-01 3.957974e-02 0 b
64 4.4 2.780373e-01 5.560746e-02 0 b
65 4.6 3.753111e-01 7.506222e-02 0 b
66 4.8 4.867523e-01 9.735045e-02 0 b
67 5.0 6.065307e-01 1.213061e-01 0 b
68 5.2 7.261490e-01 1.452298e-01 0 b
69 5.4 8.352702e-01 1.670540e-01 0 b
70 5.6 9.231163e-01 1.846233e-01 0 b
71 5.8 9.801987e-01 1.960397e-01 0 b
72 6.0 1.000000e+00 2.000000e-01 0 b
73 6.2 9.801987e-01 1.960397e-01 0 b
74 6.4 9.231163e-01 1.846233e-01 0 b
75 6.6 8.352702e-01 1.670540e-01 0 b
76 6.8 7.261490e-01 1.452298e-01 0 b
77 7.0 6.065307e-01 1.213061e-01 0 b
78 7.2 4.867523e-01 9.735045e-02 0 b
79 7.4 3.753111e-01 7.506222e-02 0 b
80 7.6 2.780373e-01 5.560746e-02 0 b
81 7.8 1.978987e-01 3.957974e-02 0 b
82 8.0 1.353353e-01 2.706706e-02 0 b
83 8.2 8.892162e-02 1.778432e-02 0 b
84 8.4 5.613476e-02 1.122695e-02 0 b
85 8.6 3.404745e-02 6.809491e-03 0 b
86 8.8 1.984109e-02 3.968219e-03 0 b
87 9.0 1.110900e-02 2.221799e-03 0 b
88 9.2 5.976023e-03 1.195205e-03 0 b
89 9.4 3.088715e-03 6.177431e-04 0 b
90 9.6 1.533811e-03 3.067621e-04 0 b
91 9.8 7.318024e-04 1.463605e-04 0 b
92 10.0 3.354626e-04 6.709253e-05 0 b
93 1.0 6.065307e-01 1.819592e-01 1 a
94 1.2 7.261490e-01 2.178447e-01 1 a
95 1.4 8.352702e-01 2.505811e-01 1 a
96 1.6 9.231163e-01 2.769349e-01 1 a
97 1.8 9.801987e-01 2.940596e-01 1 a
98 2.0 1.000000e+00 3.000000e-01 1 a
99 2.2 9.801987e-01 2.940596e-01 1 a
100 2.4 9.231163e-01 2.769349e-01 1 a
101 2.6 8.352702e-01 2.505811e-01 1 a
102 2.8 7.261490e-01 2.178447e-01 1 a
103 3.0 6.065307e-01 1.819592e-01 1 a
104 3.2 4.867523e-01 1.460257e-01 1 a
105 3.4 3.753111e-01 1.125933e-01 1 a
106 3.6 2.780373e-01 8.341119e-02 1 a
107 3.8 1.978987e-01 5.936961e-02 1 a
108 4.0 1.353353e-01 4.060058e-02 1 a
109 4.2 8.892162e-02 2.667649e-02 1 a
110 4.4 5.613476e-02 1.684043e-02 1 a
111 4.6 3.404745e-02 1.021424e-02 1 a
112 4.8 1.984109e-02 5.952328e-03 1 a
113 5.0 1.110900e-02 3.332699e-03 1 a
114 5.2 5.976023e-03 1.792807e-03 1 a
115 5.4 3.088715e-03 9.266146e-04 1 a
116 5.6 1.533811e-03 4.601432e-04 1 a
117 5.8 7.318024e-04 2.195407e-04 1 a
118 6.0 3.354626e-04 1.006388e-04 1 a
119 6.2 1.477484e-04 4.432451e-05 1 a
120 6.4 6.252150e-05 1.875645e-05 1 a
121 6.6 2.541935e-05 7.625804e-06 1 a
122 6.8 9.929504e-06 2.978851e-06 1 a
123 7.0 3.726653e-06 1.117996e-06 1 a
124 7.2 1.343812e-06 4.031437e-07 1 a
125 7.4 4.655716e-07 1.396715e-07 1 a
126 7.6 1.549753e-07 4.649259e-08 1 a
127 7.8 4.956405e-08 1.486922e-08 1 a
128 8.0 1.522998e-08 4.568994e-09 1 a
129 8.2 4.496349e-09 1.348905e-09 1 a
130 8.4 1.275408e-09 3.826223e-10 1 a
131 8.6 3.475891e-10 1.042767e-10 1 a
132 8.8 9.101471e-11 2.730441e-11 1 a
133 9.0 2.289735e-11 6.869205e-12 1 a
134 9.2 5.534610e-12 1.660383e-12 1 a
135 9.4 1.285337e-12 3.856012e-13 1 a
136 9.6 2.867975e-13 8.603925e-14 1 a
137 9.8 6.148396e-14 1.844519e-14 1 a
138 10.0 1.266417e-14 3.799250e-15 1 a
139 1.0 2.096114e-01 1.676891e-02 1 b
140 1.2 2.664683e-01 2.131746e-02 1 b
141 1.4 3.320399e-01 2.656320e-02 1 b
142 1.6 4.055545e-01 3.244436e-02 1 b
143 1.8 4.855369e-01 3.884295e-02 1 b
144 2.0 5.697828e-01 4.558263e-02 1 b
145 2.2 6.554063e-01 5.243250e-02 1 b
146 2.4 7.389685e-01 5.911748e-02 1 b
147 2.6 8.166865e-01 6.533492e-02 1 b
148 2.8 8.847059e-01 7.077647e-02 1 b
149 3.0 9.394131e-01 7.515305e-02 1 b
150 3.2 9.777512e-01 7.822010e-02 1 b
151 3.4 9.975031e-01 7.980025e-02 1 b
152 3.6 9.975031e-01 7.980025e-02 1 b
153 3.8 9.777512e-01 7.822010e-02 1 b
154 4.0 9.394131e-01 7.515305e-02 1 b
155 4.2 8.847059e-01 7.077647e-02 1 b
156 4.4 8.166865e-01 6.533492e-02 1 b
157 4.6 7.389685e-01 5.911748e-02 1 b
158 4.8 6.554063e-01 5.243250e-02 1 b
159 5.0 5.697828e-01 4.558263e-02 1 b
160 5.2 4.855369e-01 3.884295e-02 1 b
161 5.4 4.055545e-01 3.244436e-02 1 b
162 5.6 3.320399e-01 2.656320e-02 1 b
163 5.8 2.664683e-01 2.131746e-02 1 b
164 6.0 2.096114e-01 1.676891e-02 1 b
165 6.2 1.616212e-01 1.292970e-02 1 b
166 6.4 1.221507e-01 9.772054e-03 1 b
167 6.6 9.049144e-02 7.239315e-03 1 b
168 6.8 6.571027e-02 5.256822e-03 1 b
169 7.0 4.677062e-02 3.741650e-03 1 b
170 7.2 3.263076e-02 2.610460e-03 1 b
171 7.4 2.231491e-02 1.785193e-03 1 b
172 7.6 1.495813e-02 1.196651e-03 1 b
173 7.8 9.828195e-03 7.862556e-04 1 b
174 8.0 6.329715e-03 5.063772e-04 1 b
175 8.2 3.995846e-03 3.196677e-04 1 b
176 8.4 2.472563e-03 1.978050e-04 1 b
177 8.6 1.499685e-03 1.199748e-04 1 b
178 8.8 8.915937e-04 7.132750e-05 1 b
179 9.0 5.195747e-04 4.156597e-05 1 b
180 9.2 2.967858e-04 2.374286e-05 1 b
181 9.4 1.661699e-04 1.329359e-05 1 b
182 9.6 9.119596e-05 7.295677e-06 1 b
183 9.8 4.905836e-05 3.924669e-06 1 b
184 10.0 2.586810e-05 2.069448e-06 1 b
I have used facet_grid to generate this plot, but there is one thing that I am trying to figure out. So, the right panel is for z=0, and the left is for z=1. I want to move the line legend to inside the left panel (for Z=1) (top corner). I couldn't find the option for that.
And here is my code that I used in R to generate the plot:
plot1 <- ggplot(data=df, aes(x=xval, y=yval, group=cond, colour=cond) ) +
+ geom_smooth(aes(ymin = yval-se, ymax = yval+se, linetype=cond, colour=cond, fill=cond), stat="identity", size=1.1) +
+ scale_colour_hue(l=25) +
+ ylim(-0.1,1.3) + scale_linetype_manual(values = c('a' = 1,'b' = 2))
plot1 + facet_grid(~ zval, scales="free_y") + theme(strip.text.x = element_blank(),strip.background = element_rect(colour="white", fill="white"))
plot1 <- ggplot(data=df, aes(x=xval, y=yval, group=cond, colour=cond) ) +
geom_smooth(aes(ymin = yval-se, ymax = yval+se,
linetype=cond, colour=cond, fill=cond), stat="identity",
size=1.1) +
scale_colour_hue(l=25) +
ylim(-0.1,1.3) + scale_linetype_manual(values = c('a' = 1,'b' = 2))
The coordinates for legend.position are x- and y- offsets from the bottom-left of the plot, ranging from 0 - 1.
plot1 + facet_grid(~ zval, scales="free_y") +
theme(strip.text.x = element_blank(),
strip.background = element_rect(colour="white", fill="white"),
legend.position=c(.9,.75)
)
Tweak the legend.position values to suit your preference.
I have large dataset as follows:
Date rain code
2009-04-01 0.0 0
2009-04-02 0.0 0
2009-04-03 0.0 0
2009-04-04 0.7 1
2009-04-05 54.2 1
2009-04-06 0.0 0
2009-04-07 0.0 0
2009-04-08 0.0 0
2009-04-09 0.0 0
2009-04-10 0.0 0
2009-04-11 0.0 0
2009-04-12 5.3 1
2009-04-13 10.1 1
2009-04-14 6.0 1
2009-04-15 8.7 1
2009-04-16 0.0 0
2009-04-17 0.0 0
2009-04-18 0.0 0
2009-04-19 0.0 0
2009-04-20 0.0 0
2009-04-21 0.0 0
2009-04-22 0.0 0
2009-04-23 0.0 0
2009-04-24 0.0 0
2009-04-25 4.3 1
2009-04-26 42.2 1
2009-04-27 45.6 1
2009-04-28 12.6 1
2009-04-29 6.2 1
2009-04-30 1.0 1
I am trying to calculate sum of consecutive values of rain when the code is "1" and I need to have sum of them separately. For example I want to get sum of rain values from 2009-04-12 to 2009-04-15. So I am trying to find way to define when the code is equal 1 and there are consecutive rain values I get sum of them.
Any help on the above problem would be greatly appreciated.
One straightforward solution is to use rle. But I suspect there might be more "elegant" solutions out there.
# assuming dd is your data.frame
dd.rle <- rle(dd$code)
# get start pos of each consecutive 1's
start <- (cumsum(dd.rle$lengths) - dd.rle$lengths + 1)[dd.rle$values == 1]
# how long do each 1's extend?
ival <- dd.rle$lengths[dd.rle$values == 1]
# using these two, compute the sum
apply(as.matrix(seq_along(start)), 1, function(idx) {
sum(dd$rain[start[idx]:(start[idx]+ival[idx]-1)])
})
# [1] 54.9 30.1 111.9
Edit: An even simpler method with rle and tapply.
dd.rle <- rle(dd$code)
# get the length of each consecutive 1's
ival <- dd.rle$lengths[dd.rle$values == 1]
# using lengths, construct a `factor` with levels = length(ival)
levl <- factor(rep(seq_along(ival), ival))
# use these levels to extract `rain[code == 1]` and compute sum
tapply(dd$rain[dd$code == 1], levl, sum)
# 1 2 3
# 54.9 30.1 111.9
Following is vectorized way of getting the desired result.
df <- read.table(textConnection("Date rain code\n2009-04-01 0.0 0\n2009-04-02 0.0 0\n2009-04-03 0.0 0\n2009-04-04 0.7 1\n2009-04-05 54.2 1\n2009-04-06 0.0 0\n2009-04-07 0.0 0\n2009-04-08 0.0 0\n2009-04-09 0.0 0\n2009-04-10 0.0 0\n2009-04-11 0.0 0\n2009-04-12 5.3 1\n2009-04-13 10.1 1\n2009-04-14 6.0 1\n2009-04-15 8.7 1\n2009-04-16 0.0 0\n2009-04-17 0.0 0\n2009-04-18 0.0 0\n2009-04-19 0.0 0\n2009-04-20 0.0 0\n2009-04-21 0.0 0\n2009-04-22 0.0 0\n2009-04-23 0.0 0\n2009-04-24 0.0 0\n2009-04-25 4.3 1\n2009-04-26 42.2 1\n2009-04-27 45.6 1\n2009-04-28 12.6 1\n2009-04-29 6.2 1\n2009-04-30 1.0 1"),
header = TRUE)
df$cumsum <- cumsum(df$rain)
df$diff <- c(diff(df$code), 0)
df$result <- rep(NA, nrow(df))
if (nrow(df[df$diff == -1, ]) == nrow(df[df$diff == 1, ])) {
result <- df[df$diff == -1, "cumsum"] - df[df$diff == 1, "cumsum"]
df[df$diff == -1, "result"] <- result
} else {
result <- c(df[df$diff == -1, "cumsum"], df[nrow(df), "cumsum"]) - df[df$diff == 1, "cumsum"]
df[df$diff == -1, "result"] <- result[1:length(result) - 1]
df[nrow(df), "result"] <- result[length(result)]
}
df
## Date rain code cumsum diff result
## 1 2009-04-01 0.0 0 0.0 0 NA
## 2 2009-04-02 0.0 0 0.0 0 NA
## 3 2009-04-03 0.0 0 0.0 1 NA
## 4 2009-04-04 0.7 1 0.7 0 NA
## 5 2009-04-05 54.2 1 54.9 -1 54.9
## 6 2009-04-06 0.0 0 54.9 0 NA
## 7 2009-04-07 0.0 0 54.9 0 NA
## 8 2009-04-08 0.0 0 54.9 0 NA
## 9 2009-04-09 0.0 0 54.9 0 NA
## 10 2009-04-10 0.0 0 54.9 0 NA
## 11 2009-04-11 0.0 0 54.9 1 NA
## 12 2009-04-12 5.3 1 60.2 0 NA
## 13 2009-04-13 10.1 1 70.3 0 NA
## 14 2009-04-14 6.0 1 76.3 0 NA
## 15 2009-04-15 8.7 1 85.0 -1 30.1
## 16 2009-04-16 0.0 0 85.0 0 NA
## 17 2009-04-17 0.0 0 85.0 0 NA
## 18 2009-04-18 0.0 0 85.0 0 NA
## 19 2009-04-19 0.0 0 85.0 0 NA
## 20 2009-04-20 0.0 0 85.0 0 NA
## 21 2009-04-21 0.0 0 85.0 0 NA
## 22 2009-04-22 0.0 0 85.0 0 NA
## 23 2009-04-23 0.0 0 85.0 0 NA
## 24 2009-04-24 0.0 0 85.0 1 NA
## 25 2009-04-25 4.3 1 89.3 0 NA
## 26 2009-04-26 42.2 1 131.5 0 NA
## 27 2009-04-27 45.6 1 177.1 0 NA
## 28 2009-04-28 12.6 1 189.7 0 NA
## 29 2009-04-29 6.2 1 195.9 0 NA
## 30 2009-04-30 1.0 1 196.9 0 111.9
Hi
i have a 10 year, 5 minutes resolution data set of dust concentration
and i have seperetly a 15 year data set with a day resolution of the synoptic clasification
how can i combine these two datasets they are not the same length or resolution
here is a sample of the data
> head(synoptic)
date synoptic
1 01/01/1995 8
2 02/01/1995 7
3 03/01/1995 7
4 04/01/1995 20
5 05/01/1995 1
6 06/01/1995 1
>
head(beit.shemesh)
X........................ StWd SHT PRE GSR RH Temp WD WS PM10 CO O3
1 NA 64 19.8 0 -2.9 37 15.2 61 2.2 241 0.9 40.6
2 NA 37 20.1 0 1.1 38 15.2 344 2.1 241 0.9 40.3
3 NA 36 20.2 0 0.7 39 15.1 32 1.9 241 0.9 39.4
4 NA 52 20.1 0 0.9 40 14.9 20 2.1 241 0.9 38.7
5 NA 42 19.0 0 0.9 40 14.6 11 2.0 241 0.9 38.7
6 NA 75 19.9 0 0.2 40 14.5 341 1.3 241 0.9 39.1
No2 Nox No SO2 date
1 1.4 2.9 1.5 1.6 31/12/2000 24:00
2 1.7 3.1 1.4 0.9 01/01/2001 00:05
3 2.1 3.5 1.4 1.2 01/01/2001 00:10
4 2.7 4.2 1.5 1.3 01/01/2001 00:15
5 2.3 3.8 1.5 1.4 01/01/2001 00:20
6 2.8 4.3 1.5 1.3 01/01/2001 00:25
any idea's
Make an extra column for calculating the dates, and then merge. To do this, you have to generate a variable in each dataframe bearing the same name, hence you first need some renaming. Also make sure that the merge column you use has the same type in both dataframes :
beit.shemesh$datetime <- beit.shemesh$date
beit.shemesh$date <- as.Date(beith.shemesh$datetime,format="%d/%m/%Y")
synoptic$date <- as.Date(synoptic$date,format="%d/%m/%Y")
merge(synoptic, beit.shemesh,by="date",all.y=TRUE)
Using all.y=TRUE keeps the beit.shemesh dataset intact. If you also want empty rows for all non-matching rows in synoptic, you could use all=TRUE instead.