This question already has answers here:
Having trouble viewing more than 10 rows in a tibble [duplicate]
(3 answers)
Closed 2 years ago.
df <- data.frame(x=1:10000, y=1:10000, z=1:10000)
print(df)
...
330 330 330 330
331 331 331 331
332 332 332 332
333 333 333 333
[ reached 'max' / getOption("max.print") -- omitted 667 rows ]
How can i set the number of rows (50 for example) of a data frame i want to be printed into the console?
Regards
Using the n argument in print() in tibbles.
library(tibble)
iris_tbl <- as_tibble(iris)
print(iris_tbl)
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# ... with 140 more rows
print(iris_tbl, n = 30)
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3 1.4 0.1 setosa
14 4.3 3 1.1 0.1 setosa
15 5.8 4 1.2 0.2 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
21 5.4 3.4 1.7 0.2 setosa
22 5.1 3.7 1.5 0.4 setosa
23 4.6 3.6 1 0.2 setosa
24 5.1 3.3 1.7 0.5 setosa
25 4.8 3.4 1.9 0.2 setosa
26 5 3 1.6 0.2 setosa
27 5 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
29 5.2 3.4 1.4 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
# ... with 120 more rows
Related
I am trying to join two dataframes. The smaller is a subset of the larger, with updated values. I wish to keep all rows and columns in the larger dataframe, but overwrite values with the values in the smaller where the row ID and column correspond.
I can't see that any of the normal dplyr or base join operations (join, right, outer, inner) can easily achieve this. I am therefore looking for a join function/operation that can achieve what I want.
df1 <- structure(list(
ID = as.factor(c(1,2,5,6)),
Sepal.Width = c(4.5, 7, 3.2, 3.1),
Petal.Length = c(1.8, 2.4, 3.3, 6.5),
Petal.Width = c(1.2, 7.2, 3.2, 3.2)), row.names = c(NA,
4L), class = "data.frame")
df2 <- cbind(data.frame(ID = as.factor(1:10)), iris[1:10, 1:5])
A data.frame: 4 × 4
ID Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl>
1 1 4.5 1.8 1.2
2 2 7.0 2.4 7.2
3 5 3.2 3.3 3.2
4 6 3.1 6.5 3.2
A data.frame: 10 × 6
ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<fct> <dbl> <dbl> <dbl> <dbl> <fct>
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
I want to merge these into one:
A data.frame: 10 × 6
ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<fct> <dbl> <dbl> <dbl> <dbl> <fct>
1 1 5.1 4.5 1.8 1.2 setosa #<-- Updated rows
2 2 4.9 7.0 2.4 7.2 setosa #<-- Updated rows
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.2 3.3 3.2 setosa #<-- Updated rows
6 6 5.4 3.1 6.5 3.2 setosa #<-- Updated rows
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
# Î Î Î
# Updated columns
Have you tried the (relatively) new function rows_update from dplyr which does this.
library(dplyr)
df2 %>% rows_update(df1, by = 'ID')
# ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 1 5.1 4.5 1.8 1.2 setosa
#2 2 4.9 7.0 2.4 7.2 setosa
#3 3 4.7 3.2 1.3 0.2 setosa
#4 4 4.6 3.1 1.5 0.2 setosa
#5 5 5.0 3.2 3.3 3.2 setosa
#6 6 5.4 3.1 6.5 3.2 setosa
#7 7 4.6 3.4 1.4 0.3 setosa
#8 8 5.0 3.4 1.5 0.2 setosa
#9 9 4.4 2.9 1.4 0.2 setosa
#10 10 4.9 3.1 1.5 0.1 setosa
we can also use {powerjoin}
library(powerjoin)
power_left_join(df2, df1, by = "ID", conflict = coalesce_yx)
#> ID Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 1 5.1 setosa 4.5 1.8 1.2
#> 2 2 4.9 setosa 7.0 2.4 7.2
#> 3 3 4.7 setosa 3.2 1.3 0.2
#> 4 4 4.6 setosa 3.1 1.5 0.2
#> 5 5 5.0 setosa 3.2 3.3 3.2
#> 6 6 5.4 setosa 3.1 6.5 3.2
#> 7 7 4.6 setosa 3.4 1.4 0.3
#> 8 8 5.0 setosa 3.4 1.5 0.2
#> 9 9 4.4 setosa 2.9 1.4 0.2
#> 10 10 4.9 setosa 3.1 1.5 0.1
It moves the conflicted columns to the end though
Sample df:
library(tidyverse)
iris <- iris[1:10,]
iris$testlag <- NA
iris[[1,"testlag"]] <- 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
1 5.1 3.5 1.4 0.2 setosa 5
2 4.9 3.0 1.4 0.2 setosa NA
3 4.7 3.2 1.3 0.2 setosa NA
4 4.6 3.1 1.5 0.2 setosa NA
5 5.0 3.6 1.4 0.2 setosa NA
6 5.4 3.9 1.7 0.4 setosa NA
7 4.6 3.4 1.4 0.3 setosa NA
8 5.0 3.4 1.5 0.2 setosa NA
9 4.4 2.9 1.4 0.2 setosa NA
10 4.9 3.1 1.5 0.1 setosa NA
In the testlag column, I'm interesting in using dplyr::lag() to retrieve the previous value and add some column, for example Petal.Length to it. As I have only one initial value, each subsequent calculation requires it to work iteratively, so I thought something like mutate would work.
I first tried doing something like this:
iris %>% mutate_at("testlag", ~ lag(.) + Petal.Length)
But this removed the first value, and only gave a valid value for the second row and NAs for the rest. Intuitively I know why it's removing the first value, but I thought the nature of mutate would allow it to work for the rest of the values, so I don't know how to fix that.
Of course using base R I could something like:
for (idx in 2:nrow(iris)) {
iris[[idx, "testlag"]] <-
lag(iris$testlag)[idx] + iris[[idx, "Petal.Length"]]
}
But I would prefer to implement this in tidyverse syntax.
Edit: Desired output (from my for loop)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
1 5.1 3.5 1.4 0.2 setosa 5.0
2 4.9 3.0 1.4 0.2 setosa 6.4
3 4.7 3.2 1.3 0.2 setosa 7.7
4 4.6 3.1 1.5 0.2 setosa 9.2
5 5.0 3.6 1.4 0.2 setosa 10.6
6 5.4 3.9 1.7 0.4 setosa 12.3
7 4.6 3.4 1.4 0.3 setosa 13.7
8 5.0 3.4 1.5 0.2 setosa 15.2
9 4.4 2.9 1.4 0.2 setosa 16.6
10 4.9 3.1 1.5 0.1 setosa 18.1
Does this work for you?
library(tidyverse)
library("data.table")
iris <- iris[1:10,]
iris$testlag <- NA
iris[[1,"testlag"]] <- 5
iris %>% mutate (testlag = lag(first(testlag) + cumsum(Petal.Length)))
Result:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
1 5.1 3.5 1.4 0.2 setosa NA
2 4.9 3.0 1.4 0.2 setosa 6.4
3 4.7 3.2 1.3 0.2 setosa 7.8
4 4.6 3.1 1.5 0.2 setosa 9.1
5 5.0 3.6 1.4 0.2 setosa 10.6
6 5.4 3.9 1.7 0.4 setosa 12.0
7 4.6 3.4 1.4 0.3 setosa 13.7
8 5.0 3.4 1.5 0.2 setosa 15.1
9 4.4 2.9 1.4 0.2 setosa 16.6
10 4.9 3.1 1.5 0.1 setosa 18.0
Since technically there is no N-1 Petal length when N = 1, I left the first value of testlag NA. Do you really need it to be initial value? If you need, this will work:
iris %>% mutate (testlag = lag(first(testlag) + cumsum(Petal.Length), default=first(testlag)))
The function you're looking for is tidyr::fill
library(tidyverse)
iris <- iris[1:10,]
iris$testlag <- NA
iris[[1,"testlag"]] <- 5
iris %>% fill(testlag, .direction = "down")
# Note the default is 'down', but I included here for completeness
This takes the specified column (testlag in this case), and copies any values in that column to the values below. This also works if you have a value in a subset of the rows: it copies the value down until it reaches a new value, then it picks up with that one.
For example:
library(tidyverse)
iris <- iris[1:10,]
iris$testlag <- NA
iris[[1,"testlag"]] <- 5
iris[[5,"testlag"]] <- 10
Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
1 5.1 3.5 1.4 0.2 setosa 5
2 4.9 3.0 1.4 0.2 setosa NA
3 4.7 3.2 1.3 0.2 setosa NA
4 4.6 3.1 1.5 0.2 setosa NA
5 5.0 3.6 1.4 0.2 setosa 10
6 5.4 3.9 1.7 0.4 setosa NA
7 4.6 3.4 1.4 0.3 setosa NA
8 5.0 3.4 1.5 0.2 setosa NA
9 4.4 2.9 1.4 0.2 setosa NA
10 4.9 3.1 1.5 0.1 setosa NA
Applying this function...
iris %>% fill(testlag, .direction = "down")
Gives
Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
1 5.1 3.5 1.4 0.2 setosa 5
2 4.9 3.0 1.4 0.2 setosa 5
3 4.7 3.2 1.3 0.2 setosa 5
4 4.6 3.1 1.5 0.2 setosa 5
5 5.0 3.6 1.4 0.2 setosa 10
6 5.4 3.9 1.7 0.4 setosa 10
7 4.6 3.4 1.4 0.3 setosa 10
8 5.0 3.4 1.5 0.2 setosa 10
9 4.4 2.9 1.4 0.2 setosa 10
10 4.9 3.1 1.5 0.1 setosa 10
I have a list consisting of dataframes. The list is created by a funtion that I cannot control. Therefore, each dataframe holds more information then I need. The structure of every dataframe in the list is the same. What I need to do is to filter out rows by values of one column and write this to a new list. The list contains over 1000 dataframes of the same structure.
historical_file[1]
$daily_kl_historical_tageswerte_KL_00001_19370101_19860630_hist
STATIONS_ID MESS_DATUM QN_3 FX FM QN_4 RSK RSKF SDK SHK_TAG NM VPM PM TMK UPM TXK TNK TGK eor
1 1 1937-01-01 NA NA NA 5 0.0 0 NA 0 6.3 NA NA -0.5 NA 2.5 -1.6 NA eor
2 1 1937-01-02 NA NA NA 5 0.0 0 NA 0 3.0 NA NA 0.3 NA 5.0 -4.0 NA eor
3 1 1937-01-03 NA NA NA 5 0.0 0 NA 0 4.3 NA NA 3.2 NA 5.0 -0.2 NA eor
4 1 1937-01-04 NA NA NA 5 0.0 0 NA 0 8.0 NA NA 0.2 NA 3.8 -0.2 NA eor
5 1 1937-01-05 NA NA NA 5 0.0 0 NA 0 8.0 NA NA 1.4 NA 4.5 -0.7 NA eor
6 1 1937-01-06 NA NA NA 5 5.2 7 NA 0 6.0 NA NA 0.2 NA 2.0 -2.4 NA eor
[ reached 'max' / getOption("max.print") -- omitted 17296 rows ]
$daily_kl_historical_tageswerte_KL_00003_18910101_20110331_hist
STATIONS_ID MESS_DATUM QN_3 FX FM QN_4 RSK RSKF SDK SHK_TAG NM VPM PM TMK UPM TXK TNK TGK eor
1 3 1891-01-01 NA NA NA 5 0.0 0 NA NA 0.0 4.3 NA -3.6 88 0.5 -5.9 NA eor
2 3 1891-01-02 NA NA NA 5 0.0 0 NA NA 2.7 4.1 NA -2.8 84 0.0 -5.8 NA eor
3 3 1891-01-03 NA NA NA 5 2.5 1 NA NA 3.7 3.9 NA -0.2 69 2.1 -6.2 NA eor
4 3 1891-01-04 NA NA NA 5 8.2 1 NA NA 8.0 6.4 NA 1.8 90 3.7 0.6 NA eor
5 3 1891-01-05 NA NA NA 5 1.9 1 NA NA 7.7 4.7 NA -2.5 87 1.5 -4.2 NA eor
6 3 1891-01-06 NA NA NA 5 2.5 1 NA NA 8.0 3.5 NA -5.8 88 -4.0 -6.9 NA eor
I would like to filter every dataframe by MESS_DATUM. So on an individual dataframe I would do
historical_file_new<-historical_file%>%filter(MESS_DATUM>'2000-07-01')
How to do that on this list?
you pass your filter into lapply
library(dplyr)
l <- list(iris,iris)
lapply(l,function(x) filter(x,Species=="setosa"))
#> [[1]]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> 11 5.4 3.7 1.5 0.2 setosa
#> 12 4.8 3.4 1.6 0.2 setosa
#> 13 4.8 3.0 1.4 0.1 setosa
#> 14 4.3 3.0 1.1 0.1 setosa
#> 15 5.8 4.0 1.2 0.2 setosa
#> 16 5.7 4.4 1.5 0.4 setosa
#> 17 5.4 3.9 1.3 0.4 setosa
#> 18 5.1 3.5 1.4 0.3 setosa
#> 19 5.7 3.8 1.7 0.3 setosa
#> 20 5.1 3.8 1.5 0.3 setosa
#> 21 5.4 3.4 1.7 0.2 setosa
#> 22 5.1 3.7 1.5 0.4 setosa
#> 23 4.6 3.6 1.0 0.2 setosa
#> 24 5.1 3.3 1.7 0.5 setosa
#> 25 4.8 3.4 1.9 0.2 setosa
#> 26 5.0 3.0 1.6 0.2 setosa
#> 27 5.0 3.4 1.6 0.4 setosa
#> 28 5.2 3.5 1.5 0.2 setosa
#> 29 5.2 3.4 1.4 0.2 setosa
#> 30 4.7 3.2 1.6 0.2 setosa
#> 31 4.8 3.1 1.6 0.2 setosa
#> 32 5.4 3.4 1.5 0.4 setosa
#> 33 5.2 4.1 1.5 0.1 setosa
#> 34 5.5 4.2 1.4 0.2 setosa
#> 35 4.9 3.1 1.5 0.2 setosa
#> 36 5.0 3.2 1.2 0.2 setosa
#> 37 5.5 3.5 1.3 0.2 setosa
#> 38 4.9 3.6 1.4 0.1 setosa
#> 39 4.4 3.0 1.3 0.2 setosa
#> 40 5.1 3.4 1.5 0.2 setosa
#> 41 5.0 3.5 1.3 0.3 setosa
#> 42 4.5 2.3 1.3 0.3 setosa
#> 43 4.4 3.2 1.3 0.2 setosa
#> 44 5.0 3.5 1.6 0.6 setosa
#> 45 5.1 3.8 1.9 0.4 setosa
#> 46 4.8 3.0 1.4 0.3 setosa
#> 47 5.1 3.8 1.6 0.2 setosa
#> 48 4.6 3.2 1.4 0.2 setosa
#> 49 5.3 3.7 1.5 0.2 setosa
#> 50 5.0 3.3 1.4 0.2 setosa
#>
#> [[2]]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> 11 5.4 3.7 1.5 0.2 setosa
#> 12 4.8 3.4 1.6 0.2 setosa
#> 13 4.8 3.0 1.4 0.1 setosa
#> 14 4.3 3.0 1.1 0.1 setosa
#> 15 5.8 4.0 1.2 0.2 setosa
#> 16 5.7 4.4 1.5 0.4 setosa
#> 17 5.4 3.9 1.3 0.4 setosa
#> 18 5.1 3.5 1.4 0.3 setosa
#> 19 5.7 3.8 1.7 0.3 setosa
#> 20 5.1 3.8 1.5 0.3 setosa
#> 21 5.4 3.4 1.7 0.2 setosa
#> 22 5.1 3.7 1.5 0.4 setosa
#> 23 4.6 3.6 1.0 0.2 setosa
#> 24 5.1 3.3 1.7 0.5 setosa
#> 25 4.8 3.4 1.9 0.2 setosa
#> 26 5.0 3.0 1.6 0.2 setosa
#> 27 5.0 3.4 1.6 0.4 setosa
#> 28 5.2 3.5 1.5 0.2 setosa
#> 29 5.2 3.4 1.4 0.2 setosa
#> 30 4.7 3.2 1.6 0.2 setosa
#> 31 4.8 3.1 1.6 0.2 setosa
#> 32 5.4 3.4 1.5 0.4 setosa
#> 33 5.2 4.1 1.5 0.1 setosa
#> 34 5.5 4.2 1.4 0.2 setosa
#> 35 4.9 3.1 1.5 0.2 setosa
#> 36 5.0 3.2 1.2 0.2 setosa
#> 37 5.5 3.5 1.3 0.2 setosa
#> 38 4.9 3.6 1.4 0.1 setosa
#> 39 4.4 3.0 1.3 0.2 setosa
#> 40 5.1 3.4 1.5 0.2 setosa
#> 41 5.0 3.5 1.3 0.3 setosa
#> 42 4.5 2.3 1.3 0.3 setosa
#> 43 4.4 3.2 1.3 0.2 setosa
#> 44 5.0 3.5 1.6 0.6 setosa
#> 45 5.1 3.8 1.9 0.4 setosa
#> 46 4.8 3.0 1.4 0.3 setosa
#> 47 5.1 3.8 1.6 0.2 setosa
#> 48 4.6 3.2 1.4 0.2 setosa
#> 49 5.3 3.7 1.5 0.2 setosa
#> 50 5.0 3.3 1.4 0.2 setosa
Created on 2020-04-20 by the reprex package (v0.3.0)
I am in the process of learning the tidyverse and am loving the flow the pipe operator offers. I was wondering, is it possible to split a pipe at all so that the output from one part of the pipe can go to two separate commands? I have done a little research on this and have seen nothing about this being possible. So that instead of doing something like this where you would have to save the first step.
iris_filter <- iris %>%
filter(Sepal.Length <= 5.8)
iris_filter %>%
summarise(n= n())
iris_filter %>%
arrange(Sepal.Length)
Could you instead have filter passed to two separate commands and continue down two distinct pipe paths? A little image to clarify what I am curious is possible.
The %T>% operator from the magrittr-package seems to be what you are looking for.
However for that specific problem I would write a custom function which outputs the original data:
library(tidyverse)
custom.function <- function(x) {
summarise(x, n = n()) %>%
print()
return(x)
}
iris %>%
filter(Sepal.Length <= 5.8) %>%
custom.function() %>%
arrange(Sepal.Length)
#> n
#> 1 80
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 4.3 3.0 1.1 0.1 setosa
#> 2 4.4 2.9 1.4 0.2 setosa
#> 3 4.4 3.0 1.3 0.2 setosa
#> 4 4.4 3.2 1.3 0.2 setosa
#> 5 4.5 2.3 1.3 0.3 setosa
#> 6 4.6 3.1 1.5 0.2 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 4.6 3.6 1.0 0.2 setosa
#> 9 4.6 3.2 1.4 0.2 setosa
#> 10 4.7 3.2 1.3 0.2 setosa
#> 11 4.7 3.2 1.6 0.2 setosa
#> 12 4.8 3.4 1.6 0.2 setosa
#> 13 4.8 3.0 1.4 0.1 setosa
#> 14 4.8 3.4 1.9 0.2 setosa
#> 15 4.8 3.1 1.6 0.2 setosa
#> 16 4.8 3.0 1.4 0.3 setosa
#> 17 4.9 3.0 1.4 0.2 setosa
#> 18 4.9 3.1 1.5 0.1 setosa
#> 19 4.9 3.1 1.5 0.2 setosa
#> 20 4.9 3.6 1.4 0.1 setosa
#> 21 4.9 2.4 3.3 1.0 versicolor
#> 22 4.9 2.5 4.5 1.7 virginica
#> 23 5.0 3.6 1.4 0.2 setosa
#> 24 5.0 3.4 1.5 0.2 setosa
#> 25 5.0 3.0 1.6 0.2 setosa
#> 26 5.0 3.4 1.6 0.4 setosa
#> 27 5.0 3.2 1.2 0.2 setosa
#> 28 5.0 3.5 1.3 0.3 setosa
#> 29 5.0 3.5 1.6 0.6 setosa
#> 30 5.0 3.3 1.4 0.2 setosa
#> 31 5.0 2.0 3.5 1.0 versicolor
#> 32 5.0 2.3 3.3 1.0 versicolor
#> 33 5.1 3.5 1.4 0.2 setosa
#> 34 5.1 3.5 1.4 0.3 setosa
#> 35 5.1 3.8 1.5 0.3 setosa
#> 36 5.1 3.7 1.5 0.4 setosa
#> 37 5.1 3.3 1.7 0.5 setosa
#> 38 5.1 3.4 1.5 0.2 setosa
#> 39 5.1 3.8 1.9 0.4 setosa
#> 40 5.1 3.8 1.6 0.2 setosa
#> 41 5.1 2.5 3.0 1.1 versicolor
#> 42 5.2 3.5 1.5 0.2 setosa
#> 43 5.2 3.4 1.4 0.2 setosa
#> 44 5.2 4.1 1.5 0.1 setosa
#> 45 5.2 2.7 3.9 1.4 versicolor
#> 46 5.3 3.7 1.5 0.2 setosa
#> 47 5.4 3.9 1.7 0.4 setosa
#> 48 5.4 3.7 1.5 0.2 setosa
#> 49 5.4 3.9 1.3 0.4 setosa
#> 50 5.4 3.4 1.7 0.2 setosa
#> 51 5.4 3.4 1.5 0.4 setosa
#> 52 5.4 3.0 4.5 1.5 versicolor
#> 53 5.5 4.2 1.4 0.2 setosa
#> 54 5.5 3.5 1.3 0.2 setosa
#> 55 5.5 2.3 4.0 1.3 versicolor
#> 56 5.5 2.4 3.8 1.1 versicolor
#> 57 5.5 2.4 3.7 1.0 versicolor
#> 58 5.5 2.5 4.0 1.3 versicolor
#> 59 5.5 2.6 4.4 1.2 versicolor
#> 60 5.6 2.9 3.6 1.3 versicolor
#> 61 5.6 3.0 4.5 1.5 versicolor
#> 62 5.6 2.5 3.9 1.1 versicolor
#> 63 5.6 3.0 4.1 1.3 versicolor
#> 64 5.6 2.7 4.2 1.3 versicolor
#> 65 5.6 2.8 4.9 2.0 virginica
#> 66 5.7 4.4 1.5 0.4 setosa
#> 67 5.7 3.8 1.7 0.3 setosa
#> 68 5.7 2.8 4.5 1.3 versicolor
#> 69 5.7 2.6 3.5 1.0 versicolor
#> 70 5.7 3.0 4.2 1.2 versicolor
#> 71 5.7 2.9 4.2 1.3 versicolor
#> 72 5.7 2.8 4.1 1.3 versicolor
#> 73 5.7 2.5 5.0 2.0 virginica
#> 74 5.8 4.0 1.2 0.2 setosa
#> 75 5.8 2.7 4.1 1.0 versicolor
#> 76 5.8 2.7 3.9 1.2 versicolor
#> 77 5.8 2.6 4.0 1.2 versicolor
#> 78 5.8 2.7 5.1 1.9 virginica
#> 79 5.8 2.8 5.1 2.4 virginica
#> 80 5.8 2.7 5.1 1.9 virginica
Created on 2018-11-04 by the reprex package (v0.2.1)
I don't think this is possible. One workaround is to save the intermediate values in the full dataframe, for example:
iris %>%
add_tally() %>%
filter(Sepal.Length <= 5.8) %>%
arrange(Sepal.Length)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species n
<dbl> <dbl> <dbl> <dbl> <fct> <int>
1 4.3 3 1.1 0.1 setosa 150
2 4.4 2.9 1.4 0.2 setosa 150
3 4.4 3 1.3 0.2 setosa 150
4 4.4 3.2 1.3 0.2 setosa 150
5 4.5 2.3 1.3 0.3 setosa 150
Here you can use functions such as add_tally() or add_count(group1, group2, ...), which are basically equivalents of more verbose mutate(n = n()), and group_by(group1, group2, ..) %>% mutate(n = n()).
You can always use the values stored for further calculations / charts then.
I want to subset a data frame using a function as follows.
calcScore <- function(y){
t <- iris[iris$Species == y,]
return(t)
}
when I passed the value as calcScore('setosa') it gave an output as below.
> calcScore('setosa')
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
14 4.3 3.0 1.1 0.1 setosa
15 5.8 4.0 1.2 0.2 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
21 5.4 3.4 1.7 0.2 setosa
22 5.1 3.7 1.5 0.4 setosa
23 4.6 3.6 1.0 0.2 setosa
24 5.1 3.3 1.7 0.5 setosa
25 4.8 3.4 1.9 0.2 setosa
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
29 5.2 3.4 1.4 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
31 4.8 3.1 1.6 0.2 setosa
32 5.4 3.4 1.5 0.4 setosa
33 5.2 4.1 1.5 0.1 setosa
34 5.5 4.2 1.4 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
36 5.0 3.2 1.2 0.2 setosa
37 5.5 3.5 1.3 0.2 setosa
38 4.9 3.6 1.4 0.1 setosa
39 4.4 3.0 1.3 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
43 4.4 3.2 1.3 0.2 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa
47 5.1 3.8 1.6 0.2 setosa
48 4.6 3.2 1.4 0.2 setosa
49 5.3 3.7 1.5 0.2 setosa
50 5.0 3.3 1.4 0.2 setosa
But dataframe t cannot get after that. it gives the following error.
> t
standardGeneric for "t" defined from package "base"
function (x)
standardGeneric("t")
<environment: 0x11be807c>
Methods may be defined for arguments: x
Use showMethods("t") for currently available ones.
How can I write a function to subset the dataframe and it should be saved and can be able to access later?
You haven't assigned the output to anything. In other words, try something like:
mynewdf <- calcScore('setosa')