My line of code is
df1<-rbind(df1,assign(paste(x,"_name_",_Date,sep=""),Result))
Basically
assign(paste(x,"_name_",_Date,sep=""),Result)
is the same as
df2
When i do
df1<-rbind(df1,df2)
it works but this needs to be dynamic and constantly changing as i do these updates weekly.
We need get to return the value from the object name string i.e. assign only assign it to an object and it doesn't return the value.
rbind(df1, {
nm1 <- paste(x,"_name_",_Date,sep="")
assign(nm1, Result)
get(nm1)})
Using a small reproducible example
rbind(head(iris), {
nm1 <- 'newobj'
assign(nm1, tail(iris))
get(nm1)})
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
#145 6.7 3.3 5.7 2.5 virginica
#146 6.7 3.0 5.2 2.3 virginica
#147 6.3 2.5 5.0 1.9 virginica
#148 6.5 3.0 5.2 2.0 virginica
#149 6.2 3.4 5.4 2.3 virginica
#150 5.9 3.0 5.1 1.8 virginica
Related
I would like to know if there is an elegant and concise way to do conditional filtering with data.table.
My aim is the following:
if condition 1 is met, filter based on condition 2.
For instance, in the case of the iris dataset,
how can I drop the observations among Species=="setosa" where Sepal.Length<5.5, while keeping all observations with Sepal.Length<5.5 for other species?
I know how to do this in steps, but I wonder if there is a better way to do it in a single liner
# this is how I would do it in steps.
data("iris")
# first only select observations in setosa I am interested in keeping
iris1<- setDT(iris)[Sepal.Length>=5.5&Species=="setosa"]
# second, drop all of setosa observations.
iris2<- setDT(iris)[Species!="setosa"]
# join data,
iris_final<-full_join(iris1,iris2)
head(iris_final)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 5.8 4.0 1.2 0.2 setosa
2: 5.7 4.4 1.5 0.4 setosa
3: 5.7 3.8 1.7 0.3 setosa
4: 5.5 4.2 1.4 0.2 setosa
5: 5.5 3.5 1.3 0.2 setosa # only keeping setosa with Sepal.Length>=5.5. Note that for other species, Sepal.Length can be <5.5
6: 7.0 3.2 4.7 1.4 versicolor
is there a more concise and elegant way of doing this?
Is something like the following what you are looking for? It is not very clear what you want.
library(data.table)
dt <- data.table(iris)
dt[Sepal.Length >= 5.5 & Species == "setosa" | Species != "setosa"]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1: 5.8 4.0 1.2 0.2 setosa
#> 2: 5.7 4.4 1.5 0.4 setosa
#> 3: 5.7 3.8 1.7 0.3 setosa
#> 4: 5.5 4.2 1.4 0.2 setosa
#> 5: 5.5 3.5 1.3 0.2 setosa
#> ---
#> 101: 6.7 3.0 5.2 2.3 virginica
#> 102: 6.3 2.5 5.0 1.9 virginica
#> 103: 6.5 3.0 5.2 2.0 virginica
#> 104: 6.2 3.4 5.4 2.3 virginica
#> 105: 5.9 3.0 5.1 1.8 virginica
You can use the | or operator:
This is asking to remove any lines where Species=="setosa" & Sepal.Length<5.5 and keep lines where Sepal.Length>5.5
iris1[!(Species=="setosa" & Sepal.Length<5.5) | Sepal.Length>5.5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 5.8 4.0 1.2 0.2 setosa
2: 5.7 4.4 1.5 0.4 setosa
3: 5.7 3.8 1.7 0.3 setosa
4: 5.5 4.2 1.4 0.2 setosa
5: 5.5 3.5 1.3 0.2 setosa
---
101: 6.7 3.0 5.2 2.3 virginica
102: 6.3 2.5 5.0 1.9 virginica
103: 6.5 3.0 5.2 2.0 virginica
104: 6.2 3.4 5.4 2.3 virginica
105: 5.9 3.0 5.1 1.8 virginica
This is a simplified version of the actual problem I'm dealing with. In this example, I'll be working with four columns, and the actual problem requires working with about 20-30 columns.
Consider the iris dataset. Suppose that I wanted to, for some reason, append new columns which would be equal to double the .Length and the .Width columns. With the following code, this would change the existing columns:
library(dplyr)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
df_iris <- iris %>% mutate(across(matches("(\\.)(Length|Width)"),
function(x) { x * 2 }))
head(df_iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 10.2 7.0 2.8 0.4 setosa
2 9.8 6.0 2.8 0.4 setosa
3 9.4 6.4 2.6 0.4 setosa
4 9.2 6.2 3.0 0.4 setosa
5 10.0 7.2 2.8 0.4 setosa
6 10.8 7.8 3.4 0.8 setosa
However, instead, I would like to have this doubled calculation create NEW columns, say .Length.2 and .Width.2. One way this could be done is the following:
double <- function(x) {
x * 2
}
df_iris <- iris %>%
mutate(Sepal.Length.2 = double(Sepal.Length),
Sepal.Width.2 = double(Sepal.Width),
Petal.Length.2 = double(Petal.Length),
Petal.Width.2 = double(Petal.Width))
head(df_iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.2 Sepal.Width.2 Petal.Length.2 Petal.Width.2
1 5.1 3.5 1.4 0.2 setosa 10.2 7.0 2.8 0.4
2 4.9 3.0 1.4 0.2 setosa 9.8 6.0 2.8 0.4
3 4.7 3.2 1.3 0.2 setosa 9.4 6.4 2.6 0.4
4 4.6 3.1 1.5 0.2 setosa 9.2 6.2 3.0 0.4
5 5.0 3.6 1.4 0.2 setosa 10.0 7.2 2.8 0.4
6 5.4 3.9 1.7 0.4 setosa 10.8 7.8 3.4 0.8
Is there a way to do this in dplyr without:
relying on superseded/deprecated functions?
having to manually specify each column name?
We can use across (used dplyr 1.0.6 version)
library(dplyr)
df_iris <- iris %>%
mutate(across(where(is.numeric), double, .names = '{.col}.2'))
-output
head(df_iris, 3)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.2 Sepal.Width.2 Petal.Length.2 Petal.Width.2
1 5.1 3.5 1.4 0.2 setosa 10.2 7.0 2.8 0.4
2 4.9 3.0 1.4 0.2 setosa 9.8 6.0 2.8 0.4
3 4.7 3.2 1.3 0.2 setosa 9.4 6.4 2.6 0.4
So lets say that I want to locate a pattern in a string and if the pattern exists then I only keep the part of the string before the pattern. My problem is that if the pattern does not exist then it returns NA and the final result will be NA. I want it to return the original string when the pattern does not exist.
library(stringr)
library(dplyr)
unique(iris$Species)
#> [1] setosa versicolor virginica
#> Levels: setosa versicolor virginica
test <- iris %>%
mutate(Species = str_sub(Species, 1, str_locate(Species, "t")[,1] ))
head(test)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 set
#> 2 4.9 3.0 1.4 0.2 set
#> 3 4.7 3.2 1.3 0.2 set
#> 4 4.6 3.1 1.5 0.2 set
#> 5 5.0 3.6 1.4 0.2 set
#> 6 5.4 3.9 1.7 0.4 set
tail(test)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 145 6.7 3.3 5.7 2.5 <NA>
#> 146 6.7 3.0 5.2 2.3 <NA>
#> 147 6.3 2.5 5.0 1.9 <NA>
#> 148 6.5 3.0 5.2 2.0 <NA>
#> 149 6.2 3.4 5.4 2.3 <NA>
#> 150 5.9 3.0 5.1 1.8 <NA>
Created on 2019-07-14 by the reprex package (v0.3.0)
We can use a regex lookaround with str_remove. If the pattern is not found, it will return the original string. Here, we are matching characters (.*) after the 't' character and if found, those characters are removed
library(dplyr)
library(stringr)
test <- iris %>%
mutate(Species = str_remove(Species, "(?<=t).*"))
head(test)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 set
#2 4.9 3.0 1.4 0.2 set
#3 4.7 3.2 1.3 0.2 set
#4 4.6 3.1 1.5 0.2 set
#5 5.0 3.6 1.4 0.2 set
#6 5.4 3.9 1.7 0.4 set
tail(test)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#145 6.7 3.3 5.7 2.5 virginica
#146 6.7 3.0 5.2 2.3 virginica
#147 6.3 2.5 5.0 1.9 virginica
#148 6.5 3.0 5.2 2.0 virginica
#149 6.2 3.4 5.4 2.3 virginica
#150 5.9 3.0 5.1 1.8 virginica
Is there a function in BASE R that could show the first and last rows within in a data frame? I know the functions like ropls::strF and print an object in data.table could do this. It is not like this topic Select first and last row from grouped data
ropls::strF(iris)
#Sepal.Length Sepal.Width ... Petal.Width Species
#numeric numeric ... numeric factor
#nRow nCol size NAs
#150 5 0 Mb 0
#Sepal.Length Sepal.Width ... Petal.Width Species
#1 5.1 3.5 ... 0.2 setosa
#2 4.9 3 ... 0.2 setosa
#... ... ... ... ... ...
#149 6.2 3.4 ... 2.3 virginica
#150 5.9 3 ... 1.8 virginica
library(data.table)
a <- as.data.table(iris)
a
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1: 5.1 3.5 1.4 0.2 setosa
#2: 4.9 3.0 1.4 0.2 setosa
#3: 4.7 3.2 1.3 0.2 setosa
#4: 4.6 3.1 1.5 0.2 setosa
#5: 5.0 3.6 1.4 0.2 setosa
#---
#146: 6.7 3.0 5.2 2.3 virginica
#147: 6.3 2.5 5.0 1.9 virginica
#148: 6.5 3.0 5.2 2.0 virginica
#149: 6.2 3.4 5.4 2.3 virginica
#150: 5.9 3.0 5.1 1.8 virginica
As others said in the comments, there isn't a function in base R to do this, but it's straightforward enough to write a function that binds together the first N rows and last N rows.
head_and_tail <- function(x, n = 1) {
rbind(
head(x, n),
tail(x, n)
)
}
head_and_tail(iris, n = 3)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 148 6.5 3.0 5.2 2.0 virginica
#> 149 6.2 3.4 5.4 2.3 virginica
#> 150 5.9 3.0 5.1 1.8 virginica
Created on 2018-12-22 by the reprex package (v0.2.1)
This question already has an answer here:
Subset rows corresponding to max value by group using data.table
(1 answer)
Closed 7 years ago.
I understand that data.table allows you to do computations based on groups within a column. For example.
Reproducible example
iris[,.SD[which.min(Petal.Width)], by=Species]
generating
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1: setosa 4.9 3.1 1.5 0.1
2: versicolor 4.9 2.4 3.3 1.0
3: virginica 6.1 2.6 5.6 1.4
I want every row where the minimum is met; not just the first, something that is easily achieved in a DF:
for example this:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
10 4.9 3.1 1.5 0.1 setosa
13 4.8 3.0 1.4 0.1 setosa
14 4.3 3.0 1.1 0.1 setosa
33 5.2 4.1 1.5 0.1 setosa
38 4.9 3.6 1.4 0.1 setosa
58 4.9 2.4 3.3 1.0 versicolor
61 5.0 2.0 3.5 1.0 versicolor
63 6.0 2.2 4.0 1.0 versicolor
68 5.8 2.7 4.1 1.0 versicolor
80 5.7 2.6 3.5 1.0 versicolor
82 5.5 2.4 3.7 1.0 versicolor
94 5.0 2.3 3.3 1.0 versicolor
135 6.1 2.6 5.6 1.4 virginica
What I don't want is just the first instance of where the minima is met:
This would be equivalent to doing something like this using a data.frame
iris
iris <- as.data.frame(iris) #in case reader does not start new R session
f.min <- function(spec) {
spec.sub <- iris[iris$Species==spec,]
min.rows <- spec.sub[spec.sub$Petal.Width == min(spec.sub$Petal.Width),]
}
do.call(rbind, lapply(levels(iris$Species), f.min ))
There are some powerful features in data.table which are worth learning. Hence why I would like to know the equivalent in data.table.
Try:
iris[,.SD[which.min(Petal.Width)], by=Species]
This will give you the minimas but does not show ties.
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1: setosa 4.9 3.1 1.5 0.1
2: versicolor 4.9 2.4 3.3 1.0
3: virginica 6.1 2.6 5.6 1.4
A dplyr solution showing the ties as well would be:
require(dplyr)
require(magrittr)
iris %>%
group_by(Species) %>%
filter(rank(Petal.Width, ties.method= "min") == 1)
Source: local data table [13 x 5]
Groups: Species
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.1 1.5 0.1 setosa
2 4.8 3.0 1.4 0.1 setosa
3 4.3 3.0 1.1 0.1 setosa
4 5.2 4.1 1.5 0.1 setosa
5 4.9 3.6 1.4 0.1 setosa
6 4.9 2.4 3.3 1.0 versicolor
7 5.0 2.0 3.5 1.0 versicolor
8 6.0 2.2 4.0 1.0 versicolor
9 5.8 2.7 4.1 1.0 versicolor
10 5.7 2.6 3.5 1.0 versicolor
11 5.5 2.4 3.7 1.0 versicolor
12 5.0 2.3 3.3 1.0 versicolor
13 6.1 2.6 5.6 1.4 virginica
The 'ties.method' parameter is where you can select what should be displayed.
Hope this helps.