How to save output to console and file simultaneously in RStudio server? - r

I want to see the outputs of my calculations in the console, but simultaneously save it to a file. The sink() function is not suitable for me, since it simply redirects the output to a file, while I need to write them both -- in the console and in the file. Is it possible?

Looks like sink has an argument split which will send the output both to file and to the output stream (console). e.g.,
> sink(file="test.file", split = TRUE)
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> sink()
> x <- read.csv("test.file")
> x
Sepal.Length.Sepal.Width.Petal.Length.Petal.Width.Species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
>

Related

Moving rows in a data frame

I'm trying to use the move_columns function from the sjmisc package. However I'm having a problem in that when I use numerals I get a different result from the one I get when I use variables standing for those indices. For example I want to move the column Petal.Width to position 3 (so after 2), but when I use variables it gets moved to the end of the data frame.
> library(sjmisc)
>
> data(iris)
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
>
> index_rec<-4
> index<-2
> one<-move_columns(iris,index_rec,.after=index)
> head(one)
Sepal.Length Sepal.Width Petal.Length Species Petal.Width
1 5.1 3.5 1.4 setosa 0.2
2 4.9 3.0 1.4 setosa 0.2
3 4.7 3.2 1.3 setosa 0.2
4 4.6 3.1 1.5 setosa 0.2
5 5.0 3.6 1.4 setosa 0.2
6 5.4 3.9 1.7 setosa 0.4
>
> two<-move_columns(iris,4,.after=2)
> head(two)
Sepal.Length Sepal.Width Petal.Width Petal.Length Species
1 5.1 3.5 0.2 1.4 setosa
2 4.9 3.0 0.2 1.4 setosa
3 4.7 3.2 0.2 1.3 setosa
4 4.6 3.1 0.2 1.5 setosa
5 5.0 3.6 0.2 1.4 setosa
6 5.4 3.9 0.4 1.7 setosa
The documentation says that if neither of .before or .after are specified, the column is moved to the end of the data frame by default. So is the problem in the first case that I'm not specifying .after? I think it's clearly there...
EDIT
It works with quasi-quotation.
library(sjmisc)
index_rec<-4
index <- 2
move_columns(iris,4,.after=!!index) %>% head
# Sepal.Length Sepal.Width Petal.Width Petal.Length Species
#1 5.1 3.5 0.2 1.4 setosa
#2 4.9 3.0 0.2 1.4 setosa
#3 4.7 3.2 0.2 1.3 setosa
#4 4.6 3.1 0.2 1.5 setosa
#5 5.0 3.6 0.2 1.4 setosa
#6 5.4 3.9 0.4 1.7 setosa
Earlier Answer
It seems like a bug to me when you pass the number as a variable to the function.
#This works fine
move_columns(iris,4,.after=2) %>% head
# Sepal.Length Sepal.Width Petal.Width Petal.Length Species
#1 5.1 3.5 0.2 1.4 setosa
#2 4.9 3.0 0.2 1.4 setosa
#3 4.7 3.2 0.2 1.3 setosa
#4 4.6 3.1 0.2 1.5 setosa
#5 5.0 3.6 0.2 1.4 setosa
#6 5.4 3.9 0.4 1.7 setosa
#This doesn't
move_columns(iris,4,.after=index) %>% head
# Sepal.Length Sepal.Width Petal.Length Species Petal.Width
#1 5.1 3.5 1.4 setosa 0.2
#2 4.9 3.0 1.4 setosa 0.2
#3 4.7 3.2 1.3 setosa 0.2
#4 4.6 3.1 1.5 setosa 0.2
#5 5.0 3.6 1.4 setosa 0.2
#6 5.4 3.9 1.7 setosa 0.4
Why not use the new relocate function from dplyr? It does not have a bug and works as expected when passed a variable.
library(dplyr)
relocate(iris, 4, .after=2) %>% head
# Sepal.Length Sepal.Width Petal.Width Petal.Length Species
#1 5.1 3.5 0.2 1.4 setosa
#2 4.9 3.0 0.2 1.4 setosa
#3 4.7 3.2 0.2 1.3 setosa
#4 4.6 3.1 0.2 1.5 setosa
#5 5.0 3.6 0.2 1.4 setosa
#6 5.4 3.9 0.4 1.7 setosa
relocate(iris,index_rec,.after=index) %>% head
# Sepal.Length Sepal.Width Petal.Width Petal.Length Species
#1 5.1 3.5 0.2 1.4 setosa
#2 4.9 3.0 0.2 1.4 setosa
#3 4.7 3.2 0.2 1.3 setosa
#4 4.6 3.1 0.2 1.5 setosa
#5 5.0 3.6 0.2 1.4 setosa
#6 5.4 3.9 0.4 1.7 setosa

How to filter a dataframe with a character vector

I'm trying to filter a data.frame with filter() function from the package dplyr. The main problem here is that I want to use a vector for the conditions.
For example
library(dplyr)
conditions <- c("Sepal.Width<3.2","Species==setosa")
DATA <- iris %>%
filter(conditions) #This doesnt work, of course.
Is there any function that would take
conditions <- c("Sepal.Width<3.2","Species==setosa")
as an input and give me
Sepal.Width<3.2 & Species==setosa
as an output? I though about using eval(parse...) with sapplyand maybe paste0() to add the &, but can't make it work.
Any help would be aprecciated.
There are multiple issues. First, you need to quote inside quotation for the second condition:
conditions <- c("Sepal.Width < 3.2", "Species == 'setosa'")
Then, you need to specify the association between the two conditions. Here, I assumed an &. Then you can use eval(parse(...)):
iris %>%
filter(eval(parse(text = paste(conditions, sep = "&"))))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
On the other hand, I think it is always important to quote #Martin Mächler to warn about the potential problems associated with this approach:
The (possibly) only connection is via parse(text = ....) and all good
R programmers should know that this is rarely an efficient or safe
means to construct expressions (or calls). Rather learn more about
substitute(), quote(), and possibly the power of using
do.call(substitute, ......).
Here is a way:
conditions <- c("Sepal.Width<3.2","Species=='setosa'")
# note the small change here: ↑ ↑
DATA <- iris %>%
filter(eval(parse(text = paste(conditions, collapse = "&"))))
> DATA
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.0 1.4 0.2 setosa
2 4.6 3.1 1.5 0.2 setosa
3 4.4 2.9 1.4 0.2 setosa
4 4.9 3.1 1.5 0.1 setosa
5 4.8 3.0 1.4 0.1 setosa
6 4.3 3.0 1.1 0.1 setosa
7 5.0 3.0 1.6 0.2 setosa
8 4.8 3.1 1.6 0.2 setosa
9 4.9 3.1 1.5 0.2 setosa
10 4.4 3.0 1.3 0.2 setosa
11 4.5 2.3 1.3 0.3 setosa
12 4.8 3.0 1.4 0.3 setosa
A tidyeval way would be to use rlang::parse_exprs().
library(dplyr)
conditions <- c("Sepal.Width < 3.2", "Species == 'setosa'")
iris %>%
filter( !!! rlang::parse_exprs(conditions))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.0 1.4 0.2 setosa
2 4.6 3.1 1.5 0.2 setosa
3 4.4 2.9 1.4 0.2 setosa
4 4.9 3.1 1.5 0.1 setosa
5 4.8 3.0 1.4 0.1 setosa
6 4.3 3.0 1.1 0.1 setosa
7 5.0 3.0 1.6 0.2 setosa
8 4.8 3.1 1.6 0.2 setosa
9 4.9 3.1 1.5 0.2 setosa
10 4.4 3.0 1.3 0.2 setosa
11 4.5 2.3 1.3 0.3 setosa
12 4.8 3.0 1.4 0.3 setosa

Duplicating a row in a data-frame n number of times with positional spec

Sample df:
iris_subset <- iris[1:5, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
I'm looking for the best solution where I can duplicate a specific row, n number of times, with the opportunity to state positionally where to insert the duplicate rows.
For example, I want to duplicate row 2 two times, after the original row.
Desired output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.9 3.0 1.4 0.2 setosa
4 4.9 3.0 1.4 0.2 setosa
5 4.7 3.2 1.3 0.2 setosa
6 4.6 3.1 1.5 0.2 setosa
7 5.0 3.6 1.4 0.2 setosa
Sloppily, I can do something like:
iris_subset <- rbind(iris_subset, iris[2,], iris[2,])
iris_subset <- iris_subset[c(1:2, 6:7, 3:5),]
row.names(iris_subset) <- 1:nrow(iris_subset)
But if I want to functionalise this, I need a better way of repeating the row I want duplicated than manually passing in additional arguments n times in rbind or other alternatives, which is incredibly inefficient.
row_ind = 2
repeat_n = 3
place_at_row = 3
inds = append(x = 1:NROW(iris_subset),
values = rep(row_ind, repeat_n),
after = place_at_row - 1)
iris_subset[inds,]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#2.1 4.9 3.0 1.4 0.2 setosa
#2.2 4.9 3.0 1.4 0.2 setosa
#2.3 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
We can use add_row from tidyverse
library(tidyverse)
add_row(iris_subset, !!! as.list(iris_subset[rep(2, each = 3),]), .after = 2)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.9 3.0 1.4 0.2 setosa
#4 4.9 3.0 1.4 0.2 setosa
#5 4.9 3.0 1.4 0.2 setosa
#6 4.7 3.2 1.3 0.2 setosa
#7 4.6 3.1 1.5 0.2 setosa
#8 5.0 3.6 1.4 0.2 setosa

Renaming columns based on condition about their names

I would like to add a prefix to my dataset column names only if they already begin with a certain string, and I would like to do it (if possible) using a dplyr pipeline.
Taking the iris dataset as toy example, I was able to get the expected result with base R (with a quite cumbersome line of code):
data("iris")
colnames(iris)[startsWith(colnames(iris), "Sepal")] <- paste0("YAY_", colnames(iris)[startsWith(colnames(iris), "Sepal")])
head(iris)
YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
In this example, the prefix YAY_ has been added to all the column names starting with Sepal. Is there a way to obtain the same result with a dplyr command/pipeline?
An option would be rename_at
library(tidyverse)
iris %>%
rename_at(vars(starts_with("Sepal")), ~ str_c("YAY_", .))
# YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
# ...

Using certain plyr functions to calculate more than one thing

Let's say I have the following:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5.0 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
Is it possible to calculate more than one thing for the first column, such as min, max and mean using a certain plyr function, and doing that in a single call?
Thanks!

Resources