How to convert two vectors into dataframe (wide format) - r

I want to convert two vectors into a wide format dataframe. The fist vector represent the column names and the second vector the values.
Here is my reproduceable example:
vector1<-c("Reply","Reshare","Like","Share","Search")
vector2<-c(2,1,0,4,3)
Now I want to convert these two vector into a wide format dataframe:
# A tibble: 1 x 5
Reply Reshare Like Share Search
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 0 4 3
I have found some examples for the long format, but none simple solution for the wide format. Can anyone help me?

You can make a named list (e.g. using setNames), followed by as.data.frame:
df <- as.data.frame(setNames(as.list(vector2), vector1))
Note that it needs to be a list: when converting a named vector into a data.frame, R puts values into separate rows instead of columns.

vector1<-c("Reply","Reshare","Like","Share","Search")
vector2<-c(2,1,0,4,3)
df <- data.frame(vector1, vector2)
df |> tidyr::pivot_wider(names_from = vector1, values_from = vector2)
#> # A tibble: 1 × 5
#> Reply Reshare Like Share Search
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 0 4 3
Created on 2022-02-08 by the reprex package (v2.0.1)

Yet another solution, based on dplyr::bind_rows:
library(dplyr)
vector1<-c("Reply","Reshare","Like","Share","Search")
vector2<-c(2,1,0,4,3)
names(vector2) <- vector1
bind_rows(vector2)
#> # A tibble: 1 × 5
#> Reply Reshare Like Share Search
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 0 4 3

We can use map_dfc and set_names
library(purrr)
set_names(map_dfc(vector2, ~.x), vector1)
# A tibble: 1 × 5
Reply Reshare Like Share Search
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 0 4 3

Another possible solution:
library(dplyr)
data.frame(rbind(vector1, vector2)) %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
`rownames<-`(NULL)
Reply Reshare Like Share Search
1 2 1 0 4 3

Related

Extract the data if the first row of each id is 1 using R

Here, I made a simple data to demonstrate what I want to do.
df<-data.frame(id=c(1,1,1,2,2,2,2,3,3),
date=c(20220311,20220315,20220317,20220514,20220517,20220518,20220519,20220613,20220618),
disease=c(0,1,0,1,1,1,0,1,1))
id stands for a personal id. disease=1 means that person has a disease. disease=0 means that person doesn't have a disease.There are 3 people in df.For id equals 1, the first row of the value of disease is 0. On the other hand, the first two rows of the value of disease for id 2 and 3 are 1. I want to extract the data if the first row of each id is 1.
So, I should extract the data with id 2 and 3. My expected output is
df<-data.frame(id=c(2,2,2,2,3,3),
date=c(20220514,20220517,20220518,20220519,20220613,20220618),
disease=c(1,1,1,0,1,1))
You can use a filter where you select the first row_number and condition you want per group_by with any to get the group like this:
df<-data.frame(id=c(1,1,1,2,2,2,2,3,3),
date=c(20220311,20220315,20220317,20220514,20220517,20220518,20220519,20220613,20220618),
disease=c(0,1,0,1,1,1,0,1,1))
library(dplyr)
df %>%
group_by(id) %>%
filter(any(row_number() == 1 & disease == 1))
#> # A tibble: 6 × 3
#> # Groups: id [2]
#> id date disease
#> <dbl> <dbl> <dbl>
#> 1 2 20220514 1
#> 2 2 20220517 1
#> 3 2 20220518 1
#> 4 2 20220519 0
#> 5 3 20220613 1
#> 6 3 20220618 1
Created on 2022-07-25 by the reprex package (v2.0.1)
If you only want to select the rows that meet your condition you can use this:
df<-data.frame(id=c(1,1,1,2,2,2,2,3,3),
date=c(20220311,20220315,20220317,20220514,20220517,20220518,20220519,20220613,20220618),
disease=c(0,1,0,1,1,1,0,1,1))
library(dplyr)
df %>%
group_by(id) %>%
filter(row_number() == 1 & disease == 1)
#> # A tibble: 2 × 3
#> # Groups: id [2]
#> id date disease
#> <dbl> <dbl> <dbl>
#> 1 2 20220514 1
#> 2 3 20220613 1
Created on 2022-07-25 by the reprex package (v2.0.1)
We could also do like this:
library(dplyr)
df %>%
group_by(id) %>%
filter(first(disease)==1)
id date disease
<dbl> <dbl> <dbl>
1 2 20220514 1
2 2 20220517 1
3 2 20220518 1
4 2 20220519 0
5 3 20220613 1
6 3 20220618 1
In base R you can do:
ids_disease <- df$id[!duplicated(df$id) & df$disease == 1]
df[df$id %in% ids_disease, ]

Filter a tibble with two conditions

i have this tibble..
tibble(id=c(4,4), client=c(5,10), stock=c(NA,10))
# A tibble: 2 x 3
id client stock
<dbl> <dbl> <dbl>
1 4 5 NA
2 4 10 10
from which i want to keep the row where client == 5 and stock == 10. How would i filter that? So my desired outcome would be:
# A tibble: 1 x 3
a client stock
<dbl> <dbl> <dbl>
1 4 5 10
Not sure about the context of filtering using values from different rows, but see if below operation works for you.
> library(dplyr)
> df %>% fill(stock, .direction = 'up') %>% filter(client == 5 & stock == 10)
# A tibble: 1 x 3
id client stock
<dbl> <dbl> <dbl>
1 4 5 10

Changing columns right to left in a tibble

I have a table - read from an excel file, with column names in English and some variables in Hebrew.
As I read the excel file and receive a tibble, the column names don't fit the data.
I use the following code to read the table:
excel_file <- file.path(the file path, the file)
tab_1 <- read_xlsx(excel_file)
tab_1
The result that I'm getting:
# A tibble: 2 x 5
case a b c d
<chr> <dbl> <dbl> <dbl> <dbl>
1 שחור 3 2 1 4
2 אדום 2 5 2 3
>
How can I change the order of the column names? I have looked all over and found no solution.
You can do it by specifying the column indexes
Using the iris dataset as an example
First, change to a tibble
iris2 <- iris %>% as_tibble()
Reverse columns by manually specifying by column index
iris2[,c(5,4,3,2,1)]
Or do the same programatically
iris2[,ncol(iris2):1]
When the tibble becomes wider (more columns) the answer looks something like this
> tab_1 <- tab_1[,ncol(tab_1):1]
> print(tab_1)
# A tibble: 2 x 19
result q p o n m l k j i h g f e d c b a
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 שחור 2 4 5 6 4 6 2 1 2 5 2 3 4 4 1 2 3
2 אדום 4 3 5 5 6 3 0 3 3 4 5 3 5 3 2 5 2
case
<chr>
1 שחור
2 אדום
changing back the column names of the first part
I use the tidyverse generally for content management. The select function is a clean single line to reverse df columns generally. E.g.,
library(tidyverse)
n <- ncol(mtcars)
mtcars2 <- select(mtcars, c(n:1))
An option is also to use rev
library(dplyr)
mtcars %>%
select(rev(names(.)))

When I don't know column names in data.frame, when I use dplyr mutate function

I like to know how I can use dplyr mutate function when I don't know column names. Here is my example code;
library(dplyr)
w<-c(2,3,4)
x<-c(1,2,7)
y<-c(1,5,4)
z<-c(3,2,6)
df <- data.frame(w,x,y,z)
df %>% rowwise() %>% mutate(minimum = min(x,y,z))
Source: local data frame [3 x 5]
Groups: <by row>
# A tibble: 3 x 5
w x y z minimum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 1 3 1
2 3 2 5 2 2
3 4 7 4 6 4
This code is finding minimum value in row-wise. Yes, "df %>% rowwise() %>% mutate(minimum = min(x,y,z))" works because I typed column names, x, y, z. But, let's assume that I have a really big data.frame with several hundred columns, and I don't know all of the column names. Or, I have multiple data sets of data.frame, and they have all different column names; I just want to find a minimum value from 10th column to 20th column in each row and in each data.frame.
In this example data.frame I provided above, let's assume that I don't know column names, but I just want to get minimum value from 2nd column to 4th column in each row. Of course, this doesn't work, because 'mutate' doesn't work with vector;
df %>% rowwise() %>% mutate(minimum=min(df[,2],df[,3], df[,4]))
Source: local data frame [3 x 5]
Groups: <by row>
# A tibble: 3 x 5
w x y z minimum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 1 3 1
2 3 2 5 2 1
3 4 7 4 6 1
These two codes below also don't work.
df %>% rowwise() %>% mutate(average=min(colnames(df)[2], colnames(df)[3], colnames(df)[4]))
df %>% rowwise() %>% mutate(average=min(noquote(colnames(df)[2]), noquote(colnames(df)[3]), noquote(colnames(df)[4])))
I know that I can get minimum value by using apply or different method when I don't know column names. But, I like to know whether dplyr mutate function can be able to do that without known column names.
Thank you,
With apply:
library(dplyr)
library(purrr)
df %>%
mutate(minimum = apply(df[,2:4], 1, min))
or with pmap:
df %>%
mutate(minimum = pmap(.[2:4], min))
Also with by_row from purrrlyr:
df %>%
purrrlyr::by_row(~min(.[2:4]), .collate = "rows", .to = "minimum")
Output:
# tibble [3 x 5]
w x y z minimum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 1 3 1
2 3 2 5 2 2
3 4 7 4 6 4
A vectorized option would be pmin. Convert the column names to symbols with syms and evaluate (!!!) to return the values of the columns on which pmin is applied
library(dplyr)
df %>%
mutate(minimum = pmin(!!! rlang::syms(names(.)[2:4])))
# w x y z minimum
#1 2 1 1 3 1
#2 3 2 5 2 2
#3 4 7 4 6 4
Here is a tidyeval approach along the lines of the suggestion from aosmith. If you don't know the column names, you can make a function that accepts the desired positions as inputs and finds the columns names itself. Here, rlang::syms() takes the column names as strings and turns them into symbols, !!! unquotes and splices the symbols into the function.
library(dplyr)
w<-c(2,3,4)
x<-c(1,2,7)
y<-c(1,5,4)
z<-c(3,2,6)
df <- data.frame(w,x,y,z)
rowwise_min <- function(df, min_cols){
cols <- df[, min_cols] %>% colnames %>% rlang::syms()
df %>%
rowwise %>%
mutate(minimum = min(!!!cols))
}
rowwise_min(df, 2:4)
#> Source: local data frame [3 x 5]
#> Groups: <by row>
#>
#> # A tibble: 3 x 5
#> w x y z minimum
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 1 3 1
#> 2 3 2 5 2 2
#> 3 4 7 4 6 4
rowwise_min(df, c(1, 3))
#> Source: local data frame [3 x 5]
#> Groups: <by row>
#>
#> # A tibble: 3 x 5
#> w x y z minimum
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 1 3 1
#> 2 3 2 5 2 3
#> 3 4 7 4 6 4
Created on 2018-09-04 by the reprex package (v0.2.0).

Dplyr join: NA match to any

Edits I'm editing this post a little bit to provide a bit more context in case the whole approach was wrong from the start. See "Context" below for trying to explain the problem more abstractly.
I have seen the thread where the matching of NAs in tibbles is discussed, and the options are to match them to other NAs, or not to match them to anything: dplyr left_join matching NA
However, I am really looking for the opposite behaviour. Is there a way of having NAs (or whichever missing value for that case) matched to any other value during a join operation? An example below:
library(tidyverse)
# Removed output for brevity
tbl1 <- tibble(subj = 1, run = 1, session=1)
tbl2 <- tibble(subj = c(1, NA, 2), run = c(NA, 1, 2), session=c(NA, NA, 1), outcomedata = c(NA, NA, NA) )
tbl2$outcomedata[2][[1]] <- list(temperature=30)
tbl2$outcomedata[1][[1]] <- list(height=155, weight=80)
tbl2$outcomedata[3][[1]] <- list(temperature=20)
tbl1
#> # A tibble: 1 x 3
#> subj run session
#> <dbl> <dbl> <dbl>
#> 1 1.00 1.00 1.00
tbl2
#> # A tibble: 3 x 4
#> subj run session outcomedata
#> <dbl> <dbl> <dbl> <list>
#> 1 1.00 NA NA <list [2]>
#> 2 NA 1.00 NA <list [1]>
#> 3 2.00 2.00 1.00 <list [1]>
left_join(tbl1, tbl2)
#> Joining, by = c("subj", "run", "session")
#> # A tibble: 1 x 4
#> subj run session outcomedata
#> <dbl> <dbl> <dbl> <list>
#> 1 1.00 1.00 1.00 <NULL>
My desired end result is that I can match the first and the second row of tbl2 to the single row of tbl1, since these rows match on all non-NA attributes. The third row should not match to anything, since it differs on non-NA values. Thus, I am trying to get the final output to be as follows:
#> # A tibble: 2 x 4
#> subj run session outcomedata
#> <dbl> <dbl> <dbl> <list>
#> 1 1.00 1.00 1.00 <list [2]>
#> 2 1.00 1.00 1.00 <list [1]>
Context
Let me provide context in case I am way out here and barking up the wrong tree with the joins and there's an easier alternative. I have a bunch of nested json files (which I instantiate in R as lists), which contain various information that I want to attribute to specific instances in the data. One json might contain information which pertains to all instances in the data for subject 1 (i.e. the first row of tbl2), while another pertains to all instances in the data for run 1 (i.e. the second row of tbl2).
I would like to be able to merge all relevant information for each constellation of parameters in the data (one of which is in tbl1, but the plan is to have them all) in separate lists. My plan has been to try to get everything to match to everything related, and then to use a group_by operation over all parameters (i.e. group_by(subj, run, session)) and merge the lists (my plan was to use rlist::list.merge).
Any help would be massively appreciated!
Here's a tidyverse solution :
tbl2 %>%
split(seq(nrow(.))) %>% # split into one row data frames
map_dfr(~modify_if(.,is.na,~NULL) %>% # remove na columns
inner_join(tbl1,.)) # inner join to table1
# # A tibble: 2 x 4
# subj run session outcomedata
# <dbl> <dbl> <dbl> <list>
# 1 1 1 1 <list [2]>
# 2 1 1 1 <list [1]>
I use inner_join(tbl1,.) instead of inner_join(tbl1) to preserve column order.
And a base R translation :
df_list <- split(tbl2,seq(nrow(tbl2)))
df_list <- lapply(df_list,function(dfi){
merge(tbl1, dfi[!sapply(dfi,is.na)])
})
do.call(rbind,df_list)
# subj run session outcomedata
# 1 1 1 1 155, 80
# 2 1 1 1 30
Bonus
2 100% tidyverse approaches using group_by instead of split. one with do, one with nest and map. do is being soft deprecated FYI but here it offers more compact and readable syntax:
tbl2 %>%
group_by(n=seq(n())) %>%
do(modify_if(.,is.na,~NULL) %>% # remove na columns
inner_join(tbl1,.)) %>%
ungroup %>%
select(-n)
tbl2 %>%
rowid_to_column("n") %>%
group_by(n) %>%
nest(.key="dfi") %>%
mutate_at("dfi",~map(.,
~ modify_if(.,is.na,~NULL) %>% # remove na columns
inner_join(tbl1,.))) %>%
unnest %>%
select(-n)

Resources