Run call columns data of another dataframe, row by row - r

This is my First dataframe,
df1 <- as.data.frame(matrix(rbinom(9*9, 1, 0.5), ncol=9, nrow =9))
colnames(df1) <- paste(rep(c("a","b","c"), each=3), rep(c(1,2,3), 3), sep = "")
set.seed(11)
This is my Second dataframe,
factor.1 <- paste(rep(c("a","b"), each=3), rep(c(1,2,3), 2), sep = "")
factor.2 <- rep(paste(rep("c", 3), c(1,2,3), sep = ""), 2)
df2 <- as.data.frame(cbind(factor.1,factor.2))
I want to calculate the result in each column and put it inside the second dataframe. I use dplyr
fun1 <- function(x){sum(ds1[, x])}
df2%>% mutate(value = fun1(factor.1))
But what I get is this,
factor.1 factor.2 value
1 a1 c1 22
2 a2 c2 22
3 a3 c3 22
4 b1 c1 22
5 b2 c2 22
6 b3 c3 22
But What I want is this,
factor.1 factor.2 value
1 a1 c1 4
2 a2 c2 4
3 a3 c3 4
4 b1 c1 1
5 b2 c2 4
6 b3 c3 5

Is this what you are looking for ?
df2 %>% mutate(value = sapply(factor.1, fun1) )

Related

Create new columns lopping an array inside mutate (dplyr)

I have the following dummy data frame called df:
A1 A2 A3 B1 B2 B3 C1 C2 C3
1 1 1 2 2 2 3 3 3
and I would like to sum columns that contain the same letter into a new column (naming it using the corresponding letter).
I would expect this result:
A1 A2 A3 B1 B2 B3 C1 C2 C3 A B C
1 1 1 2 2 2 3 3 3 3 6 9
I know I can achieve this result using mutatefrom dyplr:
mutate(df,
A = A1 + A2 + A3,
B = B1 + B2 + B3,
C = C1 + C2 + C3)
Is there any way to do it using a vector like letters <- c("A", "B", "C") and looping over that vector inside the mutate function? Something like:
mutate(df,
letters = paste0(letters,"1") + paste0(letters,"2") + paste0(letters,"3") )
One dplyr and purrr solution could be:
bind_cols(df, map_dfc(.x = LETTERS[1:3],
~ df %>%
transmute(!!.x := rowSums(select(., starts_with(.x))))))
A1 A2 A3 B1 B2 B3 C1 C2 C3 A B C
1 1 1 1 2 2 2 3 3 3 3 6 9

In R is there a way to recode the columns from one data frame with values from another data frame?

I am still relatively new to working in R and I am not sure how to approach this problem. Any help or advice is greatly appreciated!!!
The problem I have is that I am working with two data frames and I need to recode the first data frame with values from the second. The first data frame (df1) contains the data from the respondents to a survey and the other data frame(df2) is the data dictionary for df1.
The data looks like this:
df1 <- data.frame(a = c(1,2,3),
b = c(4,5,6),
c = c(7,8,9))
df2 <- data.frame(columnIndicator = c("a","a","a","b","b","b","c","c","c" ),
df1_value = c(1,2,3,4,5,6,7,8,9),
new_value = c("a1","a2","a3","b1","b2","b3","c1","c2","c3"))
So far I can manually recode df1 to get the expected output by doing this:
df1 <- within(df1,{
a[a==1] <- "a1"
a[a==2] <- "a2"
a[a==3] <- "a3"
b[b==4] <- "b4"
b[b==5] <- "b5"
b[b==6] <- "b6"
c[c==7] <- "c7"
c[c==8] <- "c8"
c[c==9] <- "c9"
})
However my real dataset has about 42 columns that need to be recoded and that method is a little time intensive. Is there another way in R for me to recode the values in df1 with the values in df2?
Thanks!
Just need to transform the shape a bit.
library(data.table)
df1 <- data.frame(a = c(1,2,3),
b = c(4,5,6),
c = c(7,8,9))
df2 <- data.frame(columnIndicator = c("a","a","a","b","b","b","c","c","c" ),
df1_value = c(1,2,3,4,5,6,7,8,9),
new_value = c("a1","a2","a3","b4","b5","b6","c7","c8","c9"),stringsAsFactors = FALSE)
setDT(df1)
setDT(df2)
df1[,ID:=.I]
ldf1 <- melt(df1,measure.vars = c("a","b","c"),variable.name = "columnIndicator",value.name = "df1_value")
ldf1[df2,"new_value":=i.new_value,on=.(columnIndicator,df1_value)]
ldf1
#> ID columnIndicator df1_value new_value
#> 1: 1 a 1 a1
#> 2: 2 a 2 a2
#> 3: 3 a 3 a3
#> 4: 1 b 4 b4
#> 5: 2 b 5 b5
#> 6: 3 b 6 b6
#> 7: 1 c 7 c7
#> 8: 2 c 8 c8
#> 9: 3 c 9 c9
dcast(ldf1,ID~columnIndicator,value.var = "new_value")
#> ID a b c
#> 1: 1 a1 b4 c7
#> 2: 2 a2 b5 c8
#> 3: 3 a3 b6 c9
Created on 2020-04-18 by the reprex package (v0.3.0)
In base R, we can unlist df1 match it with df1_value and get corresponding new_value.
df1[] <- df2$new_value[match(unlist(df1), df2$df1_value)]
df1
# a b c
#1 a1 b1 c1
#2 a2 b2 c2
#3 a3 b3 c3
Is this what you are looking for???
library(dplyr)
df3 <- df1 %>% gather(key = "key", value = "value")
df3 %>% inner_join(df2, by = c("key" = "columnIndicator", "value" = "df1_value"))
Output
key value new_value
1 a 1 a1
2 a 2 a2
3 a 3 a3
4 b 4 b1
5 b 5 b2
6 b 6 b3
7 c 7 c1
8 c 8 c2
9 c 9 c3

how to subtract a column to the other colums in a data frame

I have a data frame that consist of 1000 rows and 156 columns. I'm trying to subtract the first column to the next 38 columns, then subtract column 39 to the next 38, and so, but I can't find a way to do it. I'm only using ncdf4 and nothing else. Something like this
C1 C2 C3 C4 C5 C6 C7 C8
1 2 3 4 5 6 4 5
3 4 6 5 4 3 2 7
And I'd like it to be
C1 C2 C3 C4 C5 C6 C7 C8
0 1 2 3 4 5 3 4
0 1 3 2 1 0 -1 4
The logic would be
First 38 columns - First column
Columns 39:77 - Column 39
and so on.
Solved it by simply doing
{
z[,1:38] <- z[,1:38]-z[,1]
z[,39:77] <-z[,39:77]-z[,39]
z[,78:118] <-z[,78:118]-z[,78]
z[,119:156] <-z[,119:156]-z[,119]
}
Where z is the dataframe. Might not be the nicest way but it did the trick
You can also do the following without any loop:
# sample data frame
df <- data.frame(matrix(data = seq(1,316),ncol = 158))
# split the data frame into list of data frame having columns
# 1 to 38, 39 to 77 and so on
df <- split.default(df, gl(round(ncol(df)/38),k = 38))
# subtract the last column from each
df <- do.call(cbind, lapply(df, function(f) f - f[,ncol(f)]))
colnames(df) <- paste0('C', seq(1,158))
print(head(df))
C1 C2 C3 C4 C5
1 -74 -72 -70 -68 -66
2 -74 -72 -70 -68 -66
Here is a user defined function: You can add else if statements as desired.
mydiff<-function(df){
mydiff<-df
for(i in 1:ncol(df)){
if(i<=38){
mydiff[,i]<-df[,i]-df[,1]
}
else if(i%in%c(39:77)){
mydiff[,i]<-df[,i]-df[,39]
}
}
mydiff
}
mydiff(df1)
Output:
C1 C2 C3 C4 C5 C6 C7 C8
0 1 2 3 4 5 3 4
0 1 3 2 1 0 -1 4
Benchmark:
system.time(result<-as.tibble(iris2) %>%
select_if(is.numeric) %>%
mydiff())
Result:
user system elapsed
0.02 0.00 0.01
You should consider using tidyverse to solve this, loading a package into R does little to the overhead of your environment and can make your life much easier.
library(tidyverse)
> df %>%
mutate_at(.vars = vars(num_range(prefix = 'C', 1:38)), .funs = function(x) x - .$C1) %>%
mutate_at(.vars = vars(num_range(prefix = 'C', 39:77)), .funs = function(x) x - .$C39)
C1 C2 C3 C4 C38 C39 C40 C41 C42 C77
1 0 1 2 3 4 0 1 2 3 4
2 0 0 3 2 4 0 0 3 2 4
Data
df <-
data.frame(
C1 = c(1, 3),
C2 = c(2, 3),
C3 = c(3, 6),
C4 = c(4, 5),
C38 = c(5, 7),
C39 = c(1, 3),
C40 = c(2, 3),
C41 = c(3, 6),
C42 = c(4, 5),
C77 = c(5, 7)
)

access first row of group_by dataset

I have a dataframedf1 with columns a,b,c. I want to assign c=0 to the first row of the dataset returned by group_by(a,b). I tried something like
t <- df1 %>% group_by(a,b) %>% filter(row_number(a)==1) %>% mutate(c= 0)
But it reduced number of rows. Expected output is
a b c
a1 b1 0
a1 b1 NA
a2 b2 0
a2 b2 NA
You can use seq_along to number elements in each group from 1 to the total number of elements within each group (2, in this case). Then use ifelse to set the first element of 'c' for each group to be 0 and leave the other element as is.
library(dplyr)
df %>%
group_by(a, b) %>%
mutate(c = ifelse(seq_along(c) == 1, 0, c))
# A tibble: 4 x 3
# Groups: a, b [2]
# a b c
# <fct> <fct> <dbl>
#1 a1 b1 0.
#2 a1 b1 NA
#3 a2 b2 0.
#4 a2 b2 NA
data
df <- data.frame(a = rep(c("a1", "a2"), each = 2),
b = rep(c("b1", "b2"), each = 2),
c = NA)
df
# a b c
#1 a1 b1 NA
#2 a1 b1 NA
#3 a2 b2 NA
#4 a2 b2 NA

Compare rows of a dataset with another dataset in R

I have dataset1 with 1400 row and 25 columns, and dataset2 with 400 rows and 5 columns.Both datasets have a column called ID. as a small example, I can illustrate them like below:
dataset1:
ID c1 c2 c3 c4
12 m n 5 1/2/2015
5 c x 4 2/3/2015
45 g t 47 4/23/2015
45 j t 3 1/1/2016
61 t y 12 7/3/2015
3 r n 18 3/3/2015
dataset2:
ID a1 a2
45 1 1/1/2015
3 5 2/2/2016
12 12 4/29/2016
(as you can see ID in dataset2 is a subset of ID in dataset1)
what I want is: for each row of dataset1, if the value in column ID is equal to a value in the column ID of dataset2, then copy the corresponding value of the column a2 of that row of dataseset2 into a new column of dataset1 as below:
ID c1 c2 c3 c4 c5
12 m n 5 1/2/2015 4/29/2016
5 c x 4 2/3/2015 NA
45 g t 47 4/23/2015 1/1/2015
45 j t 3 1/1/2016 1/1/2015
61 t y 12 7/3/2015 NA
3 r n 18 3/3/2015 2/2/2016
I appreciate your help.
As #42 mentioned, you can use match.
This is an example with match:
# match the ID of df1 with that of df2
# then returns the index of df2 that
# matches df1
# then subset the a2 column using the above index
# then store in a new column in df1
df1$c5 <- df2$a2[match(df1$ID, df2$ID)]
The output of the above code is below:
> df1
ID c1 c2 c3 c4 c5
1 12 m n 5 01/02/2015 4/29/2016
2 5 c x 4 01/02/2015 <NA>
3 45 g t 47 01/02/2015 01/01/2015
4 45 j t 3 01/02/2015 01/01/2015
5 61 t y 12 01/02/2015 <NA>
6 3 r n 18 01/02/2015 02/02/2016
din's answer is perfect. The other way to think about is to merge to data frames.
Data Preparation
ex_data1 <- data.frame(ID = c(12, 5, 45, 45, 61, 3),
c1 = c("m", "c", "g", "j", "t", "r"),
c2 = c("n", "x", "t", "t", "y", "n"),
c3 = c(5, 4, 47, 3, 12, 8),
c4 = c("1/2/2015", "2/3/2015", "4/23/2015",
"1/1/2016", "7/3/2015", "3/3/2015"),
stringsAsFactors = FALSE)
ex_data2 <- data.frame(ID = c(45, 3, 12),
a1 = c(1, 5, 12),
a2 = c("1/1/2015", "2/2/2016", "4/29/2016"), stringsAsFactors = FALSE)
Solution 1: Merge the data using base R
ex_data3 <- ex_data2[, c("ID", "a2")]
names(ex_data3) <- c("ID", "c5")
m_data <- merge(ex_data1, ex_data3, by = "ID", all = TRUE)
Solution 2: Merge the data using dplyr
library(dplyr)
m_data <- ex_data1 %>%
left_join(ex_data2, by = "ID") %>%
select(-a1, c5 = a2)

Resources