Rename columns using "relational" dataframe in R - r

I'm dealing with a large dataframe (over 100 columns) and I need to rename the columns. Let's say the dataframe of interest looks like this:
C D E F G H
1 10 200 50 40 60 10
2 30 400 20 30 30 10
3 20 40 30 30 50 10
I also have a "relational" dataframe with rows matching original column names to the desired new names that looks like this:
Code Name
1 C Cat
2 D Dog
3 E Emu
4 F Fish
5 G Goat
6 H Hog
What I'm looking for is a base or package function that allows me to use this match dataframe to rename the original columns, yielding a final dataframe that looks like this:
Cat Dog Emu Fish Goat Hog
1 10 200 50 40 60 10
2 30 400 20 30 30 10
3 20 40 30 30 50 10
Remember, the real application has something like 100+ columns, so the smallest amount of by hand coding possible is desirable here-- Thanks!

It can be done with rename_at (assuming that the columns 'code', 'Name' are character class in the relational dataset)
library(dplyr)
df1 %>%
rename_at(vars(relational$Code), ~ relational$Name)
Or another option is setnames from data.table
library(data.table)
setDT(df1)
setnames(df1, relational$Code, relational$Name)
Or in base R
names(df1) <- setNames(relational$Name, relational$Code)[names(df1)]
data
df1 <- structure(list(C = c(10L, 30L, 20L), D = c(200L, 400L, 40L),
E = c(50L, 20L, 30L), F = c(40L, 30L, 30L), G = c(60L, 30L,
50L), H = c(10L, 10L, 10L)), class = "data.frame", row.names = c("1",
"2", "3"))
relational <- structure(list(Code = c("C", "D", "E", "F", "G", "H"), Name = c("Cat",
"Dog", "Emu", "Fish", "Goat", "Hog")), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6"))

We can use match, to match the column names with Code and get the corresponding Name.
names(df) <- relational$Name[match(names(df), relational$Code)]
df
# Cat Dog Emu Fish Goat Hog
#1 10 200 50 40 60 10
#2 30 400 20 30 30 10
#3 20 40 30 30 50 10

Related

Finding mean for specific rows in a data frame with certain values

I have a data frame like below,
a1 a2 a3
1 A x 10
2 AA x 20
3 P w 13
4 R y 45
5 BC m 46
6 AC y 36
7 AD y 19
8 S y 19
9 RK m 30
I want to create a new dataframe from this where, for each distinct value of column a2, if the values of a1 are different then it would create a mean from those rows using the values of the column a3. For example, for a2=x, I want the average of 10+20/2=15 (row 1 and 2 using the values of column 3). My original dataset is much larger than this. Can anyone tell me how to resolve this in R?
Perhaps this helps
library(dplyr)
df1 %>%
group_by(a2) %>%
mutate(Mean = mean(a3[!duplicated(a1)], na.rm = TRUE)) %>%
ungroup
Here is a similar solution using an ifelse statement:
library(dplyr)
df %>%
group_by(a2) %>%
mutate(Mean = ifelse(!duplicated(a1), mean(a3, na.rm= TRUE), a3))
a1 a2 a3 Mean
<chr> <chr> <int> <dbl>
1 A x 10 15
2 AA x 20 15
3 P w 13 13
4 R y 45 29.8
5 BC m 46 38
6 AC y 36 29.8
7 AD y 19 29.8
8 S y 19 29.8
9 RK m 30 38
structure(list(a1 = c("A", "AA", "P", "R", "BC", "AC", "AD",
"S", "RK"), a2 = c("x", "x", "w", "y", "m", "y", "y", "y", "m"
), a3 = c(10L, 20L, 13L, 45L, 46L, 36L, 19L, 19L, 30L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"))

Fill Blank Rows in Dataframe with Sums for each section of data R

I have data that I separated within a dataframe by item description and type. The separations are blank rows but I would like to fill the blank rows in with the sums of numeric values by each description and, if possible, add another blank row below the sums. Preferably, I would not need to sum the sections of data that only contain one row - see variable desc "a" but not a big deal if I do get a sum there.
This is an example of what I have now:
desc type xvalue yvalue
1 a z 16 1
2
3 b y 17 2
4 b y 18 3
5
6 c x 19 4
7 c x 20 5
8 c x 21 6
9
10 d x 22 7
11 d x 23 8
12
13 d y 24 9
14 d y 25 10
What I am looking for is output that looks similar to this.
desc type xvalue yvalue
1 a z 16 1
2
3 b y 17 2
4 b y 18 3
5 35 5
6
7 c x 19 4
8 c x 20 5
9 c x 21 6
10 40 15
11
12 d x 22 7
13 d x 23 8
14 45 15
15
16 d y 24 9
17 d y 25 10
18 49 19
I found an answer on how to do this in a column but not a row. Adding column of summed data by group with empty rows with R
I used acylam's dplyr answer to this question Add blank rows in between existing rows to create the empty rows. I changed the code slightly to fit my data better so my code is:
library(dplyr)
df %>%
split(df$id, df$group) %>%
Map(rbind, ., "") %>%
do.call(rbind, .)
I am hoping I can just add options to the do.call(rbind...) dplyr code I have above.
Depending on how your data is organized we could do it this way:
Assuming empty rows are NA's (if not for example they are blank we can make them NA)
we use group_split() after grouping, getting a list, then iterate with map_df over the list using janitor's adorn_totals
library(dplyr)
library(janitor)
df %>%
na.omit() %>% # maybe you don't need this line
group_by(desc, type) %>%
group_split() %>%
purrr::map_df(., janitor::adorn_totals)
desc type xvalue yvalue
a z 16 1
Total - 16 1
b y 17 2
b y 18 3
Total - 35 5
c x 19 4
c x 20 5
c x 21 6
Total - 60 15
d x 22 7
d x 23 8
Total - 45 15
d y 24 9
d y 25 10
Total - 49 19
data:
structure(list(desc = c("a", NA, "b", "b", NA, "c", "c", "c",
NA, "d", "d", NA, "d", "d"), type = c("z", NA, "y", "y", NA,
"x", "x", "x", NA, "x", "x", NA, "y", "y"), xvalue = c(16L, NA,
17L, 18L, NA, 19L, 20L, 21L, NA, 22L, 23L, NA, 24L, 25L), yvalue = c(1L,
NA, 2L, 3L, NA, 4L, 5L, 6L, NA, 7L, 8L, NA, 9L, 10L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14"))
Here's a full answer adding empty rows and removing janitor's added stuff from #TarJae's answer:
library(dplyr)
library(janitor)
df <- df %>%
na.omit() %>% # maybe you don't need this line
group_by(desc, type) %>%
group_split() %>%
purrr::map_df(., \(x) {x <- x %>% janitor::adorn_totals() %>% rbind(NA)}) %>%
mutate(
desc = ifelse(desc == "Total", NA, desc),
type = ifelse(type == "-", NA, type)
)

Select rows based in rows of another data.frame

I have these following data.frames:
dt1
Id Mother Weight
1 elly 10
2 bina 20
3 sirce 30
4 tina 30
5 lina 40
and
dt2
Id Mother Weight sex
1 elly 10 M
2 bina 20 F
3 sirce 30 F
And I would like select rows from DT1 (ID) based in DT2 (ID), this way:
new.dt
Id Mother Weight sex
4 tina 30 NA
5 lina 40 NA
Here is one option with anti_join
library(dplyr)
anti_join(dt1 %>%
mutate(sex = NA), dt2, by = 'Id')
# Id Mother Weight sex
#1 4 tina 30 NA
#2 5 lina 40 NA
data
dt1 <- structure(list(Id = 1:5, Mother = c("elly", "bina", "sirce",
"tina", "lina"), Weight = c(10L, 20L, 30L, 30L, 40L)),
class = "data.frame", row.names = c(NA,
-5L))
dt2 <- structure(list(Id = 1:3, Mother = c("elly", "bina", "sirce"),
Weight = c(10L, 20L, 30L), sex = c("M", "F", "F")),
class = "data.frame", row.names = c(NA,
-3L))
transform(dt1[!dt1$Id %in% dt2$Id,], sex = NA)
# Id Mother Weight sex
#4 4 tina 30 NA
#5 5 lina 40 NA
d = merge(dt1, dt2, all = TRUE)
d[is.na(d$sex),]
# Id Mother Weight sex
#4 4 tina 30 <NA>
#5 5 lina 40 <NA>

Summarize the lowest values in a Dataframe?

My data frame looks like this:
View(df)
Product Value
a 2
b 4
c 3
d 10
e 15
f 5
g 6
h 4
i 50
j 20
k 35
l 25
m 4
n 6
o 30
p 4
q 40
r 5
s 3
t 40
I want to find the 9 most expensive products and summaries the rest. It should look like this:
Product Value
d 10
e 15
i 50
j 20
k 35
l 25
o 30
q 40
t 40
rest 46
Rest is the sum of the other 11 products.
I tried it with summaries, but it didn't work:
new <- df %>%
group_by(Product)%>%
summarise((Value > 10) = sum(Value)) %>%
ungroup()
We can use dplyr::row_number to effectively rank the observations after using arrange to order the data by Value. Then, we augment the Product column so that any values that aren't in the top 9 are coded as Rest. Finally, we group by the updated Product and take the sum using summarise
dat %>%
arrange(desc(Value)) %>%
mutate(RowNum = row_number(),
Product = ifelse(RowNum <= 9, Product, 'Rest')) %>%
group_by(Product) %>%
summarise(Value = sum(Value))
# A tibble: 10 × 2
Product Value
<chr> <int>
1 d 10
2 e 15
3 i 50
4 j 20
5 k 35
6 l 25
7 o 30
8 q 40
9 Rest 46
10 t 40
data
dat <- structure(list(Product = c("a", "b", "c", "d", "e", "f", "g",
"h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t"
), Value = c(2L, 4L, 3L, 10L, 15L, 5L, 6L, 4L, 50L, 20L, 35L,
25L, 4L, 6L, 30L, 4L, 40L, 5L, 3L, 40L)), .Names = c("Product",
"Value"), class = "data.frame", row.names = c(NA, -20L))
Another way with dplyr would be to create the outcome with do. The code becomes a bit hard to read since you need to use .$, yet you can avoid ifelse/if_else. After arranging the order by Value, you can create two vectors. One with the first nine product names and "rest". The other with the first nine values and the sum of the value of the other values. You directly create a data frame using do.
df %>%
arrange(desc(Value)) %>%
do(data.frame(Product = c(as.character(.$Product[1:9]), "Rest"),
Value = c(.$Value[1:9], sum(.$Value[10:length(.$Value)]))))
# Product Value
#1 i 50
#2 q 40
#3 t 40
#4 k 35
#5 o 30
#6 l 25
#7 j 20
#8 e 15
#9 d 10
#10 Rest 46
Here is one option using data.table
library(data.table)
setDT(df)[, i1 := .I][order(desc(Value))
][-(seq_len(9)), Product := 'rest'
][, .(Value = sum(Value), i1=i1[1L]), Product
][order(Product=='rest', i1)][, i1 := NULL][]
# Product Value
#1: d 10
#2: e 15
#3: i 50
#4: j 20
#5: k 35
#6: l 25
#7: o 30
#8: q 40
#9: t 40
#10: rest 46

Transform a dataframe to use first column values as column names

I have a dataframe with 2 columns:
.id vals
1 A 10
2 B 20
3 C 30
4 A 100
5 B 200
6 C 300
dput(tst_df)
structure(list(.id = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("A",
"B", "C"), class = "factor"), vals = c(10, 20, 30, 100, 200,
300)), .Names = c(".id", "vals"), row.names = c(NA, -6L), class = "data.frame")
Now i want to have the .id column to become my column names and the vals will become 2 rows.
Like this:
A B C
10 20 30
100 200 300
Basically .id is my grouping variable and i want to have all values belonging to 1 group as a row. I expected something simple like melt and transform. But after many tries i still not succeeded. Is anyone familiar with a function that will accomplish this?
You can do this in base R with unstack:
unstack(df, form=vals~.id)
A B C
1 10 20 30
2 100 200 300
The first argument is the name of the data.frame and the second is a formula which determines the unstacked structure.
You can also use tapply,
do.call(cbind, tapply(df$vals, df$.id, I))
# A B C
#[1,] 10 20 30
#[2,] 100 200 300
or wrap it in data frame, i.e.
as.data.frame(do.call(cbind, tapply(df$vals, df$.id, I)))

Resources