Counting the number of occurrences of a combination of values in r - r

I'm working with data concerning different cases going through a proces consisting of different fases during a certain period in time. Each case has an unique id number. A proces can start in multiple fases and ends in fase "Finished" (except for still ungoing fases). A case can go through a proces multiple times.
The data looks similar to this:
library(dplyr)
df1 <- structure(list(id = c("1", "1", "2", "2", "2", "2", "3", "3",
"3", "3", "3", "3", "3", "3", "3", "3"), time = structure(c(17453,
17458, 17453, 17462, 17727, 17735, 17453, 17484, 17568, 17665,
17665, 17709, 17727, 17727, 17757, 17819), class = "Date"), old_fase =
c(NA, "Fase 1", NA, "Fase 1", "Finished", "Fase 1", NA, "Fase 1", "Fase 2A",
"Finished", "Fase 2A", "Fase 2B", "Finished", "Fase 2B", "Fase 1",
"Fase 2A"), new_fase = c("Fase 1", "Finished", "Fase 1", "Finished",
"Fase 1", "Finished", "Fase 1", "Fase 2A", "Finished", "Fase 2A",
"Fase 2B", "Finished", "Fase 2B", "Fase 1", "Fase 2A", "Fase 2B"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -16L))
For my analysis I want to create a new id based on the occurrence of each proces per id. Using group_by and mutate on "id" and "new_fase" creates the following incorrect solution. This happens because of the first occurrence of "Fase 2B" in row 11.
df1 %>%
group_by(id,new_fase) %>%
mutate(occurrence=row_number())
The correct solution should look like this:
df1 %>%
mutate(occurrence = c(rep(1, 4), 2, 2, rep(1, 3), rep(2, 3), rep(3, 4)))
I tried multiple approaches and read multiple Stackoverflow posts, but I am not able to figure it out correctly. Any help is appreciated, preferably using a tidyverse solution.

We can use ave from base R
df2$occurrence <- with(df2, ave(seq_along(id), id, fase, FUN = seq_along))
Or with data.table
library(data.table)
setDT(df2)[, occurrence := seq_len(.N), .(id, fase)]

df3<- df1 %>%
group_by(id,fase) %>%
mutate(occurrence=row_number())
df3
# A tibble: 18 x 4
# Groups: id, fase [9]
id fase time occurrence
<dbl> <chr> <date> <int>
1 1 a 2018-01-01 1
2 1 b 2018-01-02 1
3 1 c 2018-01-03 1
4 2 a 2018-01-01 1
5 2 b 2018-01-02 1
6 2 c 2018-01-03 1
7 2 a 2018-01-04 2
8 2 b 2018-01-05 2
9 2 c 2018-01-06 2
10 2 a 2018-01-07 3
11 2 b 2018-01-08 3
12 2 c 2018-01-09 3
13 3 a 2018-01-01 1
14 3 b 2018-01-02 1
15 3 c 2018-01-03 1
16 3 a 2018-01-04 2
17 3 b 2018-01-05 2
18 3 c 2018-01-06 2
all(df2==df3)
[1] TRUE
You break down (group) the df into parts where each part has the same id and phase, and then you simply number the rows in each of these parts. Note this assumes the df is already sorted chronologically, as in your sample data. If this is not true, you'll have to sort it in advance by time.

I found this temporary solution (thanks to iod's solution on the first example using group_by and mutate).
df1 %>% filter(is.na(old_fase) | old_fase == "Finished") %>% # indicates the beginning of a new proces
group_by(id) %>%
mutate(occurrence = row_number()) %>%
select(id, time, occurrence) %>%
left_join(df1, ., by = c("id", "time")) %>%
fill(occurrence)

Related

Group_by multiple columns and summarise unique column

I have a dataset below
family
type
inc
name
AA
success
30000
Bill
AA
ERROR
15000
Bess
CC
Pending
22000
Art
CC
Pending
18000
Amy
AA
Serve not respnding d
25000
Paul
ZZ
Success
50000
Pat
ZZ
Processing
50000
Pat
I want to group by multiple columns
here is my code bellow
df<-df1%>%
group_by(Family , type)%>%
summarise(Transaction_count = n(), Face_value = sum(Inc))%>%
mutate(Pct = Transaction_count/sum(Transaction_count))
what I want is that anywhere there is same observation Family, it should pick only one
like this result in the picture below.
Thank you
You can use duplicated to replace the repeating values with blank value.
library(dplyr)
df %>%
group_by(family , type)%>%
summarise(Transaction_count = n(), Face_value = sum(inc))%>%
mutate(Pct = Transaction_count/sum(Transaction_count),
family = replace(family, duplicated(family), '')) %>%
ungroup
# family type Transaction_count Face_value Pct
# <chr> <chr> <int> <int> <dbl>
#1 "AA" ERROR 1 15000 0.333
#2 "" Serve not respnding d 1 25000 0.333
#3 "" success 1 30000 0.333
#4 "CC" Pending 2 40000 1
#5 "ZZ" Processing 1 50000 0.5
#6 "" Success 1 50000 0.5
If you want data for displaying purpose you may look into packages like formattable, kable etc.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(family = c("AA", "AA", "CC", "CC", "AA", "ZZ",
"ZZ"), type = c("success", "ERROR", "Pending", "Pending", "Serve not respnding d",
"Success", "Processing"), inc = c(30000L, 15000L, 22000L, 18000L,
25000L, 50000L, 50000L), name = c("Bill", "Bess", "Art", "Amy",
"Paul", "Pat", "Pat")), row.names = c(NA, -7L), class = "data.frame")

How can I reshape multiple columns of long data to wide?

structure(tibble(c("top", "jng", "mid", "bot", "sup"), c("369", "Karsa", "knight", "JackeyLove", "yuyanjia"),
c("Malphite", "Rek'Sai", "Zoe", "Aphelios", "Braum"), c("1", "1", "1", "1", "1"), c("7", "5", "7", "5", "0"),
c("6079-7578", "6079-7578", "6079-7578", "6079-7578", "6079-7578")), .Names = c("position", "player", "champion", "result", "kills", "gameid"))
Output:
# A tibble: 5 x 6
position player champion result kills gameid
* <chr> <chr> <chr> <chr> <chr> <chr>
1 top 369 Malphite 1 7 6079-7578
2 jng Karsa Rek'Sai 1 5 6079-7578
3 mid knight Zoe 1 7 6079-7578
4 bot JackeyLove Aphelios 1 5 6079-7578
5 sup yuyanjia Braum 1 0 6079-7578
My desired output would be:
structure(list(gameid = "6079-7578", result = "1", player_top = "369",
player_jng = "Karsa", player_mid = "knight", player_bot = "JackeyLove",
player_sup = "yuyanjia", champion_top = "Malphite", champion_jng = "Rek'Sai",
champion_mid = "Zoe", champion_bot = "Aphelios", champion_sup = "Braum",
kills_top = "7", kills_jng = "5", kills_mid = "7", kills_bot = "5",
kills_sup = "0"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"))
which looks like this:
gameid result player_top player_jng player_mid player_bot player_sup champion_top champion_jng champion_mid champion_bot champion_sup
1 6079-7578 1 369 Karsa knight JackeyLove yuyanjia Malphite RekSai Zoe Aphelios Braum
kills_top kills_jng kills_mid kills_bot kills_sup
1 7 5 7 5 0
I know I should use pivot_wider() and something like drop_na, but I don't know how to do pivot_wider() with mutiple columns and collapse the rows at the same time. Any help would be appreciated.
You can use pivot_wider() for this, defining the "position" variable as the variable that the new column names come from in names_from and the three variables with values you want to use to fill those columns with as values_from.
By default the multiple values_from variables are pasted on to the front of new columns names. This can be changed, but in this case that matches the naming structure you want.
All other variables in the original dataset will be used as the id_cols in the order that they appear.
library(tidyr)
pivot_wider(dat,
names_from = "position",
values_from = c("player", "champion", "kills"))
#> result gameid player_top player_jng player_mid player_bot player_sup
#> 1 1 6079-7578 369 Karsa knight JackeyLove yuyanjia
#> champion_top champion_jng champion_mid champion_bot champion_sup kills_top
#> 1 Malphite Rek'Sai Zoe Aphelios Braum 7
#> kills_jng kills_mid kills_bot kills_sup
#> 1 5 7 5 0
You can control the order of your id columns in the output by explicitly writing them out via id_cols. Here's an example, matching your desired output.
pivot_wider(dat, id_cols = c("gameid", "result"),
names_from = "position",
values_from = c("player", "champion", "kills"))
#> gameid result player_top player_jng player_mid player_bot player_sup
#> 1 6079-7578 1 369 Karsa knight JackeyLove yuyanjia
#> champion_top champion_jng champion_mid champion_bot champion_sup kills_top
#> 1 Malphite Rek'Sai Zoe Aphelios Braum 7
#> kills_jng kills_mid kills_bot kills_sup
#> 1 5 7 5 0
Created on 2021-06-24 by the reprex package (v2.0.0)
Using data.table might help here. In dcast() each row will be identified by a unique combo of gameid and result, the columns will be spread by position, and filled with values from the variables listed in value.var.
library(data.table)
library(dplyr)
df <- structure(tibble(c("top", "jng", "mid", "bot", "sup"), c("369", "Karsa", "knight", "JackeyLove", "yuyanjia"),
c("Malphite", "Rek'Sai", "Zoe", "Aphelios", "Braum"), c("1", "1", "1", "1", "1"), c("7", "5", "7", "5", "0"),
c("6079-7578", "6079-7578", "6079-7578", "6079-7578", "6079-7578")), .Names = c("position", "player", "champion", "result", "kills", "gameid"))
df2 <- dcast(setDT(df), gameid + result~position, value.var = list('player','champion','kills'))

Insert a Blank Cell into Every "X"th Row in the Same Column

In an excel file, I have the following table with headers as such:
**Date** **Session** **Player** **Pre** **Post** **Distance(m)**
Jan 1 1 Player 1 3 6 1000
Jan 1 1 Player 2 3 7 1500
Jan 1 1 Player 3 4 10 4000
Jan 1 1 Player 4 1 3 600
Jan 2 2 Player 1 2 5 1000
Jan 2 2 Player 2 - - 1750
Jan 2 2 Player 3 5 5 3000
Jan 2 2 Player 4 3 6 1000
Jan 3 3 Player 1 3 5 2500
Jan 3 3 Player 2 3 8 1500
Jan 3 3 Player 3 7 7 2500
Jan 3 3 Player 4 - - -
What am I trying to accomplish is to look at the distance numbers and compare them with the Pre numbers for the following session. So on Session 1 for Player 1, their distance (1000) and their Pre # from Jan 2 (2) should be in the same row.
To do this, after sorting the players by session number, I am trying to find a way to insert an empty cell - in the distance column for each player which acts as a placeholder for what would be session 0. This essentially bumps down the distances to match up with the next day's Pre #.
So after performing that on this data set, the result would look like this:
**Player** **Pre for the following Day** **Distance**
Player 1 3 (S1) - (Session 0 - Does Not Exist) (This value is inserted)
Player 1 2 (S2) 1000(Session 1)
Player 1 3 (S3) 1000(Session 2)
Player 1 - (S4 - Not included in this example) 2500(Session 3)
Player 2 3 (S1) - (S0)
Player 2 - (S2) 1500(S1)
Player 2 3 (S3) 1750(S2)
Player 2 - (S4) 1500(S3)
Player 3 4 (S1) - (S0)
Player 3 5 (S2) 4000(S1)
Player 3 7 (S3) 3000(S2)
Player 3 - (S4) 2500(S3)
Player 4 left out for time/redundancy sake
In this example, session 3 is the last session so the Pre for S4 for all players would just be inserted also as - by default.
So a - needs to be inserted every 4 rows to match each distance and the correct player, and after the last session, create a new row for each player giving - for Pre and Post, and the correct distance.
In my attempt to do this, I have the following code and dataset:
From dput()
structure(list(Date = structure(c(1577836800, 1577836800, 1577836800,
1577836800, 1577923200, 1577923200, 1577923200, 1577923200, 1578009600,
1578009600, 1578009600, 1578009600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Session = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3), Player = c("Player 1", "Player 2", "Player 3", "Player 4",
"Player 1", "Player 2", "Player 3", "Player 4", "Player 1", "Player 2",
"Player 3", "Player 4"), Pre = c("3", "3", "4", "1", "2", "-",
"5", "3", "3", "3", "7", "-"), Post = c("6", "7", "10", "3",
"5", "-", "5", "6", "5", "8", "7", "-"), Distance = c("1000",
"1500", "4000", "600", "1000", "1750", "3000", "1000", "-", "1500",
"2500", "-")), row.names = c(NA, 12L), class = "data.frame")
and my code:
test1 <- data.frame("2020-01-01",1,"Player 1",3,6, "-")
test2 <- data.frame("2020-01-01",4,"Player 1","-","-","2500")
names(test1) <- c("Date", "Session", "Player", "Pre", "Post", "Distance")
names(test2) <- c("Date", "Session", "Player", "Pre", "Post", "Distance")
new <- rbind(test1, stackEX) #This puts the new row at the top where I want it
#Not sure why this removes dates for other rows though
new <- rbind(new, test2)#This is for Session 4 which does not exist in this example
But using this way does not insert a - cell in the distance column to bump the values down, and instead I am only aware of how to add an entire new row rather than one cell.
This can be solved by joining with a complete set of Player / Session combinations and by shifting Distance:
library(data.table)
setDT(DF)[CJ(Player, Session = 1:4, unique = TRUE), on = .(Player, Session)][
, Distance := shift(Distance)][]
Date Session Player Pre Post Distance
1: 2020-01-01 1 Player 1 3 6 <NA>
2: 2020-01-02 2 Player 1 2 5 1000
3: 2020-01-03 3 Player 1 3 5 1000
4: <NA> 4 Player 1 <NA> <NA> 2500
5: 2020-01-01 1 Player 2 3 7 <NA>
6: 2020-01-02 2 Player 2 - - 1500
7: 2020-01-03 3 Player 2 3 8 1750
8: <NA> 4 Player 2 <NA> <NA> 1500
9: 2020-01-01 1 Player 3 4 10 <NA>
10: 2020-01-02 2 Player 3 5 5 4000
11: 2020-01-03 3 Player 3 7 7 3000
12: <NA> 4 Player 3 <NA> <NA> 2500
13: 2020-01-01 1 Player 4 1 3 <NA>
14: 2020-01-02 2 Player 4 3 6 600
15: 2020-01-03 3 Player 4 - - 1000
16: <NA> 4 Player 4 <NA> <NA> -
The cross join expression
CJ(Player, Session = 1:4, unique = TRUE)
returns all Player / Session combos:
Player Session
1: Player 1 1
2: Player 1 2
3: Player 1 3
4: Player 1 4
5: Player 2 1
6: Player 2 2
7: Player 2 3
8: Player 2 4
9: Player 3 1
10: Player 3 2
11: Player 3 3
12: Player 3 4
13: Player 4 1
14: Player 4 2
15: Player 4 3
16: Player 4 4
The default arguments of shift() are sufficient here: shift(Distance) lags Distance by one and NA is used for filling, i.e., the values in the Distance column are moved down to the next row. So row 4 (Session 4) for Player 1 gets the Distance value of the previous row (Session 3) as requested. The empty row at the top becomes NA. See also help("shift", "data.table").
Note that we do not need to group here because the whole column is lagged.
Data
DF <- structure(list(Date = structure(c(1577836800, 1577836800, 1577836800,
1577836800, 1577923200, 1577923200, 1577923200, 1577923200, 1578009600,
1578009600, 1578009600, 1578009600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Session = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3), Player = c("Player 1", "Player 2", "Player 3", "Player 4",
"Player 1", "Player 2", "Player 3", "Player 4", "Player 1", "Player 2",
"Player 3", "Player 4"), Pre = c("3", "3", "4", "1", "2", "-",
"5", "3", "3", "3", "7", "-"), Post = c("6", "7", "10", "3",
"5", "-", "5", "6", "5", "8", "7", "-"), Distance = c("1000",
"1500", "4000", "600", "1000", "1750", "3000", "1000", "2500",
"1500", "2500", "-")), row.names = c(NA, 12L), class = "data.frame")

Updating a data frame's columns based on other columns

I have a data frame containing a person's stage, as follows (this is only a sample of a very large one):
df = structure(list(DeceasedDate = c(0.283219178082192, 1.12678843226788,
2.02865296803653, 0.892465753424658, NA, 0.88013698630137, NA
), LastClinicalEventMonthEnd = c(0.244862981988838, 1.03637744165398,
10.9464611555048, 0.763698598427194, 3.35011412354135, 0.677397228564181,
3.83687211440893), FirstYStage = c("N/A", "2", "2", "2", "2",
"2", "3.1"), SecondYStage = c("N/A", "N/A", "2", "N/A", "2",
"N/A", "3.1"), ThirdYStage = c("N/A", "N/A", "2", "N/A", "2",
"N/A", "3.1"), FourthYStage = c("N/A", "N/A", "N/A", "N/A", "2",
"N/A", "3.1"), FifthYStage = c("N/A", "N/A", "N/A", "N/A", "N/A",
"N/A", "N/A")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-7L))
The 5 right hand columns are a stage of a person, but do not contain all the information yet. I need to include the information in the first two columns, in which the numbers are in years, as follows:
if the value in column 1 is smaller than a year, FirstYStage should be "Deceased", and also all the next columns (the person is still dead...); if the value is between 1 and 2, it SecondYStage should be "Deceased", and so on.
if the value in column 2 is smaller than a year, SecondYStage should be "EndOfEvents"; if the value is between 1 and 2, it SecondYStage should be "EndOfEvents", and so on.
So the expected output in this case should be:
df_updated = structure(list(DeceasedDate = c(0.283219178082192,
1.12678843226788,
2.02865296803653, 0.892465753424658, NA, 0.88013698630137, NA
), LastClinicalEventMonthEnd = c(0.244862981988838, 1.03637744165398,
10.9464611555048, 0.763698598427194, 3.35011412354135, 0.677397228564181,
3.83687211440893), FirstYStage = c("Deceased", "2", "2", "Deceased",
"2", "Deceased", "3.1"), SecondYStage = c("Deceased", "Deceased",
"2", "Deceased", "2", "Deceased", "3.1"), ThirdYStage = c("Deceased",
"Deceased", "Deceased", "Deceased", "2", "Deceased", "3.1"),
FourthYStage = c("Deceased", "Deceased", "Deceased", "Deceased",
"2", "Deceased", "3.1"), FifthYStage = c("Deceased", "Deceased",
"Deceased", "Deceased", "LastEvent", "Deceased", "LastEvent"
)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"
))
One important point is that "Death" should be given priority, in other words, if there is a clash and on the one hand there is a number and "death" is contradicting it, we should prefer death.
How do I do this in the most efficient way? At the moment I am doing if's but I think it is not the best course of action
This is what I would do:
Reshape from wide to long format
Compute years from column names
Selectively update the value column
Reshape back to wide format
data.table
As I am more fluent in data.table than in dplyr here is the approach implemented in data.table syntax. (Apologies but I will add a dplyr solution if time permits.)
library(data.table)
long <- melt(setDT(df)[, rn := .I], measure.vars = patterns("Stage$"))
long[, year := as.integer(variable)] # column index
long[floor(DeceasedDate) < year, value := "Deceased"]
long[is.na(DeceasedDate) & floor(LastClinicalEventMonthEnd) + 1 < year, value := "EndOfEvents"]
dcast(long, rn + DeceasedDate + LastClinicalEventMonthEnd ~ variable)
rn DeceasedDate LastClinicalEventMonthEnd FirstYStage SecondYStage ThirdYStage FourthYStage FifthYStage
1: 1 0.2832192 0.2448630 Deceased Deceased Deceased Deceased Deceased
2: 2 1.1267884 1.0363774 2 Deceased Deceased Deceased Deceased
3: 3 2.0286530 10.9464612 2 2 Deceased Deceased Deceased
4: 4 0.8924658 0.7636986 Deceased Deceased Deceased Deceased Deceased
5: 5 NA 3.3501141 2 2 2 2 EndOfEvents
6: 6 0.8801370 0.6773972 Deceased Deceased Deceased Deceased Deceased
7: 7 NA 3.8368721 3.1 3.1 3.1 3.1 EndOfEvents
dplyr / tidyr
As promised, here is also a dplyr/tidyr implemention of the same approach:
library(tidyr)
library(dplyr)
df %>%
mutate(rn = row_number()) %>%
gather(key, val, ends_with("Stage"), factor_key = TRUE) %>%
mutate(year = as.integer(key)) %>%
mutate(val = if_else(!is.na(DeceasedDate) & floor(DeceasedDate) < year, "Deceased", val)) %>%
mutate(val = if_else(is.na(DeceasedDate) & floor(LastClinicalEventMonthEnd) + 1 < year, "EndOfEvents", val)) %>%
select(-year) %>%
spread(key, val) %>%
arrange(rn)
DeceasedDate LastClinicalEventMonthEnd rn FirstYStage SecondYStage ThirdYStage FourthYStage FifthYStage
1 0.2832192 0.2448630 1 Deceased Deceased Deceased Deceased Deceased
2 1.1267884 1.0363774 2 2 Deceased Deceased Deceased Deceased
3 2.0286530 10.9464612 3 2 2 Deceased Deceased Deceased
4 0.8924658 0.7636986 4 Deceased Deceased Deceased Deceased Deceased
5 NA 3.3501141 5 2 2 2 2 EndOfEvents
6 0.8801370 0.6773972 6 Deceased Deceased Deceased Deceased Deceased
7 NA 3.8368721 7 3.1 3.1 3.1 3.1 EndOfEvents
or without creating a year column:
df %>%
mutate(rn = row_number()) %>%
gather(key, val, ends_with("Stage"), factor_key = TRUE) %>%
mutate(val = if_else(!is.na(DeceasedDate) & floor(DeceasedDate) < as.integer(key),
"Deceased", val)) %>%
mutate(val = if_else(is.na(DeceasedDate) & floor(LastClinicalEventMonthEnd) + 1 < as.integer(key),
"EndOfEvents", val)) %>%
spread(key, val) %>%
arrange(rn)

Transposing data frames [duplicate]

This question already has answers here:
Transposing a dataframe maintaining the first column as heading
(5 answers)
Closed 1 year ago.
Happy Weekends.
I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t, preferably using tidyr or reshape. In example below, metadata is obtained by transposing data.
metadata <- data.frame(colnames(data), t(data[1:4, ]) )
colnames(metadata) <- t(metadata[1,])
metadata <- metadata[-1,]
metadata$Multiplier <- as.numeric(metadata$Multiplier)
Though it achieves what I want, I find it little unskillful. Is there any efficient workflow to transpose the data frame?
dput of data
data <- structure(list(Series.Description = c("Unit:", "Multiplier:",
"Currency:", "Unique Identifier: "), Nominal.Broad.Dollar.Index. = c("Index:_1997_Jan_100",
"1", NA, "H10/H10/JRXWTFB_N.M"), Nominal.Major.Currencies.Dollar.Index. = c("Index:_1973_Mar_100",
"1", NA, "H10/H10/JRXWTFN_N.M"), Nominal.Other.Important.Trading.Partners.Dollar.Index. = c("Index:_1997_Jan_100",
"1", NA, "H10/H10/JRXWTFO_N.M"), AUSTRALIA....SPOT.EXCHANGE.RATE..US..AUSTRALIAN...RECIPROCAL.OF.RXI_N.M.AL. = c("Currency:_Per_AUD",
"1", "USD", "H10/H10/RXI$US_N.M.AL"), SPOT.EXCHANGE.RATE...EURO.AREA. = c("Currency:_Per_EUR",
"1", "USD", "H10/H10/RXI$US_N.M.EU"), NEW.ZEALAND....SPOT.EXCHANGE.RATE..US..NZ...RECIPROCAL.OF.RXI_N.M.NZ.. = c("Currency:_Per_NZD",
"1", "USD", "H10/H10/RXI$US_N.M.NZ"), United.Kingdom....Spot.Exchange.Rate..US..Pound.Sterling.Reciprocal.of.rxi_n.m.uk = c("Currency:_Per_GBP",
"0.01", "USD", "H10/H10/RXI$US_N.M.UK"), BRAZIL....SPOT.EXCHANGE.RATE..REAIS.US.. = c("Currency:_Per_USD",
"1", "BRL", "H10/H10/RXI_N.M.BZ"), CANADA....SPOT.EXCHANGE.RATE..CANADIAN...US.. = c("Currency:_Per_USD",
"1", "CAD", "H10/H10/RXI_N.M.CA"), CHINA....SPOT.EXCHANGE.RATE..YUAN.US.. = c("Currency:_Per_USD",
"1", "CNY", "H10/H10/RXI_N.M.CH"), DENMARK....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD",
"1", "DKK", "H10/H10/RXI_N.M.DN"), HONG.KONG....SPOT.EXCHANGE.RATE..HK..US.. = c("Currency:_Per_USD",
"1", "HKD", "H10/H10/RXI_N.M.HK"), INDIA....SPOT.EXCHANGE.RATE..RUPEES.US. = c("Currency:_Per_USD",
"1", "INR", "H10/H10/RXI_N.M.IN"), JAPAN....SPOT.EXCHANGE.RATE..YEA.US.. = c("Currency:_Per_USD",
"1", "JPY", "H10/H10/RXI_N.M.JA"), KOREA....SPOT.EXCHANGE.RATE..WON.US.. = c("Currency:_Per_USD",
"1", "KRW", "H10/H10/RXI_N.M.KO"), Malaysia...Spot.Exchange.Rate..Ringgit.US.. = c("Currency:_Per_USD",
"1", "MYR", "H10/H10/RXI_N.M.MA"), MEXICO....SPOT.EXCHANGE.RATE..PESOS.US.. = c("Currency:_Per_USD",
"1", "MXN", "H10/H10/RXI_N.M.MX"), NORWAY....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD",
"1", "NOK", "H10/H10/RXI_N.M.NO"), SWEDEN....SPOT.EXCHANGE.RATE..KRONOR.US.. = c("Currency:_Per_USD",
"1", "SEK", "H10/H10/RXI_N.M.SD"), SOUTH.AFRICA....SPOT.EXCHANGE.RATE..RAND.US.. = c("Currency:_Per_USD",
"1", "ZAR", "H10/H10/RXI_N.M.SF"), Singapore...SPOT.EXCHANGE.RATE..SINGAPORE...US.. = c("Currency:_Per_USD",
"1", "SGD", "H10/H10/RXI_N.M.SI"), SRI.LANKA....SPOT.EXCHANGE.RATE..RUPEES.US.. = c("Currency:_Per_USD",
"1", "LKR", "H10/H10/RXI_N.M.SL"), SWITZERLAND....SPOT.EXCHANGE.RATE..FRANCS.US.. = c("Currency:_Per_USD",
"1", "CHF", "H10/H10/RXI_N.M.SZ"), TAIWAN....SPOT.EXCHANGE.RATE..NT..US.. = c("Currency:_Per_USD",
"1", "TWD", "H10/H10/RXI_N.M.TA"), THAILAND....SPOT.EXCHANGE.RATE....THAILAND. = c("Currency:_Per_USD",
"1", "THB", "H10/H10/RXI_N.M.TH"), VENEZUELA....SPOT.EXCHANGE.RATE..BOLIVARES.US.. = c("Currency:_Per_USD",
"1", "VEB", "H10/H10/RXI_N.M.VE")), .Names = c("Series.Description",
"Nominal.Broad.Dollar.Index.", "Nominal.Major.Currencies.Dollar.Index.",
"Nominal.Other.Important.Trading.Partners.Dollar.Index.", "AUSTRALIA....SPOT.EXCHANGE.RATE..US..AUSTRALIAN...RECIPROCAL.OF.RXI_N.M.AL.",
"SPOT.EXCHANGE.RATE...EURO.AREA.", "NEW.ZEALAND....SPOT.EXCHANGE.RATE..US..NZ...RECIPROCAL.OF.RXI_N.M.NZ..",
"United.Kingdom....Spot.Exchange.Rate..US..Pound.Sterling.Reciprocal.of.rxi_n.m.uk",
"BRAZIL....SPOT.EXCHANGE.RATE..REAIS.US..", "CANADA....SPOT.EXCHANGE.RATE..CANADIAN...US..",
"CHINA....SPOT.EXCHANGE.RATE..YUAN.US..", "DENMARK....SPOT.EXCHANGE.RATE..KRONER.US..",
"HONG.KONG....SPOT.EXCHANGE.RATE..HK..US..", "INDIA....SPOT.EXCHANGE.RATE..RUPEES.US.",
"JAPAN....SPOT.EXCHANGE.RATE..YEA.US..", "KOREA....SPOT.EXCHANGE.RATE..WON.US..",
"Malaysia...Spot.Exchange.Rate..Ringgit.US..", "MEXICO....SPOT.EXCHANGE.RATE..PESOS.US..",
"NORWAY....SPOT.EXCHANGE.RATE..KRONER.US..", "SWEDEN....SPOT.EXCHANGE.RATE..KRONOR.US..",
"SOUTH.AFRICA....SPOT.EXCHANGE.RATE..RAND.US..", "Singapore...SPOT.EXCHANGE.RATE..SINGAPORE...US..",
"SRI.LANKA....SPOT.EXCHANGE.RATE..RUPEES.US..", "SWITZERLAND....SPOT.EXCHANGE.RATE..FRANCS.US..",
"TAIWAN....SPOT.EXCHANGE.RATE..NT..US..", "THAILAND....SPOT.EXCHANGE.RATE....THAILAND.",
"VENEZUELA....SPOT.EXCHANGE.RATE..BOLIVARES.US.."), row.names = c(NA,
4L), class = "data.frame")
Using tidyr, you gather all the columns except the first, and then you spread the gathered columns.
Try:
library(dplyr)
library(tidyr)
data %>%
gather(var, val, 2:ncol(data)) %>%
spread(Series.Description, val)
library(dplyr)
# Omitted data <- structure part ...
Here is something that replicates what's in the main answer, but more generically (e.g., works where Series.Description is not the first column of the result) and using the newer pivot_wider/pivot_longer verbs.
df_transpose <- function(df) {
df %>%
tidyr::pivot_longer(-1) %>%
tidyr::pivot_wider(names_from = 1, values_from = value)
}
df_transpose(data)
#> # A tibble: 26 x 5
#> name `Unit:` `Multiplier:` `Currency:` `Unique Identifi…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Nominal.Broad.Dollar.… Index:_19… 1 <NA> H10/H10/JRXWTFB_…
#> 2 Nominal.Major.Currenc… Index:_19… 1 <NA> H10/H10/JRXWTFN_…
#> 3 Nominal.Other.Importa… Index:_19… 1 <NA> H10/H10/JRXWTFO_…
#> 4 AUSTRALIA....SPOT.EXC… Currency:… 1 USD H10/H10/RXI$US_N…
#> 5 SPOT.EXCHANGE.RATE...… Currency:… 1 USD H10/H10/RXI$US_N…
#> 6 NEW.ZEALAND....SPOT.E… Currency:… 1 USD H10/H10/RXI$US_N…
#> 7 United.Kingdom....Spo… Currency:… 0.01 USD H10/H10/RXI$US_N…
#> 8 BRAZIL....SPOT.EXCHAN… Currency:… 1 BRL H10/H10/RXI_N.M.…
#> 9 CANADA....SPOT.EXCHAN… Currency:… 1 CAD H10/H10/RXI_N.M.…
#> 10 CHINA....SPOT.EXCHANG… Currency:… 1 CNY H10/H10/RXI_N.M.…
#> # … with 16 more rows
But note that (like the answer above) the name of the first column is lost. The following retains this (as, I guess does the spread_(names(data)[1], "val") approach proposed by #jbkunst above).
df_transpose <- function(df) {
first_name <- colnames(df)[1]
temp <-
df %>%
tidyr::pivot_longer(-1) %>%
tidyr::pivot_wider(names_from = 1, values_from = value)
colnames(temp)[1] <- first_name
temp
}
df_transpose(data)
#> # A tibble: 26 x 5
#> Series.Description `Unit:` `Multiplier:` `Currency:` `Unique Identif…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Nominal.Broad.Dollar.In… Index:_1… 1 <NA> H10/H10/JRXWTFB…
#> 2 Nominal.Major.Currencie… Index:_1… 1 <NA> H10/H10/JRXWTFN…
#> 3 Nominal.Other.Important… Index:_1… 1 <NA> H10/H10/JRXWTFO…
#> 4 AUSTRALIA....SPOT.EXCHA… Currency… 1 USD H10/H10/RXI$US_…
#> 5 SPOT.EXCHANGE.RATE...EU… Currency… 1 USD H10/H10/RXI$US_…
#> 6 NEW.ZEALAND....SPOT.EXC… Currency… 1 USD H10/H10/RXI$US_…
#> 7 United.Kingdom....Spot.… Currency… 0.01 USD H10/H10/RXI$US_…
#> 8 BRAZIL....SPOT.EXCHANGE… Currency… 1 BRL H10/H10/RXI_N.M…
#> 9 CANADA....SPOT.EXCHANGE… Currency… 1 CAD H10/H10/RXI_N.M…
#> 10 CHINA....SPOT.EXCHANGE.… Currency… 1 CNY H10/H10/RXI_N.M…
#> # … with 16 more rows
Created on 2021-05-30 by the reprex package (v2.0.0)

Resources