I have a complex dataset that looks like this:
df1 <- tibble::tribble(~"Canada > London", ~"", ~"Notes", ~"United Kingdom > London", ~"", ~"",
"Restaurant", "Price", "Range", "Restaurant", "Price", "Range",
"Fried beef", "27", "25-30", "Fried beef", "29", "25 - 35",
"Fried potato", "5", "3 - 8", "Fried potato", "8", "3 - 8",
"Bar", "Price", "Range", "Price", "Range", "",
"Beer Lager", "5", "4 - 8", "Beer Lager", "6", "4 - 8",
"Beer Dark", "4", "3 - 7", "Beer Dark", "5", "3 - 7")
Or, for visual representation:
It is long in parameters (like Beer Lager, Beer Dark, ....) and wide by the data input (many wide elements like Canada > London, or United Kingdom > London).
The desired output would be two datasets that should look like this:
The first dataset (the Values):
The second dataset (the Ranges):
Any suggestions would be much appreciated :)
Your data is neither wide nor long but is a messy data table which needs some cleaning to convert it to tidy data. Afterwards you could get your desired tables using tidyr::pivot_wider:
library(dplyr)
library(tidyr)
library(purrr)
tidy_data <- function(.data, cols) {
.data <- .data[cols]
place <- names(.data)[[1]]
.data |>
rename(product = 1, price = 2, range = 3) |>
filter(!price %in% c("Price", "Range")) |>
mutate(place = place)
}
df1_tidy <- purrr::map_dfr(list(1:3, 4:6), tidy_data, .data = df1)
df1_tidy |>
select(place, product, price) |>
pivot_wider(names_from = product, values_from = price)
#> # A tibble: 2 × 5
#> place `Fried beef` `Fried potato` `Beer Lager` `Beer Dark`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Canada > London 27 5 5 4
#> 2 United Kingdom > London 29 8 6 5
df1_tidy |>
select(place, product, range) |>
pivot_wider(names_from = product, values_from = range, names_glue = "{product} Range")
#> # A tibble: 2 × 5
#> place `Fried beef Range` Fried potato Rang…¹ Beer …² Beer …³
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Canada > London 25-30 3 - 8 4 - 8 3 - 7
#> 2 United Kingdom > London 25 - 35 3 - 8 4 - 8 3 - 7
#> # … with abbreviated variable names ¹`Fried potato Range`, ²`Beer Lager Range`,
#> # ³`Beer Dark Range`
I agree with #stefan. You actually have 4 tables, or 2 depending on how you look at it. Here is an implementation of 2 functions that start the cleaning and formatting process. The first split the dfs by row and the second function splits them by column. After that it is easier to format, clean, and merge the dfs into 1.
library(tidyverse)
df0 = tibble::tribble(~"Canada > London", ~"", ~"Notes", ~"United Kingdom > London", ~"", ~"",
"Restaurant", "Price", "Range", "Restaurant", "Price", "Range",
"Fried beef", "27", "25-30", "Fried beef", "29", "25 - 35",
"Fried potato", "5", "3 - 8", "Fried potato", "8", "3 - 8",
"Bar", "Price", "Range", "Price", "Range", "",
"Beer Lager", "5", "4 - 8", "Beer Lager", "6", "4 - 8",
"Beer Dark", "4", "3 - 7", "Beer Dark", "5", "3 - 7")
split_rows = function(df){
# breaks of sub-dfs within original df
df_breaks = df[,2] == "Price"
df_breaks = (1:length(df_breaks))[df_breaks]
df_breaks
# list to populate in loop with sub-dfs
df_list = c()
for(i in 1:length(df_breaks)){
# get start of sub-df
start = df_breaks[i]
# get end of sub-df
if(i == length(df_breaks)){
end = nrow(df) # if its the last set it to the last row of the original df
}
else{
end = df_breaks[i+1]-1 # else, set it to the next start - 1
}
# subset df
df_temp = df[start:end,]
# first row as header
colnames(df_temp) = df_temp[1,]
df_temp = df_temp[-1,]
# append to df_list
df_list = append(df_list,list(df_temp))
}
return(df_list)
}
split_cols = function(df_list,second_df_col_start = 4){
df_list = lapply(df_list, function(df){
df1 = df[,1:(second_df_col_start-1)]
df2 = df[,second_df_col_start:ncol(df)]
return(list(df1,df2))
})
return(df_list)
}
output = split_rows(df0) %>%
split_cols()
output:
[[1]]
[[1]][[1]]
# A tibble: 2 × 3
Restaurant Price Range
<chr> <chr> <chr>
1 Fried beef 27 25-30
2 Fried potato 5 3 - 8
[[1]][[2]]
# A tibble: 2 × 3
Restaurant Price Range
<chr> <chr> <chr>
1 Fried beef 29 25 - 35
2 Fried potato 8 3 - 8
[[2]]
[[2]][[1]]
# A tibble: 2 × 3
Bar Price Range
<chr> <chr> <chr>
1 Beer Lager 5 4 - 8
2 Beer Dark 4 3 - 7
[[2]][[2]]
# A tibble: 2 × 3
Price Range ``
<chr> <chr> <chr>
1 Beer Lager 6 4 - 8
2 Beer Dark 5 3 - 7
Related
I am a beginner in R, and I need to learn how to perform code. As you can see in my data frame, I want to check whether the egg in column commodity has the same unit in all rows.
data frame:
df <- structure(list(commodity = c("eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs"), unit = c("1.8 kg", "900 g",
"810 g", "kg", "kg", "1.8 kg", "900 g", "810 g", "kg", "kg",
"1.8 kg")), class = "data.frame", row.names = c(NA, -11L))
commodity unit
1 eggs 1.8 kg
2 lentils (green) 900 g
3 oil (vegetable) 810 g
4 rice kg
5 sugar (white) kg
6 eggs 1.8 kg
7 lentils (green) 900 g
8 oil (vegetable) 810 g
9 rice kg
10 sugar (white) kg
11 eggs 1.8 kg
I do not know what I should do
One way could be:
First create a column with your units extracting only alphabetic letters, then use distinct():
library(dplyr)
df %>%
mutate(unit1 = gsub("[^a-zA-Z]", "", unit)) %>%
distinct(unit1)
unit1
1 kg
2 g
df <- structure(list(commodity = c("eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs"), unit = c("1.8 kg", "900 g",
"810 g", "kg", "kg", "1.8 kg", "900 g", "810 g", "kg", "kg",
"1.8 kg")), class = "data.frame", row.names = c(NA, -11L))
In base R, we could use
length(unique(trimws(df$unit, whitespace = "[0-9.]+\\s+"))) == 1
[1] FALSE
If it is to check on a subset of elements
with(df, length(unique(trimws(unit[grepl("eggs", commodity)],
whitespace = "[0-9.]+\\s+"))) == 1)
[1] TRUE
If we want to check for all elements
library(dplyr)
library(stringr)
df %>%
group_by(item = str_extract(commodity, "^\\w+(?=\\s*)")) %>%
summarise(isUnitSame = n_distinct(str_extract(unit, "[a-z]+$"))==1)
-output
# A tibble: 5 × 2
item isUnitSame
<chr> <lgl>
1 eggs TRUE
2 lentils TRUE
3 oil TRUE
4 rice TRUE
5 sugar TRUE
Is there a quick way to replace variable names with the content of the first row of a tibble?
So turning something like this:
Subject Q1 Q2 Q3
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
Into this:
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
My dataset has over 100 variables so I'm looking for a way that doesn't involve typing out each old and new variable name.
A possible solution:
df <- structure(list(Subject = c("Subject", "429753", "b952x8", "264062",
"53082m"), Q1 = c("age", "24", "23", "19", "35"), Q2 = c("gender",
"1", "2", "1", "1"), Q3 = c("cue", "man", "mushroom", "night",
"moon")), row.names = c(NA, -5L), class = "data.frame")
names(df) <- df[1,]
df <- df[-1,]
df
#> Subject age gender cue
#> 2 429753 24 1 man
#> 3 b952x8 23 2 mushroom
#> 4 264062 19 1 night
#> 5 53082m 35 1 moon
I'm dealing with school data and I need to duplicate rows which serve larger grade spans that just ES or HS. Some sample code to illustrate my problem:
# data
schools <- tibble(name = c("school a", "school b", "school c", "school z"),
type = c("es", "NA", "hs", "es"),
gslo = c("01", "08", "09", "KG"),
gshi = c("12", "12", "12", "05"))
schools
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
Where gslo and gshi are the lowest and highest grades served, respectively. In the United States these would be divided into high schools or middle schools or elementary schools, the type.
Some schools serve more than just elementary grades, but are now only being counted as type == "es".
schools_attempt <- schools %>%
# add row based on condition and change type
# not generalized
rbind(schools %>% filter(gslo == "01", gshi == "12") %>% mutate(type = "hs"))
> schools_attempt
# A tibble: 5 x 4
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
5 school a hs 01 12
This works but is not general. Is it possible to avoid a huge case_when? Note the changed school type classification (es -> hs)
schools_want <- tibble(name = c("school a", "school b", "school c", "school z", "school a"),
type = c("es", "NA", "hs", "es", "hs"),
gslo = c("01", "08", "09", "KG", "01"),
gshi = c("12", "12", "12", "05", "12"))
> schools_want
# A tibble: 5 x 4
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
5 school a hs 01 12
Thanks!
This might suffice as a general approach. If it starts in grade 9+, it's a high school. If it ends before grade 9, it's an elementary. Otherwise, it's both and we can split into two rows.
library(dplyr)
schools %>%
mutate(across(gslo:gshi, ~if_else(.x == "KG", 0, as.numeric(.x))),
type2 = case_when(
gslo >= 9 ~ "hs",
gshi <= 8 ~ "es",
TRUE ~ "hs, es"
)) %>%
separate_rows(type2)
# A tibble: 6 x 5
name type gslo gshi type2
<chr> <chr> <dbl> <dbl> <chr>
1 school a es 1 12 hs
2 school a es 1 12 es
3 school b NA 8 12 hs
4 school b NA 8 12 es
5 school c hs 9 12 hs
6 school z es 0 5 es
Edit: if you want to preserve the gslo/gshi columns as-is, add .names = "{.col}_num"), to the across() call and use gslo_num and gshi_num in the case_when.
In an excel file, I have the following table with headers as such:
**Date** **Session** **Player** **Pre** **Post** **Distance(m)**
Jan 1 1 Player 1 3 6 1000
Jan 1 1 Player 2 3 7 1500
Jan 1 1 Player 3 4 10 4000
Jan 1 1 Player 4 1 3 600
Jan 2 2 Player 1 2 5 1000
Jan 2 2 Player 2 - - 1750
Jan 2 2 Player 3 5 5 3000
Jan 2 2 Player 4 3 6 1000
Jan 3 3 Player 1 3 5 2500
Jan 3 3 Player 2 3 8 1500
Jan 3 3 Player 3 7 7 2500
Jan 3 3 Player 4 - - -
What am I trying to accomplish is to look at the distance numbers and compare them with the Pre numbers for the following session. So on Session 1 for Player 1, their distance (1000) and their Pre # from Jan 2 (2) should be in the same row.
To do this, after sorting the players by session number, I am trying to find a way to insert an empty cell - in the distance column for each player which acts as a placeholder for what would be session 0. This essentially bumps down the distances to match up with the next day's Pre #.
So after performing that on this data set, the result would look like this:
**Player** **Pre for the following Day** **Distance**
Player 1 3 (S1) - (Session 0 - Does Not Exist) (This value is inserted)
Player 1 2 (S2) 1000(Session 1)
Player 1 3 (S3) 1000(Session 2)
Player 1 - (S4 - Not included in this example) 2500(Session 3)
Player 2 3 (S1) - (S0)
Player 2 - (S2) 1500(S1)
Player 2 3 (S3) 1750(S2)
Player 2 - (S4) 1500(S3)
Player 3 4 (S1) - (S0)
Player 3 5 (S2) 4000(S1)
Player 3 7 (S3) 3000(S2)
Player 3 - (S4) 2500(S3)
Player 4 left out for time/redundancy sake
In this example, session 3 is the last session so the Pre for S4 for all players would just be inserted also as - by default.
So a - needs to be inserted every 4 rows to match each distance and the correct player, and after the last session, create a new row for each player giving - for Pre and Post, and the correct distance.
In my attempt to do this, I have the following code and dataset:
From dput()
structure(list(Date = structure(c(1577836800, 1577836800, 1577836800,
1577836800, 1577923200, 1577923200, 1577923200, 1577923200, 1578009600,
1578009600, 1578009600, 1578009600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Session = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3), Player = c("Player 1", "Player 2", "Player 3", "Player 4",
"Player 1", "Player 2", "Player 3", "Player 4", "Player 1", "Player 2",
"Player 3", "Player 4"), Pre = c("3", "3", "4", "1", "2", "-",
"5", "3", "3", "3", "7", "-"), Post = c("6", "7", "10", "3",
"5", "-", "5", "6", "5", "8", "7", "-"), Distance = c("1000",
"1500", "4000", "600", "1000", "1750", "3000", "1000", "-", "1500",
"2500", "-")), row.names = c(NA, 12L), class = "data.frame")
and my code:
test1 <- data.frame("2020-01-01",1,"Player 1",3,6, "-")
test2 <- data.frame("2020-01-01",4,"Player 1","-","-","2500")
names(test1) <- c("Date", "Session", "Player", "Pre", "Post", "Distance")
names(test2) <- c("Date", "Session", "Player", "Pre", "Post", "Distance")
new <- rbind(test1, stackEX) #This puts the new row at the top where I want it
#Not sure why this removes dates for other rows though
new <- rbind(new, test2)#This is for Session 4 which does not exist in this example
But using this way does not insert a - cell in the distance column to bump the values down, and instead I am only aware of how to add an entire new row rather than one cell.
This can be solved by joining with a complete set of Player / Session combinations and by shifting Distance:
library(data.table)
setDT(DF)[CJ(Player, Session = 1:4, unique = TRUE), on = .(Player, Session)][
, Distance := shift(Distance)][]
Date Session Player Pre Post Distance
1: 2020-01-01 1 Player 1 3 6 <NA>
2: 2020-01-02 2 Player 1 2 5 1000
3: 2020-01-03 3 Player 1 3 5 1000
4: <NA> 4 Player 1 <NA> <NA> 2500
5: 2020-01-01 1 Player 2 3 7 <NA>
6: 2020-01-02 2 Player 2 - - 1500
7: 2020-01-03 3 Player 2 3 8 1750
8: <NA> 4 Player 2 <NA> <NA> 1500
9: 2020-01-01 1 Player 3 4 10 <NA>
10: 2020-01-02 2 Player 3 5 5 4000
11: 2020-01-03 3 Player 3 7 7 3000
12: <NA> 4 Player 3 <NA> <NA> 2500
13: 2020-01-01 1 Player 4 1 3 <NA>
14: 2020-01-02 2 Player 4 3 6 600
15: 2020-01-03 3 Player 4 - - 1000
16: <NA> 4 Player 4 <NA> <NA> -
The cross join expression
CJ(Player, Session = 1:4, unique = TRUE)
returns all Player / Session combos:
Player Session
1: Player 1 1
2: Player 1 2
3: Player 1 3
4: Player 1 4
5: Player 2 1
6: Player 2 2
7: Player 2 3
8: Player 2 4
9: Player 3 1
10: Player 3 2
11: Player 3 3
12: Player 3 4
13: Player 4 1
14: Player 4 2
15: Player 4 3
16: Player 4 4
The default arguments of shift() are sufficient here: shift(Distance) lags Distance by one and NA is used for filling, i.e., the values in the Distance column are moved down to the next row. So row 4 (Session 4) for Player 1 gets the Distance value of the previous row (Session 3) as requested. The empty row at the top becomes NA. See also help("shift", "data.table").
Note that we do not need to group here because the whole column is lagged.
Data
DF <- structure(list(Date = structure(c(1577836800, 1577836800, 1577836800,
1577836800, 1577923200, 1577923200, 1577923200, 1577923200, 1578009600,
1578009600, 1578009600, 1578009600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Session = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3), Player = c("Player 1", "Player 2", "Player 3", "Player 4",
"Player 1", "Player 2", "Player 3", "Player 4", "Player 1", "Player 2",
"Player 3", "Player 4"), Pre = c("3", "3", "4", "1", "2", "-",
"5", "3", "3", "3", "7", "-"), Post = c("6", "7", "10", "3",
"5", "-", "5", "6", "5", "8", "7", "-"), Distance = c("1000",
"1500", "4000", "600", "1000", "1750", "3000", "1000", "2500",
"1500", "2500", "-")), row.names = c(NA, 12L), class = "data.frame")
This question already has answers here:
Transposing a dataframe maintaining the first column as heading
(5 answers)
Closed 1 year ago.
Happy Weekends.
I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t, preferably using tidyr or reshape. In example below, metadata is obtained by transposing data.
metadata <- data.frame(colnames(data), t(data[1:4, ]) )
colnames(metadata) <- t(metadata[1,])
metadata <- metadata[-1,]
metadata$Multiplier <- as.numeric(metadata$Multiplier)
Though it achieves what I want, I find it little unskillful. Is there any efficient workflow to transpose the data frame?
dput of data
data <- structure(list(Series.Description = c("Unit:", "Multiplier:",
"Currency:", "Unique Identifier: "), Nominal.Broad.Dollar.Index. = c("Index:_1997_Jan_100",
"1", NA, "H10/H10/JRXWTFB_N.M"), Nominal.Major.Currencies.Dollar.Index. = c("Index:_1973_Mar_100",
"1", NA, "H10/H10/JRXWTFN_N.M"), Nominal.Other.Important.Trading.Partners.Dollar.Index. = c("Index:_1997_Jan_100",
"1", NA, "H10/H10/JRXWTFO_N.M"), AUSTRALIA....SPOT.EXCHANGE.RATE..US..AUSTRALIAN...RECIPROCAL.OF.RXI_N.M.AL. = c("Currency:_Per_AUD",
"1", "USD", "H10/H10/RXI$US_N.M.AL"), SPOT.EXCHANGE.RATE...EURO.AREA. = c("Currency:_Per_EUR",
"1", "USD", "H10/H10/RXI$US_N.M.EU"), NEW.ZEALAND....SPOT.EXCHANGE.RATE..US..NZ...RECIPROCAL.OF.RXI_N.M.NZ.. = c("Currency:_Per_NZD",
"1", "USD", "H10/H10/RXI$US_N.M.NZ"), United.Kingdom....Spot.Exchange.Rate..US..Pound.Sterling.Reciprocal.of.rxi_n.m.uk = c("Currency:_Per_GBP",
"0.01", "USD", "H10/H10/RXI$US_N.M.UK"), BRAZIL....SPOT.EXCHANGE.RATE..REAIS.US.. = c("Currency:_Per_USD",
"1", "BRL", "H10/H10/RXI_N.M.BZ"), CANADA....SPOT.EXCHANGE.RATE..CANADIAN...US.. = c("Currency:_Per_USD",
"1", "CAD", "H10/H10/RXI_N.M.CA"), CHINA....SPOT.EXCHANGE.RATE..YUAN.US.. = c("Currency:_Per_USD",
"1", "CNY", "H10/H10/RXI_N.M.CH"), DENMARK....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD",
"1", "DKK", "H10/H10/RXI_N.M.DN"), HONG.KONG....SPOT.EXCHANGE.RATE..HK..US.. = c("Currency:_Per_USD",
"1", "HKD", "H10/H10/RXI_N.M.HK"), INDIA....SPOT.EXCHANGE.RATE..RUPEES.US. = c("Currency:_Per_USD",
"1", "INR", "H10/H10/RXI_N.M.IN"), JAPAN....SPOT.EXCHANGE.RATE..YEA.US.. = c("Currency:_Per_USD",
"1", "JPY", "H10/H10/RXI_N.M.JA"), KOREA....SPOT.EXCHANGE.RATE..WON.US.. = c("Currency:_Per_USD",
"1", "KRW", "H10/H10/RXI_N.M.KO"), Malaysia...Spot.Exchange.Rate..Ringgit.US.. = c("Currency:_Per_USD",
"1", "MYR", "H10/H10/RXI_N.M.MA"), MEXICO....SPOT.EXCHANGE.RATE..PESOS.US.. = c("Currency:_Per_USD",
"1", "MXN", "H10/H10/RXI_N.M.MX"), NORWAY....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD",
"1", "NOK", "H10/H10/RXI_N.M.NO"), SWEDEN....SPOT.EXCHANGE.RATE..KRONOR.US.. = c("Currency:_Per_USD",
"1", "SEK", "H10/H10/RXI_N.M.SD"), SOUTH.AFRICA....SPOT.EXCHANGE.RATE..RAND.US.. = c("Currency:_Per_USD",
"1", "ZAR", "H10/H10/RXI_N.M.SF"), Singapore...SPOT.EXCHANGE.RATE..SINGAPORE...US.. = c("Currency:_Per_USD",
"1", "SGD", "H10/H10/RXI_N.M.SI"), SRI.LANKA....SPOT.EXCHANGE.RATE..RUPEES.US.. = c("Currency:_Per_USD",
"1", "LKR", "H10/H10/RXI_N.M.SL"), SWITZERLAND....SPOT.EXCHANGE.RATE..FRANCS.US.. = c("Currency:_Per_USD",
"1", "CHF", "H10/H10/RXI_N.M.SZ"), TAIWAN....SPOT.EXCHANGE.RATE..NT..US.. = c("Currency:_Per_USD",
"1", "TWD", "H10/H10/RXI_N.M.TA"), THAILAND....SPOT.EXCHANGE.RATE....THAILAND. = c("Currency:_Per_USD",
"1", "THB", "H10/H10/RXI_N.M.TH"), VENEZUELA....SPOT.EXCHANGE.RATE..BOLIVARES.US.. = c("Currency:_Per_USD",
"1", "VEB", "H10/H10/RXI_N.M.VE")), .Names = c("Series.Description",
"Nominal.Broad.Dollar.Index.", "Nominal.Major.Currencies.Dollar.Index.",
"Nominal.Other.Important.Trading.Partners.Dollar.Index.", "AUSTRALIA....SPOT.EXCHANGE.RATE..US..AUSTRALIAN...RECIPROCAL.OF.RXI_N.M.AL.",
"SPOT.EXCHANGE.RATE...EURO.AREA.", "NEW.ZEALAND....SPOT.EXCHANGE.RATE..US..NZ...RECIPROCAL.OF.RXI_N.M.NZ..",
"United.Kingdom....Spot.Exchange.Rate..US..Pound.Sterling.Reciprocal.of.rxi_n.m.uk",
"BRAZIL....SPOT.EXCHANGE.RATE..REAIS.US..", "CANADA....SPOT.EXCHANGE.RATE..CANADIAN...US..",
"CHINA....SPOT.EXCHANGE.RATE..YUAN.US..", "DENMARK....SPOT.EXCHANGE.RATE..KRONER.US..",
"HONG.KONG....SPOT.EXCHANGE.RATE..HK..US..", "INDIA....SPOT.EXCHANGE.RATE..RUPEES.US.",
"JAPAN....SPOT.EXCHANGE.RATE..YEA.US..", "KOREA....SPOT.EXCHANGE.RATE..WON.US..",
"Malaysia...Spot.Exchange.Rate..Ringgit.US..", "MEXICO....SPOT.EXCHANGE.RATE..PESOS.US..",
"NORWAY....SPOT.EXCHANGE.RATE..KRONER.US..", "SWEDEN....SPOT.EXCHANGE.RATE..KRONOR.US..",
"SOUTH.AFRICA....SPOT.EXCHANGE.RATE..RAND.US..", "Singapore...SPOT.EXCHANGE.RATE..SINGAPORE...US..",
"SRI.LANKA....SPOT.EXCHANGE.RATE..RUPEES.US..", "SWITZERLAND....SPOT.EXCHANGE.RATE..FRANCS.US..",
"TAIWAN....SPOT.EXCHANGE.RATE..NT..US..", "THAILAND....SPOT.EXCHANGE.RATE....THAILAND.",
"VENEZUELA....SPOT.EXCHANGE.RATE..BOLIVARES.US.."), row.names = c(NA,
4L), class = "data.frame")
Using tidyr, you gather all the columns except the first, and then you spread the gathered columns.
Try:
library(dplyr)
library(tidyr)
data %>%
gather(var, val, 2:ncol(data)) %>%
spread(Series.Description, val)
library(dplyr)
# Omitted data <- structure part ...
Here is something that replicates what's in the main answer, but more generically (e.g., works where Series.Description is not the first column of the result) and using the newer pivot_wider/pivot_longer verbs.
df_transpose <- function(df) {
df %>%
tidyr::pivot_longer(-1) %>%
tidyr::pivot_wider(names_from = 1, values_from = value)
}
df_transpose(data)
#> # A tibble: 26 x 5
#> name `Unit:` `Multiplier:` `Currency:` `Unique Identifi…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Nominal.Broad.Dollar.… Index:_19… 1 <NA> H10/H10/JRXWTFB_…
#> 2 Nominal.Major.Currenc… Index:_19… 1 <NA> H10/H10/JRXWTFN_…
#> 3 Nominal.Other.Importa… Index:_19… 1 <NA> H10/H10/JRXWTFO_…
#> 4 AUSTRALIA....SPOT.EXC… Currency:… 1 USD H10/H10/RXI$US_N…
#> 5 SPOT.EXCHANGE.RATE...… Currency:… 1 USD H10/H10/RXI$US_N…
#> 6 NEW.ZEALAND....SPOT.E… Currency:… 1 USD H10/H10/RXI$US_N…
#> 7 United.Kingdom....Spo… Currency:… 0.01 USD H10/H10/RXI$US_N…
#> 8 BRAZIL....SPOT.EXCHAN… Currency:… 1 BRL H10/H10/RXI_N.M.…
#> 9 CANADA....SPOT.EXCHAN… Currency:… 1 CAD H10/H10/RXI_N.M.…
#> 10 CHINA....SPOT.EXCHANG… Currency:… 1 CNY H10/H10/RXI_N.M.…
#> # … with 16 more rows
But note that (like the answer above) the name of the first column is lost. The following retains this (as, I guess does the spread_(names(data)[1], "val") approach proposed by #jbkunst above).
df_transpose <- function(df) {
first_name <- colnames(df)[1]
temp <-
df %>%
tidyr::pivot_longer(-1) %>%
tidyr::pivot_wider(names_from = 1, values_from = value)
colnames(temp)[1] <- first_name
temp
}
df_transpose(data)
#> # A tibble: 26 x 5
#> Series.Description `Unit:` `Multiplier:` `Currency:` `Unique Identif…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Nominal.Broad.Dollar.In… Index:_1… 1 <NA> H10/H10/JRXWTFB…
#> 2 Nominal.Major.Currencie… Index:_1… 1 <NA> H10/H10/JRXWTFN…
#> 3 Nominal.Other.Important… Index:_1… 1 <NA> H10/H10/JRXWTFO…
#> 4 AUSTRALIA....SPOT.EXCHA… Currency… 1 USD H10/H10/RXI$US_…
#> 5 SPOT.EXCHANGE.RATE...EU… Currency… 1 USD H10/H10/RXI$US_…
#> 6 NEW.ZEALAND....SPOT.EXC… Currency… 1 USD H10/H10/RXI$US_…
#> 7 United.Kingdom....Spot.… Currency… 0.01 USD H10/H10/RXI$US_…
#> 8 BRAZIL....SPOT.EXCHANGE… Currency… 1 BRL H10/H10/RXI_N.M…
#> 9 CANADA....SPOT.EXCHANGE… Currency… 1 CAD H10/H10/RXI_N.M…
#> 10 CHINA....SPOT.EXCHANGE.… Currency… 1 CNY H10/H10/RXI_N.M…
#> # … with 16 more rows
Created on 2021-05-30 by the reprex package (v2.0.0)