Extract maximum number from text string - r

I am trying to get the maximum value from the column Number struck in a data frame. As you can see some of the rows have a range. Thanks in advance.
Aircraft Number struck
B-757-200 2 to 10
B-737-300 1
B-737-300 1
B-727-200 1
UNKNOWN 1
C-550 1
B-727-200 1
CITATION II 1
DA-2000 1
B-737-500 1
B-737-300 2 to 10
UNKNOWN 2 to 10
HAWKER 800 1
MD-80 11 to 100
B-737-400 1
B-737 1
B-767-300 2 to 10
EMB-120 2 to 10
Data
df <- structure(list(Aircraft = c("B-757-200", "B-737-300", "B-737-300",
"B-727-200", "UNKNOWN", "C-550", "B-727-200", "CITATION II",
"DA-2000", "B-737-500", "B-737-300", "UNKNOWN", "HAWKER 800",
"MD-80", "B-737-400", "B-737", "B-767-300", "EMB-120"), Number.struck = c("2 to 10",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "2 to 10", "2 to 10",
"1", "11 to 100", "1", "1", "2 to 10", "2 to 10")), .Names = c("Aircraft",
"Number.struck"), row.names = c(NA, -18L), class = "data.frame")

Maybe this will work
res <- as.numeric(as.character(unlist(strsplit(gsub("[a-zA-Z]","",df$Number.struck),"\\s"))))
max(res,na.rm=T)

Related

Merging two matrices with merge.Matrices does not return the desired output

I have two matrices provided below:
cf = structure(c("7", "7", "7", "7", "7", "7", "7", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3",
"3", "3", "3", "3", "3", "3", "3", "3", "3", "17", "18", "19",
"20", "21", "22", "23", "0", "1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20", "21", "22", "23", "0", "1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17",
"18", "19", "20", "21", "22", "23", "0", "1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"), .Dim = c(71L,
2L), .Dimnames = list(NULL, c("d", "h")))
hour_df<-data.frame(
day = as.character(rep(c(1,2,3,4,5,6,7), each = 24)),
hours = as.character(rep(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23), times = 7)),
period = rep(c(rep("night",times = 8),rep("day",times = 12),rep("night",times = 4)), times = 7),
tariff_label = rep(c(rep("special feed", times = 8),rep("normal feed", times = 12),rep("special feed", times = 4)), times = 7),
week_period = c(rep("weekend",times = 32),rep("weekday",times = 108),rep("weekend",times = 28))
)
hour_df$tariff_label[hour_df$day %in% c("7","1")]<-"special feed"
hour_df<-as.matrix(hour_df)
I want to merge these matrices on two common columns in each matrix. e.g by.x = c("d","h"), by.y = c("day","hours")
If I use the base function merge()I get my desired output that looks like this
merge(cf,hour_df, by.x = c("d","h"), by.y = c("day","hours"))
d h period tariff_label week_period
1 1 0 night special feed weekend
2 1 1 night special feed weekend
3 1 10 day special feed weekend
4 1 11 day special feed weekend
5 1 12 day special feed weekend
6 1 13 day special feed weekend
7 1 14 day special feed weekend
8 1 15 day special feed weekend
9 1 16 day special feed weekend
10 1 17 day special feed weekend
11 1 18 day special feed weekend
12 1 19 day special feed weekend
13 1 2 night special feed weekend
14 1 20 night special feed weekend
15 1 21 night special feed weekend
16 1 22 night special feed weekend
17 1 23 night special feed weekend
18 1 3 night special feed weekend
19 1 4 night special feed weekend
20 1 5 night special feed weekend
21 1 6 night special feed weekend
22 1 7 night special feed weekend
23 1 8 day special feed weekend
24 1 9 day special feed weekend
25 2 0 night special feed weekend
26 2 1 night special feed weekend
27 2 10 day normal feed weekday
28 2 11 day normal feed weekday
29 2 12 day normal feed weekday
30 2 13 day normal feed weekday
31 2 14 day normal feed weekday
32 2 15 day normal feed weekday
33 2 16 day normal feed weekday
34 2 17 day normal feed weekday
35 2 18 day normal feed weekday
36 2 19 day normal feed weekday
37 2 2 night special feed weekend
38 2 20 night special feed weekday
39 2 21 night special feed weekday
40 2 22 night special feed weekday
41 2 23 night special feed weekday
42 2 3 night special feed weekend
43 2 4 night special feed weekend
44 2 5 night special feed weekend
45 2 6 night special feed weekend
46 2 7 night special feed weekend
47 2 8 day normal feed weekday
48 2 9 day normal feed weekday
49 3 0 night special feed weekday
50 3 1 night special feed weekday
51 3 10 day normal feed weekday
52 3 11 day normal feed weekday
53 3 12 day normal feed weekday
54 3 13 day normal feed weekday
55 3 14 day normal feed weekday
56 3 15 day normal feed weekday
57 3 2 night special feed weekday
58 3 3 night special feed weekday
59 3 4 night special feed weekday
60 3 5 night special feed weekday
61 3 6 night special feed weekday
62 3 7 night special feed weekday
63 3 8 day normal feed weekday
64 3 9 day normal feed weekday
65 7 17 day special feed weekend
66 7 18 day special feed weekend
67 7 19 day special feed weekend
68 7 20 night special feed weekend
69 7 21 night special feed weekend
70 7 22 night special feed weekend
71 7 23 night special feed weekend
As you see above I have 71 rows. I wanted to see if there is a faster function for merging matrices. I saw online that there is a function called merge.Matrix() and it should be faster than base merge. However, when I tried to implement it, I got a completely different result.
library(Matrix.utils)
merge.Matrix(cf,hour_df, by.x = c("d","h"), by.y = c("day","hours"))
d h day hours period tariff_label week_period
"7" "17" "1" "2" "night" "special feed" "weekend"
"7" "19" "1" "0" "night" "special feed" "weekend"
"7" "18" "1" "2" "night" "special feed" "weekend"
"7" "19" "1" "1" "night" "special feed" "weekend"
I tried to see online how it is used and more information about it but information on this function seems to be scarce. I also checked out the vignette. Can someone tell me what I am doing wrong or whether there is a better function than this?
Please Note
I am already aware of dplyr joins and data.table. It is important that both of the matrices stay matrices and that they are not changed into some other format. In reality, my code is performing a join from a list that contains thousands of matrices and therefore needs to be quick.

Insert a Blank Cell into Every "X"th Row in the Same Column

In an excel file, I have the following table with headers as such:
**Date** **Session** **Player** **Pre** **Post** **Distance(m)**
Jan 1 1 Player 1 3 6 1000
Jan 1 1 Player 2 3 7 1500
Jan 1 1 Player 3 4 10 4000
Jan 1 1 Player 4 1 3 600
Jan 2 2 Player 1 2 5 1000
Jan 2 2 Player 2 - - 1750
Jan 2 2 Player 3 5 5 3000
Jan 2 2 Player 4 3 6 1000
Jan 3 3 Player 1 3 5 2500
Jan 3 3 Player 2 3 8 1500
Jan 3 3 Player 3 7 7 2500
Jan 3 3 Player 4 - - -
What am I trying to accomplish is to look at the distance numbers and compare them with the Pre numbers for the following session. So on Session 1 for Player 1, their distance (1000) and their Pre # from Jan 2 (2) should be in the same row.
To do this, after sorting the players by session number, I am trying to find a way to insert an empty cell - in the distance column for each player which acts as a placeholder for what would be session 0. This essentially bumps down the distances to match up with the next day's Pre #.
So after performing that on this data set, the result would look like this:
**Player** **Pre for the following Day** **Distance**
Player 1 3 (S1) - (Session 0 - Does Not Exist) (This value is inserted)
Player 1 2 (S2) 1000(Session 1)
Player 1 3 (S3) 1000(Session 2)
Player 1 - (S4 - Not included in this example) 2500(Session 3)
Player 2 3 (S1) - (S0)
Player 2 - (S2) 1500(S1)
Player 2 3 (S3) 1750(S2)
Player 2 - (S4) 1500(S3)
Player 3 4 (S1) - (S0)
Player 3 5 (S2) 4000(S1)
Player 3 7 (S3) 3000(S2)
Player 3 - (S4) 2500(S3)
Player 4 left out for time/redundancy sake
In this example, session 3 is the last session so the Pre for S4 for all players would just be inserted also as - by default.
So a - needs to be inserted every 4 rows to match each distance and the correct player, and after the last session, create a new row for each player giving - for Pre and Post, and the correct distance.
In my attempt to do this, I have the following code and dataset:
From dput()
structure(list(Date = structure(c(1577836800, 1577836800, 1577836800,
1577836800, 1577923200, 1577923200, 1577923200, 1577923200, 1578009600,
1578009600, 1578009600, 1578009600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Session = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3), Player = c("Player 1", "Player 2", "Player 3", "Player 4",
"Player 1", "Player 2", "Player 3", "Player 4", "Player 1", "Player 2",
"Player 3", "Player 4"), Pre = c("3", "3", "4", "1", "2", "-",
"5", "3", "3", "3", "7", "-"), Post = c("6", "7", "10", "3",
"5", "-", "5", "6", "5", "8", "7", "-"), Distance = c("1000",
"1500", "4000", "600", "1000", "1750", "3000", "1000", "-", "1500",
"2500", "-")), row.names = c(NA, 12L), class = "data.frame")
and my code:
test1 <- data.frame("2020-01-01",1,"Player 1",3,6, "-")
test2 <- data.frame("2020-01-01",4,"Player 1","-","-","2500")
names(test1) <- c("Date", "Session", "Player", "Pre", "Post", "Distance")
names(test2) <- c("Date", "Session", "Player", "Pre", "Post", "Distance")
new <- rbind(test1, stackEX) #This puts the new row at the top where I want it
#Not sure why this removes dates for other rows though
new <- rbind(new, test2)#This is for Session 4 which does not exist in this example
But using this way does not insert a - cell in the distance column to bump the values down, and instead I am only aware of how to add an entire new row rather than one cell.
This can be solved by joining with a complete set of Player / Session combinations and by shifting Distance:
library(data.table)
setDT(DF)[CJ(Player, Session = 1:4, unique = TRUE), on = .(Player, Session)][
, Distance := shift(Distance)][]
Date Session Player Pre Post Distance
1: 2020-01-01 1 Player 1 3 6 <NA>
2: 2020-01-02 2 Player 1 2 5 1000
3: 2020-01-03 3 Player 1 3 5 1000
4: <NA> 4 Player 1 <NA> <NA> 2500
5: 2020-01-01 1 Player 2 3 7 <NA>
6: 2020-01-02 2 Player 2 - - 1500
7: 2020-01-03 3 Player 2 3 8 1750
8: <NA> 4 Player 2 <NA> <NA> 1500
9: 2020-01-01 1 Player 3 4 10 <NA>
10: 2020-01-02 2 Player 3 5 5 4000
11: 2020-01-03 3 Player 3 7 7 3000
12: <NA> 4 Player 3 <NA> <NA> 2500
13: 2020-01-01 1 Player 4 1 3 <NA>
14: 2020-01-02 2 Player 4 3 6 600
15: 2020-01-03 3 Player 4 - - 1000
16: <NA> 4 Player 4 <NA> <NA> -
The cross join expression
CJ(Player, Session = 1:4, unique = TRUE)
returns all Player / Session combos:
Player Session
1: Player 1 1
2: Player 1 2
3: Player 1 3
4: Player 1 4
5: Player 2 1
6: Player 2 2
7: Player 2 3
8: Player 2 4
9: Player 3 1
10: Player 3 2
11: Player 3 3
12: Player 3 4
13: Player 4 1
14: Player 4 2
15: Player 4 3
16: Player 4 4
The default arguments of shift() are sufficient here: shift(Distance) lags Distance by one and NA is used for filling, i.e., the values in the Distance column are moved down to the next row. So row 4 (Session 4) for Player 1 gets the Distance value of the previous row (Session 3) as requested. The empty row at the top becomes NA. See also help("shift", "data.table").
Note that we do not need to group here because the whole column is lagged.
Data
DF <- structure(list(Date = structure(c(1577836800, 1577836800, 1577836800,
1577836800, 1577923200, 1577923200, 1577923200, 1577923200, 1578009600,
1578009600, 1578009600, 1578009600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Session = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3), Player = c("Player 1", "Player 2", "Player 3", "Player 4",
"Player 1", "Player 2", "Player 3", "Player 4", "Player 1", "Player 2",
"Player 3", "Player 4"), Pre = c("3", "3", "4", "1", "2", "-",
"5", "3", "3", "3", "7", "-"), Post = c("6", "7", "10", "3",
"5", "-", "5", "6", "5", "8", "7", "-"), Distance = c("1000",
"1500", "4000", "600", "1000", "1750", "3000", "1000", "2500",
"1500", "2500", "-")), row.names = c(NA, 12L), class = "data.frame")

Looping Multiple variables using R

I imported data from a CSV file and wanted to create a "Comparison table" between two prices of a company's stock in 2018:
Table2018<-data.frame("Comparison"= c("Opening Price bigger than Adjusted Closing Price",
"Opening Pricee smaller than Adjusted Closing Price","Total trading days"),
"January18","February18",
"March18","April18","May18","June18","July18"
,"August18","September18","October18",
"November18","December18",stringsAsFactors = FALSE)
I have this set of code (all comparisons):
Table2018[1,2]<-sum(January18$Opening.Price > January18$Adjusted.Closing.Price)
Table2018[1,3]<-sum(February18$Opening.Price > February18$Adjusted.Closing.Price)
Table2018[1,4]<-sum(March18$Opening.Price > March18$Adjusted.Closing.Price)
Table2018[1,5]<-sum(April18$Opening.Price > April18$Adjusted.Closing.Price)
Table2018[1,6]<-sum(May18$Opening.Price > May18$Adjusted.Closing.Price)
Table2018[1,7]<-sum(June18$Opening.Price > June18$Adjusted.Closing.Price)
Table2018[1,8]<-sum(July18$Opening.Price > July18$Adjusted.Closing.Price)
Table2018[1,9]<-sum(August18$Opening.Price > August18$Adjusted.Closing.Price)
Table2018[1,10]<-sum(September18$Opening.Price > September18$Adjusted.Closing.Price)
Table2018[1,11]<-sum(October18$Opening.Price > October18$Adjusted.Closing.Price)
Table2018[1,12]<-sum(November18$Opening.Price > November18$Adjusted.Closing.Price)
Table2018[1,13]<-sum(December18$Opening.Price > December18$Adjusted.Closing.Price)
For those who asked, this is the final code part and my poor looking table:
Total.trading.days <- c(length(January18$ן..Date),length(February18$ן..Date),length(March18$ן..Date),length(April18$ן..Date),length(May18$ן..Date),length(June18$ן..Date),length(July18$ן..Date),length(August18$ן..Date),length(September18$ן..Date),length(October18$ן..Date),length(November18$ן..Date),length(December18$ן..Date))
#Displaying finished table
for (i in 1:12) {
Table2018[3,i+1]<-Total.trading.days[i]
Table2018[2,i+1]<-Total.trading.days[i]-as.numeric(Table2018[1,i+1])
}
Table2018
Comparison
1 Opening Price bigger than Adjusted Closing Price
2 Opening Pricee smaller than Adjusted Closing Price
3 Total trading days
X.January18. X.February18. X.March18. X.April18.
1 20 17 17 13
2 1 1 1 4
3 21 18 18 17
X.May18. X.June18. X.July18. X.August18.
1 19 18 14 18
2 1 0 2 2
3 20 18 16 20
X.September18. X.October18. X.November18.
1 8 17 19
2 3 2 0
3 11 19 19
X.December18.
1 16
2 4
3 20
dput(head(Table2018))
structure(list(Comparison = c("Opening Price bigger than Adjusted Closing Price",
"Opening Pricee smaller than Adjusted Closing Price", "Total trading days"
), X.January18. = c("20", "1", "21"), X.February18. = c("17",
"1", "18"), X.March18. = c("17", "1", "18"), X.April18. = c("13",
"4", "17"), X.May18. = c("19", "1", "20"), X.June18. = c("18",
"0", "18"), X.July18. = c("14", "2", "16"), X.August18. = c("18",
"2", "20"), X.September18. = c("8", "3", "11"), X.October18. = c("17",
"2", "19"), X.November18. = c("19", "0", "19"), X.December18. = c("16",
"4", "20")), row.names = c(NA, 3L), class = "data.frame")
The problem main is that this is too much code. In the second part of the code, how can i make a nice loop? do i need one?
Why do i get in the table's headline this format: X.month. ?
I would love to have tips on how to present my table more beautifully as well

Convert multiple header table to long format

I am reading in an Excel table with multiple rows of headers, which, through read.csv, creates an object like this in R.
R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))
How do I now 'flatten' this so that I end up with an R table with "Min age", "Max Age", "Min Duration", "Max Duration", "Area", "Level", "Price" columns. With the "Area" column showing either "EU" or "US", the "Level" column showing either 1, 2 or 3, and then the "Price" column showing the corresponding price found in the Excel table?
I would use the gather function from tidyr if there weren't multiple header rows, but can't seem to work it with this data, any ideas?
The output should have a total of 36 rows + headers
If you skip the first row, as suggested by akrun, you will presumably end up with data that looks something like this: (with "X"s and ".1"/".2" added automatically by R)
library(tidyverse)
df <- tribble(
~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration, ~X1.1, ~X2.1, ~X3.1, ~X1.2, ~X2.2, ~X3.2,
"18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01",
"22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05",
"26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21",
"18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40",
"22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50",
"26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99"
)
With this data, you can then use gather to collect all headers beginning with X into one column and price into another. You can separate the the headers into the "Level" and "Area". Finally, recode Area and remove "X" from the levels.
df %>%
gather(headers, Price, starts_with("X")) %>%
separate(headers, c("Level", "Area")) %>%
mutate(Area = if_else(Area == "1", "EU", "US"),
Level = parse_number(Level))
#> # A tibble: 36 x 7
#> Min.Age Max.Age Min.Duration Max.Duration Level Area Price
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 18 21 1 3 1 EU 0.12
#> 2 22 25 1 3 1 EU 0.20
#> 3 26 30 1 3 1 EU 0.25
#> 4 18 21 4 5 1 EU 0.32
#> 5 22 25 4 5 1 EU 0.40
#> 6 26 30 4 5 1 EU 0.55
#> 7 18 21 1 3 2 EU 0.32
#> 8 22 25 1 3 2 EU 0.40
#> 9 26 30 1 3 2 EU 0.50
#> 10 18 21 4 5 2 EU 0.60
#> # ... with 26 more rows
Created on 2018-10-12 by the reprex package (v0.2.1)
P.S. You can find lots of spreadsheet munging workflows here: https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html

Transposing data frames [duplicate]

This question already has answers here:
Transposing a dataframe maintaining the first column as heading
(5 answers)
Closed 1 year ago.
Happy Weekends.
I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t, preferably using tidyr or reshape. In example below, metadata is obtained by transposing data.
metadata <- data.frame(colnames(data), t(data[1:4, ]) )
colnames(metadata) <- t(metadata[1,])
metadata <- metadata[-1,]
metadata$Multiplier <- as.numeric(metadata$Multiplier)
Though it achieves what I want, I find it little unskillful. Is there any efficient workflow to transpose the data frame?
dput of data
data <- structure(list(Series.Description = c("Unit:", "Multiplier:",
"Currency:", "Unique Identifier: "), Nominal.Broad.Dollar.Index. = c("Index:_1997_Jan_100",
"1", NA, "H10/H10/JRXWTFB_N.M"), Nominal.Major.Currencies.Dollar.Index. = c("Index:_1973_Mar_100",
"1", NA, "H10/H10/JRXWTFN_N.M"), Nominal.Other.Important.Trading.Partners.Dollar.Index. = c("Index:_1997_Jan_100",
"1", NA, "H10/H10/JRXWTFO_N.M"), AUSTRALIA....SPOT.EXCHANGE.RATE..US..AUSTRALIAN...RECIPROCAL.OF.RXI_N.M.AL. = c("Currency:_Per_AUD",
"1", "USD", "H10/H10/RXI$US_N.M.AL"), SPOT.EXCHANGE.RATE...EURO.AREA. = c("Currency:_Per_EUR",
"1", "USD", "H10/H10/RXI$US_N.M.EU"), NEW.ZEALAND....SPOT.EXCHANGE.RATE..US..NZ...RECIPROCAL.OF.RXI_N.M.NZ.. = c("Currency:_Per_NZD",
"1", "USD", "H10/H10/RXI$US_N.M.NZ"), United.Kingdom....Spot.Exchange.Rate..US..Pound.Sterling.Reciprocal.of.rxi_n.m.uk = c("Currency:_Per_GBP",
"0.01", "USD", "H10/H10/RXI$US_N.M.UK"), BRAZIL....SPOT.EXCHANGE.RATE..REAIS.US.. = c("Currency:_Per_USD",
"1", "BRL", "H10/H10/RXI_N.M.BZ"), CANADA....SPOT.EXCHANGE.RATE..CANADIAN...US.. = c("Currency:_Per_USD",
"1", "CAD", "H10/H10/RXI_N.M.CA"), CHINA....SPOT.EXCHANGE.RATE..YUAN.US.. = c("Currency:_Per_USD",
"1", "CNY", "H10/H10/RXI_N.M.CH"), DENMARK....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD",
"1", "DKK", "H10/H10/RXI_N.M.DN"), HONG.KONG....SPOT.EXCHANGE.RATE..HK..US.. = c("Currency:_Per_USD",
"1", "HKD", "H10/H10/RXI_N.M.HK"), INDIA....SPOT.EXCHANGE.RATE..RUPEES.US. = c("Currency:_Per_USD",
"1", "INR", "H10/H10/RXI_N.M.IN"), JAPAN....SPOT.EXCHANGE.RATE..YEA.US.. = c("Currency:_Per_USD",
"1", "JPY", "H10/H10/RXI_N.M.JA"), KOREA....SPOT.EXCHANGE.RATE..WON.US.. = c("Currency:_Per_USD",
"1", "KRW", "H10/H10/RXI_N.M.KO"), Malaysia...Spot.Exchange.Rate..Ringgit.US.. = c("Currency:_Per_USD",
"1", "MYR", "H10/H10/RXI_N.M.MA"), MEXICO....SPOT.EXCHANGE.RATE..PESOS.US.. = c("Currency:_Per_USD",
"1", "MXN", "H10/H10/RXI_N.M.MX"), NORWAY....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD",
"1", "NOK", "H10/H10/RXI_N.M.NO"), SWEDEN....SPOT.EXCHANGE.RATE..KRONOR.US.. = c("Currency:_Per_USD",
"1", "SEK", "H10/H10/RXI_N.M.SD"), SOUTH.AFRICA....SPOT.EXCHANGE.RATE..RAND.US.. = c("Currency:_Per_USD",
"1", "ZAR", "H10/H10/RXI_N.M.SF"), Singapore...SPOT.EXCHANGE.RATE..SINGAPORE...US.. = c("Currency:_Per_USD",
"1", "SGD", "H10/H10/RXI_N.M.SI"), SRI.LANKA....SPOT.EXCHANGE.RATE..RUPEES.US.. = c("Currency:_Per_USD",
"1", "LKR", "H10/H10/RXI_N.M.SL"), SWITZERLAND....SPOT.EXCHANGE.RATE..FRANCS.US.. = c("Currency:_Per_USD",
"1", "CHF", "H10/H10/RXI_N.M.SZ"), TAIWAN....SPOT.EXCHANGE.RATE..NT..US.. = c("Currency:_Per_USD",
"1", "TWD", "H10/H10/RXI_N.M.TA"), THAILAND....SPOT.EXCHANGE.RATE....THAILAND. = c("Currency:_Per_USD",
"1", "THB", "H10/H10/RXI_N.M.TH"), VENEZUELA....SPOT.EXCHANGE.RATE..BOLIVARES.US.. = c("Currency:_Per_USD",
"1", "VEB", "H10/H10/RXI_N.M.VE")), .Names = c("Series.Description",
"Nominal.Broad.Dollar.Index.", "Nominal.Major.Currencies.Dollar.Index.",
"Nominal.Other.Important.Trading.Partners.Dollar.Index.", "AUSTRALIA....SPOT.EXCHANGE.RATE..US..AUSTRALIAN...RECIPROCAL.OF.RXI_N.M.AL.",
"SPOT.EXCHANGE.RATE...EURO.AREA.", "NEW.ZEALAND....SPOT.EXCHANGE.RATE..US..NZ...RECIPROCAL.OF.RXI_N.M.NZ..",
"United.Kingdom....Spot.Exchange.Rate..US..Pound.Sterling.Reciprocal.of.rxi_n.m.uk",
"BRAZIL....SPOT.EXCHANGE.RATE..REAIS.US..", "CANADA....SPOT.EXCHANGE.RATE..CANADIAN...US..",
"CHINA....SPOT.EXCHANGE.RATE..YUAN.US..", "DENMARK....SPOT.EXCHANGE.RATE..KRONER.US..",
"HONG.KONG....SPOT.EXCHANGE.RATE..HK..US..", "INDIA....SPOT.EXCHANGE.RATE..RUPEES.US.",
"JAPAN....SPOT.EXCHANGE.RATE..YEA.US..", "KOREA....SPOT.EXCHANGE.RATE..WON.US..",
"Malaysia...Spot.Exchange.Rate..Ringgit.US..", "MEXICO....SPOT.EXCHANGE.RATE..PESOS.US..",
"NORWAY....SPOT.EXCHANGE.RATE..KRONER.US..", "SWEDEN....SPOT.EXCHANGE.RATE..KRONOR.US..",
"SOUTH.AFRICA....SPOT.EXCHANGE.RATE..RAND.US..", "Singapore...SPOT.EXCHANGE.RATE..SINGAPORE...US..",
"SRI.LANKA....SPOT.EXCHANGE.RATE..RUPEES.US..", "SWITZERLAND....SPOT.EXCHANGE.RATE..FRANCS.US..",
"TAIWAN....SPOT.EXCHANGE.RATE..NT..US..", "THAILAND....SPOT.EXCHANGE.RATE....THAILAND.",
"VENEZUELA....SPOT.EXCHANGE.RATE..BOLIVARES.US.."), row.names = c(NA,
4L), class = "data.frame")
Using tidyr, you gather all the columns except the first, and then you spread the gathered columns.
Try:
library(dplyr)
library(tidyr)
data %>%
gather(var, val, 2:ncol(data)) %>%
spread(Series.Description, val)
library(dplyr)
# Omitted data <- structure part ...
Here is something that replicates what's in the main answer, but more generically (e.g., works where Series.Description is not the first column of the result) and using the newer pivot_wider/pivot_longer verbs.
df_transpose <- function(df) {
df %>%
tidyr::pivot_longer(-1) %>%
tidyr::pivot_wider(names_from = 1, values_from = value)
}
df_transpose(data)
#> # A tibble: 26 x 5
#> name `Unit:` `Multiplier:` `Currency:` `Unique Identifi…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Nominal.Broad.Dollar.… Index:_19… 1 <NA> H10/H10/JRXWTFB_…
#> 2 Nominal.Major.Currenc… Index:_19… 1 <NA> H10/H10/JRXWTFN_…
#> 3 Nominal.Other.Importa… Index:_19… 1 <NA> H10/H10/JRXWTFO_…
#> 4 AUSTRALIA....SPOT.EXC… Currency:… 1 USD H10/H10/RXI$US_N…
#> 5 SPOT.EXCHANGE.RATE...… Currency:… 1 USD H10/H10/RXI$US_N…
#> 6 NEW.ZEALAND....SPOT.E… Currency:… 1 USD H10/H10/RXI$US_N…
#> 7 United.Kingdom....Spo… Currency:… 0.01 USD H10/H10/RXI$US_N…
#> 8 BRAZIL....SPOT.EXCHAN… Currency:… 1 BRL H10/H10/RXI_N.M.…
#> 9 CANADA....SPOT.EXCHAN… Currency:… 1 CAD H10/H10/RXI_N.M.…
#> 10 CHINA....SPOT.EXCHANG… Currency:… 1 CNY H10/H10/RXI_N.M.…
#> # … with 16 more rows
But note that (like the answer above) the name of the first column is lost. The following retains this (as, I guess does the spread_(names(data)[1], "val") approach proposed by #jbkunst above).
df_transpose <- function(df) {
first_name <- colnames(df)[1]
temp <-
df %>%
tidyr::pivot_longer(-1) %>%
tidyr::pivot_wider(names_from = 1, values_from = value)
colnames(temp)[1] <- first_name
temp
}
df_transpose(data)
#> # A tibble: 26 x 5
#> Series.Description `Unit:` `Multiplier:` `Currency:` `Unique Identif…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Nominal.Broad.Dollar.In… Index:_1… 1 <NA> H10/H10/JRXWTFB…
#> 2 Nominal.Major.Currencie… Index:_1… 1 <NA> H10/H10/JRXWTFN…
#> 3 Nominal.Other.Important… Index:_1… 1 <NA> H10/H10/JRXWTFO…
#> 4 AUSTRALIA....SPOT.EXCHA… Currency… 1 USD H10/H10/RXI$US_…
#> 5 SPOT.EXCHANGE.RATE...EU… Currency… 1 USD H10/H10/RXI$US_…
#> 6 NEW.ZEALAND....SPOT.EXC… Currency… 1 USD H10/H10/RXI$US_…
#> 7 United.Kingdom....Spot.… Currency… 0.01 USD H10/H10/RXI$US_…
#> 8 BRAZIL....SPOT.EXCHANGE… Currency… 1 BRL H10/H10/RXI_N.M…
#> 9 CANADA....SPOT.EXCHANGE… Currency… 1 CAD H10/H10/RXI_N.M…
#> 10 CHINA....SPOT.EXCHANGE.… Currency… 1 CNY H10/H10/RXI_N.M…
#> # … with 16 more rows
Created on 2021-05-30 by the reprex package (v2.0.0)

Resources