I imported data from a CSV file and wanted to create a "Comparison table" between two prices of a company's stock in 2018:
Table2018<-data.frame("Comparison"= c("Opening Price bigger than Adjusted Closing Price",
"Opening Pricee smaller than Adjusted Closing Price","Total trading days"),
"January18","February18",
"March18","April18","May18","June18","July18"
,"August18","September18","October18",
"November18","December18",stringsAsFactors = FALSE)
I have this set of code (all comparisons):
Table2018[1,2]<-sum(January18$Opening.Price > January18$Adjusted.Closing.Price)
Table2018[1,3]<-sum(February18$Opening.Price > February18$Adjusted.Closing.Price)
Table2018[1,4]<-sum(March18$Opening.Price > March18$Adjusted.Closing.Price)
Table2018[1,5]<-sum(April18$Opening.Price > April18$Adjusted.Closing.Price)
Table2018[1,6]<-sum(May18$Opening.Price > May18$Adjusted.Closing.Price)
Table2018[1,7]<-sum(June18$Opening.Price > June18$Adjusted.Closing.Price)
Table2018[1,8]<-sum(July18$Opening.Price > July18$Adjusted.Closing.Price)
Table2018[1,9]<-sum(August18$Opening.Price > August18$Adjusted.Closing.Price)
Table2018[1,10]<-sum(September18$Opening.Price > September18$Adjusted.Closing.Price)
Table2018[1,11]<-sum(October18$Opening.Price > October18$Adjusted.Closing.Price)
Table2018[1,12]<-sum(November18$Opening.Price > November18$Adjusted.Closing.Price)
Table2018[1,13]<-sum(December18$Opening.Price > December18$Adjusted.Closing.Price)
For those who asked, this is the final code part and my poor looking table:
Total.trading.days <- c(length(January18$ן..Date),length(February18$ן..Date),length(March18$ן..Date),length(April18$ן..Date),length(May18$ן..Date),length(June18$ן..Date),length(July18$ן..Date),length(August18$ן..Date),length(September18$ן..Date),length(October18$ן..Date),length(November18$ן..Date),length(December18$ן..Date))
#Displaying finished table
for (i in 1:12) {
Table2018[3,i+1]<-Total.trading.days[i]
Table2018[2,i+1]<-Total.trading.days[i]-as.numeric(Table2018[1,i+1])
}
Table2018
Comparison
1 Opening Price bigger than Adjusted Closing Price
2 Opening Pricee smaller than Adjusted Closing Price
3 Total trading days
X.January18. X.February18. X.March18. X.April18.
1 20 17 17 13
2 1 1 1 4
3 21 18 18 17
X.May18. X.June18. X.July18. X.August18.
1 19 18 14 18
2 1 0 2 2
3 20 18 16 20
X.September18. X.October18. X.November18.
1 8 17 19
2 3 2 0
3 11 19 19
X.December18.
1 16
2 4
3 20
dput(head(Table2018))
structure(list(Comparison = c("Opening Price bigger than Adjusted Closing Price",
"Opening Pricee smaller than Adjusted Closing Price", "Total trading days"
), X.January18. = c("20", "1", "21"), X.February18. = c("17",
"1", "18"), X.March18. = c("17", "1", "18"), X.April18. = c("13",
"4", "17"), X.May18. = c("19", "1", "20"), X.June18. = c("18",
"0", "18"), X.July18. = c("14", "2", "16"), X.August18. = c("18",
"2", "20"), X.September18. = c("8", "3", "11"), X.October18. = c("17",
"2", "19"), X.November18. = c("19", "0", "19"), X.December18. = c("16",
"4", "20")), row.names = c(NA, 3L), class = "data.frame")
The problem main is that this is too much code. In the second part of the code, how can i make a nice loop? do i need one?
Why do i get in the table's headline this format: X.month. ?
I would love to have tips on how to present my table more beautifully as well
Related
I have two matrices provided below:
cf = structure(c("7", "7", "7", "7", "7", "7", "7", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3",
"3", "3", "3", "3", "3", "3", "3", "3", "3", "17", "18", "19",
"20", "21", "22", "23", "0", "1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20", "21", "22", "23", "0", "1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17",
"18", "19", "20", "21", "22", "23", "0", "1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"), .Dim = c(71L,
2L), .Dimnames = list(NULL, c("d", "h")))
hour_df<-data.frame(
day = as.character(rep(c(1,2,3,4,5,6,7), each = 24)),
hours = as.character(rep(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23), times = 7)),
period = rep(c(rep("night",times = 8),rep("day",times = 12),rep("night",times = 4)), times = 7),
tariff_label = rep(c(rep("special feed", times = 8),rep("normal feed", times = 12),rep("special feed", times = 4)), times = 7),
week_period = c(rep("weekend",times = 32),rep("weekday",times = 108),rep("weekend",times = 28))
)
hour_df$tariff_label[hour_df$day %in% c("7","1")]<-"special feed"
hour_df<-as.matrix(hour_df)
I want to merge these matrices on two common columns in each matrix. e.g by.x = c("d","h"), by.y = c("day","hours")
If I use the base function merge()I get my desired output that looks like this
merge(cf,hour_df, by.x = c("d","h"), by.y = c("day","hours"))
d h period tariff_label week_period
1 1 0 night special feed weekend
2 1 1 night special feed weekend
3 1 10 day special feed weekend
4 1 11 day special feed weekend
5 1 12 day special feed weekend
6 1 13 day special feed weekend
7 1 14 day special feed weekend
8 1 15 day special feed weekend
9 1 16 day special feed weekend
10 1 17 day special feed weekend
11 1 18 day special feed weekend
12 1 19 day special feed weekend
13 1 2 night special feed weekend
14 1 20 night special feed weekend
15 1 21 night special feed weekend
16 1 22 night special feed weekend
17 1 23 night special feed weekend
18 1 3 night special feed weekend
19 1 4 night special feed weekend
20 1 5 night special feed weekend
21 1 6 night special feed weekend
22 1 7 night special feed weekend
23 1 8 day special feed weekend
24 1 9 day special feed weekend
25 2 0 night special feed weekend
26 2 1 night special feed weekend
27 2 10 day normal feed weekday
28 2 11 day normal feed weekday
29 2 12 day normal feed weekday
30 2 13 day normal feed weekday
31 2 14 day normal feed weekday
32 2 15 day normal feed weekday
33 2 16 day normal feed weekday
34 2 17 day normal feed weekday
35 2 18 day normal feed weekday
36 2 19 day normal feed weekday
37 2 2 night special feed weekend
38 2 20 night special feed weekday
39 2 21 night special feed weekday
40 2 22 night special feed weekday
41 2 23 night special feed weekday
42 2 3 night special feed weekend
43 2 4 night special feed weekend
44 2 5 night special feed weekend
45 2 6 night special feed weekend
46 2 7 night special feed weekend
47 2 8 day normal feed weekday
48 2 9 day normal feed weekday
49 3 0 night special feed weekday
50 3 1 night special feed weekday
51 3 10 day normal feed weekday
52 3 11 day normal feed weekday
53 3 12 day normal feed weekday
54 3 13 day normal feed weekday
55 3 14 day normal feed weekday
56 3 15 day normal feed weekday
57 3 2 night special feed weekday
58 3 3 night special feed weekday
59 3 4 night special feed weekday
60 3 5 night special feed weekday
61 3 6 night special feed weekday
62 3 7 night special feed weekday
63 3 8 day normal feed weekday
64 3 9 day normal feed weekday
65 7 17 day special feed weekend
66 7 18 day special feed weekend
67 7 19 day special feed weekend
68 7 20 night special feed weekend
69 7 21 night special feed weekend
70 7 22 night special feed weekend
71 7 23 night special feed weekend
As you see above I have 71 rows. I wanted to see if there is a faster function for merging matrices. I saw online that there is a function called merge.Matrix() and it should be faster than base merge. However, when I tried to implement it, I got a completely different result.
library(Matrix.utils)
merge.Matrix(cf,hour_df, by.x = c("d","h"), by.y = c("day","hours"))
d h day hours period tariff_label week_period
"7" "17" "1" "2" "night" "special feed" "weekend"
"7" "19" "1" "0" "night" "special feed" "weekend"
"7" "18" "1" "2" "night" "special feed" "weekend"
"7" "19" "1" "1" "night" "special feed" "weekend"
I tried to see online how it is used and more information about it but information on this function seems to be scarce. I also checked out the vignette. Can someone tell me what I am doing wrong or whether there is a better function than this?
Please Note
I am already aware of dplyr joins and data.table. It is important that both of the matrices stay matrices and that they are not changed into some other format. In reality, my code is performing a join from a list that contains thousands of matrices and therefore needs to be quick.
Is there a quick way to replace variable names with the content of the first row of a tibble?
So turning something like this:
Subject Q1 Q2 Q3
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
Into this:
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
My dataset has over 100 variables so I'm looking for a way that doesn't involve typing out each old and new variable name.
A possible solution:
df <- structure(list(Subject = c("Subject", "429753", "b952x8", "264062",
"53082m"), Q1 = c("age", "24", "23", "19", "35"), Q2 = c("gender",
"1", "2", "1", "1"), Q3 = c("cue", "man", "mushroom", "night",
"moon")), row.names = c(NA, -5L), class = "data.frame")
names(df) <- df[1,]
df <- df[-1,]
df
#> Subject age gender cue
#> 2 429753 24 1 man
#> 3 b952x8 23 2 mushroom
#> 4 264062 19 1 night
#> 5 53082m 35 1 moon
structure(tibble(c("top", "jng", "mid", "bot", "sup"), c("369", "Karsa", "knight", "JackeyLove", "yuyanjia"),
c("Malphite", "Rek'Sai", "Zoe", "Aphelios", "Braum"), c("1", "1", "1", "1", "1"), c("7", "5", "7", "5", "0"),
c("6079-7578", "6079-7578", "6079-7578", "6079-7578", "6079-7578")), .Names = c("position", "player", "champion", "result", "kills", "gameid"))
Output:
# A tibble: 5 x 6
position player champion result kills gameid
* <chr> <chr> <chr> <chr> <chr> <chr>
1 top 369 Malphite 1 7 6079-7578
2 jng Karsa Rek'Sai 1 5 6079-7578
3 mid knight Zoe 1 7 6079-7578
4 bot JackeyLove Aphelios 1 5 6079-7578
5 sup yuyanjia Braum 1 0 6079-7578
My desired output would be:
structure(list(gameid = "6079-7578", result = "1", player_top = "369",
player_jng = "Karsa", player_mid = "knight", player_bot = "JackeyLove",
player_sup = "yuyanjia", champion_top = "Malphite", champion_jng = "Rek'Sai",
champion_mid = "Zoe", champion_bot = "Aphelios", champion_sup = "Braum",
kills_top = "7", kills_jng = "5", kills_mid = "7", kills_bot = "5",
kills_sup = "0"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"))
which looks like this:
gameid result player_top player_jng player_mid player_bot player_sup champion_top champion_jng champion_mid champion_bot champion_sup
1 6079-7578 1 369 Karsa knight JackeyLove yuyanjia Malphite RekSai Zoe Aphelios Braum
kills_top kills_jng kills_mid kills_bot kills_sup
1 7 5 7 5 0
I know I should use pivot_wider() and something like drop_na, but I don't know how to do pivot_wider() with mutiple columns and collapse the rows at the same time. Any help would be appreciated.
You can use pivot_wider() for this, defining the "position" variable as the variable that the new column names come from in names_from and the three variables with values you want to use to fill those columns with as values_from.
By default the multiple values_from variables are pasted on to the front of new columns names. This can be changed, but in this case that matches the naming structure you want.
All other variables in the original dataset will be used as the id_cols in the order that they appear.
library(tidyr)
pivot_wider(dat,
names_from = "position",
values_from = c("player", "champion", "kills"))
#> result gameid player_top player_jng player_mid player_bot player_sup
#> 1 1 6079-7578 369 Karsa knight JackeyLove yuyanjia
#> champion_top champion_jng champion_mid champion_bot champion_sup kills_top
#> 1 Malphite Rek'Sai Zoe Aphelios Braum 7
#> kills_jng kills_mid kills_bot kills_sup
#> 1 5 7 5 0
You can control the order of your id columns in the output by explicitly writing them out via id_cols. Here's an example, matching your desired output.
pivot_wider(dat, id_cols = c("gameid", "result"),
names_from = "position",
values_from = c("player", "champion", "kills"))
#> gameid result player_top player_jng player_mid player_bot player_sup
#> 1 6079-7578 1 369 Karsa knight JackeyLove yuyanjia
#> champion_top champion_jng champion_mid champion_bot champion_sup kills_top
#> 1 Malphite Rek'Sai Zoe Aphelios Braum 7
#> kills_jng kills_mid kills_bot kills_sup
#> 1 5 7 5 0
Created on 2021-06-24 by the reprex package (v2.0.0)
Using data.table might help here. In dcast() each row will be identified by a unique combo of gameid and result, the columns will be spread by position, and filled with values from the variables listed in value.var.
library(data.table)
library(dplyr)
df <- structure(tibble(c("top", "jng", "mid", "bot", "sup"), c("369", "Karsa", "knight", "JackeyLove", "yuyanjia"),
c("Malphite", "Rek'Sai", "Zoe", "Aphelios", "Braum"), c("1", "1", "1", "1", "1"), c("7", "5", "7", "5", "0"),
c("6079-7578", "6079-7578", "6079-7578", "6079-7578", "6079-7578")), .Names = c("position", "player", "champion", "result", "kills", "gameid"))
df2 <- dcast(setDT(df), gameid + result~position, value.var = list('player','champion','kills'))
We have the following data frame a with something like this:
> a
google_prod Value
1 categoria ML
2 google 120
3 youtube 24
4 categoria AO
5 google 2
6 youtube 0
7 categoria ML
8 google 27
9 youtube 0
10 categoria AO
11 google 5
12 youtube 0
We would like to get this:
categoria google_prod Value
1 ML google 120
2 ML youtube 24
3 AO google 2
4 AO youtube 0
5 ML google 27
6 ML youtube 0
7 AO google 5
8 AO youtube 0
In other words, perform a type of application of the Spread or similar function, where only one value is taken from the google_prod column to apply it, in this case it would be the 'categoria' value.
library(tidyverse)
# getting the data
category <- rep(c("categoria", "google", "youtube"), 4)
value <- c("ML", "120", "24", "AO", "2", "0", "ML", "27", "0", "AO", "5", "0")
df <- tibble(category, value)
df %>%
mutate(helper = rep(1:(nrow(df)/3), each = 3)) %>%
pivot_wider(names_from = category, values_from = value) %>%
select(-helper) %>%
pivot_longer(names_to = "google_prod", values_to = "values", -1)
# # A tibble: 8 x 3
# categoria google_prod values
# <chr> <chr> <chr>
# 1 ML google 120
# 2 ML youtube 24
# 3 AO google 2
# 4 AO youtube 0
# 5 ML google 27
# 6 ML youtube 0
# 7 AO google 5
# 8 AO youtube 0
Here is another idea with creating a group with cumsum and extracting the first element
library(dplyr)
mydf %>%
group_by(grp = cumsum(google_prod == 'categoria')) %>%
mutate(categoria = first(Value)) %>%
slice(-1) %>%
ungroup %>%
select(-grp) %>%
type.convert(as.is = TRUE)
# A tibble: 8 x 3
# google_prod Value categoria
# <chr> <int> <chr>
#1 google 120 ML
#2 youtube 24 ML
#3 google 2 AO
#4 youtube 0 AO
#5 google 27 ML
#6 youtube 0 ML
#7 google 5 AO
#8 youtube 0 AO
data
mydf <- structure(list(google_prod = c("categoria", "google", "youtube",
"categoria", "google", "youtube", "categoria", "google", "youtube",
"categoria", "google", "youtube"), Value = c("ML", "120", "24",
"AO", "2", "0", "ML", "27", "0", "AO", "5", "0")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
One idea would be the following. As far as I see the pattern, you are targeting values that contain two capital letters in Value. I searched where they are using grep() and obtained indice. Using this information, I created a group variable using findIntervals(). For each group, I aggregated data; I extracted and put the capital-letter value in categoria. In similar ways, I created two more columns. They are lists. So I used unnest() in the end to get the output.
library(tidyverse)
ind <- grep(x = mydf$Value, pattern = "[A-Z]+")
group_by(mydf, group = findInterval(x = 1:n(), vec = ind)) %>%
summarize(categoria = Value[google_prod == "categoria"],
Google_prod = list(google_prod[google_prod != "categoria"]),
Value = list(Value[google_prod != "categoria"])) %>%
unnest(cols = Google_prod:Value)
group categoria Google_prod Value
<int> <chr> <chr> <chr>
1 1 ML google 120
2 1 ML youtube 24
3 2 AO google 2
4 2 AO youtube 0
5 3 ML google 27
6 3 ML youtube 0
7 4 AO google 5
8 4 AO youtube 0
DATA
mydf <- structure(list(google_prod = c("categoria", "google", "youtube",
"categoria", "google", "youtube", "categoria", "google", "youtube",
"categoria", "google", "youtube"), Value = c("ML", "120", "24",
"AO", "2", "0", "ML", "27", "0", "AO", "5", "0")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
I am trying to get the maximum value from the column Number struck in a data frame. As you can see some of the rows have a range. Thanks in advance.
Aircraft Number struck
B-757-200 2 to 10
B-737-300 1
B-737-300 1
B-727-200 1
UNKNOWN 1
C-550 1
B-727-200 1
CITATION II 1
DA-2000 1
B-737-500 1
B-737-300 2 to 10
UNKNOWN 2 to 10
HAWKER 800 1
MD-80 11 to 100
B-737-400 1
B-737 1
B-767-300 2 to 10
EMB-120 2 to 10
Data
df <- structure(list(Aircraft = c("B-757-200", "B-737-300", "B-737-300",
"B-727-200", "UNKNOWN", "C-550", "B-727-200", "CITATION II",
"DA-2000", "B-737-500", "B-737-300", "UNKNOWN", "HAWKER 800",
"MD-80", "B-737-400", "B-737", "B-767-300", "EMB-120"), Number.struck = c("2 to 10",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "2 to 10", "2 to 10",
"1", "11 to 100", "1", "1", "2 to 10", "2 to 10")), .Names = c("Aircraft",
"Number.struck"), row.names = c(NA, -18L), class = "data.frame")
Maybe this will work
res <- as.numeric(as.character(unlist(strsplit(gsub("[a-zA-Z]","",df$Number.struck),"\\s"))))
max(res,na.rm=T)