Not getting the correct output with this R script - r

I have once again thrown myself into learning R. However, I'm not sure if my data is formatted wrong or if I'm missing a key point.
The vision is to compare all samples against each other over time. However, nailing the code has proved difficult. I can't seem to get time on the x-axis and the samples to match and overlap. I have looked at what feels like 100 videos and web pages. Still can't work this in.
Script:
Data2 <- Data3 %>%
gather( key = "test", value = "value", c(-Name))
Data2 %>%
ggplot() +
geom_point(aes(x=value, y=test)) +
ylab("Film type") +
theme(legend.position="none") +
xlab("Time")
Name = c("2% No wash No cure 20gm", "3 % no wash no cure 20 gm", "4 % no wash no cure 20 gm", "2 % no cure just wash 20 gm", "3 % no cure just wash 20gm", "4 % no cure just wash 20 gm", "3 % cure + wash 20 gm", "4%cure+wash 20gm")
Data:
structure(list(Name = c(0, 15, 30, 45, 60, 75, 90, 105, 120,
135, 150, 165, 180), `2% No wash No cure 20gm` = c(0.0499999999999998,
0.0800000000000001, 0.13, 0.23, 0.56, 0.61, 0.54, 0.54, NA, NA,
NA, NA, NA), `3 % no wash no cure 20 gm` = c(0.0200000000000005,
0.04, 0.0700000000000003, 0.350000000000001, 0.42, 0.36, 0.36,
0.350000000000001, NA, NA, NA, NA, NA), `4 % no wash no cure 20 gm` = c(0.0499999999999998,
0.0899999999999999, 0.12, 0.18, 0.655, 0.649999999999999, 0.62,
0.62, NA, NA, NA, NA, NA), `2 % no cure just wash 20 gm` = c(0.04,
0.0699999999999994, 0.0899999999999999, 0.13, 0.44, 0.64, 0.62,
0.739999999999999, NA, NA, NA, NA, NA), `3 % no cure just wash 20gm` = c(0.04,
0.0999999999999996, 0.0800000000000001, 0.0999999999999996, 0.23,
0.6, 0.919999999999999, 1.42, 1.51, 1.64, NA, NA, NA), `4 % no cure just wash 20 gm` = c(0.0499999999999998,
0.0899999999999999, 0.0999999999999996, 0.12, 0.13, 0.13, 0.2,
0.37, 0.62, 0.86, 1.05, 1.23, 0.899999999999999), `3 % cure + wash 20 gm` = c(0.11,
0.16, 0.17, 0.18, 0.19, 0.2, 0.37, 0.819999999999999, 1.34, 1.62,
1.62, 2.02, 1.53), `4%cure+wash 20gm` = c(0.0600000000000005,
0.11, 0.14, 0.16, 0.17, 0.19, 0.26, 0.680000000000001, 0.87,
1.02, 1.12, 1.29, 1.12)), row.names = c(NA, -13L), class = c("tbl_df",
"tbl", "data.frame"))

I'm not sure about the meaning of your features, but did you think about something like this?
Data2 %>%
ggplot(aes(x = Name, y = value)) +
geom_point(aes(col = test), alpha = 0.5, position = "jitter")

Related

How to change the a axis to a time series in ggplot2

I'm trying to replicate the graph provided at https://www.chicagofed.org/research/data/cfnai/current-data since I will be needing graphs for data sets soon that look like this. I'm almost there, I can't seem to figure out how to change the x axis to the dates when using ggplot2. Specifically, I would like to change it to the dates in the Date column. I tried about a dozen ways and nothing is working. The data for this graph is under indexes on the website. Here's my code and the graph where dataSet is the data from the website:
library(ggplot2)
library(reshape2)
library(tidyverse)
library(lubridate)
df = data.frame(time = index(dataSet), melt(as.data.frame(dataSet)))
df
str(df)
df$data1.Date = as.Date(as.character(df$data1.Date))
str(df)
replicaPlot1 = ggplot(df, aes(x = time, y = value)) +
geom_area(aes(colour = variable, fill = variable)) +
stat_summary(fun = sum, geom = "line", size = 0.4) +
labs(title = "Chicago Fed National Activity Index (CFNAI) Current Data")
replicaPlot1 + scale_x_continuous(name = "time", breaks = waiver(), labels = waiver(), limits =
df$data1.Date)
replicaPlot1
Any sort of help on this would be very much appreciated!
G:\BOS\Common\R-Projects\Graphs\Replica of Chicago Fed National Acitivty index (PCA)\dataSet
Not sure what's your intention with data.frame(time = index(dataSet), melt(as.data.frame(dataSet))). When I download the data and read via readxl::read_excel I got a nice tibble with a date(time) column which after reshaping via tidyr::pivot_longer could easily be plotted and by making use of scale_x_datetime has a nicely formatted date axis:
Using just the first 20 rows of data try this:
library(ggplot2)
library(readxl)
library(tidyr)
df <- pivot_longer(df, -Date, names_to = "variable")
ggplot(df, aes(x = Date, y = value)) +
geom_area(aes(colour = variable, fill = variable)) +
stat_summary(fun = sum, geom = "line", size = 0.4) +
labs(title = "Chicago Fed National Activity Index (CFNAI) Current Data") +
scale_x_datetime(name = "time")
#> Warning: Removed 4 rows containing non-finite values (stat_summary).
#> Warning: Removed 4 rows containing missing values (position_stack).
Created on 2021-01-28 by the reprex package (v1.0.0)
DATA
# Data downloaded from https://www.chicagofed.org/~/media/publications/cfnai/cfnai-data-series-xlsx.xlsx?la=en
# df <- readxl::read_excel("cfnai-data-series-xlsx.xlsx")
# dput(head(df, 20))
df <- structure(list(Date = structure(c(
-87004800, -84412800, -81734400,
-79142400, -76464000, -73785600, -71193600, -68515200, -65923200,
-63244800, -60566400, -58060800, -55382400, -52790400, -50112000,
-47520000, -44841600, -42163200, -39571200, -36892800
), tzone = "UTC", class = c(
"POSIXct",
"POSIXt"
)), P_I = c(
-0.26, 0.16, -0.43, -0.09, -0.19, 0.58, -0.05,
0.21, 0.51, 0.33, -0.1, 0.12, 0.07, 0.04, 0.35, 0.04, -0.1, 0.14,
0.05, 0.11
), EU_H = c(
-0.06, -0.09, 0.01, 0.04, 0.1, 0.22, -0.04,
0, 0.32, 0.16, -0.2, 0.34, 0.06, 0.17, 0.17, 0.07, 0.12, 0.12,
0.15, 0.18
), C_H = c(
-0.01, 0.01, -0.05, 0.08, -0.07, -0.01,
0.12, -0.11, 0.1, 0.15, -0.04, 0.04, 0.17, -0.03, 0.05, 0.08,
0.09, 0.05, -0.06, 0.09
), SO_I = c(
-0.01, -0.07, -0.08, 0.02,
-0.16, 0.22, -0.08, -0.07, 0.38, 0.34, -0.13, -0.1, 0.08, -0.07,
0.06, 0.07, 0.12, -0.3, 0.35, 0.14
), CFNAI = c(
-0.34, 0.02, -0.55,
0.04, -0.32, 1, -0.05, 0.03, 1.32, 0.97, -0.46, 0.39, 0.38, 0.11,
0.63, 0.25, 0.22, 0.01, 0.49, 0.52
), CFNAI_MA3 = c(
NA, NA, -0.29,
-0.17, -0.28, 0.24, 0.21, 0.33, 0.43, 0.77, 0.61, 0.3, 0.1, 0.29,
0.37, 0.33, 0.37, 0.16, 0.24, 0.34
), DIFFUSION = c(
NA, NA, -0.17,
-0.14, -0.21, 0.16, 0.11, 0.17, 0.2, 0.5, 0.41, 0.28, 0.2, 0.32,
0.36, 0.32, 0.33, 0.25, 0.31, 0.47
)), row.names = c(NA, -20L), class = c(
"tbl_df",
"tbl", "data.frame"
))

Removing NAs from ggplot x-axis in ggplot2

I would like to get rid off the whole NA block (highlighted here ).
I tried na.ommit and na.rm = TRUE unsuccesfully.
Here is the code I used :
library(readxl)
data <- read_excel("Documents/TFB/xlsx_geochimie/solfatara_maj.xlsx")
View(data)
data <- gather(data,FeO:`Fe2O3(T)`,key = "Element",value="Pourcentage")
library(ggplot2)
level_order <- factor(data$Element,levels = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O"))
ggplot(data=data,mapping=aes(x=level_order,y=data$Pourcentage,colour=data$Ech)+geom_point()+geom_line(group=data$Ech) +scale_y_log10()
And here is my original file
https://drive.google.com/file/d/1bZi7fPWebbpodD1LFScoEcWt5Bs-cqhb/view?usp=sharing
If I run your code and look at data that goes into ggplot:
table(data$Element)
Al2O3 CaO Fe2O3 Fe2O3(T) FeO K2O LOI LOI2 MgO MnO
12 12 12 12 12 12 12 12 12 12
Na2O P2O5 SiO2 SO4 TiO2 Total Total 2 Total N Total S
12 12 12 12 12 12 12 12 12
You have included Total into the melted data frame.. which is not intended I guess. Hence when you do factor on these, and these "Total.." are not included in the levels, they become NA.
So we can do it from scratch:
data <- read_excel("solfatara_maj.xlsx")
The data:
structure(list(Ech = c("AGN 1A", "AGN 2A", "AGN 3B", "SOL 4B",
"SOL 8Ag", "SOL 8Ab", "SOL 16A", "SOL 16B", "SOL 16C", "SOL 22 A",
"SOL 22D", "SOL 25B"), FeO = c(0.2, 0.8, 1.7, 0.3, 1.7, NA, 0.2,
NA, 0.1, 0.7, 1.3, 2), `Total S` = c(5.96, 45.3, 0.22, 17.3,
NA, NA, NA, NA, NA, NA, 2.37, 0.36), SO4 = c(NA, 6.72, NA, 4.08,
0.06, 0.16, 42.2, 35.2, 37.8, 0.32, 6.57, NA), `Total N` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, 15.2, NA, NA), SiO2 = c(50.2,
31.05, 56.47, 62.14, 61.36, 75.66, 8.41, 21.74, 17.44, 13.52,
19.62, 56.35), Al2O3 = c(15.53, 7.7, 17.56, 4.44, 17.75, 10.92,
31.92, 26.38, 27.66, 0.64, 3.85, 17.28), Fe2O3 = c(0.49, 0.63,
2.06, NA, 1.76, 0.11, 0.64, 0.88, 1.71, NA, 1.32, 2.67), MnO = c(0.01,
0.01, 0.13, 0.01, 0.09, 0.01, 0.01, 0.01, 0.01, 0.005, 0.04,
0.12), MgO = c(0.06, 0.07, 0.88, 0.03, 0.97, 0.05, 0.04, 0.07,
0.03, 0.02, 1.85, 1.63), CaO = c(0.2, 0.09, 3.34, 0.09, 2.58,
0.57, 0.2, 0.26, 0.15, 0.06, 35.66, 4.79), Na2O = c(0.15, 0.14,
3.23, 0.13, 3.18, 2.04, 0.68, 0.68, 0.55, 0.05, 0.45, 3.11),
K2O = c(4.39, 1.98, 8, 1.26, 8.59, 5.94, 8.2, 6.97, 8.04,
0.2, 0.89, 7.65), TiO2 = c(0.42, 0.27, 0.46, 0.79, 0.55,
0.16, 0.09, 0.22, 0.16, 0.222, 0.34, 0.53), P2O5 = c(0.11,
0.09, 0.18, 0.08, 0.07, 0.07, 0.85, 0.68, 0.62, NA, 0.14,
0.28), LOI = c(27.77, 57.06, 6.13, 29.03, 1.38, 4.92, 42.58,
37.58, 38.76, NA, 26.99, 3.92), LOI2 = c(27.79, 57.15, 6.32,
29.06, 1.57, 4.93, 42.6, 37.59, 38.77, 0.08, 27.13, 4.15),
Total = c(99.52, 99.88, 100.2, 98.25, 99.99, 100.5, 93.81,
95.57, 95.23, 15.25, 92.45, 100.3), `Total 2` = c(99.54,
99.96, 100.3, 98.28, 100.2, 100.6, 93.83, 95.58, 95.24, 15.33,
92.59, 100.6), `Fe2O3(T)` = c(0.71, 1.52, 3.95, 0.27, 3.65,
0.22, 0.87, 0.99, 1.82, 0.61, 2.76, 4.9)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
First we set the plotting level like you did:
plotlvls = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O")
Then we select only these columns, and also Ech, note I use pivot_longer() because gather() will supposedly be deprecated, and then we do the factoring too:
plotdf = data %>% select(c(plotlvls,"Ech")) %>%
pivot_longer(-Ech,names_to = "Element",values_to = "Pourcentage") %>%
mutate(Element=factor(Element,levels=toplot))
Finally we plot, and there are no NAs:
ggplot(data=plotdf,mapping=aes(x=Element,y=Pourcentage,colour=Ech))+
geom_point()+geom_line(aes(group=Ech)) +scale_y_log10()
1.Create reproducible minimal data
data <- data.frame(Element = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O",NA),
Pourcentage = 1:10,
Ech = c("AGN 1A", "SOL 16"))
2.Set factor levels for variable 'Element'
data$Element <- factor(data$Element,levels = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O"))
3.Remove rows containing NA in the variable 'Element'
data <- data[!is.na(data$Element), ]
4.Plot data using ggplot2 (ggplot2 syntax uses NSE (non standard evaluation), which means you dont't have to pass the variable names as strings or using the $ notation):
ggplot(data=data,aes(x=Element,y=Pourcentage,colour=Ech)) +
geom_point() +
geom_line(aes(group=Ech)) +
scale_y_log10()

How do I make segments (of my probabilities?)

I was wondering if there is a function which can help me with segmentation. Via mixtools (logisregmixEM), I got an optimum of 3 segments with corresponding size of 2.5%, 40.3% and 57.2%. I also got posterior probabilities. Is there any way how to create three separate segments with corresponding observations based on the probabilities, in which I end up with 3 segments with the above called sizes?
For what its worth some background information of my coefficients, and probabilities:
> dput(head(betas))
structure(list(comp1 = c(4.57, 0.08, 0.91, -0.11, 0.09, 0.07),
comp2 = c(2.04, -0.22, 0.19, 0.34, -0.34, -0.01), comp3 = c(0.88,
0.03, 0.42, -0.02, -0.17, -0.01)), row.names = c("beta.0",
"beta.1", "beta.2", "beta.3", "beta.4", "beta.5"), class = "data.frame")
> dput(head(posteriorp))
structure(c(0.06, 0.03, 0, 0.03, 0, 0, 0.61, 0.42, 0.07, 0.41,
0.31, 0.41, 0.33, 0.56, 0.93, 0.56, 0.69, 0.59), .Dim = c(6L,
3L), .Dimnames = list(NULL, c("comp.1", "comp.2", "comp.3")))

for loop a non-parametric test filtered by year and save each year's result as a data frame

I'm trying to create and print a list of data frames that are the result of the Mann-Whitney-Wilcoxon Test.
My code currently runs the Mann-Whitney-Wilcoxon Test on all the observations and compares the two data frames, ORATIOS and KFMARATIOS.
library(tidyverse)
library(devtools)
library(inspectdf)
library(readr)
library(broom)
library(knitr)
library(readxl)
library(skimr)
library(kableExtra)
list_ratio <- grep("RATIO",colnames(ORATIOS), value=TRUE)
MWU_pvalues <- unlist(Map(function(a,b) wilcox.test(a, b)$p.value, ORATIOS[list_ratio], KFMARATIOS[list_ratio]))
MWU_pvalues <- as.data.frame(MWU_pvalues) %>%
rename(`P VALUE` = MWU_pvalues)
MWU_pvalues <- tibble::rownames_to_column(MWU_pvalues, "RATIO") %>%
mutate(`Significance` = if_else(`P VALUE` > 0.05, "",
if_else(`P VALUE` <= 0.05 & `P VALUE` >= 0.01, "\\*",
if_else(`P VALUE` <= 0.01 & `P VALUE` >= 0.001, "**", "***"))))
kable(MWU_pvalues) %>%
kable_styling()
How would I create a for loop or lapply filtering on each year, running the above test, saving each result as a dataframe into a list of dataframes? I'd like to have each dataframe for each year printed using kable in my RMarkdown file.
Sample data:
ORATIOS:
structure(list(YEAR = c(2008, 2009, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015,
2016, 2017), FARM = c("D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I"),
`CURRENT RATIO` = c(0.568022785746452, 0.329854720020037,
0.832073159580644, 0.643108790851367, 25.1454874121908, 14.5975395062397,
5.12537888750377, 5.20160770260219, 7.64257374037806, 2.1580962424325,
1.31703632160198, 0.125166573684741, 0.0680923398879462,
0.100452384108057, 0.0998706900125819, 0.0907309088049343,
0.521537398114045, 0.773433351511582, 0.174099653043861,
0.0804425861373205), `WORKING CAPITAL TO GROSS FARMING INCOME` = c(-0.132573843177753,
-0.419436996986394, -0.031444400685141, -0.114022796397208,
1.22962822585944, 0.397841184148093, 0.239623650110705, 0.295681875030473,
0.502930206605254, 0.41862926754376, 0.0513905118422565,
-0.406448322702947, -0.343476652794216, -0.366684678854441,
-0.27321810774102, -0.306827980132377, -0.173010159020099,
-0.140768598200492, -0.367184395657858, -0.888263538055031
), `DEBT TO TOTAL ASSET RATIO` = c(0.0846892634197993, 0.102127561711337,
0.0750728145035032, 0.0797349374471145, 0.0122514875519798,
0.0162967044282012, 0.0165670856047258, 0.0188732833402721,
0.0150968780472965, 0.0275252089477482, 0.1123291162633,
0.151496340475165, 0.0960615511639704, 0.0985641068765839,
0.119816717131179, 0.121164074695269, 0.0970056997272376,
0.139114211255347, 0.0686657852466466, 0.17098484263781),
`DEBT TO FARM ASSET RATIO` = c(0.0935832744841849, 0.114259598684054,
0.0824723632268821, 0.08365143337564, 0.0129689938858425,
0.0191316764222117, 0.0216751963945452, 0.0225358439285237,
0.0167830935834987, 0.030821228954403, 0.140068283663094,
0.203393535891141, 0.133942894025292, 0.137887444914688,
0.17818477721901, 0.182143899668642, 0.141540075268137, 0.212926916788055,
0.0962721755129152, 0.172706971368876), `EQUITY TO ASSET RATIO` = c(0.915310736580201,
0.897872438288663, 0.924927185496497, 0.920265062552885,
0.98774851244802, 0.983703295571799, 0.983432914395274, 0.981126716659728,
0.984903121952704, 0.972474791052252, 0.8876708837367, 0.848503659524835,
0.90393844883603, 0.901435893123416, 0.880183282868821, 0.878835925304732,
0.902994300272762, 0.860885788744653, 0.931334214753353,
0.82901515736219), `DEBT TO EQUITY RATIO` = c(0.0925251502415636,
0.113743954437438, 0.0811661887343104, 0.0866434472975902,
0.0124034482437396, 0.0165666868267717, 0.0168461776723358,
0.0192363361631072, 0.0153282873318188, 0.0283042904566863,
0.126543652970169, 0.178545300040313, 0.106270013503315,
0.109341227289126, 0.13612700838927, 0.137868823072129, 0.107426702137473,
0.161594270778014, 0.0737284040024573, 0.206250562633691),
`RETURN ON FARM ASSETS` = c(0.0170145283510924, -0.00522377886147693,
0.0237250420249203, 0.00257743472229431, 0.0213365859181817,
0.0244609737360482, 0.0279373354305636, 0.0167869242322396,
0.0572363957452595, -0.00273821783417637, 0.0325678749005671,
-0.0532931806283685, 0.024215521265722, -0.0178636730481072,
0.0189254399688753, 0.00211416100547258, -0.00938005681041073,
0.0501921695586829, 0.0215269026374393, -0.0366154070757298
), `RETURN ON ASSETS` = c(0.0566608458884666, 0.0239054711694685,
0.0264084815850861, 0.00576204495548541, 0.179667366138176,
0.0246773695339781, 0.0246552659101915, 0.020526505137709,
0.0551370549195115, -5.05665725060606e-05, 0.0449112877923212,
-0.0284073208306705, 0.0249952584312144, -0.00283565027536605,
0.0360687362998932, 0.0080927754538142, -0.00331579015236834,
0.0457634829675583, 0.0229640648122328, -0.023016837706958
), `RETURN ON EQUITY` = c(0.0168221490501512, -0.00520020437367425,
0.023349291367177, 0.00266962346623839, 0.0204061503508897,
0.0211814836515069, 0.0217131742563291, 0.0143291246913213,
0.0522749822883451, -0.002514608130223, 0.0294232052511338,
-0.0467824450944562, 0.0192125442012039, -0.0141654371518756,
0.0144583817182496, 0.00160025611694793, -0.00711931632857772,
0.0380917883044123, 0.0164860113123938, -0.0437269454184399
), `FARM OPERATING PROFIT MARGIN RATIO` = c(0.113108456739495,
-0.0455472105804567, 0.199838203998892, 0.0234275923606582,
0.158472105656006, 0.183710042172317, 0.190582976791897,
0.124927655425634, 0.45847835351018, -0.0422031337055503,
0.122121670323183, -0.243017854350921, 0.11277681710057,
-0.0790679940692684, 0.076084143213901, 0.00890894198839937,
-0.0450368591167229, 0.204577659697265, 0.13619384495868,
-0.358538500350435), `ASSET TURNOVER RATIO` = c(0.0153974936379558,
-0.00466912018059027, 0.0215963943475807, 0.00245676120615052,
0.0201561446538819, 0.0208362952730876, 0.0213534502396742,
0.0140586870610039, 0.0514857932558134, -0.00244539301601691,
0.0261181226076402, -0.0396950758641658, 0.0173669574034299,
-0.0127692334904846, 0.0127260258857395, 0.00140636256526249,
-0.00642870206654449, 0.0327926792191383, 0.0153539864000432,
-0.0362503005370359), `OPERATING EXPENSE RATIO` = c(0.671535228245263,
0.773166498456329, 0.607985458258, 0.724432447012029, 0.67336000606662,
0.64796797949329, 0.589032574693052, 0.74988495257417, 0.461775664398759,
0.862141471389961, 0.672863504023624, 0.980455882037588,
0.669661413731221, 0.86690216270866, 0.670033358895902, 0.737005445439968,
0.783494244501376, 0.649760819934915, 0.706382908455109,
1.134948535946), `DEPRECIATION EXPENSE RATIO` = c(0.12660532789432,
0.132732814909818, 0.103826844188336, 0.144629676126728,
0.140059287930065, 0.157478624539652, 0.141620283491016,
0.0919194664659044, 0.0583370508964949, 0.133579109920113,
0.150646135557582, 0.183514628711121, 0.146236932328879,
0.16125312788589, 0.191531747619893, 0.197293862401247, 0.193527787561396,
0.0913809290148264, 0.0946887014018637, 0.145522583536315
), `INTEREST EXPENSE RATIO` = c(0.0887509871209225, 0.139647897214309,
0.0883494935547731, 0.107510284500585, 0.028108600347309,
0.0108433537947408, 0.0787641650240354, 0.0332679255342914,
0.0214089311945663, 0.0464825523954769, 0.0543686900956105,
0.0790473436022124, 0.0713248368393299, 0.0509127034747178,
0.0623507502703033, 0.0567917501703862, 0.068014827053951,
0.0542805913529945, 0.0627345451843474, 0.0780673808681226
), `NET FARM INCOME RATIO` = c(0.113108456739495, -0.0455472105804567,
0.199838203998892, 0.0234275923606582, 0.158472105656006,
0.183710042172317, 0.190582976791897, 0.124927655425634,
0.45847835351018, -0.0422031337055503, 0.122121670323183,
-0.243017854350921, 0.11277681710057, -0.0790679940692684,
0.076084143213901, 0.00890894198839937, -0.0450368591167229,
0.204577659697265, 0.13619384495868, -0.358538500350435)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L))
KFMARATIOS:
structure(list(YEAR = c(2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008
), FARM = c(11407100, 11484600, 11485100, 11495100, 11801800,
11806400, 11820000, 11885400, 11886000, 11897200, 11897300, 12004500,
12004501, 12303001, 12340101, 12398300, 13050001, 13700201, 13705601,
14089100, 14110900, 14130000, 14130002, 14184100, 14192300, 14330302,
14388200, 14783200, 14786200, 15094200, 15096200, 15584200, 15586100,
15682100, 15683100, 15689100, 16507002, 16580000, 16598200, 16601300
), `CURRENT RATIO` = c(-3, 0, 4.57, 15.94, 2.22, 0, 368.69, 1.86,
9.1, 3.45, 2, 0, 1.58, 6.26, 1.97, 1.54, 0, 3.39, 313.09, 5.59,
5.4, 0, 3.6, 5.78, 3.18, 207.1, 2.36, 28.31, 3.4, 3.68, 0.37,
3.5, 5.6, 13.64, 7.05, 0, 2.23, 0.89, 4.4, 1.11), `WORKING CAPITAL TO GROSS FARMING INCOME` = c(0.783990044655886,
0.939342207539837, 0.468883358203084, 0.53708199556795, 0.429230789973027,
0.856616290636639, 0.46085746623408, 0.019246546772549, 1.04338230212655,
0.318770448161572, 0.398058372857175, 0.506978780306214, 0.263816960947357,
0.4960655740923, 0.101962576323424, 0.220623464476751, 1.12676140487953,
0.533690322762107, 0.685276501922026, 0.703540899065169, 0.660869855557338,
0.71777803486123, 0.319578323479609, 0.722736340214157, 0.286630301648443,
0.818610240507597, 0.184477489966846, 0.78148168000963, 0.357891811040315,
0.289159422203956, -0.125641128630768, 0.392321597654173, 0.561996673317676,
0.353452531903466, 0.683345718597063, 0.804567295215173, 0.307398272114796,
-0.375449779668313, 0.186702574682293, -0.55737251721071), `DEBT TO TOTAL ASSET RATIO` = c(0.02,
0.07, 0.27, 0.37, 0.36, 0, 0.07, 0.37, 0.05, 0.33, 0.42, 0.08,
0.24, 0.34, 0.36, 0.51, 0.01, 0.11, 0.1, 0.07, 0.08, 0.01, 0.32,
0.14, 0.4, 0.52, 0.39, 0.06, 0.21, 0.32, 0.43, 0.52, 0.29, 0.12,
0.17, 0.1, 0.15, 0.87, 0.12, 0.69), `DEBT TO FARM ASSET RATIO` = c(0.0210960466847519,
0.0662443993261916, 0.270051570315789, 0.373240578143398, 0.359031265562519,
0, 0.0678176279710153, 0.369000587598404, 0.04831743727994, 0.33065743433488,
0.41680939549244, 0.0851067276205844, 0.245359588845858, 0.337912727823456,
0.356607488633417, 0.508663012923272, 0.0126098421632802, 0.10665178903834,
0.105106247793806, 0.0698908293989529, 0.0818483764283224, 0.00750932570017385,
0.319501072718455, 0.136757510256717, 0.400840648545665, 0.516753083750126,
0.389587948103612, 0.0577299469460252, 0.206521419569117, 0.315261383020663,
0.43256943562472, 0.520491208048298, 0.290288373137576, 0.120229338185664,
0.173192986515349, 0.104536048245734, 0.151997186500475, 0.868552025800098,
0.123958600776313, 0.692195974317741), `EQUITY TO ASSET RATIO` = c(0.98536882817945,
0.944215770167283, 0.736537746555766, 0.729860554651407, 0.642228778874089,
1, 0.94228148558872, 0.630999412401596, 0.95168256272006, 0.66934256566512,
0.592693562701164, 0.914893272379416, 0.813956784138156, 0.688995447780108,
0.725420084109645, 0.545241148972386, 0.988536562104007, 0.900124825958172,
0.90344241855196, 0.930936390469265, 0.92060316189968, 0.992490674299826,
0.758518009863028, 0.881474617998699, 0.600468426703118, 0.553595877267449,
0.667405715763261, 0.942270053053975, 0.842757601135073, 0.708413078986436,
0.56743056437528, 0.533041296742996, 0.743304732269968, 0.88511363093375,
0.831970255984885, 0.904591907651469, 0.876296809602567, 0.131447974199902,
0.890119750534961, 0.307804025682259), `DEBT TO EQUITY RATIO` = c(0.02,
0.07, 0.37, 0.6, 0.56, 0, 0.07, 0.58, 0.05, 0.49, 0.72, 0.09,
0.32, 0.51, 0.55, 1.04, 0.01, 0.12, 0.12, 0.08, 0.09, 0.01, 0.47,
0.16, 0.67, 1.07, 0.64, 0.06, 0.26, 0.46, 0.76, 1.08, 0.41, 0.14,
0.21, 0.12, 0.18, 6.61, 0.14, 2.25), `RETURN ON FARM ASSETS` = c(0.374484329540697,
0.0498819566035984, 0.181954755022922, 0.193161758267218, 0.0473627311001023,
0.327305563029612, 0.603037930741254, -0.0156737997438482, 0.10397858597475,
0.10789191406389, 0.180771277730155, 0.150007797084, 0.174196776278552,
0.120122100767257, 0.298096858936563, 0.0517125227815447, 0.111597414809764,
0.185024421154621, 0.239979711875599, 0.0808784377916965, 0.201436668181771,
0.135024051506645, 0.251851638310215, 0.103285147847268, 0.14207589091784,
0.247675592658745, 0.100067311604358, 0.308209326567443, 0.154555623216289,
0.174464204907127, 0.00457531564104158, 0.098141499884622, 0.251116584438097,
0.153198476415449, 0.183688952743912, 0.0838032420725189, 0.169288085631256,
0.0279120898963428, 0.147329195543669, 0.034801030826966), `RETURN ON ASSETS` = c(0.260063898261748,
0.0581159003954688, 0.186586004612603, 0.144217266907855, 0.0471965084015535,
0.203276288956977, 0.522691591931166, -0.0156737997438482, 0.104160943214225,
0.110451790466256, 0.178360409188664, 0.150089138729099, 0.134029707705111,
0.120565772385725, 0.229528019076799, 0.0697390623585822, 0.10198296142804,
0.192570247620748, 0.245119340816501, 0.115758491252085, 0.195889106965538,
0.138158444053898, 0.231674956423303, 0.0966027636728098, 0.141766843553559,
0.215113054221126, 0.135495862386357, 0.314351616201071, 0.133076845003381,
0.168262801476855, 0.00457531564104158, 0.0986664889666124, 0.242490501823923,
0.152124266735103, 0.201716489655936, 0.0786665142081486, 0.162659186669921,
0.0279454048764536, 0.134992616527726, 0.034801030826966), `RETURN ON EQUITY` = c(0.263580248064511,
0.0444871419402714, 0.241012793134955, 0.191549228659637, 0.0734886226747657,
0.186089113513671, 0.544673844576945, -0.0248396423765173, 0.109257634896201,
0.161190875342999, 0.298045789765326, 0.163962072531003, 0.162274234481587,
0.160460729376603, 0.31640703656353, 0.0847926292565323, 0.102628180483108,
0.192493344561337, 0.244023637469295, 0.0858503015508329, 0.212255623707772,
0.13604566269794, 0.250952374400512, 0.101551944180348, 0.235835707060263,
0.386487527831846, 0.128000474163853, 0.327092350614891, 0.139632557156543,
0.227780755169442, 0.0080632167674627, 0.165179790324242, 0.298742298993181,
0.165391606109475, 0.214205739228479, 0.084552656304169, 0.157224605882577,
0.212343248849882, 0.146717984157146, 0.113062299136044), `FARM OPERATING PROFIT MARGIN RATIO` = c(0.55,
0.18, 0.29, 0.33, 0.12, 0.46, 0.24, -0.1, 0.14, 0.23, 0.2, 0.22,
0.44, 0.25, 0.33, 0.13, 0.36, 0.44, 0.33, 0.05, 0.32, 0.16, 0.52,
0.3, 0.24, 0.35, 0.2, 0.32, 0.38, 0.29, 0.02, 0.24, 0.36, 0.25,
0.4, 0.18, 0.32, -0.01, 0.08, -0.01), `ASSET TURNOVER RATIO` = c(0.64,
0.2, 0.55, 0.58, 0.29, 0.64, 1.88, 0.39, 0.31, 0.34, 0.72, 0.58,
0.38, 0.41, 0.96, 0.38, 0.26, 0.4, 0.62, 0.41, 0.55, 0.67, 0.53,
0.29, 0.51, 0.86, 0.38, 0.94, 0.4, 0.54, 0.65, 0.49, 0.7, 0.49,
0.41, 0.3, 0.47, 0.62, 0.87, 0.79), `OPERATING EXPENSE RATIO` = c(0.29,
0.57, 0.61, 0.52, 0.69, 0.48, 0.57, 0.89, 0.64, 0.57, 0.72, 0.62,
0.45, 0.55, 0.52, 0.69, 0.49, 0.43, 0.5, 0.75, 0.53, 0.69, 0.38,
0.54, 0.6, 0.54, 0.55, 0.56, 0.5, 0.57, 0.87, 0.61, 0.54, 0.63,
0.44, 0.61, 0.56, 0.82, 0.77, 0.83), `DEPRECIATION EXPENSE RATIO` = c(0.08,
0.16, 0.01, 0.05, 0.07, 0.02, 0.03, 0.09, 0.02, 0.06, 0.03, 0.1,
0.04, 0.08, 0.06, 0.1, 0.06, 0.05, 0.03, 0.04, 0.08, 0.09, 0.04,
0.06, 0.05, 0.01, 0.11, 0.05, 0.04, 0.06, 0.05, 0.08, 0.04, 0.03,
0.06, 0.08, 0.01, 0.1, 0.05, 0.04), `INTEREST EXPENSE RATIO` = c(0.01,
0, 0.03, 0.07, 0.08, 0, 0, 0.06, 0, 0.02, 0.04, 0.01, 0.02, 0.06,
0.03, 0.06, 0, 0, 0.03, 0.01, 0.02, 0, 0.06, 0.01, 0.05, 0, 0.07,
0, 0.04, 0.01, 0.08, 0.1, 0.04, 0.02, 0.03, 0.02, 0.04, 0.04,
0.01, 0.09), `NET FARM INCOME RATIO` = c(0.62, 0.27, 0.35, 0.36,
0.16, 0.5, 0.39, -0.04, 0.34, 0.35, 0.22, 0.28, 0.49, 0.31, 0.39,
0.15, 0.45, 0.51, 0.44, 0.2, 0.37, 0.21, 0.52, 0.39, 0.29, 0.45,
0.27, 0.39, 0.43, 0.36, 0.01, 0.21, 0.37, 0.32, 0.47, 0.28, 0.38,
0.05, 0.17, 0.05)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-40L))
My solution is kind of convuluted but I guess it is never easy to work with list columns,
nested_oratios <- ORATIOS %>%
group_by(YEAR) %>%
nest() %>%
mutate(fake_year = 2008) %>%
ungroup()
nested_kfmaratios <- KFMARATIOS %>%
group_by(YEAR) %>%
nest() %>%
mutate(fake_year = 2008) %>%
ungroup() %>%
select(-YEAR)
nested_comb <- nested_oratios %>%
left_join(nested_kfmaratios,by = c('fake_year'),suffix = c(".oratios", ".kfmaratios")) %>%
select(-fake_year)
logic_pipe <- function(a,b) {
a <- a %>% select(contains('RATIO'))
b <- b %>% select(contains('RATIO'))
MWU_pvalues <- map2(a,b,function(a,b) wilcox.test(a, b)$p.value) %>% unlist()
MWU_pvalues <- as.data.frame(MWU_pvalues) %>%
rename(`P VALUE` = MWU_pvalues)
MWU_pvalues <- tibble::rownames_to_column(MWU_pvalues, "RATIO") %>%
mutate(`Significance` = if_else(`P VALUE` > 0.05, "",
if_else(`P VALUE` <= 0.05 & `P VALUE` >= 0.01, "\\*",
if_else(`P VALUE` <= 0.01 & `P VALUE` >= 0.001, "**", "***"))))
return(MWU_pvalues %>% as_tibble())
}
nested_comb %>%
mutate(result = map2(.x = data.oratios ,.y =data.kfmaratios,logic_pipe))
Consider the apply family with mapply and by (object-oriented wrapper to tapply) that can subset your data by year and pass into a user-defined function. Note: unlist + Map can be replaced with mapply (the underlying function of Map, its wrapper). Below demonstrates with base R where transform replaces mutate and ifelse replaces if_else:
proc_df <- function(df) {
yr <- df$YEAR[1]
MWU_pvalues <- mapply(function(a,b) wilcox.test(a, b)$p.value,
subset(ORATIOS, YEAR==yr)[list_ratio], df[list_ratio])
final_df <- transform(data.frame(ratio = names(MWU_pvalues),
p_value = unname(MWU_pvalues)),
significance = ifelse(p_value > 0.05, "",
ifelse(p_value <= 0.05 & p_value >= 0.01, "*",
ifelse(p_value <= 0.01 & p_value >= 0.001, "**", "***")
)
)
)
return(final_df)
}
df_list <- by(KFMARATIOS, KFMARATIOS$YEAR, proc_df)
Output
df_list$`2008`
# ratio p_value significance
# 1 CURRENT RATIO 0.20349856
# 2 DEBT TO TOTAL ASSET RATIO 0.39154322
# 3 DEBT TO FARM ASSET RATIO 0.52264808
# 4 EQUITY TO ASSET RATIO 0.42276423
# 5 DEBT TO EQUITY RATIO 0.39162003
# 6 FARM OPERATING PROFIT MARGIN RATIO 0.11726414
# 7 ASSET TURNOVER RATIO 0.01957554 *
# 8 OPERATING EXPENSE RATIO 0.24893798
# 9 DEPRECIATION EXPENSE RATIO 0.02588258 *
# 10 INTEREST EXPENSE RATIO 0.10127823
# 11 NET FARM INCOME RATIO 0.06262773

R - d3heatmap - implement breaks

I am trying to plot a heatmap using the d3heatmap package.
Unfortunately, I have not been successful yet in implementing certain breaks using the option breaks=... as in heatmap or heatmap.2.
This yields just funny results, I am not even sure whether I am doing something wrong or whether the function just ignores breaks.
For example, I tried:
breaks = c(seq(-10, -2), seq(-2, -1.65), seq(-1.65, 1.65), seq(1.65, 2), seq(2, 10)
and
breaks = c(-10, -2, -1.65, 1.65, 2, 10)
with
colors = c("red", "yellow", "green", "yellow", "red")
but nothing seems to work properly.
Any suggestions?
Here's the dput of my data:
> dput(mat)
structure(c(-0.04, NA, 0.59, NA, 0.675, 0.96, 1.09, 0.445, NA,
0.545, NA, NA, 0.09, -1.11, NA, 0.99, 0.13, 0.215, 1.425, 0,
NA, 0.69, 0.805, NA, 0.69, 1.22, NA, 0.3, NA, 0.025, NA, 0.075,
0.36, -0.94, NA, -0.31, 0.26, 1.02, -1.19, NA, NA, -0.77, NA,
-1.48, 1.05, 0.48, NA, NA, NA, 1.49, -1.285, NA, 0.76, 1.14,
-0.62, NA, NA, NA, 0.95, NA, NA, -0.12, 0.49, NA, 2.31, NA, -0.33,
0.85, NA, -1.7, -1.63, NA, -1.12, 0.135, -0.18, NA, -0.245, NA,
-0.2, -0.2, 0.23, -0.11, NA, 0.3, -0.81, 0.04, 0.18, -0.7, 0.53,
0.44, -0.49, 0.28, 0.26, 0.06, 0.265, 0.21, 0.06, -0.175, 0.365,
0.255, 1.25, -0.35, 0.16, 0.125, 0.825, 0.08, 0.02, -0.02, 0.99,
0.79, -0.23, 0.06, NA, 0.36, -0.64, -0.195, 1.19, -0.29, 0.915,
NA, NA, NA, NA, 0.2, 0.1, NA, 0.04, 0.33, NA, 1.46, 2.36, NA,
-0.92, 1.295, NA, NA, 0.8, NA, 1.09, 1.45, 5.42, NA, NA, NA,
1.69, 3.43, NA, 0.55), .Dim = c(37L, 4L), .Dimnames = list(c("AT",
"BE", "BG", "CEE", "CH", "CN", "CZ", "DE", "DK", "EA", "EE",
"EMU", "ES", "EU", "FI", "FR", "GB", "GR", "HR", "HU", "IE",
"IT", "JP", "LU", "NL", "PL", "PT", "RO", "RS", "RU", "SE", "SI",
"SK", "TR", "UA", "UK", "US"), c("Credit Risk", "Funding and liquidity Risk",
"Macro Risk", "Market Risk")))
And the code I am running:
d3heatmap(abs(mat),
dendrogram = "none",
breaks = c(0,1.65,2,10),
col = c("green", "yellow", "red"),
na.rm = TRUE)
The same function using heatmap.2 works perfectly, though.
The function d3heatmap simply does not have a 'breaks' argument. If it gets passed in as an argument it is silently ignored. (See ?d3heatmap.)
The heatmap.2 function in the gplots package on the other hand does have a "breaks" argument. That explains the difference in behaviour.
Luckily, it is still possible to get the desired behaviour by passing an appropriate 'colors' function to d3heatmap. It works as follows.
First the example data:
mat <- structure(c(-0.04, NA, 0.59, NA, 0.675, 0.96, 1.09, 0.445, NA,
0.545, NA, NA, 0.09, -1.11, NA, 0.99, 0.13, 0.215, 1.425, 0,
NA, 0.69, 0.805, NA, 0.69, 1.22, NA, 0.3, NA, 0.025, NA, 0.075,
0.36, -0.94, NA, -0.31, 0.26, 1.02, -1.19, NA, NA, -0.77, NA,
-1.48, 1.05, 0.48, NA, NA, NA, 1.49, -1.285, NA, 0.76, 1.14,
-0.62, NA, NA, NA, 0.95, NA, NA, -0.12, 0.49, NA, 2.31, NA, -0.33,
0.85, NA, -1.7, -1.63, NA, -1.12, 0.135, -0.18, NA, -0.245, NA,
-0.2, -0.2, 0.23, -0.11, NA, 0.3, -0.81, 0.04, 0.18, -0.7, 0.53,
0.44, -0.49, 0.28, 0.26, 0.06, 0.265, 0.21, 0.06, -0.175, 0.365,
0.255, 1.25, -0.35, 0.16, 0.125, 0.825, 0.08, 0.02, -0.02, 0.99,
0.79, -0.23, 0.06, NA, 0.36, -0.64, -0.195, 1.19, -0.29, 0.915,
NA, NA, NA, NA, 0.2, 0.1, NA, 0.04, 0.33, NA, 1.46, 2.36, NA,
-0.92, 1.295, NA, NA, 0.8, NA, 1.09, 1.45, 5.42, NA, NA, NA,
1.69, 3.43, NA, 0.55), .Dim = c(37L, 4L),
.Dimnames = list(c("AT", "BE", "BG", "CEE", "CH", "CN", "CZ", "DE", "DK", "EA", "EE", "EMU", "ES", "EU", "FI", "FR", "GB", "GR", "HR", "HU", "IE", "IT", "JP", "LU", "NL", "PL", "PT", "RO", "RS", "RU", "SE", "SI", "SK", "TR", "UA", "UK", "US"), c("Credit Risk", "Funding and liquidity Risk", "Macro Risk", "Market Risk")))
Suppose we want the following three color bins: blue for values < 0, green for values >= 0 but < 2, and red for values >= 2. We then define the corresponding ordered list of colors.
palette <- c("blue", "green", "red")
We also define the boundary values of the color bins. These values must include the domain boundaries.
mi <- min(mat, na.rm = TRUE)
ma <- max(mat, na.rm = TRUE)
breaks <- c(mi, 0, 2, ma)
We can now define a color interpolation function which maps a value in [0,1] onto a color, respecting our color bins. The 'scales' package comes to help here.
install.package('scales') # if needed
library(scales)
colorFunc <- col_bin(palette, bins = rescale(breaks))
The breaks originally defined in the domain of our data needed to be rescaled to [0,1]. The 'rescale' function in the 'scales' package handled that.
Small detail: the low boundary of a bin is included in the bin, but the high boundary is excluded. So the value 0 will be green, anything between 0 and 2 will be green too, but 2 will be red.
We can now plot the heat map.
d3heatmap(mat, dendrogram = "none", colors = colorFunc, na.rm = TRUE)
The result looks like this:

Resources