Plot histogram in R - r

Given the following data frame:
time frequency
0000 - 0059 8
0100 - 0159 4
0200 - 0259 17
0300 - 0359 5
0400 - 0459 71
0500 - 0559 477
0600 - 0629 325
0630 - 0659 661
0700 - 0714 558
0715 - 0729 403
0730 - 0744 671
0745 - 0759 444
0800 - 0814 641
0815 - 0829 356
0830 - 0844 427
need to plot a a histogram of 15 bins, where x is labelled with the "time" for each bin and y is titled with "frequency". Is there a good way to make it?

Related

Automize portfolios volatilities computation in R

Thanks for reading my post. I have a series of portfolios created from the combination of several stocks. I should compute the volatility of those portfolios using the historical daily performances of each stock. Since I have all the combinations in one data frame (called final_output), and all stocks return in another data frame (called perf, where the columns are stocks and rows days) I don't know which will be the most efficient way to automize the process. Below you can find an extract:
> Final_output
ISIN_1 ISIN_2 ISIN_3 ISIN_4
2 CH0595726594 CH1111679010 XS1994697115 CH0587331973
3 CH0595726594 CH1111679010 XS1994697115 XS2027888150
4 CH0595726594 CH1111679010 XS1994697115 XS2043119358
5 CH0595726594 CH1111679010 XS1994697115 XS2011503617
6 CH0595726594 CH1111679010 XS1994697115 CH1107638921
7 CH0595726594 CH1111679010 XS1994697115 XS2058783270
8 CH0595726594 CH1111679010 XS1994697115 JE00BGBBPB95
> perf
CH0595726594 CH1111679010 XS1994697115 CH0587331973
626 0.0055616769 -0.0023656130 1.363791e-03 1.215922e-03
627 0.0086094443 0.0060037334 0.000000e+00 2.519220e-03
628 0.0053802380 0.0009027081 0.000000e+00 7.508635e-04
629 -0.0025213543 -0.0022046297 4.864050e-05 1.800720e-04
630 0.0192416817 0.0093401627 -6.079767e-03 3.800836e-03
631 -0.0101224820 0.0051741294 6.116956e-03 -1.345184e-03
632 -0.0013293793 -0.0100475153 -4.494163e-03 -1.746106e-03
633 0.0036350604 0.0012999350 3.801130e-03 -5.997121e-05
634 0.0030097434 -0.0011484496 -1.187614e-03 -2.069131e-03
635 0.0002034381 0.0030493901 -1.851762e-03 -3.806280e-04
636 -0.0035594427 0.0167455769 -2.148123e-04 -4.709560e-04
637 0.0007654623 -0.0051958237 -3.711191e-04 1.604010e-04
638 0.0107592678 -0.0016260163 4.298764e-04 3.397951e-03
639 0.0050953486 -0.0007403020 2.011738e-03 8.790770e-04
640 0.0008532851 -0.0071121648 -9.746114e-04 5.389598e-04
641 -0.0068204614 0.0133810874 -9.755622e-05 -1.346674e-03
642 0.0091395678 0.0102591793 1.717157e-03 -1.977785e-03
643 0.0027520640 -0.0157912638 1.256440e-03 -1.301119e-04
644 -0.0048902196 0.0039494471 -1.624514e-03 -3.373340e-03
645 -0.0116838833 0.0062450826 6.625549e-04 1.205255e-03
646 0.0004566442 -0.0018570102 -3.456636e-03 4.474138e-03
647 0.0041586368 0.0085679315 4.435933e-03 1.957455e-03
648 0.0007575758 0.0002912621 0.000000e+00 2.053306e-03
649 0.0046429473 -0.0138309230 -4.435798e-03 1.541798e-03
650 0.0049731250 -0.0488164953 4.181975e-03 -9.733133e-04
651 0.0008497451 -0.0033110870 2.724477e-04 -7.555498e-04
652 0.0004494831 0.0049831300 -8.657588e-04 -1.790813e-04
653 -0.0058905751 0.0020143588 8.178287e-04 -1.213991e-03
654 0.0000000000 0.0167525773 4.864050e-05 9.365068e-04
655 0.0010043186 0.0048162231 0.000000e+00 -2.110146e-03
656 -0.0024079462 -0.0100403633 -2.431907e-03 -9.176600e-04
657 -0.0095544604 -0.0193670047 0.000000e+00 -8.935435e-03
658 0.0008123477 0.0114339172 2.437835e-03 5.530483e-03
659 0.0022828734 -0.0015415446 -3.239300e-03 2.765060e-03
660 0.0049096523 -0.0001029283 3.199079e-02 2.327835e-03
661 -0.0027702226 -0.0357198003 9.456712e-04 3.189602e-04
662 -0.0008081216 -0.0139311449 -2.891020e-02 -1.295363e-03
663 -0.0033867462 0.0068745264 -2.529552e-03 -1.496588e-04
664 -0.0015216068 -0.0558572120 -3.023653e-03 -7.992975e-03
665 0.0052829422 0.0181072771 4.304652e-03 -3.319519e-03
666 0.0084386054 0.0448545861 -8.182748e-04 4.279284e-03
667 -0.0076664829 -0.0059415480 -2.047362e-04 6.059936e-03
668 -0.0062108665 -0.0039847073 7.313506e-04 5.993467e-04
669 -0.0053350948 0.0068119154 -1.042631e-02 -2.056524e-03
670 -0.0263588067 0.0245395479 -2.188962e-02 -6.732491e-03
671 -0.0021511018 0.0220649895 1.412435e-02 1.702085e-03
672 0.0205058100 -0.0007179119 3.057527e-03 -1.002423e-02
673 0.0096862280 -0.0194488633 1.207407e-03 -1.553899e-03
674 0.0007143951 -0.0068557672 6.227450e-03 1.790274e-03
675 -0.0021926470 -0.0051114507 -6.267498e-03 -1.035691e-03
676 0.0076655765 -0.0139300847 6.583825e-03 3.059472e-03
677 -0.0032457653 0.0180480206 -4.635495e-03 1.064002e-03
678 0.0036633764 0.0060676410 -2.762676e-04 5.364970e-04
679 -0.0008111122 -0.0013635410 -1.065898e-03 1.214059e-03
680 0.0050228311 0.0055141267 3.003507e-03 1.121643e-03
681 -0.0007067495 0.0147281558 -2.699002e-03 -1.514035e-04
682 -0.0024248548 0.0002573473 -2.113685e-03 -1.423409e-03
683 -0.0002025624 0.0138417207 -4.374895e-03 1.415328e-04
684 -0.0141822418 -0.0169517332 -3.578920e-03 -1.799234e-03
685 -0.0005651749 -0.0259693324 -5.926428e-03 -3.635333e-03
686 0.0004112688 0.0133043570 -1.545642e-03 1.981828e-03
687 -0.0150565262 -0.0107757493 -1.717916e-02 -1.328749e-02
688 0.0039129754 -0.0441013167 -8.376631e-03 -5.653841e-04
689 0.0019748467 0.0115063340 -2.835394e-02 7.868428e-03
690 0.0072614108 0.0358764014 3.586897e-02 7.960077e-03
691 -0.0003604531 0.0106119001 1.024769e-04 -2.733651e-04
What I should do is look for each portfolio (each row of final_output is a portfolio, i.e. 4 stocks portfolio) in perf and compute the volatility (standard deviation) of that portfolio using the stocks historical daily performances of the last three months. (Of course, here I have pasted only 4 stocks performances for simplicity.) Once done for the first, I should do the same for all the other rows (portfolios).
Below is the formula I used for computing the volatility:
#formula for computing the volatility
sqrt(t(weights) %*% covariance_matrix %*% weights)
#where covariance_matrix is
cov(portfolio_component_monthly_returns)
#All the portfolios are equiponderated
weights = [ 0.25 0.25 0.25 0.25 ]
What I'm trying to do since yesterday is to automize the process for all the rows, indeed I have more than 10'000 rows. I'm an RStudio naif, so even trying and surfing on the new I have no results and no ideas of how to automize it. Would someone have a clue how to do it?
Hope to have been clearer as possible, in case do not hesitate to ask me.
Many thanks

Capture the column index in R or excel for a series of data for a condition

I would like to capture the index value for any value less than 500 for a series of data.
Below is how my data looks like
Category,Price1,Price2,Price3,Price4,Price5,Price6
Product1,967,855,929,811,501,387
Product2,526,809,723,304,315,671
Product3,412,133,369,930,400,337
Product4,709,241,625,822,967,952
Product5,395,506,110,280,829,817
Product6,803,618,794,214,605,788
For example, in the first row, Price6 is the first element for the series between Price1 to Price6, where value is less than 500, hence in the output "First" has 6.
Similarly, for second row, Price4 has less than 500, and next Price5 has less than 500, hence the value for First and Second are 4,5 respectively for the series of data between price1 and Price6.
When nothing is capture in the logic, i want to place a "-" for the same.
Below is the output i am looking for.
Category,Price1,Price2,Price3,Price4,Price5,Price6,First,Second,Third,Fourth,Fifth,Sixth
Product1,967,855,929,811,501,387,6,-,-,-,-,-
Product2,526,809,723,304,315,671,4,5,-,-,-,-
Product3,412,133,369,930,400,337,1,2,3,5,6,-
Product4,709,241,625,822,967,952,2,-,-,-,-,-
Product5,395,506,110,280,829,817,1,3,4,-,-,-
Product6,803,618,794,214,605,788,4,-,-,-,-,-
Not sure how to do the same in R or excel.
Any leads would be highly appreciated.
Thanks,
Using data.table
dt[, when := melt(dt, id.vars = "Category")[, toString(which(value < 500)), Category][, V1]]
cbind(dt, dt[, tstrsplit(when, ", ", fill = "-")])
Gives
Category Price1 Price2 Price3 Price4 Price5 Price6 when V1 V2 V3 V4 V5
1: Product1 967 855 929 811 501 387 6 6 - - - -
2: Product2 526 809 723 304 315 671 4, 5 4 5 - - -
3: Product3 412 133 369 930 400 337 1, 2, 3, 5, 6 1 2 3 5 6
4: Product4 709 241 625 822 967 952 2 2 - - - -
5: Product5 395 506 110 280 829 817 1, 3, 4 1 3 4 - -
6: Product6 803 618 794 214 605 788 4 4 - - - -
Now you just need to replace the names V1-V5 and drop column when.
Data:
dt <- fread("Category,Price1,Price2,Price3,Price4,Price5,Price6
Product1,967,855,929,811,501,387
Product2,526,809,723,304,315,671
Product3,412,133,369,930,400,337
Product4,709,241,625,822,967,952
Product5,395,506,110,280,829,817
Product6,803,618,794,214,605,788")
One can try apply and tidyr::separate based solution as:
# First merge the data after moving values < 500 in left.
# The empty places should be filled with `-`
df$DesiredData <- apply(df[2:7],1,function(x){
value <- x[x<500]
paste0(c(value,rep("-",length(x)-length(value))),collapse = ",")
})
library(tidyverse)
# Now use `separate` function to split column in 6 desired columns
df %>% separate("DesiredData",
c("First","Second","Third","Fourth","Fifth","Sixth"), sep = ",")
# Category Price1 Price2 Price3 Price4 Price5 Price6 First Second Third Fourth Fifth Sixth
# 1 Product1 967 855 929 811 501 387 387 - - - - -
# 2 Product2 526 809 723 304 315 671 304 315 - - - -
# 3 Product3 412 133 369 930 400 337 412 133 369 400 337 -
# 4 Product4 709 241 625 822 967 952 241 - - - - -
# 5 Product5 395 506 110 280 829 817 395 110 280 - - -
# 6 Product6 803 618 794 214 605 788 214 - - - - -
Data:
df <- read.table(text="
Category,Price1,Price2,Price3,Price4,Price5,Price6
Product1,967,855,929,811,501,387
Product2,526,809,723,304,315,671
Product3,412,133,369,930,400,337
Product4,709,241,625,822,967,952
Product5,395,506,110,280,829,817
Product6,803,618,794,214,605,788",
header = TRUE, stringsAsFactors = FALSE, sep=",")

Natural Neighbor Interpolation in R

I need to conduct Natural Neighbor Interpolation (NNI) via R in order to smooth my numeric data. For example, say I have very spurious data, my goal is to use NNI to model the data neatly.
I have several hundred rows of data (one observation for each postcode), alongside latitudes and longitudes. I've made up some data below:
Postcode lat lon Value
200 -35.277272 149.117136 7
221 -35.201372 149.095065 38
800 -12.801028 130.955789 27
801 -12.801028 130.955789 3
804 -12.432181 130.84331 29
810 -12.378451 130.877014 20
811 -12.376597 130.850489 3
812 -12.400091 130.913672 42
814 -12.382572 130.853877 32
820 -12.410444 130.856124 39
821 -12.426641 130.882367 39
822 -12.799278 131.131697 49
828 -12.474896 130.907378 38
829 -14.460879 132.280002 34
830 -12.487233 130.972637 8
831 -12.480066 130.984006 49
832 -12.492269 130.990891 29
835 -12.48138 131.029173 33
836 -12.525546 131.103025 40
837 -12.460094 130.842663 39
838 -12.709507 130.995407 28
840 -12.717562 130.351316 22
841 -12.801028 130.955789 8
845 -13.038663 131.072091 19
846 -13.226806 131.098416 50
847 -13.824123 131.835799 11
850 -14.464497 132.262021 2
851 -14.464497 132.262021 23
852 -14.92267 133.064654 36
854 -16.81839 137.14707 17
860 -19.648306 134.186642 3
861 -18.94406 134.318373 8
862 -20.231104 137.762232 28
870 -12.436101 130.84059 24
871 -12.436101 130.84059 16
Is there any kind of package that will do this? I should mention, that the only predictors I am using in this model are latitude and longitude. If there isn't a package than can do this, how can I implement it manually. I've searched extensively and I can't figure out how to implement this in R. I have seen one or two other SO posts, but they haven't assisted me in figuring this out.
Please let me know if there's anything I must add to the question. Thanks.
I suggest the following:
Reproject the data to the corresponding UTM Zone.
Use R WhiteboxTools package to process the data using natural neighbour interpolation.

Formatting phone number based on country code

I am working on a shiny app where I want to display telephone numbers of contacts. If the number is a US number, I want to show it in a specific format, for example (XXX) XXX- XXXX else I just want to return the number as it is.
I tried the most simple way using substr. This is the function I have.
telFormat <- function(x){
if (is.na(x)){
return ("")
}
if(substr(x,1,3) %in% c("+1 ")){
p1 <- substr(x,4,6)
p2 <- substr(x,8,10)
p3 <- substr(x, 12,15)
return (paste("(",p1,") ",p2,"-",p3, sep = ""))
}
else
return (x)
}
The sample data I have is:
sample <- c("+1 312 252 7546", "+1 678 538 1919", "+44 (0) 207 743 4052",
"+44 (0) 207 743 3000", "+1 212 810 5300", NA, "+44 (0) 207 591 6630",
"+61 2 9272 2200", "+852 3903 2448", "+1 415 670 6267", "+44 (0) 207 743 3000",
"+1 212 810 5300", "+1 919 743 2500", "+1 919 743 2500", "+1 919 743 2500",
"+1 919 743 2500")
The output for the phone numbers starting with +1 gets converted correctly, but there is something wrong with the other numbers.
telFormat(sample)
#output
[1] "(312) 252-7546" "(678) 538-1919" "( (0) 20- 743" "( (0) 20- 743" "
(212) 810-5300" "(NA) NA-NA" "( (0) 20- 591"
[8] "( 2 ) 272-2200" "(2 3) 03 -448" "(415) 670-6267" "( (0) 20- 743" "
(212) 810-5300" "(919) 743-2500" "(919) 743-2500"
[15] "(919) 743-2500" "(919) 743-2500"
and I get this warning message too
Warning messages:
1: In if (is.na(x)) { :
the condition has length > 1 and only the first element will be used
2: In if (substr(x, 1, 3) %in% c("+1 ")) { :
the condition has length > 1 and only the first element will be used
What am I doing wrong here? Is there a efficient way to get the desired output?
If all US numbers in your data have a specific format, i.e. +1 XXX XXX XXXX, you can use regex ^\\+1 (\\d{3}) (\\d{3}) (\\d{4})$ to reformat it:
sub("^\\+1 (\\d{3}) (\\d{3}) (\\d{4})$", "(\\1) \\2-\\3", sample)
# [1] "(312) 252-7546" "(678) 538-1919" "+44 (0) 207 743 4052"
# [4] "+44 (0) 207 743 3000" "(212) 810-5300" NA
# [7] "+44 (0) 207 591 6630" "+61 2 9272 2200" "+852 3903 2448"
#[10] "(415) 670-6267" "+44 (0) 207 743 3000" "(212) 810-5300"
#[13] "(919) 743-2500" "(919) 743-2500" "(919) 743-2500"
#[16] "(919) 743-2500"
This uses capture groups with parenthesis to match the first three, second three and last four digits in a US number, refer to these pattern with back references \\ with a number as replacement.
May be somethign like this helps with stringr
library(stringr)
as.data.frame(do.call(rbind, lapply(str_match_all(sample[!is.na(sample)],
"(\\+1|.*)[^\\d]?(\\d+)[^\\d]+(\\d+)[^\\d]+(\\d+)$"), function(x) x[,2:5])))
V1 V2 V3 V4
1 +1 312 252 7546
2 +1 678 538 1919
3 +44 (0) 20 7 743 4052
4 +44 (0) 20 7 743 3000
5 +1 212 810 5300
6 +44 (0) 20 7 591 6630
7 +61 2 9272 2200
8 +85 2 3903 2448
9 +1 415 670 6267
10 +44 (0) 20 7 743 3000
11 +1 212 810 5300
12 +1 919 743 2500
13 +1 919 743 2500
14 +1 919 743 2500
15 +1 919 743 2500

How ask R not to combine the X axis values for a bar chart?

I am a beginner with R . My data looks like this:
id count date
1 210 2009.01
2 400 2009.02
3 463 2009.03
4 465 2009.04
5 509 2009.05
6 861 2009.06
7 872 2009.07
8 886 2009.08
9 725 2009.09
10 687 2009.10
11 762 2009.11
12 748 2009.12
13 678 2010.01
14 699 2010.02
15 860 2010.03
16 708 2010.04
17 709 2010.05
18 770 2010.06
19 784 2010.07
20 694 2010.08
21 669 2010.09
22 689 2010.10
23 568 2010.11
24 584 2010.12
25 592 2011.01
26 548 2011.02
27 683 2011.03
28 675 2011.04
29 824 2011.05
30 637 2011.06
31 700 2011.07
32 724 2011.08
33 629 2011.09
34 446 2011.10
35 458 2011.11
36 421 2011.12
37 459 2012.01
38 256 2012.02
39 341 2012.03
40 284 2012.04
41 321 2012.05
42 404 2012.06
43 418 2012.07
44 520 2012.08
45 546 2012.09
46 548 2012.10
47 781 2012.11
48 704 2012.12
49 765 2013.01
50 571 2013.02
51 371 2013.03
I would like to make a bar graph like graph that shows how much what is the count for each date (dates in format of Month-Y, Jan-2009 for instance). I have two issues:
1- I cannot find a good format for a bar-char like graph like that
2- I want all of my data-points to be present in X axis(date), while R aggregates it to each year only (so I inly have four data-points there). Below is the current command that I am using:
plot(df$date,df$domain_count,col="red",type="h")
and my current plot is like this:
Ok, I see some issues in your original data. May I suggest the following:
Add the days in your date column
df$date=paste(df$date,'.01',sep='')
Convert the date column to be of date type:
df$date=as.Date(df$date,format='%Y.%m.%d')
Plot the data again:
plot(df$date,df$domain_count,col="red",type="h")
Also, may I add one more suggestion, have you used ggplot for ploting chart? I think you will find it much easier and resulting in better looking charts. Your example could be visualized like this:
library(ggplot2) #if you don't have the package, run install.packages('ggplot2')
ggplot(df,aes(date, count))+geom_bar(stat='identity')+labs(x="Date", y="Count")
First, you should transform your date column in a real date:
library(plyr) # for mutate
d <- mutate(d, month = as.numeric(gsub("[0-9]*\\.([0-9]*)", "\\1", as.character(date))),
year = as.numeric(gsub("([0-9]*)\\.[0-9]*", "\\1", as.character(date))),
Date = ISOdate(year, month, 1))
Then, you could use ggplot to create a decent barchart:
library(ggplot2)
ggplot(d, aes(x = Date, y = count)) + geom_bar(fill = "red", stat = "identity")
You can also use basic R to create a barchart, which is however less nice:
dd <- setNames(d$count, format(d$Date, "%m-%Y"))
barplot(dd)
The former plot shows you the "holes" in your data, i.e. month where there is no count, while for the latter it is even wuite difficult to see which bar corresponds to which month (this could however be tweaked I assume).
Hope that helps.

Resources