Dates Messed Up in R - r

I made a list of dates called newDat that looks like the following:
> newDat
[1] 4.2.20 4.3.20 4.4.20 4.5.20 4.6.20 4.7.20 4.8.20 4.9.20
[9] 4.10.20 4.11.20 4.12.20 4.13.20 4.14.20 4.15.20 4.16.20 4.17.20
[17] 4.18.20 4.19.20 4.20.20 4.21.20 4.22.20 4.23.20 4.24.20 4.25.20
[25] 4.26.20 4.27.20 4.28.20 4.29.20 4.30.20 5.1.20 5.2.20 5.3.20
[33] 5.4.20 5.5.20 5.6.20 5.7.20 5.8.20 5.9.20 5.10.20 5.11.20
[41] 5.12.20 5.13.20 5.14.20 5.15.20 5.16.20 5.17.20 5.18.20 5.19.20
...
I subsequently plot my data by using the following code
plot.ts(as.Date(newDat,"%m.%d.%y"), casesDifferenced, type = "l",
xlab = "Date")
But my x-axis dates are now showing up properly as shown in the image below.
What am I missing here?

Way to fix my problem:
plot(as.Date(newDate,"%m.%d.%y"), casesDifferenced, type = "l", xlab = "Date")

Related

Inserting Previous Dates in R Vector

I'm trying to insert the previous date for every date in a vector in R.
This is my current vector:
[1] "1990-02-08" "1990-03-28" "1990-05-16" "1990-07-05" "1990-07-13" "1990-08-22" "1990-10-03"
[8] "1990-10-29" "1990-11-14" "1990-12-07" "1990-12-18" "1991-01-08" "1991-02-01" "1991-02-07"
I'm trying to get the following:
[1] "1990-02-07" "1990-02-08" "1990-03-27" "1990-03-28" "1990-05-15" "1990-05-16" "1990-07-05"
ect.
I tried the following:
dates_lagged = as.Date(dates)-1
dates_combined = c(date, dates_lagged)
However, with this method, some dates are not getting lagged.
Is there a better way to do this?
Edit: to answer the comment, this is my code (replaced CSV with its starting values):
FOMC <- read_csv(file = c("x", "1990-02-08", "1990-03-28", "1990-05-16", "1990-07-05", "1990-07-13", "1990-08-22", "1990-10-03",
"1990-10-29", "1990-11-14", "1990-12-07"))
FOMC$x <- as.Date(FOMC$x, format = "%Y-%m-%d")
colnames(FOMC) <- "Date"
dates_vector <- FOMC[["Date"]]
FOMC = as.vector(as.Date(dates_vector))
dates_lagged = as.Date(FOMC)-1
dates_combined = c(FOMC, dates_lagged)
as.Date(dates_combined)
For some reason, there is no "1990-10-28" before "1990-10-29" for example, and I can't figure out why.
You could try:
as.Date(c(rbind(dates - 1, dates)), origin = "1970-01-01")
#> [1] "1990-02-07" "1990-02-08" "1990-03-27" "1990-03-28" "1990-05-15"
#> [6] "1990-05-16" "1990-07-04" "1990-07-05" "1990-07-12" "1990-07-13"
#> [11] "1990-08-21" "1990-08-22" "1990-10-02" "1990-10-03" "1990-10-28"
#> [16] "1990-10-29" "1990-11-13" "1990-11-14" "1990-12-06" "1990-12-07"
#> [21] "1990-12-17" "1990-12-18" "1991-01-07" "1991-01-08" "1991-01-31"
#> [26] "1991-02-01" "1991-02-06" "1991-02-07"
Data
dates <- c("1990-02-08", "1990-03-28", "1990-05-16", "1990-07-05", "1990-07-13",
"1990-08-22", "1990-10-03", "1990-10-29", "1990-11-14", "1990-12-07",
"1990-12-18", "1991-01-08", "1991-02-01", "1991-02-07")
dates <- as.Date(dates)
Created on 2021-11-04 by the reprex package (v2.0.0)

Transforming data to create generalized, quasi-proportional Venn diagrams using Package ‘nVennR’

I have the below dataset and would like your help to transform it in order to be able to plot a Venn Diagram using the Package ‘nVennR’ by Pérez-Silva et al. 2018.
Here's the dataset:
dput(data)
structure(list(Employee = c("A001", "A002", "A003", "A004", "A005",
"A006", "A007", "A008", "A009", "A010", "A011", "A012", "A013",
"A014", "A015", "A016", "A017", "A018"), SAS = c("Y", "N", "Y",
"Y", "Y", "Y", "N", "Y", "N", "N", "Y", "Y", "Y", "Y", "N", "N",
"N", "N"), Python = c("Y", "Y", "Y", "Y", "N", "N", "N", "N",
"N", "N", "Y", "Y", "N", "N", "N", "N", "Y", "Y"), R = c("Y",
"Y", "N", "Y", "N", "Y", "N", "N", "Y", "Y", "Y", "Y", "Y", "Y",
"Y", "Y", "N", "N")), .Names = c("Employee", "SAS", "Python",
"R"), row.names = c(NA, -18L), class = c("tbl_df", "tbl", "data.frame"
))
See below an example of the Venn diagram I would like to get:
Update:
After installing the updated version of nVennR and rsvg, when I run the example code from here I get the error and diagram below:
Warning message:
In checkValidSVG(doc, warn = warn) :
This picture was not generated by the 'grConvert' package, errors may result
Below is my session info:
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] nVennR_0.2.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 lattice_0.20-35 XML_3.98-1.10
[4] png_0.1-7 rsvg_1.1 grid_3.4.2
[7] plyr_1.8.4 gtable_0.2.0 scales_0.5.0.9000
[10] ggplot2_2.2.1.9000 pillar_1.2.1 rlang_0.2.0.9001
[13] grImport2_0.1-2 lazyeval_0.2.1 Matrix_1.2-12
[16] tools_3.4.2 munsell_0.4.3 jpeg_0.1-8
[19] compiler_3.4.2 base64enc_0.1-3 colorspace_1.3-2
[22] tibble_1.4.2
I would appreciate any ideas to address this issue.
Here is one way using the limma package in Bioconductor with your data loaded in from the dput as the variable z:
source("http://www.bioconductor.org/biocLite.R")
biocLite("limma")
library(limma)
Change all Y to TRUE and all N to FALSE:
z2 <- data.frame(lapply(z, function(x) { gsub("Y", "TRUE", x) }))
z3 <- data.frame(lapply(z2, function(x) { gsub("N", "FALSE", x) }),stringsAsFactors=FALSE)
Make sure they are all logical type:
z3$SAS <- as.logical(z3$SAS)
z3$Python <- as.logical(z3$Python)
z3$R <- as.logical(z3$R)
Now tally up all the totals for each Venn region using vennCounts:
> ( venn.totals <- vennCounts(z3[,-1]) )
SAS Python R Counts
1 0 0 0 1
2 0 0 1 4
3 0 1 0 2
4 0 1 1 1
5 1 0 0 2
6 1 0 1 3
7 1 1 0 1
8 1 1 1 4
attr(,"class")
[1] "VennCounts"
Producing the diagram is just one more step:
vennDiagram(venn.totals)
Just a quick note to let you know that the new version of nVennR is ready. Input and output control is different now, and toVenn is deprecated, to be replaced by plotVenn. There is a vignette with several examples, one of which uses the data in this question, here.
Nice to have feedback so fast. Perhaps we should have stated in the docummentation that this version of nVennR is preliminary. Some researchers had asked for a quick way to run nVenn, so I just wraped the C++ code into a couple of R functions. As you can see, the result is shown in the viewer window, instead of the plot window. I am learning as I go.
Since I see some interest on this package, I am compiling a list of features to add to the next version. Better input options are definitely in that list. Also, more control on the output (by the way, if colors are in the way, you can just set opacity to 0).
Regarding the question, #mysteRious is right, you send lists to the function. A quick way to do it would be
sas <- subset(data, SAS == "Y")$Employee
python <- subset(data, Python == "Y")$Employee
rr <- subset(data, R == "Y")$Employee
mySVG <- toVenn(sas, python, rr)
showSVG(mySVG = mySVG, opacity = 0.1)
The next version will have a method to enter names separately (sorry about that)
Regarding labels, the short answer is that you can edit them yourself with an SVG editor, such as Inkscape. If you have it installed, you can open the figure in the editor by running showSVG(mySVG = mySVG, opacity = 0.1, systemShow=T). You can also save the figure by providing an output file (outFile) or just open the temporary file that is generated.
The somewhat longer answer is that name1, name2,... can be replaced with the names of the lists. Unfortunately, due to my limitations in R, I did not realise that this might not be straightforward. It would be easier to load each variable as a table and set the colNames. For instance,
sas <- as.table(subset(data, SAS == "Y")$Employee)
names(sas) <- 'SAS'
That label will be used at the legend. Regarding the small labels, currently there is no way for a user to change them. Those are meant to help read the location of specific regions, and when those regions are small it does not seem feasible to use longer labels. My advice would be to always use an external editor to change them. The future version will have at least the ability to remove those labels, like in the Web version.

Change origin for time series in r

I have a time series in R that I would like to work with, spanning from 01-01-52 to 01-01-88. (1952 to 1988). 37 observations.
However, when I read it in in R, I encounter the problem that the observations from 01-01-52 to 01-01-68 are interpreted as being in 2052 etc., rather than 1952.
How do I force R to read in all the data as being from 1952 to 1988?
Link to my data: https://www.dropbox.com/s/93foyc238skt3xj/AgricIndus.csv?dl=0
This is the code I have used. Do you know what I need to do with my code to make it read properly?
agri <- read.table("AgricIndus.csv",
sep = ",", header = TRUE, skip = 0,
stringsAsFactors = FALSE)
agri$time <- as.Date(agri$time, "%m-%d-%y")
agri.xts <- xts(agri[, 2:3], order.by = agri$time)
One way (hack) can be the following:
agri$time <- as.Date(paste0(substring(agri$time,1,6), '19', substring(agri$time,7,8)),
"%m-%d-%Y")
agri$time
# [1] "01-01-52" "01-01-53" "01-01-54" "01-01-55" "01-01-56" "01-01-57" "01-01-58" "01-01-59" "01-01-60" "01-01-61" "01-01-62" "01-01-63" "01-01-64" "01-01-65"
# [15] "01-01-66" "01-01-67" "01-01-68" "01-01-69" "01-01-70" "01-01-71" "01-01-72" "01-01-73" "01-01-74" "01-01-75" "01-01-76" "01-01-77" "01-01-78" "01-01-79"
# [29] "01-01-80" "01-01-81" "01-01-82" "01-01-83" "01-01-84" "01-01-85" "01-01-86" "01-01-87" "01-01-88"
If you can be sure that your time series is regular then the it is probably the easiest to generate a regular date sequence like so:
agri$time <- seq.Date(as.Date("1952-01-01"),as.Date("1988-01-01"),by='years’)
Another easy solution that would work for irregular time series as well would be to read your data as years 52 to 88 with format = %m-%d-%Y (capitalized “Y” !) and add 1900 years:
df$time <- as.POSIXlt(as.Date(df$time,format = '%m-%d-%Y'))
df$time$year <-df$time$year + 1900
df$time <- as.Date(df$time)
df$time
[1] "1952-01-01" "1953-01-01" "1954-01-01" "1955-01-01"
[5] "1956-01-01" "1957-01-01" "1958-01-01" "1959-01-01"
[9] "1960-01-01" "1961-01-01" "1962-01-01" "1963-01-01"
[13] "1964-01-01" "1965-01-01" "1966-01-01" "1967-01-01"
[17] "1968-01-01" "1969-01-01" "1970-01-01" "1971-01-01"
[21] "1972-01-01" "1973-01-01" "1974-01-01" "1975-01-01"
[25] "1976-01-01" "1977-01-01" "1978-01-01" "1979-01-01"
[29] "1980-01-01" "1981-01-01" "1982-01-01" "1983-01-01"
[33] "1984-01-01" "1985-01-01" "1986-01-01" "1987-01-01"
[37] "1988-01-01"

Error with R dplyr left_join

So I've been trying to use left_join to get the columns of a new dataset onto my main dataset (called employee)
I've double checked the vector names and the cleaning that I've don't and nothing seems to work. Here is my code. Would appreciate any help.
job_codes <- read_csv("Quest_UMMS_JobCodes.csv")
job_codes <- job_codes %>%
clean_names() %>%
select(job_code, pos_desc = pos_des_desc)
job_codes$is_nurse <- str_detect(tolower(job_codes$pos_desc), "nurse")
employee <- employee %>%
left_join(job_codes, by = "job_code")
The error I keep getting:Error in eval(substitute(expr), envir, enclos) :
'job_code' column not found in rhs, cannot join
here are the results of
names(job_code)
> names(job_codes)
[1] "job_code" "pos_desc" "is_nurse"
names(employee)
> names(employee)
[1] "REC_NUM" "ZIP" "STATE"
[4] "SEX" "EEO_CLASS" "BIRTH_YEAR"
[7] "EMP_STATUS" "PROCESS_LEVEL" "DEPARTMENT"
[10] "JOB_CODE" "UNION_CODE" "SUPERVISOR"
[13] "DATE_HIRED" "R_SHIFT" "SALARY_CLASS"
[16] "EXEMPT_EMP" "PAY_RATE" "ADJ_HIRE_DATE"
[19] "ANNIVERS_DATE" "TERM_DATE" "NBR_FTE"
[22] "PENSION_PLAN" "PAY_GRADE" "SCHEDULE"
[25] "OT_PLAN_CODE" "DECEASED" "POSITION"
[28] "WORK_SCHED" "SUPERVISOR_IND" "FTE_TOTAL"
[31] "PRO_RATE_TOTAL" "PRO_RATE_A_SAL" "NEW_HIRE_DATE"
[34] "COUNTY" "FST_DAY_WORKED" "date_hired"
[37] "date_hired_adj" "term_date" "employment_duration"
[40] "current" "age" "emp_duration_years"
[43] "DESCRIPTION.x" "PAY_STATUS.x" "DESCRIPTION.y"
[46] "PAY_STATUS.y"
Now, after the OP has added the column names of both tables in the Q, it is evident that the columns to join on are written in different ways (upper vs lower case).
If the column names are different, help("left_join") suggests:
To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.
So, in this case it should read
employee <- employee %>% left_join(job_codes, by = c("JOB_CODE" = "job_code"))

Change scaling of data on the x-axis

I am having plot my data like that:
(dput(sale))
structure(c(-0.049668136, 0.023675638, -0.032249731, -0.071487224,
-0.034017265, -0.031278933, -0.052070721, -0.034305542, -0.019041209,
-0.050459175, -0.017315808, -0.012787003, -0.03341208, -0.045078144,
-0.036638132, -0.036533367, -0.012683656, -0.014388251, -0.006775188,
-0.037153807, -0.008941402, -0.011760677, -0.005077979, -0.041187417,
-0.001966554, -0.028822067, 0.021828558, 0.016208791, -0.026897492,
-0.032107207, -0.008496522, -0.028027096, -0.013746662, -0.004545603,
-0.005679941, -0.004614187, 0.004083014, -0.012624954, -0.016362079,
-0.006350167, -0.019551277), na.action = structure(42:45, class = "omit"))
[1] -0.049668136 0.023675638 -0.032249731 -0.071487224 -0.034017265
[6] -0.031278933 -0.052070721 -0.034305542 -0.019041209 -0.050459175
[11] -0.017315808 -0.012787003 -0.033412080 -0.045078144 -0.036638132
[16] -0.036533367 -0.012683656 -0.014388251 -0.006775188 -0.037153807
[21] -0.008941402 -0.011760677 -0.005077979 -0.041187417 -0.001966554
[26] -0.028822067 0.021828558 0.016208791 -0.026897492 -0.032107207
[31] -0.008496522 -0.028027096 -0.013746662 -0.004545603 -0.005679941
[36] -0.004614187 0.004083014 -0.012624954 -0.016362079 -0.006350167
[41] -0.019551277
attr(,"na.action")
[1] 42 43 44 45
attr(,"class")
[1] "omit"
(dput(purchase))
structure(c(0.042141187, 0.075875128, 0.090953485, 0.050951625,
0.082566915, 0.184396833, 0.136625887, 0.042725409, 0.135028692,
0.13201904, 0.093634104, 0.16776844, 0.13645719, 0.201365036,
0.227589832, 0.236473792, 0.269064385, 0.200981722, 0.144739536,
0.145256493, 0.040205545, 0.031577107, 0.014767345, 0.005843065,
0.034805051, 0.082493053, 0.010572227, 0.000645763, 0.033368236,
0.024326153, 0.038601182, 0.025446045, 0.000556418, 0.017201608,
0.008316872, 0.059722053, 0.059695415, 0.076940829, 0.067650014,
0.002029566, 0.008466334), na.action = structure(42:45, class = "omit"))
[1] 0.042141187 0.075875128 0.090953485 0.050951625 0.082566915 0.184396833
[7] 0.136625887 0.042725409 0.135028692 0.132019040 0.093634104 0.167768440
[13] 0.136457190 0.201365036 0.227589832 0.236473792 0.269064385 0.200981722
[19] 0.144739536 0.145256493 0.040205545 0.031577107 0.014767345 0.005843065
[25] 0.034805051 0.082493053 0.010572227 0.000645763 0.033368236 0.024326153
[31] 0.038601182 0.025446045 0.000556418 0.017201608 0.008316872 0.059722053
[37] 0.059695415 0.076940829 0.067650014 0.002029566 0.008466334
attr(,"na.action")
[1] 42 43 44 45
attr(,"class")
[1] "omit"
timeLine <- c(-20 , +20)
plot(sale,type="b", xlim=timeLine, ylim=c(-.1,.4) )
lines( purchase, type="b")
abline(v=0, col="black")
The plot I get looks like that:
Whats wrong with the plot is the scaling. My graphs should start at -20 and should got to +20 whereas each data point like -20, -19, -18, ..., +19, +20 is a point in the graph. In my exported csv sheet I have a row with these values. My question is, how to start from -20 so that every data point is an integer number to +20? Is is also possible to display every integer from -20 to +20?
I really appreciate your answer!
UPDATE
The scaling of the axis:
By, default the values are plotted against their index (starting at 1) when x is not specified in plot. You have to create a vector for the x axis.
timeLine <- c(-20 , 20)
# this command generates a sequence from -20 to 20
timeSeq <- Reduce(seq, timeLine)
# now, this sequence is passed to `x`
plot(sale, x = timeSeq, type = "b", xlim = timeLine, ylim = c(-.1, .4) )
lines(purchase, x = timeSeq, type = "b")
abline(v = 0, col = "black")
Update: how to show all x axis labels?
You can show all x axis labels if you decrease their size (cex.axis) and increase the width of the plot. Here's an example.
png("plot.png", width = 1000)
plot(sale,type="b", x = timeSeq, xlim=timeLine, ylim=c(-.1,.4),
xaxt = "n")
lines( purchase, type="b", x = timeSeq)
abline(v=0, col="black")
axis(side = 1, at = timeSeq, cex.axis = 0.75)
dev.off()

Resources