How to determine the zodiac symbol in R? - r

I did some statistics today using Excel and R. Part of the work requires to determine the zodiac symbol according to the date of birth. I did this in Excel:
Sample data:
DOB
25-Jan-1985
25-Jul-1983
28-Aug-1982
24-Feb-1984
13-Jan-1985
24-Jan-1982
15-Feb-1984
14-Oct-1983
08-Sep-1984
04-Mar-1983
04-Apr-1984
31-Mar-1985
04-Aug-1984
29-Jan-1984
20-Jul-1984
...
Here is the rule to determine zodiac symbols:
Dec. 22 - Jan. 19 - Capricorn
Jan. 20 - Feb. 17 - Aquarius
Feb. 18 - Mar. 19 - Pisces
March 20 - April 19 - Aries
April 20 - May 19 - Taurus
May 20 - June 20 - Gemini
June 21 - July 21 - Cancer
July 22 - Aug. 22 - Leo
Aug 23 - Sept. 21 - Virgo
Sept. 22 - Oct. 22 - Libran
Oct. 23 - Nov. 21 - Scorpio
Nov. 22 - Dec. 21 - Sagittarius
Excel formula:
=LOOKUP(--TEXT(A2,"mdd"),{101,"Capricorn";120,"Aquarius";219,"Pisces";321,"Aries";420,"Taurus";521,"Gemini";621,"Cancer";723,"Leo";823,"Virgo";923,"Libran";1023,"Scorpio";1122,"Sagittarius";1222,"Capricorn"})
I'm wondering how to do it in R? Also data.table is great, is it possible to figure it out in data.table?

Assuming your data is called dat.
First, convert it to a date format:
dat$DOB <- as.Date(dat$DOB, format = "%d-%b-%Y")
Then use the Zodiac function from DescTools, as mentioned by #Josh in the comment:
library(DescTools)
dat$Zodiac <- Zodiac(dat$DOB)
dat

Related

Reading a date, time text file and converting to string using strptime()?

I have a text file of many rows containing date and time and the end goal is for me to group together the number of rows per week that their date values are in. This is so that I can plot a scatter diagram with x values being the week number and y values being the frequency. For example the text file (dates.txt):
Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013
So from here, week 1 will have a frequency of 6 and week 2 will have a frequency of 1
As I want to plot a scatter diagram for this, I want to convert them to text value first using strptime() with format %a %b
my attempt so far has been
time_stamp <- strptime(time_stamp, format='%a.%b')
However it shows the input string is too long. I'm very new to R-studio so could somebody please help me figure this out?
Thank you
Example of final output graph : https://imgur.com/a/3o3DivA
You could use readLines() to avoid the data frame, then read time using strptime, and finally strftime to format the output.
strftime(strptime(readLines('dates.txt'), '%c'), '%a.%b')
# [1] "Sat.May" "Sat.May" "Mon.May" "Tue.May" "Thu.May" "Thu.May" "Fri.May" "Fri.May" "Sat.May"
Edit
So it appears that your dates have a time zone abbreviation "Mon Apr 06 23:49:29 PDT 2009". Since it is constant during the dates we can specify it literally in the pattern.
We will use '%d_%m' for strftime to get something numeric seperated by _ with which we feed strsplit and then type.convert into numerics.
Finally we unlist, create a matrix that we fill byrow, and plot the guy.
strptime(readLines('timestamp.txt'), '%a %b %d %H:%M:%S PDT %Y') |>
strftime('%d_%m') |>
strsplit('_') |>
type.convert(as.is=TRUE) |>
unlist() |>
matrix(ncol=2, byrow=TRUE) |>
plot(pch=20, col=4, main='My Plot', xlab='day', ylab='month')
Note: Please use R>=4.1 for the |> pipes.
You need to first read (or assign) the data, parse it to a date type and then use that to e.g. get the number of the week.
Here is one example
text <- "Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013"
data <- read.table(text=text, sep='\n', col.names="dates")
data$parse <- anytime::anytime(data$dates)
data$week <- as.integer(format(data$parse, "%V"))
data
The result is a new data.frame object:
> data
dates parse week
1 Mon May 11 22:51:27 2013 2013-05-11 22:51:27 19
2 Mon May 11 22:58:34 2013 2013-05-11 22:58:34 19
3 Wed May 13 23:15:27 2013 2013-05-13 23:15:27 20
4 Thu May 14 04:11:22 2013 2013-05-14 04:11:22 20
5 Sat May 16 19:46:55 2013 2013-05-16 19:46:55 20
6 Sat May 16 22:29:54 2013 2013-05-16 22:29:54 20
7 Sun May 17 02:08:45 2013 2013-05-17 02:08:45 20
8 Sun May 17 23:55:15 2013 2013-05-17 23:55:15 20
9 Mon May 18 00:42:07 2013 2013-05-18 00:42:07 20
>

I want To find the number of weeks with start date and end date of each for the month using moment.js

let currentDate = moment();
let weekStart = currentDate.clone().startOf('week');
let weekEnd = currentDate.clone().endOf('week');
I want to know the start date and end date of every week for a given month.
expected output
In August month
In Array of object
1. 1 Aug 2021 - 7 Aug 2021
2. 8 Aug 2021 - 14 Aug 2021
3. 15 Aug 2021 - 21 Aug 2021
4. 22 Aug 2021 - 28 Aug 2021
5. 29 Aug 2021 - 31 Aug 2021
moment().startOf('week');
moment().endOf('week');
Refrence:
https://www.itsolutionstuff.com/post/moment-js-get-current-week-start-and-end-date-exampleexample.html

Converting Month character to date for time series without "0" before Month

How do I convert this data set into a time series format in R? Lets call the data set Bob. This is what it looks like
1/2013 25
2/2013 865
3/2013 26
4/2013 33
5/2013 74
6/2013 24
Are you looking for something like this....?
> dat <- read.table(text = "1/2013 25
2/2013 865
3/2013 26
4/2013 33
5/2013 74
6/2013 24
", header=FALSE) # your data
> ts(dat$V2, start=c(2013, 1), frequency = 12) # time series object
Jan Feb Mar Apr May Jun
2013 25 865 26 33 74 24
Assuming that your starting point is the data frame DF defined reproducibly in the Note at the end this converts it to a zoo series z as well as a ts series tt.
library(zoo)
z <- read.zoo(DF, FUN = as.yearmon, format = "%m/%Y")
tt <- as.ts(z)
z
## Jan 2013 Feb 2013 Mar 2013 Apr 2013 May 2013 Jun 2013
## 25 865 26 33 74 24
tt
## Jan Feb Mar Apr May Jun
## 2013 25 865 26 33 74 24
Note
Lines <- "1/2013 25
2/2013 865
3/2013 26
4/2013 33
5/2013 74
6/2013 24"
DF <- read.table(text = Lines)

What is wrong with my R codes for transforming a wide data frame into the long format?

I am running the following R codes in Rstudio with the aim to convert a wide data frame (called 'merged') into a long one.
> merged
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2017 (A) 5980 5341 5890 5596 5753 5470 5589 5545 5749 5938 5844 5356
2017 (P) 5762 5275 5733 5411 5406 4954 5464 5536 5805 5819 5903 5630
I'm after the following output:
Description Month RN
2017 (A) Jan 5980
2017 (P) Jan 5762
2017 (A) Feb 5341
2017 (P) Feb 5275
... ... ...
I have tried the following (but with no success):
library(reshape2)
merged_long <- melt(data=merged,
id.vars="Description",
variable.name="Month",
value.name="RN")
I'm getting the following error message:
Error: id variables not found in data: Description
What am I doing wrong?
As noted by #Sotos in the comments, data in the rownames of the merged data set is required to uniquely identify an observation in the melted data set. To include the rownames in the melted data set, add the following to your code.
merged$Description <- rownames(merged)
Then your original code should produce the expected result.
library(reshape2)
merged_long <- melt(data=merged,
id.vars="Description",
variable.name="Month",
value.name="RN")
It's easiest to just use melt(as.matrix(...)) given the nature of your data. Omit the as.matrix part if your data is already a matrix, obviously.
melt(as.matrix(mydf))
You can use setNames to rename the columns at the same time:
setNames(melt(as.matrix(mydf)), c("Description", "Month", "RN"))
# Description Month RN
# 1 2017 (A) Jan 5980
# 2 2017 (P) Jan 5762
# 3 2017 (A) Feb 5341
# .........................
# .........................
# 23 2017 (A) Dec 5356
# 24 2017 (P) Dec 5630

No of monthly days between two dates

diff(seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by="month"))
Time differences in days
[1] 31 31 28
The above code generates no of days in the month Dec, Jan and Feb.
However, my requirement is as follows
#Results that I need
#monthly days from date 2016-12-21 to 2017-04-05
11, 31, 28, 31, 5
#i.e 11 days of Dec, 31 of Jan, 28 of Feb, 31 of Mar and 5 days of Apr.
I even tried days_in_month from lubridate but not able to achieve the result
library(lubridate)
days_in_month(c(as.Date("2016-12-21"), as.Date("2017-04-05")))
Dec Apr
31 30
Try this:
x = rle(format(seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by=1), '%b'))
> setNames(x$lengths, x$values)
# Dec Jan Feb Mar Apr
# 11 31 28 31 5
Although we have seen a clever replacement of table by rle and a pure table solution, I want to add two approaches using grouping. All approaches have in common that they create a sequence of days between the two given dates and aggregate by month but in different ways.
aggregate()
This one uses base R:
# create sequence of days
days <- seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by = 1)
# aggregate by month
aggregate(days, list(month = format(days, "%b")), length)
# month x
#1 Apr 5
#2 Dez 11
#3 Feb 28
#4 Jan 31
#5 Mrz 31
Unfortunately, the months are ordered alphabetically as it happened with the simple table() approach. In these situations, I do prefer the ISO8601 way of unambiguously naming the months:
aggregate(days, list(month = format(days, "%Y-%m")), length)
# month x
#1 2016-12 11
#2 2017-01 31
#3 2017-02 28
#4 2017-03 31
#5 2017-04 5
data.table
Now that I've got used to the data.table syntax, this is my preferred approach:
library(data.table)
data.table(days)[, .N, .(month = format(days, "%b"))]
# month N
#1: Dez 11
#2: Jan 31
#3: Feb 28
#4: Mrz 31
#5: Apr 5
The order of months is kept as they have appeared in the input vector.

Resources