How to use third, fourth etc. decimal values in a calculator - r

I want to calculate the Body Mass Index for four persons and store the results as object XY and append it to a data frame. That works fine so far, but I only manage to do this with rounding the results
Body Mass Index <- round(as.numeric(data[, 4],
digits=2)/(as.numeric(data[, 3])/100)^2)
Body Mass Index
data <- as.data.frame(cbind(data, Body Mass Index))
data
How can I do the same but round the resulting values to third or fourth decimal place? I know it has something to do with round and obviously I changed it, but the other options do not work...

The argument digits in round defines how many digits will be shown. Becareful because certain objects ou function round the data just for the printing. Here is an example.
Libraries
library(tidyverse)
Example Data
df <-
#Random data
tibble(
height = rnorm(10,1.7,sd = .2),
mass = rnorm(10,70)
) %>%
mutate(
# Calculating index
bmi_index = mass/(height^2),
bmi_index_round4 = round(bmi_index,4),
bmi_index_round3 = round(bmi_index,3),
bmi_index_round2 = round(bmi_index,2)
)
Output
> df$bmi_index
[1] 21.69752 33.96611 19.23116 23.32622 20.93689 26.45366 23.50817 33.12299 27.41969 22.49677
> df$bmi_index_round4
[1] 21.6975 33.9661 19.2312 23.3262 20.9369 26.4537 23.5082 33.1230 27.4197 22.4968
> df$bmi_index_round3
[1] 21.698 33.966 19.231 23.326 20.937 26.454 23.508 33.123 27.420 22.497
> df$bmi_index_round2
[1] 21.70 33.97 19.23 23.33 20.94 26.45 23.51 33.12 27.42 22.50

Related

Recursive / Expanding Window forecasts

I am having a small issue with my Rstudio code. I will try to replicate my code but unfortunately there is no easy data for me to show. This is about the package forecast. What I am looking for is somehwat simpler for what is in the manual. But unfortunately, I am not able to work round it.
so the issue is with an expanding window forecast. So I have a dependent variable Y and 3 regressors (X). I am trying to build a recursive one steap ahead forecast for each X.
Here is my code.
library(forecast)
library(zoo)
library(timeDate)
library(xts)
## Load data
data = Dataset[,2:ncol(Dataset)]
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
data = xts(data, order.by=tt)
##########################################################################
RECFORECAST=function (Y,X,h,window){
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
datas= cbind(Y,X)
newfcast= matrix(0,nrow(datas),h)
for (k in 1:nrow(datas)){
sample =datas[1:(window+k-1),]
# print(sample)
v= window+k
# print(v)
# fit = Arima(sample[,1], order=c(0,0,0),xreg=sample[,2])
fit = lm(sample[,1]~sample[,2], data = sample)
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
print(fcast)
# print(fcast)
# newfcast[k+window+1,]=fcast
}
print(newfcast)
return(newfcast)
}
## Code to send the loop into forecasts
StoreMatrix = data$growth ## This is the first column data[,1]
for (i in 2:4)
{
try({
X=data[,i]
Y=data[,1]
RecModel=RECFORECAST(Y,X,h=1,window=60) ##Here the initial window is 60 obs
StoreMatrix=cbind(StoreMatrix,RecModel)
print(StoreMatrix)
}, silent=T)
}
The bits # were different ways I tried to crosscheck my data and they may not be useful. I have tried so many things but I don't seem to be able to get my head through it. At the end I want to have a matrix (StoreMatrix) with the first variable being the realization, and each of the columns with the corresponding 1 step ahead forecast.
The main lines where there seems to be an issue are these ones:
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
Note sure how to solve this. Thank you very much.

Convert DMS coordinates to decimal degrees in R

I have the following coordinates in DMS format. I need to convert them to decimal degrees.
# Libraries
> library(sp)
> library(magrittr)
# Latitide & Longitude as strings
> lat <- '21d11m24.32s'
> lng <- '104d38m26.88s'
I tried:
> lat_d <- char2dms(lat, chd='d', chm='m', chs='s') %>% as.numeric()
> lng_d <- char2dms(lng, chd='d', chm='m', chs='s') %>% as.numeric()
> print(c(lat_d, lng_d))
[1] 21.18333 104.63333
Although close, this result is different from the output I get from this website. According to this site, the correct output should be:
Latitude: 21.190089
Longitude: 104.6408
It seems that sp::char2dms and as.numeric are rounding the coordinates. I noticed this issue when converting a large batch of DMS coordinates using this method because the number of unique values decreases drastically after the conversion.
You are right! To tell you the truth, I didn't notice this problem.
To get around this, here is a solution with the use of the package measurements:
REPREX:
install.packages("measurements")
library(measurements)
lat <- conv_unit('21 11 24.32', from = "deg_min_sec", to = "dec_deg")
long <- conv_unit('104 38 26.88' , from = "deg_min_sec", to = "dec_deg")
print(c(lat, long))
#> [1] "21.1900888888889" "104.6408"
Created on 2021-10-07 by the reprex package (v2.0.1)
Edit from OP
This can also be solved by adding 'N' or 'S' to latitude and 'E' or 'W' to longitude.
# Add character to lat & long strings
> lat_d <- char2dms(paste0(lat,'N'), chd='d', chm='m', chs='s') %>% as.numeric()
> lng_d <- char2dms(paste0(lng,'W'), chd='d', chm='m', chs='s') %>% as.numeric()
> print(c(lat_d, lng_d))
[1] 21.19009 -104.64080

SMA using R & TTR Package

Afternoon! I'm just starting out with R and learning about data frames, packages, etc... read a lot of the messages here but couldn't find an answer.
I have a table I'm accessing with R that has the following fields:
[Symbol],[Date],[Open],[High],[Low],[Close],[Volume]
And, I'm calculating SMAs on the close prices:
sqlQuery <- "Select * from [dbo].[Stock_Data]"
conn <- odbcDriverConnect(connectionString)
dfSMA <- sqlQuery(conn, sqlQuery)
sma20 <- SMA(dfSMA$Close, n = 20)
dfSMA["SMA20"] <- sma20
When I look at the output, it appears to be calculating the SMA without any regard for what the symbol is. I haven't tried to replicate the calculation, but I would suspect it's just doing it by 20 moving rows, regardless of date/symbol.
How do I restrict the calculation to a given symbol?
Any help is appreciated - just need to be pointed in the right direction.
Thanks
You're far more likely to get answers if you provide reproducible examples. First, let's replicate your data:
library(quantmod)
symbols <- c("GS", "MS")
getSymbols(symbols)
# Create example data:
dGS <- data.frame("Symbol" = "GS", "Date" = index(GS), coredata(OHLCV(GS)))
names(dGS) <- str_replace(names(dGS), "GS\\.", "")
dMS <- data.frame("Symbol" = "MS", "Date" = index(MS), coredata(OHLCV(MS)))
names(dMS) <- str_replace(names(dMS), "MS\\.", "")
dfSMA <- rbind(dGS, dMS)
> head(dfSMA)
Symbol Date Open High Low Close Volume Adjusted
1 GS 2007-01-03 200.60 203.32 197.82 200.72 6494900 178.6391
2 GS 2007-01-04 200.22 200.67 198.07 198.85 6460200 176.9748
3 GS 2007-01-05 198.43 200.00 197.90 199.05 5892900 177.1528
4 GS 2007-01-08 199.05 203.95 198.10 203.73 7851000 181.3180
5 GS 2007-01-09 203.54 204.90 202.00 204.08 7147100 181.6295
6 GS 2007-01-10 203.40 208.44 201.50 208.11 8025700 185.2161
What you want to do is subset your long data object, and then apply technical indicators on each symbol in isolation. Here is one approach to guide you toward acheiving your desired result.
You could do this using a list, and build the indicators on xts data objects for each symbol, not on a data.frame like you do in your example (You can apply the TTR functions to columns in a data.frame but it is ugly -- work with xts objects is much more ideal). This is template for how you could do it. The final output l.data should be intuitive to work with. Keep each symbol in a separate "Container" (element of the list) rather than combining all the symbols in one data.frame which isn't easy to work with.
make_xts_from_long_df <- function(x) {
# Subset the symbol you desire
res <- dfSMA[dfSMA$Symbol == x, ]
#Create xts, then allow easy merge of technical indicators
x_res <- xts(OHLCV(res), order.by = res$Date)
merge(x_res, SMA(Cl(x_res), n = 20))
}
l.data <- setNames(lapply(symbols, make_xts_from_long_df), symbols)

How to write rownames into a spreadsheet with the googlesheets package in R?

I would like to write a data frame in a Google spreadsheet with the googlessheets package but the rownames isn't written in the first column.
My data frame looks like this :
> str(stats)
'data.frame': 4 obs. of 2 variables:
$ Offensive: num 194.7 87 62.3 10.6
$ Defensive: num 396.28 51.87 19.55 9.19
> stats
Offensive Defensive
Annualized Return 194.784261 396.278385
Annualized Standard Deviation 87.04125 51.872826
Worst Drawdown 22.26618 9.546208
Annualized Sharpe Ratio (Rf=0%) 1.61126 0.9193734
I load the library as recommanded in the documentation, create spreadsheet & worksheet then write the data with the gs_edit_cells command :
> install.packages("googlesheets")
> library("googlesheets")
> suppressPackageStartupMessages(library("dplyr"))
> mySpreadsheet <- gs_new("mySpreadsheet")
> mySpreadsheet <- mySpreadsheet %>% gs_ws_new("Stats")
> mySpreadsheet <- mySpreadsheet %>% gs_edit_cells(ws = "Stats", input = stats, trim = TRUE)
Everything goes well but googlesheets doesn't create a column with the rownames. Only two columns are created with their data (Offensive and Defensive).
I have try to convert the data frame into a matrix but still the same.
Any idea how I could achieve this ?
Thank you
Doesn't look like there is a row names argument for gs_edit_cells(). If you just want the row names to show up in the first column of the sheet you could try:
stats$Rnames = rownames(stats) ## add column equal to the row names
stats[,c("Rnames","Offensive", "Defensive")] ## re order so names are first
# names(stats) = c("","Offensive", "Defensive") optional if you want the names col to not have a "name"
From here just pass stats to the functions from the googlessheets package just like you did before

Scrape number of articles on a topic per year from NYT and WSJ?

I would like to create a data frame that scrapes the NYT and WSJ and has the number of articles on a given topic per year. That is:
NYT WSJ
2011 2 3
2012 10 7
I found this tutorial for the NYT but is not working for me :_(. When I get to line 30 I get this error:
> cts <- as.data.frame(table(dat))
Error in provideDimnames(x) :
length of 'dimnames' [1] not equal to array extent
Any help would be much appreciated.
Thanks!
PS: This is my code that is not working (A NYT api key is needed http://developer.nytimes.com/apps/register)
# Need to install from source http://www.omegahat.org/RJSONIO/RJSONIO_0.2-3.tar.gz
# then load:
library(RJSONIO)
### set parameters ###
api <- "API key goes here" ###### <<<API key goes here!!
q <- "MOOCs" # Query string, use + instead of space
records <- 500 # total number of records to return, note limitations above
# calculate parameter for offset
os <- 0:(records/10-1)
# read first set of data in
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[1], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F") # get them
res <- fromJSON(raw.data) # tokenize
dat <- unlist(res$results) # convert the dates to a vector
# read in the rest via loop
for (i in 2:length(os)) {
# concatenate URL for each offset
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[i], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F")
res <- fromJSON(raw.data)
dat <- append(dat, unlist(res$results)) # append
}
# aggregate counts for dates and coerce into a data frame
cts <- as.data.frame(table(dat))
# establish date range
dat.conv <- strptime(dat, format="%Y%m%d") # need to convert dat into POSIX format for this
daterange <- c(min(dat.conv), max(dat.conv))
dat.all <- seq(daterange[1], daterange[2], by="day") # all possible days
# compare dates from counts dataframe with the whole data range
# assign 0 where there is no count, otherwise take count
# (take out PSD at the end to make it comparable)
dat.all <- strptime(dat.all, format="%Y-%m-%d")
# cant' seem to be able to compare Posix objects with %in%, so coerce them to character for this:
freqs <- ifelse(as.character(dat.all) %in% as.character(strptime(cts$dat, format="%Y%m%d")), cts$Freq, 0)
plot (freqs, type="l", xaxt="n", main=paste("Search term(s):",q), ylab="# of articles", xlab="date")
axis(1, 1:length(freqs), dat.all)
lines(lowess(freqs, f=.2), col = 2)
UPDATE: the repo is now at https://github.com/rOpenGov/rtimes
There is a RNYTimes package created by Duncan Temple-Lang https://github.com/omegahat/RNYTimes - but it is outdated because the NYTimes API is on v2 now. I've been working on one for political endpoints only, but not relevant for you.
I'm rewiring RNYTimes right now...Install from github. You need to install devtools first to get install_github
install.packages("devtools")
library(devtools)
install_github("rOpenGov/RNYTimes")
Then try your search with that, e.g,
library(RNYTimes); library(plyr)
moocs <- searchArticles("MOOCs", key = "<yourkey>")
This gives you number of articles found
moocs$response$meta$hits
[1] 121
You could get word counts for each article by
as.numeric(sapply(moocs$response$docs, "[[", 'word_count'))
[1] 157 362 1316 312 2936 2973 355 1364 16 880

Resources