Dynamically specify column name in spread() - r

I am attempting to automate a simple process of importing some data and using the spread function from the tidyr package to make it wide format data.
Below is a simplified example
Ticker <- c(rep("GOOG",5), rep("AAPL",5))
Prices <- rnorm(10, 95, 5)
Date <- rep(sapply(c("2015-01-01", "2015-01-02", "2015-01-03", "2015-01-04", "2015-01-05"),as.Date), 2)
exStockData <- data.frame(Ticker, Date, Prices)
After reading in a data frame like exStockData, I'd like to be able to create a data frame like the one below
library(tidyr)
#this is the data frame I'd like to be able to create
desiredDataFrame <- spread(exStockData, Ticker, Prices)
However, the column used for the key argument of the spread function will not always be called Ticker and the column used for the value argument of the function will not always be called Prices. The column names are read in from a different portion of the file that gets imported.
#these vectors are removed because the way my text file is read in
#I don't actually have these vectors
rm(Ticker, Prices, Date)
#the name of the first column (which serves as the key in
#the spread function) of the exStockData data frame will
#vary, and is read in from the file and stored as a one
#element character vector
secID <- "Ticker"
#the name of the last column in the data frame
#(which serves as the value in the spread function)
#is stored also stored as a one element character vector
fields <- "Prices"
#I'd like to be able to dynamically specify the column
#names using these other character vectors
givesAnError <- spread(exStockData, get(secID), get(fields))

The "See also" section of the documentation for the spread function mentions the spread_ function which is intended to be used in this situation.
In this case the solution is to use:
solved <- spread_(exstockData, secID, fields)

Related

Data extraction in R - multiple columns

Hello, I have this type of table consisting of a single row and several columns. I have tried a code to extract my KD_PL parameters without success. Do you know a way in R to extract all the KD_PLs and store them in a vector or data frame array?
I tried this:
KDPL <- select("KD_PL.", which(substr(colnames(max_LnData), start=1, stop=6)))
This should do the trick:
library(tidyverse)
KDPL <- max_LnData %>% select(starts_with("KD_PL."))
This function selects all columns from your old dataset starting with "KD_PL." and stores them in a new dataframe KDPL.
If you only want the names of the columns to be saved, you could use the following:
KDPL_names <- colnames(KDPL)
This saves the column names in the vector KDPL_names.

clean way to download multiple time series from Bloomberg in R

i am trying to download some time series data about euro swaps (EUSA10 Currency for example) in R using the blpapi but i am encountering the following problems:
if i try to download for example 2y, 5y, 10y and 30y swap rates using the include.non.trading.days=FALSE option , the resulting time series are for some reason of different length and i receive a message error about it. If, on the other hand i set the non trading day option to true i have similar length time series that can then be cleaned up using the na.omit() function
the format in which the data is downloaded is messy...i would like to have a data frame in which the first column is the date, second column is the first security, third column is second security and so forth. Instead what i get is [date][security][date][security2]......[date][securityN]. Any suggestions on how to solve this?
Below a quick few lines i wrote as an example
# Load package
library(Rblpapi)
# Connect to Bloomberg
blpConnect()
# Declaring securities
sec<-c("eusa2 curncy", "eusa5 curncy", "eusa10 curncy")
# Declaring field to be dowloaded
flds<-"PX_LAST"
data<-as.data.frame(bdh(sec,flds,start.date=as.Date("2019-08-18"),end.date=as.Date("2020-08-18"), include.non.trading.days=TRUE"))
It's states in the Rblapi manual that the Rblapi::bdh returns
A list with as a many entries as there are entries in securities; each list contains a data.frame with
one row per observations and as many columns as entries in fields. If the list is of length one, it
is collapsed into a single data frame. Note that the order of securities returned is determined by the
backend and may be different from the order of securities in the securities field.
So I'd suggest you rbind the data then reshape it in order to have the result you want. a fast way to do it is use the data.table::rbindlist function it takes a list as input and returns a data.table containing all entries and if idcol=TRUE then it'll append a .id column showing where the data.frame came from. Also this method will work even if you have different number of rows in the data.frames resulting from the Rblapi::bdh call.
# Declaring field to be dowloaded
flds<-"PX_LAST"
# LOADING THE DATA FROM THE API
l <- bdh(sec,flds,start.date=as.Date("2019-08-18"),end.date=as.Date("2020-08-18"), include.non.trading.days=TRUE)
# the names of the securities columns as returned by the api
securities <- paste0("eusa", c(2,5,10,15,30), ".curncy.",flds)
# row binding the resulting list
dt <- data.table::rbindlist(l, idcol=T, use.names=FALSE)
# idcol=T appends an id column (.id) to the resulting data.table
# use.names=F because the columns of the data.frames are different
# remaking the .id column so it reflects the name of the column that it already had
dt[, .id:= securities[.id] ]
# making a wider data.table
data.table::dcast(dt, eusa2.curncy.date ~ .id, value.var=securities[1])
# eusa2.curncy.date is the column that defines a group of observation
# .id the name of the columns
# securities[1] or eusa2.curncy.PX_LAST is the column that contains the values
data used
As I don't have access to a bloomberg api endpoint I created this mock data which resemble the output of dbh
col.names <- paste0("eusa", rep(c(2,5,10,15,30),each=2), ".curncy.", rep(c(flds,"date"), 5))
l<-rep(list(data.frame(rnorm(200), 1:200)), 5)
for (i in 1:length(l)) colnames(l[[i]]) <- col.names[(2*i-1):(2*i)]

How to create a matrix/data frame from a high number of single objects by using a loop?

I have a high number of single objects each one containing a mean value for a year. They are called cddmean1950, cddmean1951, ... ,cddmean2019.
Now I would like to put them together into a matrix or data frame with the first column being the year (1950 - 2019) and the second column being the single mean values.
This is a very long way to do it without looping:
matrix <- rbind(cddmean1950,cddmean1951,cddmean1952,...,cddmean2019)
Afterwards you transform the matrix to a data frame, create a vector with the years and add it to the data frame.
I am sure there must be a smarter and faster way to do this by using a loop or anything else?
I think this could be an easy way to do it. Provided all those single objects are in your current environment.
First we would create a list of the variable names using the paste0 function
YearRange <- 1950:2019
ObjectName <- paste0("cddmean", YearRange)
Then we can use lapply and get to get the values of all these variables as a list.
Then using do.call and rbind we can rbind all these values into a single vector and then finally create your dataframe as you requested.
ListofSingleObjects <- lapply(ObjectName, get)
MeanValues <- do.call(rbind,ListofSingleObjects)
df <- data.frame( year = YearRange , Mean = MeanValues )

How can I create a simple dataframe from nested, JSON format API content

Using a JSON format file pulled from the SeatGeek API, I'd like to convert the data into a data frame. I've managed to create a frame with all variables + data using the function below:
library(httr)
library(jsonlite)
vpg <- GET("https://api.seatgeek.com/2/venues?country=US&per_page=5000&page=1&client_id=NTM2MzE3fDE1NzM4NTExMTAuNzU&client_secret=77264dfa5a0bc99095279fa7b01c223ff994437433c214c8b9a08e6de10fddd6")
vpgc <- content(vpg)
vpgcv <- (vpgc$venues)
json_file <- sapply(vpgcv, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
as.data.frame(t(x))
})
From this point, I can create a data frame using:
venues.dataframe <- as.data.frame(t(json_file), flatten = TRUE)
But my resulting data is a data frame with the correct number of 23 variables and 5000 rows, but each entry is a list rather than just a value. How can I pull the value out of each list?
I've also attempted to pull the values out using data tables in the following code:
library(data.table)
data.table::rbindlist(json_file, fill= TRUE)
But the output data frame flows almost diagonally, placing 1 stored variable + 22 NULL values per row. While all the data exists here, Rows 1-23 (and 24-46, and so on) should be a single row.
Of these two dead ends, which is the easiest/cleanest solution to produce my desired data frame output of [5000 observations, in simple value form of 23 variables]?
Your url is connecting directly to the JSON file, no need for the GET function. The jsonlite library can handle the download directly.
library(jsonlite)
output<-fromJSON("https://api.seatgeek.com/2/venues?country=US&per_page=5000&page=1&client_id=NTM2MzE3fDE1NzM4NTExMTAuNzU&client_secret=77264dfa5a0bc99095279fa7b01c223ff994437433c214c8b9a08e6de10fddd6")
df<-output$venues
flatdf<-flatten(df)
#remove first column of empty lists
flatdf<-flatdf[,-1]
The variable "output" is a list of dataframes from the JSON object. One can reference using the "$" to retrieve the part of interest.
df does have some imbedded data frames, to flatten, use the flatten function from jsonlite package.

integer function converting row names in to numbers

enter image description here
I used to this
mydata3 <- data.frame(sapply(mydata2, as.integer))
But now I see that row names which is gene names, has been converted to number like 1-200). But I should point that same command I used sometime ago when it was working well. So I thought there are some problems with my file then i used old file on which this command was working but i am seeing same problem like gene name is converted in to number here is full script:
countsTable<-read.table("JW.txt",header=TRUE,stringsAsFactors=TRUE,row.names=1)
mydata2 <- countsTable/1000
mydata3 <- data.frame(sapply(mydata2, as.integer))
str(mydata3)
Please let me know.
sapply works over columns of your data.frame mydata2, and returns respective output per column. as such, it does not return the row-names of your data.frame, so you either have to re-assign those, or re-assign the new column data into your original data.frame, like:
mydata2[] <- sapply(mydata2, as.integer)
Thus you can keep all of the original attributes.

Resources