Related
FIRST QUESTION EVER ;)
Here's the point: I have this dataset and I started without "stringsAsFactors=FALSE" in read.csv function. I can't work with those data because I got the Warning message: NAs introduced by coercion. Thank you for the help :)
rm(list=ls())
path <- "....."
file <- read.csv(path, header = TRUE, sep = ",", stringsAsFactors=FALSE)
str(file)
#'data.frame': 33 obs. of 11 variables:
#$ Var1: chr "01/09/2021" "02/09/2021" "09/09/2021" "10/09/2021" ...
#$ Var2: chr "mercoledì" "giovedì" "giovedì" "venerdì" ...
#$ Var3: chr "2,5" "2,5" "2,5" "3,0" ...
#$ Var4: chr "4,0" "0,0" "2,0" "3,0" ...
#$ Var5: chr "2,0" "5,0" "5,0" "5,0" ...
#$ Var5: chr "0,0" "0,0" "0,0" "0,0" ...
#$ Var6: chr "6,0" "5,0" "7,0" "8,0" ...
#$ Var7: chr "23,5" "25,0" "28,0" "32,0" ...
#$ Var8: chr "0,0" "1,0" "5,0" "5,5" ...
#$ Var9: chr "23,5" "26,0" "33,0" "37,5" ...
#$ Var10: chr "67,0" "0,0" "0,0" "0,0" ...
as.numeric(file$Var7)
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion
CSV FILE
I managed to recreate your problem. Your file is using , both as field separator and decimal separator (which is uncommon).
You can fix your problem by specifying that decimals are commas in (dec = ",") in read.csv(), as follows:
read.csv(
path,
header = TRUE,
sep = ",",
dec = ",", # I've added this line
stringsAsFactors = FALSE
)
Change this, run str(file) again, and you should see that most columns are numeric.
I have an excel file with 77 columns (with 43 NA columns) of different length, 12 of which are Date. Ideally, I want to import it in R the dataset with the columns that refer to Date in date format, while the other columns in numeric format. There is lot of material in stackoverflow and I tried all the options but it is not working.
The first option would be to do it directly from excel:
dataset <- read_xlsx("Data.xlsx", col_types = "numeric") #it gives everything numeric but column date always in this format "36164"
#I also tried something like this:
dataset <- read_xlsx("Data.xlsx", col_types = c("date", rep("numeric", n))) #where "n" stands for all the columns with numbers I have but it did not work
I can import the data with the incorret date columns. After some cleaning (removing NA columns) I get a tbl with different column length. I tried the following codes to transform the incorrect column dates into date format:
dataset <- janitor::remove_empty(dataset, which = "cols") #remove NA columns
dataset <- dataset[-c(1),] #remove the first row of all columns
# Now using this command I could transform each incorrect date column into a date format:
date <- as.Date(as.numeric(dataset$column1), origin = "1899-12-30")
# I would like to do it for all the date columns in one shot but when I try to do it in this way
as.Date(as.numeric(dataset[,c(1,3,5,7,14,16,18,20,21,23,25,32)]), origin = "1899-12-30")
# I get an error, probably because the columns have different length
# the error is: Error in as.Date(as.numeric(var_dataset[, c(1, 3, 5, 7, 14, 16, 18, 20, :
'list' object cannot be coerced to type 'double'
# unlisting the object doesn't solve the problem
I am aware it is missing data to reproduce my problem but in the first scenario I don't know how to approximate my quite big excel file while in the second case I don't know how to create a tbl with many columns of different length without wasting lot of time. Sorry.
Do you have any solution? Either for importing directly from Excel or playing with the dataframe
Thanks so much
I attach here the structure of my dataset:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5500 obs. of 77 variables:
$ Name...1 : chr "Code" "36164" "36165" "36166" ...
$ VSTOXX VOLATILITY INDEX - PRICE INDEX : chr "VSTOXXI(PI)" "18.2" "29.69" "25.17" ...
$ ...3 : logi NA NA NA NA NA NA ...
$ ...4 : logi NA NA NA NA NA NA ...
$ ...5 : logi NA NA NA NA NA NA ...
$ ...6 : logi NA NA NA NA NA NA ...
$ Name...7 : chr "Code" "36799" "36830" "36860" ...
$ EM COMPOSITE INDICATOR OF SOVEREIGN STRESS: GDP WEIGHTS NADJ : chr "EMEBSCGWR" "7.8255999999999992E-2" "8.9886999999999995E-2" "8.0714999999999995E-2" ...
$ ...9 : logi NA NA NA NA NA NA ...
$ Name...10 : chr "Code" "36168" "36175" "36182" ...
$ CISS BOND MKT: GOV & NFC VOLATILITY - ECONOMIC SERIES : chr "EMCIBMG" "4.4651999999999997E-2" "6.6535999999999998E-2" "4.9789E-2" ...
$ ...12 : logi NA NA NA NA NA NA ...
$ Name...13 : chr "Code" "36168" "36175" "36182" ...
$ CISS MONEY MKT: 3M RATE+ VOLATILITY - ECONOMIC SERIES : chr "EMECM3E" "5.7435999999999994E-2" "7.463199999999999E-2" "7.2263999999999995E-2" ...
$ CISS FX MKT: EUR VOLATILITY - ECONOMIC SERIES : chr "EMECFEM" "7.2139999999999996E-2" "8.6049E-2" "4.5948999999999997E-2" ...
$ CISS FIN INTERM: BANK+ VOLATILITY - ECONOMIC SERIES : chr "EMCIFIN" "4.5384999999999995E-2" "0.11820399999999999" "0.11516499999999999" ...
$ CISS NF EQUITY: VOLATILITY - ECONOMIC SERIES : chr "EMCIEMN" "7.7453999999999995E-2" "0.12733" "0.11918899999999999" ...
$ CISS: CROSS SUBINDEXCORRELATION - ECONOMIC SERIES : chr "EMCICRO" "-0.21210999999999999" "-0.29791000000000001" "-0.2369" ...
$ SYSTEMIC STRESS COMPINDICATOR - ECONOMIC SERIES : chr "EMCISSI" "8.4954000000000002E-2" "0.174844" "0.16546" ...
$ ...20 : logi NA NA NA NA NA NA ...
$ ...21 : logi NA NA NA NA NA NA ...
$ ...22 : logi NA NA NA NA NA NA ...
$ ...23 : logi NA NA NA NA NA NA ...
$ ...24 : logi NA NA NA NA NA NA ...
$ ...25 : logi NA NA NA NA NA NA ...
$ Name...26 : chr "Code" "33253" "33284" "33312" ...
$ Z8 IPI: MFG., VOLUME INDEX OF PRODUCTION, 2015=100 (WDA) VOLA: chr "Z8ES493KG" "81" "79.7" "79.400000000000006" ...
$ ...28 : logi NA NA NA NA NA NA ...
$ ...29 : logi NA NA NA NA NA NA ...
$ ...30 : logi NA NA NA NA NA NA ...
$ ...31 : logi NA NA NA NA NA NA ...
$ ...32 : logi NA NA NA NA NA NA ...
$ ...33 : logi NA NA NA NA NA NA ...
$ ...34 : logi NA NA NA NA NA NA ...
$ Name...35 : chr "Code" "35779" "35810" "35841" ...
$ EH HICP: ALL-ITEMS NADJ : chr "EHES795WR" "1.7" "1.6" "1.6" ...
$ ...37 : logi NA NA NA NA NA NA ...
$ ...38 : logi NA NA NA NA NA NA ...
$ Name...39 : chr "Code" "35110" "35139" "35170" ...
$ EH HICP: ALL-ITEMS (%MOM) NADJ : chr "EHESPQ93R" "0.4" "0.4" "0.3" ...
$ ...41 : logi NA NA NA NA NA NA ...
$ ...42 : logi NA NA NA NA NA NA ...
$ ...43 : logi NA NA NA NA NA NA ...
$ Name...44 : chr "Code" "35445" "35476" "35504" ...
$ EH HICP: ALL-ITEMS HICP (%YOY) NADJ : chr "EHESAKZER" "2.2000000000000002" "2" "1.7" ...
$ ...46 : logi NA NA NA NA NA NA ...
$ ...47 : logi NA NA NA NA NA NA ...
$ ...48 : logi NA NA NA NA NA NA ...
$ ...49 : logi NA NA NA NA NA NA ...
$ Name...50 : chr "Code" "36206" "36234" "36265" ...
$ EM EUROSYSTEM: BASE MONEY CURN : chr "EMEBSMYBA" "426.64374199999997" "430.51499999999999" "432.34064499999999" ...
$ ...52 : logi NA NA NA NA NA NA ...
$ ...53 : logi NA NA NA NA NA NA ...
$ ...54 : logi NA NA NA NA NA NA ...
$ ...55 : logi NA NA NA NA NA NA ...
$ Name...56 : chr "Code" "35703" "35734" "35762" ...
$ EM EUROSYSTEM: TOTAL ASSETS/LIABILITIES (EP) CURN : chr "EMECBSALA" "710257.53500000003" "711193.47100000002" "714957.58900000004" ...
$ ...58 : logi NA NA NA NA NA NA ...
$ ...59 : logi NA NA NA NA NA NA ...
$ ...60 : logi NA NA NA NA NA NA ...
$ ...61 : logi NA NA NA NA NA NA ...
$ ...62 : logi NA NA NA NA NA NA ...
$ ...63 : logi NA NA NA NA NA NA ...
$ Name...64 : chr "Code" "41548" "41579" "41609" ...
$ TR EU FWD INFL-LKD SWAP 10YF20Y - MIDDLE RATE : chr "TREFSTT" NA NA NA ...
$ TR EU FWD INFL-LKD SWAP 10YF10Y - MIDDLE RATE : chr "TREFS1T" NA NA NA ...
$ TR EU FWD INFL-LKD SWAP 2YF2Y - MIDDLE RATE : chr "TREFS22" "1.5158" "1.4669000000000001" "1.4715" ...
$ TR EU FWD INFL-LKD SWAP 1YF1Y - MIDDLE RATE : chr "TREFS11" "1.4509000000000001" "1.2338" "1.1225000000000001" ...
$ TR EU FWD INFL-LKD SWAP 2YF3Y - MIDDLE RATE : chr "TREFS23" "1.5906000000000002" "1.5453000000000001" "1.5283000000000002" ...
$ TR EU FWD INFL-LKD SWAP 5YF10Y - MIDDLE RATE : chr "TREFS5T" "2.3516000000000004" "2.3323" "2.3070000000000004" ...
$ ...71 : logi NA NA NA NA NA NA ...
$ ...72 : logi NA NA NA NA NA NA ...
$ ...73 : logi NA NA NA NA NA NA ...
$ ...74 : logi NA NA NA NA NA NA ...
$ ...75 : logi NA NA NA NA NA NA ...
$ Name...76 : chr "Code" "41255" "41286" "41317" ...
$ TR EU FWD INFL-LKD SWAP 5YF5Y - MIDDLE RATE : chr "TREFS55" "2.2027000000000001" "2.2637" "2.383" ...
You have to specify the col_types correctly in the read_excel (or read_xlsx) command. For example:
dataset <- read_xlsx("Data.xlsx",
col_types=c("numeric","date","numeric","date","numeric", "date", ...))
Edit: Finally after much interrogation, the problem is that your data starts in row 3, not 2. So skip the first row (skip=1) and try again.
dataset <- read_xlsx("Data.xlsx", skip=1)
edit: While this will most likely solve the error you're getting, I agree with Edward's advice to use readxl::read_excel which should preserve the dates.
The problem with
as.Date(as.numeric(dataset[,c(1,3,5,7,14,16,18,20,21,23,25,32)]), origin = "1899-12-30")
is that you apply as.numeric on a tibble which internally is a list. Instead do
dplyr::mutate_at(
dataset,
c(1,3,5,7,14,16,18,20,21,23,25,32),
dplyr::funs(as.numeric, as.Date),
origin = "1899-12-30",
format = "%Y-%m-%d"
)
You say the columns have a different length but that's not possible in R's table-like structures (tibble, data.frame, data.table).
Lesson: Always be aware what datatype you're working with doing e.g. str(dataset). as.numeric does not work on tables but needs to be applied to specific columns, using e.g. mutate.
I have a large dataframe consisting of five columns.
When I try to filter on one of the columns another column every row in another column is changed to NA. The column I'm filtering on is VehicleEvent, Location is the column receiving the NA substitution.
str(datain)
'data.frame': 7551105 obs. of 19 variables:
$ DiagnosticIDs : chr "2,0,3,1,774,775,810,744,951,947" "2,0,3,1,774,775,7,718,720,951,837,810,744,947" "2,0,3,1,774,775,7,810,744,951,947" NA ...
$ DiagnosticValues: chr "28.211,48284.435,31647,7650.75,0,0,0,1,1,-73" "28.272,48290.34,31650,7651.2,0,0,550,0,0,54,0,0,1,-81" "28.272,48290.34,31650,7651.2,0,0,550,0,1,1,-81" NA ...
$ DriverName : chr "" "" "" NA ...
$ IgnitionOn : chr "true" "true" "true" NA ...
$ Latitude : num 51.5 51.5 51.5 51.5 51.5 ...
$ Longitude : num -0.462 -0.462 -0.463 -0.463 -0.463 ...
$ Location : chr "" "Parking area" "Dispatch" NA ...
$ Time : num 1.52e+09 1.52e+09 1.52e+09 1.52e+09 1.52e+09 ...
some columns not of interest omitted
$ AlertId : chr NA NA NA "6fbc400e-1ae5-11e8-9eee-7845c4f0a3d7" ...
$ AlertType : chr NA NA NA "Exited" ...
$ VehicleEvent : chr NA NA NA "fabb4fcb-c254-4a13-8f9c-a3307a4ba63b" ...
$ MessageType : chr NA NA NA "InsightAlertMessage" ...
str(datadf)
'data.frame': 104136 obs. of 6 variables:
$ Location : chr NA NA NA NA ...
$ Longitude : num -0.483 -0.462 -0.466 -0.464 -0.464 ...
$ Latitude : num 51.5 51.5 51.5 51.5 51.5 ...
$ AlertId : chr "ae22e47c-47c4-11e8-9513-7845c4f0a3d7" "3e13ccbc-47c6-11e8-a72e-7845c4f0a3d7" "5428da40-47c8-11e8-b59f-7845c4f0a3d7" "2fcd3fa8-47df-11e8-85a9-7845c4f0a3d7" ...
$ AlertType : chr "Exited" "Exited" "Exited" "Exited" ...
$ VehicleEvent: chr "792d6964-6ba1-4f98-9b63-5c9e194fff6d" "792d6964-6ba1-4f98-9b63-5c9e194fff6d" "792d6964-6ba1-4f98-9b63-5c9e194fff6d" "792d6964-6ba1-4f98-9b63-5c9e194fff6d" ...
There are no non-ACSII characters in the data (it's all extracted from XML if that means anything). All commas, trailing spaces, full-stop(period) and slashes have been removed from Location in case they has caused this.
The columns have been renamed (just in case there was something else going on using the same names).
I have tried pretty much everything I can think of including ...
datadf <- datain %>%
filter(AlertType == "Exited" &
VehicleEvent == "792d6964-6ba1-4f98-9b63-5c9e194fff6d") %>%
select(Location, Latitude, Longitude)
datadf <- datain[datain$VehicleEvent == "792d6964-6ba1-4f98-9b63-5c9e194fff6d",]
That last one changes all columns to 'NA'.
Is the data in VehicleEvent so strange that it can't be handled...surely not. I have run out of ideas hence my request to the wider community.
I have some data that is formatted in a way that's difficult to use, so I'm trying to flatten it out. The minimum reproducible example is here.
> str(sampleData)
List of 4
$ Events :'data.frame': 2 obs. of 3 variables:
..$ CateringOptions:List of 2
.. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. ..$ Agreed : logi TRUE
.. .. ..$ Tnc :'data.frame': 1 obs. of 5 variables:
.. .. .. ..$ Identity : chr "SpicyOWing"
.. .. .. ..$ Schema : logi NA
.. .. .. ..$ ElementId : chr "105031"
.. .. .. ..$ ElementType : logi NA
.. .. .. ..$ ElementVersion: logi NA
.. .. ..$ Address: chr "New York"
.. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. ..$ Agreed : logi TRUE
.. .. ..$ Tnc :'data.frame': 1 obs. of 5 variables:
.. .. .. ..$ Identity : chr "BaconEggs"
.. .. .. ..$ Schema : logi NA
.. .. .. ..$ ElementId : chr "105032"
.. .. .. ..$ ElementType : logi NA
.. .. .. ..$ ElementVersion: logi NA
.. .. ..$ Address: chr "Seattle"
..$ Action : num [1:2] 1 1
..$ Volume : num [1:2] 1000 2000
$ Host :List of 5
..$ Identity : chr "John"
..$ Schema : logi NA
..$ ElementId : chr "101505"
..$ ElementType : logi NA
..$ ElementVersion: logi NA
$ Sender :List of 5
..$ Identity : chr "Jane"
..$ Schema : logi NA
..$ ElementId : chr "101005"
..$ ElementType : logi NA
..$ ElementVersion: logi NA
$ CompletedDate: chr "/Date(1490112000000)/"
Expected
> expectedOutcome
Events.CateringOptions.Agreed Events.CateringOptions.Tnc.Identity Events.CateringOptions.Tnc.Schema Events.CateringOptions.Tnc.ElementId
1 NA SpicyOWing TRUE 105031
2 NA BaconEggs TRUE 105032
Events.CateringOptions.Tnc.ElementType Events.CateringOptions.Tnc.ElementVersion Events.CateringOptions.Address Events.Action Events.Volume Host.Identity
1 NA NA New York 1 1000 John
2 NA NA Seattle 1 2000 John
Host.Schema Host.ElementId Host.ElementType Host.ElementVersion Sender.Identity Sender.Schema Sender.ElementId Sender.ElementType Sender.ElementVersion
1 NA 101505 NA NA Jane NA 101005 NA NA
2 NA 101505 NA NA Jane NA 101005 NA NA
CompletedDate
1 /Date(1490112000000)/
2 /Date(1490112000000)/
The check function
check<-function(li){
areDF<-sapply(1:length(li), function(i) class(li[[i]]) == "data.frame")
areList<-sapply(1:length(li), function(i) class(li[[i]]) == "list")
tmp1 <- NULL
tmp2 <- NULL
if(any(areDF)){
for(j in which(areDF)){
columns <- jsonlite::flatten(li[[j]])
li[[j]] <- check(columns)
}
tmp1<-plyr::rbind.fill(li[areDF])
#return(tmp1)
}
if(any(areList)){
for(j in which(areList)){
li[[j]]<-check(li[[j]])
}
tmp2<-do.call(cbind,li)
#return(tmp2)
}
if(!is.null(tmp1) & !is.null(tmp2)){
return (cbind(tmp1,tmp2))
}
else if(!is.null(tmp1)){
return (tmp1)
}
else if(!is.null(tmp2)){
return (tmp2)
}
return(li)
}
Results
> str(check(sampleData))
'data.frame': 2 obs. of 29 variables:
$ CateringOptions.Agreed : logi TRUE TRUE
$ CateringOptions.Address : chr "New York" "Seattle"
$ CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
$ CateringOptions.Tnc.Schema : logi NA NA
$ CateringOptions.Tnc.ElementId : chr "105031" "105032"
$ CateringOptions.Tnc.ElementType : logi NA NA
$ CateringOptions.Tnc.ElementVersion : logi NA NA
$ Action : num 1 1
$ Volume : num 1000 2000
$ Events.CateringOptions.Agreed : logi TRUE TRUE
$ Events.CateringOptions.Address : chr "New York" "Seattle"
$ Events.CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
$ Events.CateringOptions.Tnc.Schema : logi NA NA
$ Events.CateringOptions.Tnc.ElementId : chr "105031" "105032"
$ Events.CateringOptions.Tnc.ElementType : logi NA NA
$ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
$ Events.Action : num 1 1
$ Events.Volume : num 1000 2000
$ Host.Identity : Factor w/ 1 level "John": 1 1
$ Host.Schema : logi NA NA
$ Host.ElementId : Factor w/ 1 level "101505": 1 1
$ Host.ElementType : logi NA NA
$ Host.ElementVersion : logi NA NA
$ Sender.Identity : Factor w/ 1 level "Jane": 1 1
$ Sender.Schema : logi NA NA
$ Sender.ElementId : Factor w/ 1 level "101005": 1 1
$ Sender.ElementType : logi NA NA
$ Sender.ElementVersion : logi NA NA
$ CompletedDate : Factor w/ 1 level "/Date(1490112000000)/": 1 1
I almost have it, but the nested dataframe is being duped. Also, my code takes fairly long. Does anyone have any idea how I can go about flattening this?
Edit:
I added my solution in the end in the gist
Here is my take at it, with help from purrr.
The idea is similar to yours, only with a different syntax: flatten() the most nested dataframes, then rbind() them.
If I understand your code properly, mine is slightly different at the end, since I'll try to get a more "jsonlite::flatten-friendly" structure to apply it once more to the end result:
library(jsonlite)
library(purrr)
res <-
sampleData %>%
modify_if(
is.list,
.f = ~ modify_if(
.x,
.p = function(x) all(sapply(x, is.data.frame)),
.f = ~ do.call("rbind", lapply(.x, jsonlite::flatten))
)
) %>%
as.data.frame() %>%
jsonlite::flatten()
str(res)
# 'data.frame': 2 obs. of 20 variables:
# $ Events.Action : num 1 1
# $ Events.Volume : num 1000 2000
# $ Host.Identity : chr "John" "John"
# $ Host.Schema : logi NA NA
# $ Host.ElementId : chr "101505" "101505"
# $ Host.ElementType : logi NA NA
# $ Host.ElementVersion : logi NA NA
# $ Sender.Identity : chr "Jane" "Jane"
# $ Sender.Schema : logi NA NA
# $ Sender.ElementId : chr "101005" "101005"
# $ Sender.ElementType : logi NA NA
# $ Sender.ElementVersion : logi NA NA
# $ CompletedDate : chr "/Date(1490112000000)/" "/Date(1490112000000)/"
# $ Events.CateringOptions.Agreed : logi TRUE TRUE
# $ Events.CateringOptions.Address : chr "New York" "Seattle"
# $ Events.CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
# $ Events.CateringOptions.Tnc.Schema : logi NA NA
# $ Events.CateringOptions.Tnc.ElementId : chr "105031" "105032"
# $ Events.CateringOptions.Tnc.ElementType : logi NA NA
# $ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
I've got one mismatch with your expectedOutcome but if I may, it might be on your side:
all.equal(expectedOutcome[sort(names(expectedOutcome))], res[sort(names(res))])
# [1] "Component “Events.CateringOptions.Agreed”: 'is.NA' value mismatch: 0 in current 2 in target"
Not sure if this over-simplifies your problem, but with the sample you shared, it seems to work. Basically, if the column is not already a vector when you do data.frame(your_list), it unlists the data and makes a matrix.
FLAT <- function(inlist) {
A <- data.frame(inlist)
out <- lapply(A, function(y) {
if (is.list(y)) {
y <- unlist(y)
m <- matrix(y, nrow(A), byrow = TRUE, dimnames = list(NULL, unique(names(y))))
y <- data.frame(m, stringsAsFactors = FALSE)
y[] <- lapply(y, type.convert)
}
y
})
do.call(cbind, out)
}
FLAT(sampleData)
Here's the str on your sample data:
str(FLAT(sampleData))
## 'data.frame': 2 obs. of 20 variables:
## $ Events.CateringOptions.Agreed : logi TRUE TRUE
## $ Events.CateringOptions.Tnc.Identity : Factor w/ 2 levels "BaconEggs","SpicyOWing": 2 1
## $ Events.CateringOptions.Tnc.Schema : logi NA NA
## $ Events.CateringOptions.Tnc.ElementId : int 105031 105032
## $ Events.CateringOptions.Tnc.ElementType : logi NA NA
## $ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
## $ Events.CateringOptions.Address : Factor w/ 2 levels "New York","Seattle": 1 2
## $ Events.Action : num 1 1
## $ Events.Volume : num 1000 2000
## $ Host.Identity : Factor w/ 1 level "John": 1 1
## $ Host.Schema : logi NA NA
## $ Host.ElementId : Factor w/ 1 level "101505": 1 1
## $ Host.ElementType : logi NA NA
## $ Host.ElementVersion : logi NA NA
## $ Sender.Identity : Factor w/ 1 level "Jane": 1 1
## $ Sender.Schema : logi NA NA
## $ Sender.ElementId : Factor w/ 1 level "101005": 1 1
## $ Sender.ElementType : logi NA NA
## $ Sender.ElementVersion : logi NA NA
## $ CompletedDate : Factor w/ 1 level "/Date(1490112000000)/": 1 1
the getFin() function returns an object of type "financials". which contains a list of lists.
getFin("AAPL")
structure of resulting object
i need to create tables for each of the following:
Balance Sheet
Income Statement
Cash Flow
End goal is to display these tables on a dashboard.
Here's what I tried, but it doesn't seem right:
df <- data.frame(AAPL.f[[2]][2])
df2 <- data.frame(viewFin(AAPL.f,"BS", "A"))
How can I get the above statements into Data frames?
This should give you what you want.
require(quantmod)
setwd("C:/Users/your_path_here/downloads")
stocks <- c("AXP","BA","CAT","CSCO","CVX","DD","DIS","GE","GS","HD","IBM","INTC","JNJ","JPM","KO","MCD","MMM","MRK","MSFT","NKE","PFE","PG","T","TRV","UNH","UTX","V","VZ","WMT","XOM")
# equityList <- read.csv("EquityList.csv", header = FALSE, stringsAsFactors = FALSE)
# names(equityList) <- c ("Ticker")
for (i in 1 : length(stocks)) {
temp<-getFinancials(stocks[i],src="google",auto.assign=FALSE)
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Annual).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Annual).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Annual).csv",sep=""))
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Quarterly).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Quaterly).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Quaterly).csv",sep=""))
}
Here's what I was looking for....
I'm sure there are better ways to do this.
library(quantmod)
library(xlsx)
getFin("GS")
gs_BS <- GS.f$BS$A
str(gs_BS)
#num [1:42, 1:4] 106533 NA 113003 71883 NA ...
#- attr(*, "dimnames")=List of 2
# ..$ : chr [1:42] "Cash & Equivalents" "Short Term Investments" "Cash and Short Term Investments" "Accounts Receivable - Trade, Net" ...
# ..$ : chr [1:4] "2015-12-31" "2014-12-31" "2013-12-31" "2012-12-31"
#- attr(*, "col_desc")= chr [1:4] "As of 2015-12-31" "As of 2014-12-31" "As of 2013-12-31" "As of 2012-12-31"
transposed <- t(gs_BS)
write.xlsx(transposed, "C:\\Users\\abc\\Desktop\\bal_sheet.xlsx", row.names=FALSE)
transp <- read.xlsx("C:\\Users\\Tatter\\Desktop\\bal_sheet.xlsx" , sheetName="Sheet1")
transp$year <- c("2015","2014","2013","2012")
#> str(transp)
#'data.frame': 4 obs. of 43 variables:
#$ Cash...Equivalents : num 106533 90406 94224 65919
#$ Short.Term.Investments : logi NA NA NA NA
#$ Cash.and.Short.Term.Investments : num 113003 96196 98364 72669
#$ Accounts.Receivable...Trade..Net : num 71883 94479 97880 91354
#$ Receivables...Other : logi NA NA NA NA
#$ Total.Receivables..Net : num 71883 94479 97880 91354
#$ Total.Inventory : logi NA NA NA NA
#$ Prepaid.Expenses : logi NA NA NA NA
#$ Other.Current.Assets..Total : logi NA NA NA NA
#$ Total.Current.Assets : logi NA NA NA NA
#$ Property.Plant.Equipment..Total...Gross : num 17726 18324 18236 17267
#$ Accumulated.Depreciation..Total : num -7770 -8980 -9040 -9050
#$ Goodwill..Net : num 3657 3645 3705 3702
#$ Intangibles..Net : num 491 515 671 1397
#$ Long.Term.Investments : num 548317 547272 615841 602819
#$ Other.Long.Term.Assets..Total : num 5548 5181 5241 55291
#$ Total.Assets : num 861395 855842 911507 938555
#$ Accounts.Payable : num 210362 213572 204765 194485
#$ Accrued.Expenses : num 8149 8368 7874 8292
#$ Notes.Payable.Short.Term.Debt : num 196752 186133 250283 241931
#$ Current.Port..of.LT.Debt.Capital.Leases : num 29623 29501 47288 67349
#$ Other.Current.liabilities..Total : num 1280 1533 1974 2724
#$ Total.Current.Liabilities : logi NA NA NA NA
#$ Long.Term.Debt : num 268652 257954 245227 176270
#$ Capital.Lease.Obligations : logi NA NA NA NA
#$ Total.Long.Term.Debt : num 268652 257954 245227 176270
#$ Total.Debt : num 495027 473588 542798 485550
#$ Deferred.Income.Tax : logi NA NA NA NA
#$ Minority.Interest : num 459 404 326 508
#$ Other.Liabilities..Total : num 51035 70829 70120 152289
#$ Total.Liabilities : num 774667 773045 833040 862839
#$ Redeemable.Preferred.Stock..Total : logi NA NA NA NA
#$ Preferred.Stock...Non.Redeemable..Net : num 11200 9200 7200 6200
#$ Common.Stock..Total : num 9 9 8 8
#$ Additional.Paid.In.Capital : num 51340 50049 48998 48030
#$ Retained.Earnings..Accumulated.Deficit. : num 83386 78984 71961 65223
#$ Treasury.Stock...Common : num -62640 -58468 -53015 -46850
#$ Other.Equity..Total : num -718 -743 -524 -520
#$ Total.Equity : num 86728 82797 78467 75716
#$ Total.Liabilities...Shareholders..39..Equity: num 861395 855842 911507 938555
#$ Shares.Outs...Common.Stock.Primary.Issue : logi NA NA NA NA
#$ Total.Common.Shares.Outstanding : num 419 430 467 465
#$ year : chr "2015" "2014" "2013" "2012"
so, the financial statement object has been transposed so that each item on the statement (Balance Sheet in this case) becomes a column and can be written to a database table
ticker = "AAPL"
period = 'A'
statements = getFin(ticker, auto.assign=FALSE)
bs = viewFin(statements, type="BS", period=period)
is = viewFin(statements, type="IS", period=period)
cf = viewFin(statements, type="CF", period=period)
bs.df = data.frame(bs)
is.df = data.frame(is)
cf.df = data.frame(cf)
You can then check to make sure they're all of data.frame class like so.
> is.data.frame(bs.df)
[1] TRUE
> is.data.frame(is.df)
[1] TRUE
> is.data.frame(cf.df)
[1] TRUE
This should give you what you want.
require(quantmod)
setwd("C:/Users/rshuell001/Desktop/downloads")
stocks <- c("AXP","BA","CAT","CSCO","CVX","DD","DIS","GE","GS","HD","IBM","INTC","JNJ","JPM","KO","MCD","MMM","MRK","MSFT","NKE","PFE","PG","T","TRV","UNH","UTX","V","VZ","WMT","XOM")
# equityList <- read.csv("EquityList.csv", header = FALSE, stringsAsFactors = FALSE)
# names(equityList) <- c ("Ticker")
for (i in 1 : length(stocks)) {
temp<-getFinancials(stocks[i],src="google",auto.assign=FALSE)
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Annual).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Annual).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Annual).csv",sep=""))
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Quarterly).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Quaterly).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Quaterly).csv",sep=""))
}