Convert character variables into numeric with R - r

FIRST QUESTION EVER ;)
Here's the point: I have this dataset and I started without "stringsAsFactors=FALSE" in read.csv function. I can't work with those data because I got the Warning message: NAs introduced by coercion. Thank you for the help :)
rm(list=ls())
path <- "....."
file <- read.csv(path, header = TRUE, sep = ",", stringsAsFactors=FALSE)
str(file)
#'data.frame': 33 obs. of 11 variables:
#$ Var1: chr "01/09/2021" "02/09/2021" "09/09/2021" "10/09/2021" ...
#$ Var2: chr "mercoledì" "giovedì" "giovedì" "venerdì" ...
#$ Var3: chr "2,5" "2,5" "2,5" "3,0" ...
#$ Var4: chr "4,0" "0,0" "2,0" "3,0" ...
#$ Var5: chr "2,0" "5,0" "5,0" "5,0" ...
#$ Var5: chr "0,0" "0,0" "0,0" "0,0" ...
#$ Var6: chr "6,0" "5,0" "7,0" "8,0" ...
#$ Var7: chr "23,5" "25,0" "28,0" "32,0" ...
#$ Var8: chr "0,0" "1,0" "5,0" "5,5" ...
#$ Var9: chr "23,5" "26,0" "33,0" "37,5" ...
#$ Var10: chr "67,0" "0,0" "0,0" "0,0" ...
as.numeric(file$Var7)
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion
CSV FILE

I managed to recreate your problem. Your file is using , both as field separator and decimal separator (which is uncommon).
You can fix your problem by specifying that decimals are commas in (dec = ",") in read.csv(), as follows:
read.csv(
path,
header = TRUE,
sep = ",",
dec = ",", # I've added this line
stringsAsFactors = FALSE
)
Change this, run str(file) again, and you should see that most columns are numeric.

Related

How to convert in Date format the columns of a particular excel file?

I have an excel file with 77 columns (with 43 NA columns) of different length, 12 of which are Date. Ideally, I want to import it in R the dataset with the columns that refer to Date in date format, while the other columns in numeric format. There is lot of material in stackoverflow and I tried all the options but it is not working.
The first option would be to do it directly from excel:
dataset <- read_xlsx("Data.xlsx", col_types = "numeric") #it gives everything numeric but column date always in this format "36164"
#I also tried something like this:
dataset <- read_xlsx("Data.xlsx", col_types = c("date", rep("numeric", n))) #where "n" stands for all the columns with numbers I have but it did not work
I can import the data with the incorret date columns. After some cleaning (removing NA columns) I get a tbl with different column length. I tried the following codes to transform the incorrect column dates into date format:
dataset <- janitor::remove_empty(dataset, which = "cols") #remove NA columns
dataset <- dataset[-c(1),] #remove the first row of all columns
# Now using this command I could transform each incorrect date column into a date format:
date <- as.Date(as.numeric(dataset$column1), origin = "1899-12-30")
# I would like to do it for all the date columns in one shot but when I try to do it in this way
as.Date(as.numeric(dataset[,c(1,3,5,7,14,16,18,20,21,23,25,32)]), origin = "1899-12-30")
# I get an error, probably because the columns have different length
# the error is: Error in as.Date(as.numeric(var_dataset[, c(1, 3, 5, 7, 14, 16, 18, 20, :
'list' object cannot be coerced to type 'double'
# unlisting the object doesn't solve the problem
I am aware it is missing data to reproduce my problem but in the first scenario I don't know how to approximate my quite big excel file while in the second case I don't know how to create a tbl with many columns of different length without wasting lot of time. Sorry.
Do you have any solution? Either for importing directly from Excel or playing with the dataframe
Thanks so much
I attach here the structure of my dataset:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5500 obs. of 77 variables:
$ Name...1 : chr "Code" "36164" "36165" "36166" ...
$ VSTOXX VOLATILITY INDEX - PRICE INDEX : chr "VSTOXXI(PI)" "18.2" "29.69" "25.17" ...
$ ...3 : logi NA NA NA NA NA NA ...
$ ...4 : logi NA NA NA NA NA NA ...
$ ...5 : logi NA NA NA NA NA NA ...
$ ...6 : logi NA NA NA NA NA NA ...
$ Name...7 : chr "Code" "36799" "36830" "36860" ...
$ EM COMPOSITE INDICATOR OF SOVEREIGN STRESS: GDP WEIGHTS NADJ : chr "EMEBSCGWR" "7.8255999999999992E-2" "8.9886999999999995E-2" "8.0714999999999995E-2" ...
$ ...9 : logi NA NA NA NA NA NA ...
$ Name...10 : chr "Code" "36168" "36175" "36182" ...
$ CISS BOND MKT: GOV & NFC VOLATILITY - ECONOMIC SERIES : chr "EMCIBMG" "4.4651999999999997E-2" "6.6535999999999998E-2" "4.9789E-2" ...
$ ...12 : logi NA NA NA NA NA NA ...
$ Name...13 : chr "Code" "36168" "36175" "36182" ...
$ CISS MONEY MKT: 3M RATE+ VOLATILITY - ECONOMIC SERIES : chr "EMECM3E" "5.7435999999999994E-2" "7.463199999999999E-2" "7.2263999999999995E-2" ...
$ CISS FX MKT: EUR VOLATILITY - ECONOMIC SERIES : chr "EMECFEM" "7.2139999999999996E-2" "8.6049E-2" "4.5948999999999997E-2" ...
$ CISS FIN INTERM: BANK+ VOLATILITY - ECONOMIC SERIES : chr "EMCIFIN" "4.5384999999999995E-2" "0.11820399999999999" "0.11516499999999999" ...
$ CISS NF EQUITY: VOLATILITY - ECONOMIC SERIES : chr "EMCIEMN" "7.7453999999999995E-2" "0.12733" "0.11918899999999999" ...
$ CISS: CROSS SUBINDEXCORRELATION - ECONOMIC SERIES : chr "EMCICRO" "-0.21210999999999999" "-0.29791000000000001" "-0.2369" ...
$ SYSTEMIC STRESS COMPINDICATOR - ECONOMIC SERIES : chr "EMCISSI" "8.4954000000000002E-2" "0.174844" "0.16546" ...
$ ...20 : logi NA NA NA NA NA NA ...
$ ...21 : logi NA NA NA NA NA NA ...
$ ...22 : logi NA NA NA NA NA NA ...
$ ...23 : logi NA NA NA NA NA NA ...
$ ...24 : logi NA NA NA NA NA NA ...
$ ...25 : logi NA NA NA NA NA NA ...
$ Name...26 : chr "Code" "33253" "33284" "33312" ...
$ Z8 IPI: MFG., VOLUME INDEX OF PRODUCTION, 2015=100 (WDA) VOLA: chr "Z8ES493KG" "81" "79.7" "79.400000000000006" ...
$ ...28 : logi NA NA NA NA NA NA ...
$ ...29 : logi NA NA NA NA NA NA ...
$ ...30 : logi NA NA NA NA NA NA ...
$ ...31 : logi NA NA NA NA NA NA ...
$ ...32 : logi NA NA NA NA NA NA ...
$ ...33 : logi NA NA NA NA NA NA ...
$ ...34 : logi NA NA NA NA NA NA ...
$ Name...35 : chr "Code" "35779" "35810" "35841" ...
$ EH HICP: ALL-ITEMS NADJ : chr "EHES795WR" "1.7" "1.6" "1.6" ...
$ ...37 : logi NA NA NA NA NA NA ...
$ ...38 : logi NA NA NA NA NA NA ...
$ Name...39 : chr "Code" "35110" "35139" "35170" ...
$ EH HICP: ALL-ITEMS (%MOM) NADJ : chr "EHESPQ93R" "0.4" "0.4" "0.3" ...
$ ...41 : logi NA NA NA NA NA NA ...
$ ...42 : logi NA NA NA NA NA NA ...
$ ...43 : logi NA NA NA NA NA NA ...
$ Name...44 : chr "Code" "35445" "35476" "35504" ...
$ EH HICP: ALL-ITEMS HICP (%YOY) NADJ : chr "EHESAKZER" "2.2000000000000002" "2" "1.7" ...
$ ...46 : logi NA NA NA NA NA NA ...
$ ...47 : logi NA NA NA NA NA NA ...
$ ...48 : logi NA NA NA NA NA NA ...
$ ...49 : logi NA NA NA NA NA NA ...
$ Name...50 : chr "Code" "36206" "36234" "36265" ...
$ EM EUROSYSTEM: BASE MONEY CURN : chr "EMEBSMYBA" "426.64374199999997" "430.51499999999999" "432.34064499999999" ...
$ ...52 : logi NA NA NA NA NA NA ...
$ ...53 : logi NA NA NA NA NA NA ...
$ ...54 : logi NA NA NA NA NA NA ...
$ ...55 : logi NA NA NA NA NA NA ...
$ Name...56 : chr "Code" "35703" "35734" "35762" ...
$ EM EUROSYSTEM: TOTAL ASSETS/LIABILITIES (EP) CURN : chr "EMECBSALA" "710257.53500000003" "711193.47100000002" "714957.58900000004" ...
$ ...58 : logi NA NA NA NA NA NA ...
$ ...59 : logi NA NA NA NA NA NA ...
$ ...60 : logi NA NA NA NA NA NA ...
$ ...61 : logi NA NA NA NA NA NA ...
$ ...62 : logi NA NA NA NA NA NA ...
$ ...63 : logi NA NA NA NA NA NA ...
$ Name...64 : chr "Code" "41548" "41579" "41609" ...
$ TR EU FWD INFL-LKD SWAP 10YF20Y - MIDDLE RATE : chr "TREFSTT" NA NA NA ...
$ TR EU FWD INFL-LKD SWAP 10YF10Y - MIDDLE RATE : chr "TREFS1T" NA NA NA ...
$ TR EU FWD INFL-LKD SWAP 2YF2Y - MIDDLE RATE : chr "TREFS22" "1.5158" "1.4669000000000001" "1.4715" ...
$ TR EU FWD INFL-LKD SWAP 1YF1Y - MIDDLE RATE : chr "TREFS11" "1.4509000000000001" "1.2338" "1.1225000000000001" ...
$ TR EU FWD INFL-LKD SWAP 2YF3Y - MIDDLE RATE : chr "TREFS23" "1.5906000000000002" "1.5453000000000001" "1.5283000000000002" ...
$ TR EU FWD INFL-LKD SWAP 5YF10Y - MIDDLE RATE : chr "TREFS5T" "2.3516000000000004" "2.3323" "2.3070000000000004" ...
$ ...71 : logi NA NA NA NA NA NA ...
$ ...72 : logi NA NA NA NA NA NA ...
$ ...73 : logi NA NA NA NA NA NA ...
$ ...74 : logi NA NA NA NA NA NA ...
$ ...75 : logi NA NA NA NA NA NA ...
$ Name...76 : chr "Code" "41255" "41286" "41317" ...
$ TR EU FWD INFL-LKD SWAP 5YF5Y - MIDDLE RATE : chr "TREFS55" "2.2027000000000001" "2.2637" "2.383" ...
You have to specify the col_types correctly in the read_excel (or read_xlsx) command. For example:
dataset <- read_xlsx("Data.xlsx",
col_types=c("numeric","date","numeric","date","numeric", "date", ...))
Edit: Finally after much interrogation, the problem is that your data starts in row 3, not 2. So skip the first row (skip=1) and try again.
dataset <- read_xlsx("Data.xlsx", skip=1)
edit: While this will most likely solve the error you're getting, I agree with Edward's advice to use readxl::read_excel which should preserve the dates.
The problem with
as.Date(as.numeric(dataset[,c(1,3,5,7,14,16,18,20,21,23,25,32)]), origin = "1899-12-30")
is that you apply as.numeric on a tibble which internally is a list. Instead do
dplyr::mutate_at(
dataset,
c(1,3,5,7,14,16,18,20,21,23,25,32),
dplyr::funs(as.numeric, as.Date),
origin = "1899-12-30",
format = "%Y-%m-%d"
)
You say the columns have a different length but that's not possible in R's table-like structures (tibble, data.frame, data.table).
Lesson: Always be aware what datatype you're working with doing e.g. str(dataset). as.numeric does not work on tables but needs to be applied to specific columns, using e.g. mutate.

Issues with `addPolylines()` not appearing with data from `osmdata` package

I'm having issues with leaflet::addPolylines using sf objects with Leaflet for R.
Below is the code I'm using to extract (as a random example) of a railway in London.
library(osmdata)
library(leaflet)
library(sf)
library(ggplot2)
# Get Data
dlr <-
opq("London, UK") %>%
add_osm_feature(key = "line", value = "DLR") %>%
osmdata_sf()
str(dlr$osm_lines)
# Classes ‘sf’ and 'data.frame': 213 obs. of 25 variables:
# $ osm_id : chr "3636480" "3663203" "4005749" "4005750" ...
# $ name : chr "Docklands Light Railway" "Docklands Light Railway" "Docklands Light Railway" "Docklands Light Railway" ...
# $ bridge : chr "viaduct" "viaduct" NA NA ...
# $ covered : chr NA NA NA NA ...
# $ cutting : chr NA NA NA NA ...
# $ disused.railway: chr NA NA NA NA ...
# $ electrified : chr "rail" "rail" "rail" "rail" ...
# $ fixme : chr NA NA NA NA ...
# $ frequency : chr "0" "0" "0" "0" ...
# $ gauge : chr "1435" "1435" "1435" "1435" ...
# $ layer : chr "1" "1" "-2" "-2" ...
# $ level : chr NA NA NA NA ...
# $ line : chr "DLR" "DLR" "DLR" "DLR" ...
# $ note : chr NA NA "Route guessed" "Route guessed" ...
# $ oneway : chr NA NA NA NA ...
# $ railway : chr "light_rail" "light_rail" "light_rail" "light_rail" ...
# $ service : chr NA NA NA NA ...
# $ short_name : chr NA NA NA NA ...
# $ source : chr NA NA NA NA ...
# $ source_ref : chr NA NA NA NA ...
# $ start_date : chr NA NA NA NA ...
# $ track_detail : chr NA NA NA NA ...
# $ tunnel : chr NA NA "yes" "yes" ...
# $ voltage : chr "750" "750" "750" "750" ...
# $ geometry :sfc_LINESTRING of length 213; first list element: 'XY' num [1:4, 1:2] -0.0673 -0.0669 -0.0664 -0.0661 51.5111 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr "18019994" "1842525419" "1752475375" "18019985"
# .. ..$ : chr "lon" "lat"
# - attr(*, "sf_column")= chr "geometry"
# - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
# ..- attr(*, "names")= chr "osm_id" "name" "bridge" "covered" ...
Then, plotting using ggplot() and geom_sf() is fine:
dlr$osm_lines %>%
ggplot() + geom_sf()
But not with Leaflet:
dlr$osm_lines %>%
leaflet() %>%
addProviderTiles("Stamen.Watercolor") %>%
addPolylines()
Apologies for the unnecessary watercolour - just wanted to make it abundantly clear that the lines were not there.
This seems to be a problem with the names being set in the geometry of the lines, following recent updates - see the discussion here https://github.com/r-spatial/sf/issues/880 - which suggests just removing them.
This works for me with your example...
names(st_geometry(dlr$osm_lines)) = NULL
dlr$osm_lines %>%
leaflet() %>%
addProviderTiles("Stamen.Watercolor") %>%
addPolylines()
This will hopefully be dealt with by a leaflet update - see https://github.com/rstudio/leaflet/issues/631.

all rows change to NA after filtering R dataframe

I have a large dataframe consisting of five columns.
When I try to filter on one of the columns another column every row in another column is changed to NA. The column I'm filtering on is VehicleEvent, Location is the column receiving the NA substitution.
str(datain)
'data.frame': 7551105 obs. of 19 variables:
$ DiagnosticIDs : chr "2,0,3,1,774,775,810,744,951,947" "2,0,3,1,774,775,7,718,720,951,837,810,744,947" "2,0,3,1,774,775,7,810,744,951,947" NA ...
$ DiagnosticValues: chr "28.211,48284.435,31647,7650.75,0,0,0,1,1,-73" "28.272,48290.34,31650,7651.2,0,0,550,0,0,54,0,0,1,-81" "28.272,48290.34,31650,7651.2,0,0,550,0,1,1,-81" NA ...
$ DriverName : chr "" "" "" NA ...
$ IgnitionOn : chr "true" "true" "true" NA ...
$ Latitude : num 51.5 51.5 51.5 51.5 51.5 ...
$ Longitude : num -0.462 -0.462 -0.463 -0.463 -0.463 ...
$ Location : chr "" "Parking area" "Dispatch" NA ...
$ Time : num 1.52e+09 1.52e+09 1.52e+09 1.52e+09 1.52e+09 ...
some columns not of interest omitted
$ AlertId : chr NA NA NA "6fbc400e-1ae5-11e8-9eee-7845c4f0a3d7" ...
$ AlertType : chr NA NA NA "Exited" ...
$ VehicleEvent : chr NA NA NA "fabb4fcb-c254-4a13-8f9c-a3307a4ba63b" ...
$ MessageType : chr NA NA NA "InsightAlertMessage" ...
str(datadf)
'data.frame': 104136 obs. of 6 variables:
$ Location : chr NA NA NA NA ...
$ Longitude : num -0.483 -0.462 -0.466 -0.464 -0.464 ...
$ Latitude : num 51.5 51.5 51.5 51.5 51.5 ...
$ AlertId : chr "ae22e47c-47c4-11e8-9513-7845c4f0a3d7" "3e13ccbc-47c6-11e8-a72e-7845c4f0a3d7" "5428da40-47c8-11e8-b59f-7845c4f0a3d7" "2fcd3fa8-47df-11e8-85a9-7845c4f0a3d7" ...
$ AlertType : chr "Exited" "Exited" "Exited" "Exited" ...
$ VehicleEvent: chr "792d6964-6ba1-4f98-9b63-5c9e194fff6d" "792d6964-6ba1-4f98-9b63-5c9e194fff6d" "792d6964-6ba1-4f98-9b63-5c9e194fff6d" "792d6964-6ba1-4f98-9b63-5c9e194fff6d" ...
There are no non-ACSII characters in the data (it's all extracted from XML if that means anything). All commas, trailing spaces, full-stop(period) and slashes have been removed from Location in case they has caused this.
The columns have been renamed (just in case there was something else going on using the same names).
I have tried pretty much everything I can think of including ...
datadf <- datain %>%
filter(AlertType == "Exited" &
VehicleEvent == "792d6964-6ba1-4f98-9b63-5c9e194fff6d") %>%
select(Location, Latitude, Longitude)
datadf <- datain[datain$VehicleEvent == "792d6964-6ba1-4f98-9b63-5c9e194fff6d",]
That last one changes all columns to 'NA'.
Is the data in VehicleEvent so strange that it can't be handled...surely not. I have run out of ideas hence my request to the wider community.

importing financial statements from getFin() to data.frame or data.table?

the getFin() function returns an object of type "financials". which contains a list of lists.
getFin("AAPL")
structure of resulting object
i need to create tables for each of the following:
Balance Sheet
Income Statement
Cash Flow
End goal is to display these tables on a dashboard.
Here's what I tried, but it doesn't seem right:
df <- data.frame(AAPL.f[[2]][2])
df2 <- data.frame(viewFin(AAPL.f,"BS", "A"))
How can I get the above statements into Data frames?
This should give you what you want.
require(quantmod)
setwd("C:/Users/your_path_here/downloads")
stocks <- c("AXP","BA","CAT","CSCO","CVX","DD","DIS","GE","GS","HD","IBM","INTC","JNJ","JPM","KO","MCD","MMM","MRK","MSFT","NKE","PFE","PG","T","TRV","UNH","UTX","V","VZ","WMT","XOM")
# equityList <- read.csv("EquityList.csv", header = FALSE, stringsAsFactors = FALSE)
# names(equityList) <- c ("Ticker")
for (i in 1 : length(stocks)) {
temp<-getFinancials(stocks[i],src="google",auto.assign=FALSE)
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Annual).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Annual).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Annual).csv",sep=""))
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Quarterly).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Quaterly).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Quaterly).csv",sep=""))
}
Here's what I was looking for....
I'm sure there are better ways to do this.
library(quantmod)
library(xlsx)
getFin("GS")
gs_BS <- GS.f$BS$A
str(gs_BS)
#num [1:42, 1:4] 106533 NA 113003 71883 NA ...
#- attr(*, "dimnames")=List of 2
# ..$ : chr [1:42] "Cash & Equivalents" "Short Term Investments" "Cash and Short Term Investments" "Accounts Receivable - Trade, Net" ...
# ..$ : chr [1:4] "2015-12-31" "2014-12-31" "2013-12-31" "2012-12-31"
#- attr(*, "col_desc")= chr [1:4] "As of 2015-12-31" "As of 2014-12-31" "As of 2013-12-31" "As of 2012-12-31"
transposed <- t(gs_BS)
write.xlsx(transposed, "C:\\Users\\abc\\Desktop\\bal_sheet.xlsx", row.names=FALSE)
transp <- read.xlsx("C:\\Users\\Tatter\\Desktop\\bal_sheet.xlsx" , sheetName="Sheet1")
transp$year <- c("2015","2014","2013","2012")
#> str(transp)
#'data.frame': 4 obs. of 43 variables:
#$ Cash...Equivalents : num 106533 90406 94224 65919
#$ Short.Term.Investments : logi NA NA NA NA
#$ Cash.and.Short.Term.Investments : num 113003 96196 98364 72669
#$ Accounts.Receivable...Trade..Net : num 71883 94479 97880 91354
#$ Receivables...Other : logi NA NA NA NA
#$ Total.Receivables..Net : num 71883 94479 97880 91354
#$ Total.Inventory : logi NA NA NA NA
#$ Prepaid.Expenses : logi NA NA NA NA
#$ Other.Current.Assets..Total : logi NA NA NA NA
#$ Total.Current.Assets : logi NA NA NA NA
#$ Property.Plant.Equipment..Total...Gross : num 17726 18324 18236 17267
#$ Accumulated.Depreciation..Total : num -7770 -8980 -9040 -9050
#$ Goodwill..Net : num 3657 3645 3705 3702
#$ Intangibles..Net : num 491 515 671 1397
#$ Long.Term.Investments : num 548317 547272 615841 602819
#$ Other.Long.Term.Assets..Total : num 5548 5181 5241 55291
#$ Total.Assets : num 861395 855842 911507 938555
#$ Accounts.Payable : num 210362 213572 204765 194485
#$ Accrued.Expenses : num 8149 8368 7874 8292
#$ Notes.Payable.Short.Term.Debt : num 196752 186133 250283 241931
#$ Current.Port..of.LT.Debt.Capital.Leases : num 29623 29501 47288 67349
#$ Other.Current.liabilities..Total : num 1280 1533 1974 2724
#$ Total.Current.Liabilities : logi NA NA NA NA
#$ Long.Term.Debt : num 268652 257954 245227 176270
#$ Capital.Lease.Obligations : logi NA NA NA NA
#$ Total.Long.Term.Debt : num 268652 257954 245227 176270
#$ Total.Debt : num 495027 473588 542798 485550
#$ Deferred.Income.Tax : logi NA NA NA NA
#$ Minority.Interest : num 459 404 326 508
#$ Other.Liabilities..Total : num 51035 70829 70120 152289
#$ Total.Liabilities : num 774667 773045 833040 862839
#$ Redeemable.Preferred.Stock..Total : logi NA NA NA NA
#$ Preferred.Stock...Non.Redeemable..Net : num 11200 9200 7200 6200
#$ Common.Stock..Total : num 9 9 8 8
#$ Additional.Paid.In.Capital : num 51340 50049 48998 48030
#$ Retained.Earnings..Accumulated.Deficit. : num 83386 78984 71961 65223
#$ Treasury.Stock...Common : num -62640 -58468 -53015 -46850
#$ Other.Equity..Total : num -718 -743 -524 -520
#$ Total.Equity : num 86728 82797 78467 75716
#$ Total.Liabilities...Shareholders..39..Equity: num 861395 855842 911507 938555
#$ Shares.Outs...Common.Stock.Primary.Issue : logi NA NA NA NA
#$ Total.Common.Shares.Outstanding : num 419 430 467 465
#$ year : chr "2015" "2014" "2013" "2012"
so, the financial statement object has been transposed so that each item on the statement (Balance Sheet in this case) becomes a column and can be written to a database table
ticker = "AAPL"
period = 'A'
statements = getFin(ticker, auto.assign=FALSE)
bs = viewFin(statements, type="BS", period=period)
is = viewFin(statements, type="IS", period=period)
cf = viewFin(statements, type="CF", period=period)
bs.df = data.frame(bs)
is.df = data.frame(is)
cf.df = data.frame(cf)
You can then check to make sure they're all of data.frame class like so.
> is.data.frame(bs.df)
[1] TRUE
> is.data.frame(is.df)
[1] TRUE
> is.data.frame(cf.df)
[1] TRUE
This should give you what you want.
require(quantmod)
setwd("C:/Users/rshuell001/Desktop/downloads")
stocks <- c("AXP","BA","CAT","CSCO","CVX","DD","DIS","GE","GS","HD","IBM","INTC","JNJ","JPM","KO","MCD","MMM","MRK","MSFT","NKE","PFE","PG","T","TRV","UNH","UTX","V","VZ","WMT","XOM")
# equityList <- read.csv("EquityList.csv", header = FALSE, stringsAsFactors = FALSE)
# names(equityList) <- c ("Ticker")
for (i in 1 : length(stocks)) {
temp<-getFinancials(stocks[i],src="google",auto.assign=FALSE)
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Annual).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Annual).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Annual).csv",sep=""))
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Quarterly).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Quaterly).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Quaterly).csv",sep=""))
}

why when I remove specific rows, my output is all NA?

I have a data that I uploaded it here
https://gist.github.com/anonymous/0bc36ec5f46757de7c2c
I load it in R using following command
df <- read.delim("path to the data", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='')
Then I check for a specific column to see how many + are there like this
length(which(df$Potential.contaminant == "+"))
which shows 9 in this cas. Then I try to remove all the rows that the + is in that row using the following command
Newdf <- df[df$Potential.contaminant != "+", ]
The output is all NA. what is wrong ?? what do I do wrong here ?
As #akrun suggested I have tried many different ways to do it but without success
df[!grepl("[+]", df$Potential.contaminant),]
df[ is.na(df$Potential.contaminant),]
subset(df, Potential.contaminant != "+")
df[-(which(df$Potential.contaminant == "+")),]
None of above commands could solve it. One idea was that the Potential.contaminant has NA and that is the reason. I replaced all NA with zero using
df[c("Potential.contaminant")][is.na(df[c("Potential.contaminant")])] <- 0
but still the same.
copy pasted your gist in a file c:/input.txt and then used your code:
df <- read.delim("c:/input.txt", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='')
Now:
> str(df)
'data.frame': 21 obs. of 11 variables:
$ Intensityhenya : int 0 NA NA NA NA 0 0 0 0 0 ...
$ Only.identified.by.site: chr "+" NA NA NA ...
$ Reverse : logi NA NA NA NA NA NA ...
$ Potential.contaminant : chr "+" NA NA NA ...
$ id : int 0 NA NA NA NA 1 2 3 4 5 ...
$ IDs.1 : chr "16182;22925;28117;28534;28538;29309;36387;36889;42536;49151;49833;52792;54591;54592" NA NA NA ...
$ razor : chr "True;True;False;False;False;False;False;True;False;False;False;False;False;False" NA NA NA ...
$ Mod.IDs : chr "16828;23798;29178;29603;29607;30404;38270;38271;38793;44633;51496;52211;55280;57146;57147;57148;57149" NA NA NA ...
$ Evidence.IDs : chr "694702;694703;694704;1017531;1017532;1017533;1017534;1017535;1017536;1017537;1017538;1017539;1017540;1017541;1017542;1017543;10"| __truncated__ NA NA NA ...
$ GHSIDs : chr NA NA NA NA ...
$ BestGSFD : chr NA NA NA NA ...
If I try to subset:
> df2 <- df[is.na(df$Potential.contaminant),]
> str(df2)
'data.frame': 12 obs. of 11 variables:
$ Intensityhenya : int NA NA NA NA NA NA NA NA NA NA ...
$ Only.identified.by.site: chr NA NA NA NA ...
$ Reverse : logi NA NA NA NA NA NA ...
$ Potential.contaminant : chr NA NA NA NA ...
$ id : int NA NA NA NA NA NA NA NA NA NA ...
$ IDs.1 : chr NA NA NA NA ...
$ razor : chr NA NA NA NA ...
$ Mod.IDs : chr NA NA NA NA ...
$ Evidence.IDs : chr NA NA NA NA ...
$ GHSIDs : chr NA NA NA NA ...
$ BestGSFD : chr NA NA NA NA ...
But your datas are so crazy it's nearly impossible to visualize them so let's try something else to get the glance of it.
> colnames(df)
[1] "Intensityhenya" "Only.identified.by.site" "Reverse" "Potential.contaminant" "id" "IDs.1" "razor" "Mod.IDs"
[9] "Evidence.IDs" "GHSIDs" "BestGSFD"
Your header is a pain to follow, let's have a look at it:
IDs Intensityhenya Only identified by site Reverse Potential contaminant id IDs razor Mod.IDs Evidence IDs GHSIDs BestGSFD
Along with a line of data where long data are cut to get a glance:
CON__A2A4G1 0 + + 0 16182;[...];4592 True;[..];False 16828;[...];57149 694702;[...];2208697;
208698;[...];2441826
3;2433194;[...];4682766
I've just stripped extraneous numbers when possible and sure, keeping the tabs and newlines.
I hope you see how and why this can lead to a proper analysis of your data, do some check on your input data to sanitize them before retrying to load them in R.
For illustration purpose here is your gist with ellipsis and %T% in place of tabs:
IDs%T%Intensityhenya%T%Only identified by site%T%Reverse%T%Potential contaminant%T%id%T%IDs%T%razor%T%Mod.IDs%T%Evidence IDs%T%GHSIDs%T%BestGSFD
CON__A2A4G1%T%0%T%+%T%%T%+%T%0%T%1618[...]4592%T%Tru[...]alse%T%1682[...]7149%T%69470[...]208697;%T%%T%
20869[...]441826%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
[...]20%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
00[...]%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1271[...]682766%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
CON__A2A5Y0%T%0%T%%T%%T%+%T%1%T%443[...]5777%T%Fals[...]rue%T%464[...]8377%T%21071[...]489947%T%40503[...]780178%T%40505[...]780175
CON__A2AB72%T%0%T%%T%%T%+%T%2%T%443[...]0447%T%Tru[...]alse%T%464[...]2842%T%21070[...]232341%T%40502[...]250729%T%40502[...]250728
CON__ENSEMBL:ENSBTAP00000014147%T%0%T%%T%%T%+%T%3%T%53270%T%TRUE%T%55779%T%238286[...]382871%T%457377[...]573778%T%4573776
CON__ENSEMBL:ENSBTAP00000024146%T%0%T%%T%%T%+%T%4%T%186[...]5835%T%Tru[...]rue%T%194[...]8438%T%8382[...]492132%T%15455[...]783465%T%15455[...]783465
CON__ENSEMBL:ENSBTAP00000024466;CON__ENSEMBL:ENSBTAP00000024462%T%0%T%%T%%T%+%T%5%T%939[...]5179%T%Tru[...]rue%T%978[...]7757%T%41149[...]468480%T%78212[...]739209%T%78217[...]739209
CON__ENSEMBL:ENSBTAP00000025008%T%0%T%+%T%%T%+%T%6%T%1564[...]8580%T%Fals[...]alse%T%1627[...]9651%T%66672[...]269215%T%125151[...]439696%T%125151[...]439691
CON__ENSEMBL:ENSBTAP00000038253%T%0%T%%T%%T%+%T%7%T%120[...]5703%T%Fals[...]alse%T%125[...]8300%T%5326[...]25602%T%%T%
;125602[...]178%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1[...]483384%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
22838[...]23247%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;123247[...]411%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
4[...]7%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
603[...]790126;%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
79012[...]13848%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;413848[...]765024%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
sp|O43790|KRT86_HUMAN;CON__O43790%T%0%T%%T%%T%+%T%8%T%121[...]5716%T%Tru[...]rue%T%126[...]8315%T%5455[...]484318%T%10404[...]426334%T%
It seems like your data rows which are not marked as contaminants, have no values. The "NA" are because of the "na.strings=''" emplyed during read.delim function call. So for example, if you do:
df <- read.delim("https://gist.githubusercontent.com/anonymous/0bc36ec5f46757de7c2c/raw/517ef70ab6a68e600f57308e045c2b4669a7abfc/example.txt", header=TRUE, row.names=1, sep="\t")
df<-df[df$Potential.contaminant!='+',]
summary(df)
you should see empty cells.

Resources