R: Quantmod - getsymbols.csv file format? - r

I'm struggeling to read a local CSV file with quantmod's getSymbols. The format of the file (wkn_541779.csv) I'm trying to read is like this:
Date;Open;High;Low;Close;Volume;Ajdusted
2012-09-06;104,62;105,95;104,62;105,95;1248065,00;105,95
2012-09-05;104,78;104,78;104,45;104,48;1176371,00;104,48
2012-09-04;104,73;104,73;104,26;104,26;13090,00;104,26
> getSymbols("wkn_541779", src="csv", header = TRUE, sep=";", dec=",")
Gives me an error message: "more columns than column names" though.
> count.fields("wkn_541779.csv", sep = ";", skip = 0, blank.lines.skip = TRUE)
Results in "7" for each line (including the header!), which is exactly the number of columns in the header.
Can anybody please help me tracking down the problem here?

getSymbols.csv calls read.csv with its defaults. i.e. sep=","

Related

NA introduced by coercion

I have a file a notepad txt file inflation.txt that looks something like this:
1950-1 0.0084490544865279
1950-2 −0.0050487986543660
1950-3 0.0038461526886055
1950-4 0.0214293914558992
1951-1 0.0232839389540449
1951-2 0.0299121323429455
1951-3 0.0379293285389640
1951-4 0.0212773984472849
From a previous stackoverflow post, I learned how to import this file into R:
data <- read.table("inflation.txt", sep = "" , header = F ,
na.strings ="", stringsAsFactors= F, encoding = "UTF-8")
However, this code reads the file as a character. When I try to convert this file to numeric format, all negative values are replaced with NA:
b=as.numeric(data$V2)
Warning message:
In base::as.numeric(x) : NAs introduced by coercion
> head(b)
[1] 0.008449054 NA 0.003846153 0.021429391 0.023283939 0.029912132
Can someone please show me what I am doing wrong? Is it possible to save the inflation.txt file as a data.frame?
I would read the file using space as a separator, then spin out two separate columns for the year and quarter from your R script:
data <- read.table("inflation.txt", sep = " ", header=FALSE,
na.strings="", stringsAsFactors=FALSE, encoding="UTF-8")
names(data) <- c("ym", "vals")
data$year <- as.numeric(sub("-.*$", "", data$ym))
data$month <- as.numeric(sub("^\\d+-", "", data$ym))
data <- data[, c("year", "month", "vals")]
The issue is that "−" that you have in your data is not minus sign (it is a dash), hence the data is being read as character.
You have two options.
Open the file in any text editor and find and replace all the "−" with negative sign and then using read.table would work directly.
data <- read.table("inflation.txt")
If you can't change the data in the original file then replace them with sub after reading the data into R.
data$V2 <- as.numeric(sub('−', '-', data$V2, fixed = TRUE))

Problem importing with read.csv (NULL in database manager is not recognized)

I'm trying to import a dataframe to r with read.csv. I exported from my database manager (DBeaver) with UTF-8 encoding.
In some factor vectors what is NULL in the database manager is not recognized as so. I think NULL is replaced by blank space(s) and I when I try, can't turn them to NULL or NA.
I'm using:
tb1 <- read.csv("pacientes.csv", header = TRUE, sep = ",", dec = ".")
I identified the problem when I use
table(tb1$var2, useNA="ifany")
With factor variables I know have missing values and I get a table with "blank space(s)" as a category (along with the correct categories)
I have 59 columns, so using some features of read.csv is unpractical. And I really believe there's an easier way to fix the problem. Can anyone help me? thank you very much!
Try
read.csv("pacientes.csv", header = TRUE, sep = ",", dec = ".", na.strings = c("NA", "", " "))

Error in type.convert when reading data from CSV

I am working on a basketball project. I am struggling to open my data on R :
https://www.basketball-reference.com/leagues/NBA_2019_totals.html
I have imported the data on excel and then saved it as CSV (for macintosh).
When I import the data on R I get an error message :
"Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, : invalid multibyte string at '<e7>lex<20>Abrines' "
The following seems to work. The readHTMLTable function does give warnings due to the presence of null characters in column Player.
library(XML)
uri <- "https://www.basketball-reference.com/leagues/NBA_2019_totals.html"
data <- readHTMLTable(readLines(uri), which = 1, header = TRUE)
i <- grep("Player", data$Player, ignore.case = TRUE)
data <- data[-i, ]
cols <- c(1, 4, 6:ncol(data))
data[cols] <- lapply(data[cols], function(x) as.numeric(as.character(x)))
Check if there are NA values. This is needed because the table in the link restarts the headers every now and then and character strings become mixed with numeric entries. The grep above is meant to detect such cases but maybe there are others.
sapply(data, function(x) sum(is.na(x)))
No, everything is alright. So write the data set as a CSV file.
write.csv(data, "nba.csv")
The file Encoding to Latin1 can help.
For example, to read a file in csv skipping second row:
Test=(read.csv("IMDB.csv",header=T,sep=",",fileEncoding="latin1")[-2,])

Issues reading data as csv in R

I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.
I have done the following code runs:
Input:
data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)
Output for the second code:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 80 elements
Input:
datar <- read.csv("data.csv", header = TRUE, na.strings = NA)
Output:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.
How can I solve this??
Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.
Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
Or, hard-code the path, and read the data into R.
# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")

R read.table skip not working. Why?

I have a file similar to
ColA ColB ColC
A 1 0.1
B 2 0.2
But with many more columns.
I want to read the table and set the correct type of data for each column.
I am doing the following:
data <- read.table("file.dat", header = FALSE, na.string = "",
dec = ".",skip = 1,
colClasses = c("character", "integer","numeric"))
But I get the following error:
Error in scan(...): scan() expected 'an integer', got 'ColB'
What am I doing wrong? Why is it trying to parse also the first line according to colClasses, despite skip=1?
Thanks for your help.
Some notes: This file has been generated in a Linux environment and is being worked on in a Windows environment. I am thinking of a problem with newline characters, but I have no idea what to do.
Also, if I read the table without colClasses the table is read correctly (skipping the first line) but all columns are factor type. I can probably change the class later, but still I would like to understand what is happening.
Instead of skipping first line, you can change header = TRUE and it should work fine.
data <- read.table("file.dat", header = TRUE, na.string = "",
dec = ".",colClasses = c("character", "integer","numeric"), sep = ",")

Resources