I have a file a notepad txt file inflation.txt that looks something like this:
1950-1 0.0084490544865279
1950-2 −0.0050487986543660
1950-3 0.0038461526886055
1950-4 0.0214293914558992
1951-1 0.0232839389540449
1951-2 0.0299121323429455
1951-3 0.0379293285389640
1951-4 0.0212773984472849
From a previous stackoverflow post, I learned how to import this file into R:
data <- read.table("inflation.txt", sep = "" , header = F ,
na.strings ="", stringsAsFactors= F, encoding = "UTF-8")
However, this code reads the file as a character. When I try to convert this file to numeric format, all negative values are replaced with NA:
b=as.numeric(data$V2)
Warning message:
In base::as.numeric(x) : NAs introduced by coercion
> head(b)
[1] 0.008449054 NA 0.003846153 0.021429391 0.023283939 0.029912132
Can someone please show me what I am doing wrong? Is it possible to save the inflation.txt file as a data.frame?
I would read the file using space as a separator, then spin out two separate columns for the year and quarter from your R script:
data <- read.table("inflation.txt", sep = " ", header=FALSE,
na.strings="", stringsAsFactors=FALSE, encoding="UTF-8")
names(data) <- c("ym", "vals")
data$year <- as.numeric(sub("-.*$", "", data$ym))
data$month <- as.numeric(sub("^\\d+-", "", data$ym))
data <- data[, c("year", "month", "vals")]
The issue is that "−" that you have in your data is not minus sign (it is a dash), hence the data is being read as character.
You have two options.
Open the file in any text editor and find and replace all the "−" with negative sign and then using read.table would work directly.
data <- read.table("inflation.txt")
If you can't change the data in the original file then replace them with sub after reading the data into R.
data$V2 <- as.numeric(sub('−', '-', data$V2, fixed = TRUE))
Related
I am working on a basketball project. I am struggling to open my data on R :
https://www.basketball-reference.com/leagues/NBA_2019_totals.html
I have imported the data on excel and then saved it as CSV (for macintosh).
When I import the data on R I get an error message :
"Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, : invalid multibyte string at '<e7>lex<20>Abrines' "
The following seems to work. The readHTMLTable function does give warnings due to the presence of null characters in column Player.
library(XML)
uri <- "https://www.basketball-reference.com/leagues/NBA_2019_totals.html"
data <- readHTMLTable(readLines(uri), which = 1, header = TRUE)
i <- grep("Player", data$Player, ignore.case = TRUE)
data <- data[-i, ]
cols <- c(1, 4, 6:ncol(data))
data[cols] <- lapply(data[cols], function(x) as.numeric(as.character(x)))
Check if there are NA values. This is needed because the table in the link restarts the headers every now and then and character strings become mixed with numeric entries. The grep above is meant to detect such cases but maybe there are others.
sapply(data, function(x) sum(is.na(x)))
No, everything is alright. So write the data set as a CSV file.
write.csv(data, "nba.csv")
The file Encoding to Latin1 can help.
For example, to read a file in csv skipping second row:
Test=(read.csv("IMDB.csv",header=T,sep=",",fileEncoding="latin1")[-2,])
I am trying to read a dat file with ";" separated. I want to read a specific line that starts with certain characters like "B" and the other line are not the matter of interest. Can anyone guide me.
I have tried using the read_delim, read.table and read.csv2.But since some lines are not of equal length. So, I am getting errors.
file <- read.table(file = '~/file.DAT',header = FALSE, quote = "\"'",dec = ".",numerals = c("no.loss"),sep = ';',text)
I am expecting a r dataframe out of this file which I can write it to a csv file again.
You should be able to do that through readLines
allLines <- readLines(con = file('~/file.DAT'), 'r')
grepB <- function(x) grepl('^B',x)
BLines <- filter(grepB, allLines)
df <- as.data.frame(strsplit(BLines, ";"))
And if your file contains header, then you can specify
names(df) <- strsplit(allLines[1], ";")[[1]]
When trying to read a local csv file im getting the error
Error in xts(dat, order.by = as.Date(rownames(dat), "%m/%d/%Y")) :
'order.by' cannot contain 'NA', 'NaN', or 'Inf'
im trying out the example from https://rpubs.com/mohammadshadan/288218 which is the following:
tmp_file <- "test.csv"
# Create dat by reading tmp_file
dat <- read.csv(tmp_file,header=FALSE)
# Convert dat into xts
xts(dat, order.by = as.Date(rownames(dat), "%m/%d/%Y"))
# Read tmp_file using read.zoo
dat_zoo <- read.zoo(tmp_file, index.column = 0, sep = ",", format = "%m/%d/%Y")
# Convert dat_zoo to xts
dat_xts <- as.xts(dat_zoo)
the thing is when i try to read the file like in the example which is reading the file from the server this works somehow but not when i try with a csv file locally even if its the same info as the file in the web.
i have tried creating the csv file with Notepad,Notepad++ and Excel with no luck.
Any idea what im missing?, i have also tried using read.table instead of csv with the same results...
File can be found at: https://ufile.io/zfqje
if header=TRUE i get the following error:
Warning messages: 1: In read.table(file = file, header = header, sep =
sep, quote = quote, : incomplete final line found by
readTableHeader on 'test.csv'
2: In read(file, ...) : incomplete
final line found by readTableHeader on 'test.csv'
The problem is the header=FALSE argument in read.csv.
read.csv will choose the first column as the row names if there is a header and the first row contains one fewer field than the number of columns. When header = FALSE, it doesn't create the row names.
Here is an example of the problem:
dat <- read.csv(text = "a,b
1/02/2015,1,3
2/03/2015,2,4", header = F)
as.Date(rownames(dat), "%m/%d/%Y")
#> [1] NA NA NA
By removing header = F, the problem is fixed:
dat <- read.csv(text = "a,b
1/02/2015,1,3
2/03/2015,2,4")
as.Date(rownames(dat), "%m/%d/%Y")
#> [1] "2015-01-02" "2015-02-03"
I have a txt file (remove.txt) with these kind of data (that's RGB Hex colors):
"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"...
Which are colors I don't want into my analysis.
And I have a R object (nacreHEX) with other data like in the file, but there are into this the good colors and the colors wich I don't want into my analysis. So I use this code to remove them:
nacreHEX <- nacreHEX [! nacreHEX %in% remove] .
It's works when remove is a R object like this remove <- c("#DDDEE0", "#D8D9DB"...), but it doesn't work when it's come from a txt file and I change it into a data.frame, and neither when I try with remove2 <-as.vector(t(remove)).
So there is my code:
remove <- read.table("remove.txt", sep=",")
remove2 <-as.vector(t(remove))
nacreHEX <- nacreHEX [! nacreHEX %in% remove2]
head(nacreHEX)
With this, there are no comas with as.vector, so may be that's why it doesn't work.
How can I make a R vector with comas with these kind of data?
What stage did I forget?
The problem is that your txt file is separated by ", " not ",'. The spaces end up in your string:
rr = read.table(text = '"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"', sep = ",")
(rr = as.vector(t(rr)))
# [1] "#DDDEE0" " #D8D9DB" " #F5F6F8" " #C9CBCA"
You can see the leading spaces before the #. We can trim these spaces with trimws().
trimws(rr)
# [1] "#DDDEE0" "#D8D9DB" "#F5F6F8" "#C9CBCA"
Even better, you can use the argument strip.white to have read.table do it for you:
rr = read.table(text = '"#DDDEE0", "#D8D9DB", "#F5F6F8", "#C9CBCA"',
sep = ",", strip.white = TRUE)
I'm trying to read a .csv file into R where all the column are numeric. However, they get converted to factor everytime I import them.
Here's a sample of how my CSV looks like:
This is my code:
options(StringsAsFactors=F)
data<-read.csv("in.csv", dec = ",", sep = ";")
As you can see, I set dec to , and sep to ;. Still, all the vectors that should be numerics are factors!
Can someone give me some advice? Thanks!
Your NA strings in the csv file, N/A, are interpreted as character and then the whole column is converted to character. If you have stringsAsFactors = TRUE in options or in read.csv (default), the column is further converted to factor. You can use the argument na.strings to tell read.csv which strings should be interpreted as NA.
A small example:
df <- read.csv(text = "x;y
N/A;2,2
3,3;4,4", dec = ",", sep = ";")
str(df)
df <- read.csv(text = "x;y
N/A;2,2
3,3;4,4", dec = ",", sep = ";", na.strings = "N/A")
str(df)
Update following comment
Although not apparent from the sample data provided, there is also a problem with instances of '$' concatenated to the numbers, e.g. '$3,3'. Such values will be interpreted as character, and then the dec = "," doesn't help us. We need to replace both the '$' and the ',' before the variable is converted to numeric.
df <- read.csv(text = "x;y;z
N/A;1,1;2,2$
$3,3;5,5;4,4", dec = ",", sep = ";", na.strings = "N/A")
df
str(df)
df[] <- lapply(df, function(x){
x2 <- gsub(pattern = "$", replacement = "", x = x, fixed = TRUE)
x3 <- gsub(pattern = ",", replacement = ".", x = x2, fixed = TRUE)
as.numeric(x3)
}
)
df
str(df)
You could have gotten your original code to work actually - there's a tiny typo ('stringsAsFactors', not 'StringsAsFactors'). The options command wont complain with the wrong text, but it just wont work. When done correctly, it'll read it as char, instead of factors. You can then convert columns to whatever format you want.
I just had this same issue, and tried all the fixes on this and other duplicate posts. None really worked all that well. The way I went about fixing it was actually on the excel side. If you highlight all the columns in your source file (in excel), right click==> format cells then select 'number' it'll import perfectly fine (so long as you have no non-numeric characters below the header)