I'm new to r and really need some help with an assignment i have for school.
So I've created an xls file containing returns for companies as decimals i.e 0.023 (2.3% return)
Data is in 3 columns with some negative values. titles for each column in the first row. No row names present just 130 observations of returns and the company names (column names) at the top. All the cells are formatted to general
I converted the xls file to csv on my mac so the file type became CSV-UTF-8 (comma delimited).
When i try to create a dataset in r I imported the csv using read.table command:
read.table(”filename.csv”, header = TRUE, sep =”;” row.names=null)
The dataset looks good all the individual numbers in the right place but when I try
Sapply(dataset, class)
All columns are character. I've tried as.numeric and it says list object cannot be coerced to type ’double’
The issue comes from the fact that you imported a dataset with commas and R cannot interpret this as numeric (it requires dot as decimals separator).
Two ways to avoid this :
You import as you did and you convert your dataframe
dataset=apply(apply(dataset, 2, gsub, patt=",", replace="."), 2, as.numeric)
You directly import the dataset by intepreting commas as decimals separator with read.csv2
library(readr)
read.csv2("filename.csv",fill=TRUE,header=TRUE)
Related
I have a table in Excel with numeric, date, and character type columns. I use the read_excel() function from readxl library to load data into R. For most of the columns, read_excel by default does a good job in recognizing the column type.
Problem:
As the number of columns in the table can increase or decrease, I don't want to define col_types in read_excel to load data.
Two Excel numeric columns are cost and revenue with '$' in front of the value such as $200.0541. The dollar sign '$' seems to cause the function to mistakenly identify the cost and revenue column as POSIXct type.
Since new numeric columns might be added later with '$', is it possible to change the column types after loading the data (without using df$cost <- as.numeric(df$cost) for each column) through a loop?
Edit: link to sample - https://ethercalc.org/ogiqi9s51o45
I am trying to import a csv dataset which is about the number of benefit recipients per month and district. The table looks like this:
There are 43 variables (months) and 88 observations (districts).
Unfortunately, when I import the dataset with the following code:
D=read.csv2(file="D.csv", header=TRUE, sep=";", dec=".")
all my numbers get converted to characters.
I tried the as.is=T argument, and to use read.delim, as suggested by Sam in this post: Imported a csv-dataset to R but the values becomes factors
but it did not help.
I also tried deleting the first two columns in the original csv file to get rid of the district names (which is the only real non-numeric column) but I stil get characters in the imported data frame. Can you please help how I could retain my numerics?
I have a CSV file with all values in double quotes. One column is the id column and it contains values such as this:
01100170109835
The problem I am having is that no matter what options I specify (as.is=T, stringsAsFactors=F, or numerals='no.loss'), it always reads this id column in as numeric and drops the leading 0's. This is such a fundamental operation that I am really baffled that I can't find a solution.
I am importing several tab separated files into R. In one of the columns, there are numeric IDs which are 18+ digits long. When I use read.table for this, it automatically reads that column as numeric, converts the ID into scientific format (e.g. x.xxE10) and then when I use as.character on this, it results in the same string even if the original IDs were two different numbers.
Is there any way by which I can define in R how to read the data before reading the data? Or in general, how do I solve this problem?
I am simply using the read.table command
df <- read.table(file="data/myfile.txt",sep="\t",header=T, stringsAsFactors=F, encoding="UTF-8")
here is the solution. my file contains 20 columns
df <- read.table(file="data/myfile.txt",sep="\t",header=T, stringsAsFactors=F, encoding="UTF-8", colClasses=c(rep("character",20)))
I am importing a csv of stock data into R, with column names of stock ticker which starts with number and containing space inside, e.g. "5560 JP". After reading into R, the column names are added with "X" and space replaced by ".", e.g. "X5560.JP". After all the works are done in R, I want to write the processed data back to a new csv, but with the original column name, e.g. "5560 JP" instead of "X5560.JP", how can I do that?
Thank you!
When you use write.csv or write.table to save your data to a CSV file, you can set the column names to whatever you like by setting the col.names argument.
But that assumes you have the column names to available.
Once you've read in the data and R has converted the names, you've lost that information. To get around this, you can suppress the conversion to get the column names:
df <- read.csv("mydata.csv", check.names=FALSE)
orig.cols <- colnames(df)
colnames(df) <- make.names(colnames(df))
[your original code]
write.csv(df, col.names=orig.cols)