Importing xlsx into R, header contains dates that are converted - r

I imported data into R, but the column headers in the xlsx file, contain Date types, see sample here:
GrowthValue 15-May 15-Jun 15-Jul 15-Aug 15-Sep 15-Oct
So in the table header of the spreadsheet 15-May gets translated to the variable name X42505 in R.
I could not find anything in my searches. How do you preserve the Date in the header?

R doesn't like numbers as column names, so it adds X as a prefix.
You should avoid column names that start with numbers, but if that's what you want, here is a solution (source):
read.table(file, check.names=FALSE)
If you want to reference these columns, you quote them.
df$'15-May'

Related

Data import from excel to R (all columns are character classed)

I'm new to r and really need some help with an assignment i have for school.
So I've created an xls file containing returns for companies as decimals i.e 0.023 (2.3% return)
Data is in 3 columns with some negative values. titles for each column in the first row. No row names present just 130 observations of returns and the company names (column names) at the top. All the cells are formatted to general
I converted the xls file to csv on my mac so the file type became CSV-UTF-8 (comma delimited).
When i try to create a dataset in r I imported the csv using read.table command:
read.table(”filename.csv”, header = TRUE, sep =”;” row.names=null)
The dataset looks good all the individual numbers in the right place but when I try
Sapply(dataset, class)
All columns are character. I've tried as.numeric and it says list object cannot be coerced to type ’double’
The issue comes from the fact that you imported a dataset with commas and R cannot interpret this as numeric (it requires dot as decimals separator).
Two ways to avoid this :
You import as you did and you convert your dataframe
dataset=apply(apply(dataset, 2, gsub, patt=",", replace="."), 2, as.numeric)
You directly import the dataset by intepreting commas as decimals separator with read.csv2
library(readr)
read.csv2("filename.csv",fill=TRUE,header=TRUE)

How to separate one column into many columns in a .txt file?

I've been given a data set for a project that I need to reformat in order to work with it.
The problem is that all of the column names and corresponding values are mashed into one column in the file. As shown in the picture.
I'm new to R so I hardly know how to work with complex commands.
My Questions:
Is there a simple way to separate this from 1 column into 12 columns?
Desire Output:
I'll also need to remove the periods between the column names and the semicolons between the values.
I just need to be able to do basic statistical analysis on the table.
Thanks
table
Although your data is in one column, it is semi colon separated. The read.csv function has the ability to accept a column separator:
df <- read.csv(file="path/to/your/file.txt", skip=1, header=FALSE, sep=";")
The above call will generate columns based on a ; separator. I skip the first line and ignore the header, because it is a single string. You may manually assign the columns names via:
names(df) <- c("name1", "name2", ..., "name12")

R - Read.csv importing incorrectly due to column names missing

I have a CSV file that's missing column names. The CSV file has 16 columns, but only 9 columns have column names and the rest do not. Additionally, the 7 columns without column names also do not have any data in the first 8 rows.
When i use read.csv(my_file), R loads a dataframe with only 9 columns. It takes the other 7 columns and puts them at the bottom of 7 of the 9 first columns, which is a pain. Any thoughts on how to fix this?
Best,
EDIT: let me know if i should provide my code / the CSV file. didn't attach at first on the off-chance that this is a common problem and could be solved without the CSv.
Take this example csv 'tmp.csv':
a,b,c
1,2,3
4,5,6
7,8,9,10,11
Then you can parse it with the following command:
read.csv('tmp.csv', col.names=letters[1:5], fill=TRUE, header=TRUE)
This reads the header but ignores it, replacing it with your custom column names. Rows with missing values are automatically padded with NA.
You can read it in by specifying the number of columns like this:
df = read.table(file = "file", fill = TRUE, sep = ",", col.names=paste("column", 1:7, sep="_")

Retaining numerical data from csv file

I am trying to import a csv dataset which is about the number of benefit recipients per month and district. The table looks like this:
There are 43 variables (months) and 88 observations (districts).
Unfortunately, when I import the dataset with the following code:
D=read.csv2(file="D.csv", header=TRUE, sep=";", dec=".")
all my numbers get converted to characters.
I tried the as.is=T argument, and to use read.delim, as suggested by Sam in this post: Imported a csv-dataset to R but the values becomes factors
but it did not help.
I also tried deleting the first two columns in the original csv file to get rid of the district names (which is the only real non-numeric column) but I stil get characters in the imported data frame. Can you please help how I could retain my numerics?

Defining variable type before/while importing data

I am importing several tab separated files into R. In one of the columns, there are numeric IDs which are 18+ digits long. When I use read.table for this, it automatically reads that column as numeric, converts the ID into scientific format (e.g. x.xxE10) and then when I use as.character on this, it results in the same string even if the original IDs were two different numbers.
Is there any way by which I can define in R how to read the data before reading the data? Or in general, how do I solve this problem?
I am simply using the read.table command
df <- read.table(file="data/myfile.txt",sep="\t",header=T, stringsAsFactors=F, encoding="UTF-8")
here is the solution. my file contains 20 columns
df <- read.table(file="data/myfile.txt",sep="\t",header=T, stringsAsFactors=F, encoding="UTF-8", colClasses=c(rep("character",20)))

Resources