I have a .csv file which I have read into R as a dataframe (say df).
The first column is date in mm/dd/yyyy format. The second column is a double number. What I want to do is to create a new dataframe like:
df2<-data.frame(date=c(df[10,1],df[15,2]),num=c(111,222))
When I try to do this I get very messy df2. Most probably I am doing it wrong because I do not understand the data frame concept.
Whenever I try to do df[10,1], the output is the 10th row and 1st column of df, including all the levels of column 1.
You can control how R will interpret the classes of data being read in by specifying a vector of column classes as an argument to read.table with colClasses. Otherwise R will use type.convert which will convert a character vector in a "logical" fashion, according to R's definition of logical. That obviously has some potential quirks to it if you aren't familiar with them.
You can also prevent R from creating a factor by specifying stringsAsFactors = FALSE as an argument in read.table, this is generally an easier option than specifying all of the colClasses.
You can format the date with strptime(). Taking all of this into consideration, I would recommend reading your data into R without turning character data into factors and then use strptime to format.
df <- read.csv("myFile.csv", stringsAsFactors = FALSE)
#Convert time to proper time format
df$time <- strptime(df$time, "%m/%d/%Y")
if you don't want to type out stringsAsFactors=FALSE each time you read in / construct a data frame. you can at the outset specify
options(stringsAsFactors=FALSE)
Related
I'm a Rookie with R. I have read in a Data Frame from Excel in R with the read.csv2 call, (Converted the Excel-file into csv).
I changed every Date in the table to a Y-M-D Format and wanted to use:
lapply(df$dates, as.Date, Format = "%Y/%m/%d")
but it produces NAs for every Date then.
When i ask for the mode it says the Dates are "numeric".
I tried to convert into character before into Dates with:
lapply(df$dates, as.character)
I dont know why it producs the NAs. Can someone help?
If you want to avoid the pain of finding the good format, there is dataPreparation package which provide a function to do that easily.
require(dataPreparation)
df <- setColAsDate(df, cols = "dates")
It will try to guess the format among thousand of various formats.
(NB: Please note that I'm the developer of this package.)
I have run into an issue I do not understand, and I have not been able to find an answer to this issue on this website (I keep running into answers about how to convert dates to numeric or vice versa, but that is exactly what I do not want to know).
The issue is that R converts values that are formatted as a date (for instance "20-09-1992") to numeric values when you assign them to a matrix or data frame.
For example, we have "20-09-1992" with a date format, we have checked this using class().
as.Date("20-09-1992", format = "%d-%m-%Y")
class(as.Date("20-09-1992", format = "%d-%m-%Y"))
We now assign this value to a matrix, imaginatively called Matrix:
Matrix <- matrix(NA,1,1)
Matrix[1,1] <- as.Date("20-09-1992", format = "%d-%m-%Y")
Matrix[1,1]
class(Matrix[1,1])
Suddenly the previously date formatted "20-09-1992" has become a numeric with the value 8298. I don't want a numeric with the value 8298, I want a date that looks like "20-09-1992" in date format.
So I was wondering whether this is simply how R works, and we are not allowed to assign dates to matrices and data frames (somehow I have managed to have dates in other matrices/data frames, but it beats me why those other times were different)? Is there a special method to assigning dates to data frames and matrices that I have missed and have failed to deduce from previous (somehow successful) attempts at assigning dates to data frames/matrices?
I don't think you can store dates in a matrix. Use a data frame or data table. If you must store dates in a matrix, you can use a matrix of lists.
Matrix <- matrix(NA,1,1)
Matrix[1,1] <- as.list(as.Date("20-09-1992", format = "%d-%m-%Y"),1)
Matrix
[[1]]
[1] "1992-09-20"
Edited: I also just re-read you had this issue with data frame. I'm not sure why.
mydate<-as.Date("20-09-1992", format = "%d-%m-%Y")
mydf<-data.frame(mydate)
mydf
mydate
1 1992-09-20
Edited: This has been a learning experience for me with R and dates. Apparently the date you supplied was converted to number of days since origin. Origin is defined as Jan 1st,1970. To convert this back to a date format at some point
Matrix
[,1]
[1,] 8298
as.Date(Matrix, origin ="1970-01-01")
[1] "1992-09-20"
try the following: First specify your date vector & then use
rownames(mat) <- as.character(date_vector)
the dates will appear as a text.
This happens mostly when we are loading Excel Workbook
You need to add detectDates = TRUE in the function
DataFrame <- read.xlsx("File_Nmae", sheet = 3, detectDates = TRUE)
I generally like R, but the type conversion issues are driving me crazy.
Following issue:
I read a data frame from a database connection. The result is a data frame with character columns.
I know that the first column is a date format - all the others are numeric. However, no matter how I tried to convert the character columns of the data frame into the correct types, it didn't work out.
Upon conversion of the data frame into a matrix and then back into a data frame, all columns became type factor - and casting factors into numerics created wrong results cause the indices of the factor levels were converted instead of the real values.
Moreover, if the table is big in size - I do not want to convert each column manually. Isn't there a way to get this done automatically?
We can use type.convert by looping over the columns of the dataset with lapply. Convert the columns to character and apply the type.convert. If it is is a character class, it will convert to factor which we can reconvert it to Date class (as there is only a single column with character class. It is not sure about the format of the 'Date' class, so in case it is a different format, specify the format argument in as.Date).
df1[] <- lapply(df1, function(x) {x1 <- type.convert(as.character(x))
if(is.factor(x1))
as.Date(x1) else x1})
Here is the existing data:
I have 2 columns of data. Each row of the first column has data whereas only certain rows of the second column has data (others being blank). I want to convert the format of the data with the help of as.POSIXct(). For the first column I used the following code (I named the data frame as 'mrkt'):
mrkt[1]<-lapply(mrkt[1],as.POSIXct)
This worked well in terms of converting the existing data to the right format
For the second column the above code won't work as the as.POSIXct() cannot address "" values. So I wrote a loop instead:
for (i in 1:dim(mrkt[2])[1]){
if (!as.character(mrkt[[2]][i])==""){
mrkt$open_time[i]<-as.POSIXct(mrkt$open_time[i])
}
}
However this is giving me weird outputs in the form of a number. How can I avoid that? Here is the output:
An easy way to do this would be to do this:
library(plyr)
library(dplyr)
mrkt %>%
mutate(send_time = send_time %>%
as.POSIXct,
open_time = open_time %>%
mapvalues("", NA) %>%
as.POSIXct)
This is due to implicit typecasting from POSIXct to numeric. This only happens in the loop because the vector has an assigned type and values are casted to this type if single values are assigned. When the whole vector is replaced a new vector is created with the right type.
The simplest solution is to use as.POSIXct(strptime(mrkt$open_time, format=yourformat)), with a correctly defined format, see ?strptime for the formats. This is vectorized, and strptime handles empty Strings correctly (returning NA).
I have two columns I am trying to subtract and put into a new one, but one of them contains values that read '#NULL!', after converting over from SPSS and excel, so R reads it as a factor and will not let me subtract. What is the easiest way to fix it knowing I have 19,000+ rows of data?
While reading the dataset using read.table/read.csv, we can specify the na.strings argument for those values that needs to be transformed to 'NA' or missing values. So, in your dataset it would be
dat <- read.table('yourfile.txt', na.strings=c("#NULL!", "-99999", "-88888"),
header=TRUE, stringsAsFactors=FALSE)