everyone!
As part of my clinical study I created a xlsx spreadsheet containing a data set. Only columns 2 to 12 and lines 1 to 307 are useful to me. I now manipulate my spreadsheet under R, after importing it (read_excel, etc.).
In my columns 11 and 12 ('data' and 'raw_data'), some cells correspond to dates (for example the first 2 rows of 'data' and 'raw_data'). Indeed, this corresponds to the patient's visit dates. However, as you can see, these dates are given to me in number of days since the origin "1899-12-30". However, I would like to be able to transform them into a current date format (2019-07-05).
My problem is that in these columns I don't only have dates, I have different numerical results (times, means, scores, etc.) .
I started by transforming the class of my columns from character to factor/numeric so that I could better manipulate the columns later. But I can't change only the format of cells corresponding to a date.
Do you know if it is possible to transform only the cells concerned and if so how?
I attach my code and a preview of my data frame.
Part "Unsuccessful trial": I tried with this kind of thing. Of course the date changes format here but as soon as I try to make this change in the data frame it doesn't work.
Thank you for your help!
# Indicate the id of the patient
id = "01_AA"
# Get protocol data of patient
idlst <- dir("/data/protocolData", full.names = T, pattern = id)
# Convert the xlsx database into dataframe
idData <- data.table::rbindlist(lapply(
idlst,
read_excel,
n_max = 307,
range = cell_cols("B:M"), # just keep the table
), fill = TRUE)
idData <- as.tibble(idData)
idData<- idData %>%
mutate_at(vars(1:10), as.factor)%>%
mutate_at(vars(11:length(idData)), as.numeric)
# Unsuccessful trial
as.Date.character(data[1:2,11:12], origin ='1899-12-30')
Thank you for your comments and indeed this is one of the problems with R.
I solved my problem with the following code where idData is my df.
# Change the data format of the date cells of the column Data and Raw_data:
idData$Data[grepl("date",idData$Measure)] <- as.character(as.Date(
as.numeric(
idData$Data[grepl("date",idData$Measure)]),
origin = "1899-12-30"))
Related
I am working from an nc file and after extracting the data to a matrix the time variable is the column variable for this it just gave it a number 1:2087 for the range of time for the dataset. I would like to rename it to the date that they should be (starting at 1981/12/31 to 2021/12/31 where each column is a week) I tried to change the names by using
colnames(tmp_mat) <- rep(seq(as.Date('1982-01-05'), as.Date('2021-12-28'), by = 'weeks'))
this changed the column names but it changed it to a number (the number of days for that date since 1971/01/01.
Does anyone have any suggestions in how to make this work
Your data is a matrix , you have to change it to data.frame then apply your code
tmp_mat = data.frame(tmp_mat)
colnames(tmp_mat) <- rep(seq(as.Date('1982-01-05'), as.Date('2021-12-28'), by = 'weeks'))
I am a newbie to Stackoverflow, stats and R, so apologies for the simple nature of my question/request for advice:
I am completing analysis of a large data-set comprising of 2 files: a txt containing internal temperature data and a second SPSS data file.
To kick off, I have exported the SPSS data into CSV format and stripped back to contain just the few columns i think i need - house type and occupant type. I have imported all the temperature data and merged the two using a common identifier.
So now I have a merged data frame, containing all the data i need (to begin with) to start completing some analysis.
First question: I have year, date and time as separate columns. However the time column has imported with an incorrect date before "30/12/1899". How can i delete the date part of all observations from this column, but retain the time?
Second question Similar to above, the date colum shows the correct date, but has the time following, which is not correct (every observation showing 00:00:00), how can I delete all the times from this column?
Third question How can I combine the correct Time with correct date, to end up with DD/MM/YYYY HH:MM:SS
Fourth question Should i create subsets of merged to facilitate the analysis: ie: each house type (seperate subsets) vs temp, time and occupant type?
Dates can be brought in as they are instead of factor via the parameter as.is = TRUE i.e.
data <- read.csv(choose.files(), as.is = T)
I would try reading the csv file again and then working with the date time. It will come in as a chron or some format like that and you'll need to change it to Posixct, well I do anyway. To view help on a function, type question mark followed by function name i.e. ?as.posixct.
Date.Time: chron "2018/08/04 10:10:00", ... # '%Y-%m-%d %H:%M:%S' current format as read in from my system.
# Date format you want is '%d/%m/%Y %H:%M'
# tz='' is an empty time zone can't remember exactly you probably should read up on
# finally on the left side of the assign <- I am creating a new column Date.
# You can over write the old column, Date.Time, but can't hurt to learn how to delete
# a column.
data$Date <- as.POSIXct(date$Date.Time, tz='', '%d/%m/%Y %H:%M:%S')
# Now remove the original column. -Date.Time take out Date.Time, if you leave the
# minus out, the data will contain the subset Date.Time and no other columns.
data <- subset(data, select = -Date.Time)
Try this first, and I will look into removing time with in a date field. I have an idea, but I'd rather see if this helps with the problem first.
Though if you do want to merge the Year, month, day columns, you could try something like this, seem like a logical thing to do, you can always keep the original format and delete it later. It's not hurting anything.
data$YMD <- paste(data$Year," ",
data$Month, " ",
data$Day)
Also while you are at it. Install a library called dplyr, written by the same guy that did ggplot2, Hadley....
install.packages("dplyr")
# The add it to the top of your file like ggplot.
library(dplyr)
This question is quite difficult to describe, but easy to understand when visualized. I would therefore suggest looking at the two images that I linked to this post to help facilitate understanding the issue.
Here is a link to my practice data frame:
sample.data <-read.table("https://pastebin.com/uAQD6nnM", header=T, sep="\t")
I don't know why I get an error "more columns than column names", because using the same file from my desktop works just fine, however clicking on the link goes to my dataset.
I received very large data frames that are arranged in rows, and I want it to be put it in columns, however it is not that 'easy', because I do not necessarily want (or need) to transpose all the data.
This link appears to be close to what I would like to do, but just not quite the right answer for me Python Pandas: Transpose or Stack?
I have a header with GPS data (Coords_Y, Coords_X), followed by a list of 100+ plant species names. If a species is present at a certain location, the author used the term TRUE, and if not present, they used the term FALSE.
I would like to take this data set I've been sent, create a new column called "species", where it stacks each of the species listed in rows on top of each other , & keeps only data set to TRUE. Therefore, as my images point out, if 2 plants are both present at the same location, then the GPS points will need to be duplicated so no data point is lost, and at the same time, if a certain species is present at many locations, the species name will need to be repeated multiple times in the column. In the end, I will have a dataset that is 1000's of rows long, but only 5 columns in my header row.
Before
After
Here is a way to do it using base R:
# Notice that the link works if you include the /raw/ part
sample.data <-read.table("https://pastebin.com/raw/uAQD6nnM", header=T, sep="\t")
vars <- c("var0", "Var.1", "Coords_y", "Coords_x")
# Just selects the ones marked TRUE for each
alf <- sample.data[ sample.data$Alfaroa.williamsii, vars ]
aln <- sample.data[ sample.data$Alnus.acuminata, vars ]
alf$species <- "Alfaroa.williamsii"
aln$species <- "Alnus.acuminata"
final <- rbind(alf,aln)
final
var0 Var.1 Coords_y Coords_x species
192 191 7.10000 -73.00000 Alfaroa.williamsii
101 100 -13.18000 -71.59000 Alfaroa.williamsii
36 35 10.18234 -84.10683 Alnus.acuminata
38 37 10.26787 -84.05528 Alnus.acuminata
To do it more generally, using dplyr and tidyr, you can use the gather function:
library(dplyr)
library(tidyr)
tidyr::gather(sample.data, key = "species", value = "keep", 5:6) %>%
dplyr::filter(keep) %>%
dplyr::select(-keep)
Just replace the 5:6 with the indices of the columns of different species.
I could not download the data so I made some:
sample.data=data.frame(var0=c(192,36,38,101),var1=c(191,35,37,100),y=c(7.1,10.1,10.2,-13.8),x=c(-73,-84,-84,-71),
Alfaroa=c(T,F,F,T),Alnus=c(T,T,T,F))
the code that gives the requested result is:
dfAlfaroa=sample.data%>%filter(Alfaroa)%>%select(-Alnus)%>%rename("Species"="Alfaroa")%>%replace("Species","Alfaroa")
dfAlnus=sample.data%>%filter(Alnus)%>%select(-Alfaroa)%>%rename("Species"="Alnus")%>%replace("Species","Alnus")
rbind(dfAlfaroa,dfAlnus)
I have run into an issue I do not understand, and I have not been able to find an answer to this issue on this website (I keep running into answers about how to convert dates to numeric or vice versa, but that is exactly what I do not want to know).
The issue is that R converts values that are formatted as a date (for instance "20-09-1992") to numeric values when you assign them to a matrix or data frame.
For example, we have "20-09-1992" with a date format, we have checked this using class().
as.Date("20-09-1992", format = "%d-%m-%Y")
class(as.Date("20-09-1992", format = "%d-%m-%Y"))
We now assign this value to a matrix, imaginatively called Matrix:
Matrix <- matrix(NA,1,1)
Matrix[1,1] <- as.Date("20-09-1992", format = "%d-%m-%Y")
Matrix[1,1]
class(Matrix[1,1])
Suddenly the previously date formatted "20-09-1992" has become a numeric with the value 8298. I don't want a numeric with the value 8298, I want a date that looks like "20-09-1992" in date format.
So I was wondering whether this is simply how R works, and we are not allowed to assign dates to matrices and data frames (somehow I have managed to have dates in other matrices/data frames, but it beats me why those other times were different)? Is there a special method to assigning dates to data frames and matrices that I have missed and have failed to deduce from previous (somehow successful) attempts at assigning dates to data frames/matrices?
I don't think you can store dates in a matrix. Use a data frame or data table. If you must store dates in a matrix, you can use a matrix of lists.
Matrix <- matrix(NA,1,1)
Matrix[1,1] <- as.list(as.Date("20-09-1992", format = "%d-%m-%Y"),1)
Matrix
[[1]]
[1] "1992-09-20"
Edited: I also just re-read you had this issue with data frame. I'm not sure why.
mydate<-as.Date("20-09-1992", format = "%d-%m-%Y")
mydf<-data.frame(mydate)
mydf
mydate
1 1992-09-20
Edited: This has been a learning experience for me with R and dates. Apparently the date you supplied was converted to number of days since origin. Origin is defined as Jan 1st,1970. To convert this back to a date format at some point
Matrix
[,1]
[1,] 8298
as.Date(Matrix, origin ="1970-01-01")
[1] "1992-09-20"
try the following: First specify your date vector & then use
rownames(mat) <- as.character(date_vector)
the dates will appear as a text.
This happens mostly when we are loading Excel Workbook
You need to add detectDates = TRUE in the function
DataFrame <- read.xlsx("File_Nmae", sheet = 3, detectDates = TRUE)
I'm new to R (having worked in C++ and Python before) so this is probably just a factor of me not knowing some of R's nuances.
The program I'm working on is supposed to construct matrices of data by date. Here's how I might initialize such a matrix:
dates <- seq(as.Date("1980-01-01"), as.Date("2013-12-31"), by="days")
HN3 <- matrix(nrow=length(dates), ncol = 5, dimnames = list(as.character(dates), c("Value1", "Value2", "Value3", "Value4", "Value5")))
Notice that dates includes every day between 1980 and 2013.
So, from there, I have files containing certain dates and measurements of Value1, etc for those dates, and I need to read those files' contents into HN3. But the problem is that most of the files don't contain measurements for every day.
So what I want to do is read a file into a dataframe (say, v1read) with column 1 being dates and column 2 being the desired data. Then I'd match the dates of v1read to that date's row in HN3 and copy all of the relevant v1read values that way. Here is my attempt at doing so:
for (i in 1:nrow(v1read)) {
HN3[as.character(v1read[i,1]),Value1] <- v1read[i,4]
}
This gives me an out of index range error when the value of i is bumped up unexpectedly. I understand that R doesn't like to iterate through dates, but since the iterator itself is a numeric value rather than a date, I was hoping I'd found a loophole.
Any tips on how to accomplish this would be enormously appreciated.
Let's use library(dplyr). Start with
dates = seq(as.Date("1980-01-01"), as.Date("2013-12-31"), by="days")
HN3 = data.frame(Date=dates)
Now, load in your first file, the one that has a date and Value1.
file1 = read.file(value1.file) #I'm assuming this file has a column already named "Date" and one named #Value1
HN3 = left_join(HN3,file1,by="Date")
This will do a left join (SQL style) matching only the rows where a date exists and filling in the rest with NA. Now you have a data frame with two columns, Date and Value1. Load in your other files, do a left_join with each and you'll be done.