How convert data between long and wide format in datatable - r

I want to get values of kpi's column and set these values to the header of new data table.In this app I want the admin creates the KPIs and let the user to populate the values in the second data table.
So I want to have a table such as bellow:
server.r
GetTableMetadata <- function() {
fields <- c(id = "Id",
name = "Name",
used_shiny = "Used Shiny",
r_num_years = "R Years")
result <- list(fields = fields)
return (result)
}
#display table in wide format
output$viewresponses<-DT::renderDataTable({
viewDF<-(as.data.frame(responses))
viewDF %>% spread(GetTableMetadata()$fields$name,GetTableMetadata()$fields$used_shiny)[-1]
})
The error that I got is:
Error : Invalid column specification

I have used library(tidyr) package to convert the long format to wide format
KPI <- c('cost','time','quality','time','time')
measurements <- c(1, 2, 3,2,1)
kpi.data <- data.frame(KPI, measurements)
The kpi.data is as following (long format):
KPI measurements
1 cost 1
2 time 2
3 quality 3
4 time 2
5 time 1
By using spread function from tidyr it long format will be convert to wide format
kpi.data %>% spread(KPI,measurements)

Related

How to use the age_calc function with a "date of birth"column?

How can you use age_calc function in Rstudio to calculate the age of employees (in years), provided you have a large data set (more than 19 000 entries) where "Date of Birth " is a column name, and then add the ages as a new column to the data set?.
1.Create reproducible example data
data <- data.frame(ID = 1:5,
Date_of_Birth = c("1999-03-05", "1999-05-19", "1999-07-01", "1999-02-27", "1999-06-11"))
2.Convert date string column to actual date using as.Date
data$Date_of_Birth <- as.Date(data$Date_of_Birth, "%Y-%m-%d")
3.Define calc_age_days function that takes dates as argument and returns the number of days since that date
calc_age_days <- function(date_value) {
return(Sys.Date() - date_value)
}
4.Use calc_age_days function to calculate Age column
data$Age <- calc_age_days(date_value=data$Date_of_Birth)

How to assign columns based on caracter position?

My data file doesn't have any columns classification and the row1 looks like this:
AB365960091120112011311260000005311300000001ES020000040036ES1400N
I know that characters from 1 to 8 data refer to ID, from 9 to 15 refer to year of birth, from 16 to 28 refer to year of dead and so on. How can I create a table separate according to the character position? What is the way to indicate that ID = character from 1 to 8, for example in R lenguage?
I want my table to look like this:
ID birth date death date
AB36596 9112011 201131126
You can use read_fwf from readr package.
library(readr)
library(dplyr)
df <- read_fwf(file = "test.txt", fwf_widths(c(9, 7, 9))) %>%
`colnames<-`(c("id", "birth date", "death date"))
df
Output is:
id `birth date` `death date`
1 AB3659600 9112011 201131126
Sample data:
test.txt having
AB365960091120112011311260000005311300000001ES020000040036ES1400N
Here a solution based on your example:
Input data:
x<-"AB365960091120112011311260000005311300000001ES020000040036ES1400N"
Split the string in each variable and add them in a data.frame
df<-data.frame(ID=substr(x,1,7),
birth_date=substr(x,10,16),
death_date=substr(x,17,25))
Your desired output
df
ID birth_date death_date
1 AB36596 9112011 201131126
Using the same approach and substr function you will be able to extract all information.

Rearranging rows of data with same value

I have the following data:
Data <- data.frame(Project=c(123,123,123,123,123,123,124,124,124,124,124,124),
Date=c("12/27/2016 15:16","12/27/2016 15:20","12/27/2016 15:24","12/27/2016 15:28","12/27/2016 15:28","12/27/2016 15:42","12/28/2016 7:22","12/28/2016 7:26","12/28/2016 7:35","12/28/2016 11:02","12/28/2016 11:02","12/28/2016 11:28"),
OldValue=c("","Open","In Progress","Open","System Declined","In Progress","System Declined","Open","In Progress","Open","Complete","In Progress"),
NewValue=c("Open","In Progress","System Declined","In Progress","Open","System Declined","Open","In Progress","Complete","In Progress","Open","Complete"))
The data is already ordered by Project, then Date.
However, if there are two rows with the same Date (such as rows 4,5 and 10,11) I want to designate the order based on OldValue. So I'd like row 5 ahead of row 4, and row 11 ahead of row 10.
How can I go about doing this?
#Assign Desired order to the OldValue, CHANGE "y" IF NECESSARY
OldValue_order = data.frame(OldValue = c("","Open","In Progress","System Declined","Complete"), y = c(0,4,2,1,3))
# We'll need lookup command to copy desired order to the "Data"
library(qdapTools)
Data$OV_order = lookup(Data$OldValue, OldValue_order) # Adds new column to "Data"
# Arrange the data.frame in desired order
Data = Data[with(Data, order(Project, as.POSIXct(Date, format = "%m/%d/%Y %H:%M"), OV_order)),]
#Remove the added column
Data = Data[1:4]

time series: resample from hour to quarter. Last rows missing

Problem:
I have a list of 4 rows (for each hour) with values which are datetime indexed. Now I want to have 16 (4*4) rows with each value copied 3 times and filled in Forward.
My Question: How can I tell Pandas/Python to write the last three lines?
Thats what i want
My try:
create dataframe
df = pd.DataFrame(
{'A' : [4,5,6,7], 'B' : [10,20,30,40],'C' : [100,50,-30,-50]})
create date
date_60min = pd.date_range(
'1/1/2013', periods=4, freq='60min', tz='Europe/Berlin')
add date
df['Date'] = date_60min
set date to index
df_date = df.set_index('Date')
show df_date
df_date
Variation 1 with resmaple
df_resample15min = df_date.resample(
'15Min',fill_method='ffill', label='left', closed='right')
df_resample15min
Variation 2 with asfreq
df_asfreq15min = df_date.asfreq('15Min', method='pad')
df_asfreq15min

How to subset a list of dataframes in R?

I have multiple datasets of physical variables, and I want to do some work on it with R. However, I would like to use a list. Here is my code for 1 of my dataframe :
# Table definition
df.jannuary <- read.table("C:\\...file1.csv", sep=";")
# Subset of the table containing only variables of interest
df.jannuary_sub <- subset(df.jannuary, select=c(2:8, 11:12))
# Column names
colnames(df.jannuary_sub)<-c("year","day","hour","minute","temp_air","temp_eau","humidity_rel","wind_intensity","wind_direction")
# Aggregation of the 4 Year-Day-Hour-Minute columns into a single column and conversion into a POSIXct objet through the temporary column "timestamp"
df.jannuary_sub$timestamp <- as.POSIXct(paste(df.jannuary_sub$year, df.jannuary_sub$day, df.jannuary_sub$hour, df.jannuary_sub$minute), format="%Y %j %H %M", tz="GMT")
# Getting the date with a new format from julian day to normal day into a column called "date"
df.jannuary_sub$date <- format(df.jannuary_sub$timestamp,"%d/%m/%Y %H:%M",tz = "GMT")
# Suppression of the 4 Year-Day-Hour-Minute initial columns and of the temporary column "timestamp", and placement of the date column as column 1
df.jannuary_sub <- subset(df.jannuary_sub, select=c(11, 5:9))
This code works. The thing is I got all the months of the year, for several years.
So I started to use a list, here is the example for the year 2011 :
df.jannuary <- read.table("C:\\...\file1.dat", sep=",")
#...
df.december <- read.table("C:\\...\file12.dat", sep=",")
# Creation of a list containing the month datasets, with a subset of the tables containing only variables of interest
list.dataset_2011<-list(
df.jannuary_sub <- subset(df.jannuary, select=c(2:8, 11:12)),
#...
df.december_sub <- subset(df.december, select=c(2:8, 11:12))
)
# Column names for all variables of the list for (j in 1:12)
{
colnames(list.dataset_2011[[j]])<-c("year","day","hour","minute","temp_air","temp_eau","humidity_rel","wind_intensity","wind_direction")
}
# Conversion of the list into a data.frame called "list.dataset_2011" for (i in 1:9)
{
list.dataset_2011[[i]]<-as.data.frame(list.dataset_2011[[i]])
}
# Aggregation of the 4 Year-Day-Hour-Minute columns into a single column and conversion into a POSIXct objet through the temporary column "timestamp"
list.dataset_2011$timestamp <- as.POSIXct(paste(list.dataset_2011$year, list.dataset_2011$day, list.dataset_2011$hour, list.dataset_2011$minute), format="%Y %j %H %M", tz="GMT")
# Getting the date with a new format from julian day to normal day into a column called "date"
list.dataset_2011$date <- format(list.dataset_2011$timestamp,"%d/%m/%Y %H:%M",tz = "GMT")
# Suppression of the 4 Year-Day-Hour-Minute initial columns and of the temporary column "timestamp", and placement of the date column as column 1
list.dataset_2011 <- subset(list.dataset_2011, select=c(11, 5:9))
I encounter a problem at the end of my code (hoping the rest is working !) with the subset command, which doesn't appear to work for the attribute "list".

Resources