I have the following data:
Data <- data.frame(Project=c(123,123,123,123,123,123,124,124,124,124,124,124),
Date=c("12/27/2016 15:16","12/27/2016 15:20","12/27/2016 15:24","12/27/2016 15:28","12/27/2016 15:28","12/27/2016 15:42","12/28/2016 7:22","12/28/2016 7:26","12/28/2016 7:35","12/28/2016 11:02","12/28/2016 11:02","12/28/2016 11:28"),
OldValue=c("","Open","In Progress","Open","System Declined","In Progress","System Declined","Open","In Progress","Open","Complete","In Progress"),
NewValue=c("Open","In Progress","System Declined","In Progress","Open","System Declined","Open","In Progress","Complete","In Progress","Open","Complete"))
The data is already ordered by Project, then Date.
However, if there are two rows with the same Date (such as rows 4,5 and 10,11) I want to designate the order based on OldValue. So I'd like row 5 ahead of row 4, and row 11 ahead of row 10.
How can I go about doing this?
#Assign Desired order to the OldValue, CHANGE "y" IF NECESSARY
OldValue_order = data.frame(OldValue = c("","Open","In Progress","System Declined","Complete"), y = c(0,4,2,1,3))
# We'll need lookup command to copy desired order to the "Data"
library(qdapTools)
Data$OV_order = lookup(Data$OldValue, OldValue_order) # Adds new column to "Data"
# Arrange the data.frame in desired order
Data = Data[with(Data, order(Project, as.POSIXct(Date, format = "%m/%d/%Y %H:%M"), OV_order)),]
#Remove the added column
Data = Data[1:4]
Related
I want to get values of kpi's column and set these values to the header of new data table.In this app I want the admin creates the KPIs and let the user to populate the values in the second data table.
So I want to have a table such as bellow:
server.r
GetTableMetadata <- function() {
fields <- c(id = "Id",
name = "Name",
used_shiny = "Used Shiny",
r_num_years = "R Years")
result <- list(fields = fields)
return (result)
}
#display table in wide format
output$viewresponses<-DT::renderDataTable({
viewDF<-(as.data.frame(responses))
viewDF %>% spread(GetTableMetadata()$fields$name,GetTableMetadata()$fields$used_shiny)[-1]
})
The error that I got is:
Error : Invalid column specification
I have used library(tidyr) package to convert the long format to wide format
KPI <- c('cost','time','quality','time','time')
measurements <- c(1, 2, 3,2,1)
kpi.data <- data.frame(KPI, measurements)
The kpi.data is as following (long format):
KPI measurements
1 cost 1
2 time 2
3 quality 3
4 time 2
5 time 1
By using spread function from tidyr it long format will be convert to wide format
kpi.data %>% spread(KPI,measurements)
Problem:
I have a list of 4 rows (for each hour) with values which are datetime indexed. Now I want to have 16 (4*4) rows with each value copied 3 times and filled in Forward.
My Question: How can I tell Pandas/Python to write the last three lines?
Thats what i want
My try:
create dataframe
df = pd.DataFrame(
{'A' : [4,5,6,7], 'B' : [10,20,30,40],'C' : [100,50,-30,-50]})
create date
date_60min = pd.date_range(
'1/1/2013', periods=4, freq='60min', tz='Europe/Berlin')
add date
df['Date'] = date_60min
set date to index
df_date = df.set_index('Date')
show df_date
df_date
Variation 1 with resmaple
df_resample15min = df_date.resample(
'15Min',fill_method='ffill', label='left', closed='right')
df_resample15min
Variation 2 with asfreq
df_asfreq15min = df_date.asfreq('15Min', method='pad')
df_asfreq15min
I have two dataframes, one which contains a timestamp and air_temperature
air_temp time_stamp
85.1 1396335600
85.4 1396335860
And another, which contains startTime, endTime, location coordinates, and a canonical name.
startTime endTime location.lat location.lon name
1396334278 1396374621 37.77638 -122.4176 Work
1396375256 1396376369 37.78391 -122.4054 Work
For each row in the first data frame, I want to identify which time range in the second data frame it lies in, i.e if the timestamp 1396335600, is between the startTime 1396334278, and endTime 1396374621, add the location and name value to the row in the first data.frame.
The start and end time in the second data frame don't overlap, and are linearly increasing. However they are not perfectly continuous, so if the timestamp falls between two time bands, I need to mark the location as NA. If it does fit between the start and end times, I want to add the location.lat, location.lon, and name columns to the first data frame.
Appreciate your help.
Try this. Not tested.
newdata <- data2[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime ,3:5]
data1 <- cbind(data1[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime,],newdata)
This won't return any values if timestamp isn't between startTime and endTime, so in theory your returned dataset could be shorter than the original. Just in case I treated data1 with the same TRUE FALSE vector as data2 so they will be the same length.
Interesting problem... Turned out to be more complicated than I originally thought!!
Step1: Set up the data!
DF1 <- read.table(text="air_temp time_stamp
85.1 1396335600
85.4 1396335860",header=TRUE)
DF2 <- read.table(text="startTime endTime location.lat location.lon name
1396334278 1396374621 37.77638 -122.4176 Work
1396375256 1396376369 37.78391 -122.4054 Work",header=TRUE)
Step2: For each time_stamp in DF1 compute appropriate index in DF2:
index <- sapply(DF1$time_stamp,
function(i) {
dec <- which(i >= DF2$startTime & i <= DF2$endTime)
ifelse(length(dec) == 0, NA, dec)
}
)
index
Step3: Merge the two data frames:
DF1 <- cbind(DF1,DF2[index,3:5])
row.names(DF1) <- 1:nrow(DF1)
DF1
Hope this helps!!
rowidx <- sapply(dfrm1$time_stamp, function(x) which( dfrm2$startTime <= x & dfrm2$endTime >= x)
cbind(dfrm1$time_stamp. dfrm2[ rwoidx, c("location.lat","location.lon","name")]
Mine's not test either and looks substantially similar to CCurtis, so give him the check if it works.
I have data over 3 years that I would like to plot.
However I would like to plot each year side by side.
In order to do this, I'd like to make the date 03/17/2010 become 03/17, so that it lines up with 03/17/2011.
any ideas how to do that in R?
Here is an image of what I'd like it to look like:
R has its own Date representation, which you should use. Once you convert data to Date it is easy to manipulate their format using the format function.
http://www.statmethods.net/input/dates.html
as an example
> d <- as.Date( "2010-03-17" )
> d
[1] "2010-03-17"
> format( d, format="%m/%d")
[1] "03/17"
or with your data style
> format( as.Date("03/17/2010", "%m/%d/%Y"), format="%m/%d")
[1] "03/17"
You can use R's built in style for dates, using as.Date() and format to choose only month and day:
> dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92")
> format(as.Date(dates, "%m/%d/%y"), "%m/%d")
[1] "02/27" "02/27" "01/14" "02/28" "02/01"
For your example, just use your own dates.
I found this out using R's help where the previous was the example:
> ?as.Date
> ?format
Here's my solution:
It involves formatting the date to a string (without year) and then back to a date, which will default all of the dates to the same (current year).
The code and sample input file are below:
Code
# Clear all
rm(list = ls())
# Load the library that reads xls files
library(gdata)
# Get the data in
data = read.csv('Readings.csv')
# Extract each Column
readings = data[,"Reading"]
dates = as.Date(data[,"Reading.Date"])
# Order the data correctly
readings = readings[order(dates)]
dates = dates[order(dates)]
# Calculate the difference between each date (in days) and readings
diff.readings = diff(readings)
diff.dates = as.numeric(diff(dates)) # Convert from days to an integer
# Calculate the usage per reading period
usage.per.period = diff.readings/diff.dates
# Get Every single day between the very first reading and the very last
# seq will create a sequence: first argument is min, second is max, and 3rd is the step size (which in this case is 1 day)
days = seq(min(dates),max(dates), 1)
# This creates an empty vector to get data from the for loop below
usage.per.day = numeric()
# The length of the diff.dates is the number of periods that exist.
for (period in 1:(length(diff.dates))){
# to convert usage.per.period to usage.per.day, we need to replicate the
# value for the number of days in that period. the function rep will
# replicate a number: first argument is the number to replicate, and the
# second number is the number of times to replicate it. the function c will
# concatinate the current vector and the new period, sort of
# like value = value + 6, but with vectors.
usage.per.day = c(usage.per.day, rep(usage.per.period[period], diff.dates[period]))
}
# The for loop above misses out on the last day, so I add that single value manually
usage.per.day[length(usage.per.day)+1] = usage.per.period[period]
# Get the number of readings for each year
years = names(table(format(dates, "%Y")))
# Now break down the usages and the days by year
# list() creates an empty list
usage.per.day.grouped.by.year = list()
year.day = list()
# This defines some colors for plotting, rainbow(n) will give you
colors = rainbow(length(years))
for (year.index in 1:length(years)){
# This is a vector of trues and falses, to say whether a day is in a particular
# year or not
this.year = (days >= as.Date(paste(years[year.index],'/01/01',sep="")) &
days <= as.Date(paste(years[year.index],'/12/31',sep="")))
usage.per.day.grouped.by.year[[year.index]] = usage.per.day[this.year]
# We only care about the month and day, so drop the year
year.day[[year.index]] = as.Date(format(days[this.year], format="%m/%d"),"%m/%d")
# In the first year, we need to set up the whole plot
if (year.index == 1){
# create a png file with file name image.png
png('image.png')
plot(year.day[[year.index]], # x coords
usage.per.day.grouped.by.year[[year.index]], # y coords
"l", # as a line
col=colors[year.index], # with this color
ylim = c(min(usage.per.day),max(usage.per.day)), # this y max and y min
ylab='Usage', # with this lable for y axis
xlab='Date', # with this lable for x axis
main='Usage Over Time') # and this title
}
else {
# After the plot is set up, we just need to add each year
lines(year.day[[year.index]], # x coords
usage.per.day.grouped.by.year[[year.index]], # y coords
col=colors[year.index]) # color
}
}
# add a legend to the whole thing
legend("topright" , # where to put the legend
legend = years, # what the legend names are
lty=c(1,1), # what the symbol should look like
lwd=c(2.5,2.5), # what the symbol should look like
col=colors) # the colors to use for the symbols
dev.off() # save the png to file
Input file
Reading Date,Reading
1/1/10,10
2/1/10,20
3/6/10,30
4/1/10,40
5/7/10,50
6/1/10,60
7/1/10,70
8/1/10,75
9/22/10,80
10/1/10,85
11/1/10,90
12/1/10,95
1/1/11,100
2/1/11,112.9545455
3/1/11,120.1398601
4/1/11,127.3251748
5/1/11,134.5104895
6/1/11,141.6958042
7/1/11,148.8811189
8/1/11,156.0664336
9/17/11,190
10/1/11,223.9335664
11/1/11,257.8671329
12/1/11,291.8006993
1/1/12,325.7342657
2/1/12,359.6678322
3/5/12,375
4/1/12,380
5/1/12,385
6/1/12,390
7/1/12,400
8/1/12,410
9/1/12,420
seasonplot() does this very well!
I have multiple datasets of physical variables, and I want to do some work on it with R. However, I would like to use a list. Here is my code for 1 of my dataframe :
# Table definition
df.jannuary <- read.table("C:\\...file1.csv", sep=";")
# Subset of the table containing only variables of interest
df.jannuary_sub <- subset(df.jannuary, select=c(2:8, 11:12))
# Column names
colnames(df.jannuary_sub)<-c("year","day","hour","minute","temp_air","temp_eau","humidity_rel","wind_intensity","wind_direction")
# Aggregation of the 4 Year-Day-Hour-Minute columns into a single column and conversion into a POSIXct objet through the temporary column "timestamp"
df.jannuary_sub$timestamp <- as.POSIXct(paste(df.jannuary_sub$year, df.jannuary_sub$day, df.jannuary_sub$hour, df.jannuary_sub$minute), format="%Y %j %H %M", tz="GMT")
# Getting the date with a new format from julian day to normal day into a column called "date"
df.jannuary_sub$date <- format(df.jannuary_sub$timestamp,"%d/%m/%Y %H:%M",tz = "GMT")
# Suppression of the 4 Year-Day-Hour-Minute initial columns and of the temporary column "timestamp", and placement of the date column as column 1
df.jannuary_sub <- subset(df.jannuary_sub, select=c(11, 5:9))
This code works. The thing is I got all the months of the year, for several years.
So I started to use a list, here is the example for the year 2011 :
df.jannuary <- read.table("C:\\...\file1.dat", sep=",")
#...
df.december <- read.table("C:\\...\file12.dat", sep=",")
# Creation of a list containing the month datasets, with a subset of the tables containing only variables of interest
list.dataset_2011<-list(
df.jannuary_sub <- subset(df.jannuary, select=c(2:8, 11:12)),
#...
df.december_sub <- subset(df.december, select=c(2:8, 11:12))
)
# Column names for all variables of the list for (j in 1:12)
{
colnames(list.dataset_2011[[j]])<-c("year","day","hour","minute","temp_air","temp_eau","humidity_rel","wind_intensity","wind_direction")
}
# Conversion of the list into a data.frame called "list.dataset_2011" for (i in 1:9)
{
list.dataset_2011[[i]]<-as.data.frame(list.dataset_2011[[i]])
}
# Aggregation of the 4 Year-Day-Hour-Minute columns into a single column and conversion into a POSIXct objet through the temporary column "timestamp"
list.dataset_2011$timestamp <- as.POSIXct(paste(list.dataset_2011$year, list.dataset_2011$day, list.dataset_2011$hour, list.dataset_2011$minute), format="%Y %j %H %M", tz="GMT")
# Getting the date with a new format from julian day to normal day into a column called "date"
list.dataset_2011$date <- format(list.dataset_2011$timestamp,"%d/%m/%Y %H:%M",tz = "GMT")
# Suppression of the 4 Year-Day-Hour-Minute initial columns and of the temporary column "timestamp", and placement of the date column as column 1
list.dataset_2011 <- subset(list.dataset_2011, select=c(11, 5:9))
I encounter a problem at the end of my code (hoping the rest is working !) with the subset command, which doesn't appear to work for the attribute "list".