Printing row names for values in a matrix - r

Im having problems printing the rowname for specific values within a matrix. The following two questions have been difficult.
On which day(s) did she arrive the fastest in the first week? (Only the day(s) of the week should print. (Hint: Use the row names.)
Determine the day(s) of the second week on which she arrived to work within a half an hour. (Only the day(s) of the week should print.)
This is the data set called commutes
Week1 Week2
Monday 26 22
Tuesday 35 23
Wednesday 24 36
Thursday 31 32
Friday 34 25

1) You can use the which() function to find the index of the smallest value in the first column. You provide which() with a logical object (in this case, a vectorized equal test). Supposing you have your matrix bound to m:
ind = which(m[,'Week1'] == min(m[,'Week1']))
You can then take the use the index to get the row name matching that logical using rownames():
day = rownames(m)[ind]
2) This is essentially the same thing, except you will be expecting a vector of indices rather than a single index. Again use which() to find the indices which match the desired logical expression:
inds = which(m$Week2 < 30)
days = rownames(m)[inds]

Related

R: Function to repeat counting words in strings with different arguments

I am counting the sum of words of strings with specific arguments e.g. for weeks (week 1 = 1, week 2 = 2 and so on) with the following command:
sum(data[which(data[,17]==1), 19])
[,17] is the column in the data frame of the numeric argument of the week which has to be 1 for week 1
, 19 is the column in the data frame of the number of words of each string
I have 31 weeks and 228.000 strings and I do not want to execute each command for each week seperately so I am searching for a function which can do it automatically for week 1-31 and gives me the results.
Thanks for helping!

Nested loop in R not giving expected outputs

I wanted to use the nested loop below to work out a variable 'data' for every day within a number of years.
x is a vector of length 20 (number of years) and each of the 20 entries is the number of days the inner loop is to run for.
I also have a vector 'start' that has 20 dates in the format "1981-02-01".
I wanted to create a matrix of the output (data) that would have the data for each day in rows and then one column per year.
The code I am using below however does not seem to be updating the counters (yrcntr and daycntr) which is causing the whole thing to not work.
Also, when I try to assign values to 'data' within the loop using the counters as indices (data[daycntr yrcntr]),it's not working.
I'm not even getting an error.
I'm not sure how to write out the format of 'data' used below here, but I'll give it a go:
datamat=
tmax tmin date
11 4 "1981-03-31"
13 6 "1981-04-01"
12 7 "1981-04-02"
and 'start' is a vector of dates in the format: `"1981-04-02" "1981-04-03"
tmax<-datamat[,1]
tmin<-datamat[,2]
tdates<-datamat[,3]
yrcntr<-0;
daycntr<-0;
for (yr in 1:length(x)){
yrcntr<-yrcntr+1
#find the row in the temp data that matches the startdate each year
tempidx<- (which(tdates==start[yrcntr]))-1
for (days in 1:numdays[yr]){
daycntr<-daycntr+1
dlytempidx=tempidx+1
data[daycntr yrcntr]<- (tmax[dlytempidx]+tmin[dlytempidx])
}
rm(tempidx)
}

Efficient and Succinct Vector Transformation of Weekly to Daily hourly Data in R

I've got a working function, but I'm hoping there is a more succinct way of going about this.
I have a dataset of events that are captured with the hour of the week they occurred in. For example, 4 AM on Sunday= 4, 4 AM on Monday = 28 etc. I want to analyze this data on a daily basis. For instance, all of the events that happen between 8 and 10 am on any day.
To do this I have built a function that returns a dichotomous value for the given range for an ordered list. Function two_break accepts an ordered list of integers between 0:168 representing the hours of a week and a range (b1 and b2) for the desired periods of a 24 hour day. b1 and b2 divide the range of the 24 hour day that are desired. i.e. if b1=8 and b2=10 two_break will return all all values of 9, (9+24)=33, (9+48)=57...etc. as 1 and all others 0.
two_break <- function(test_hr,b1,b2){
test_hr<-ifelse(test_hr==1,1.1,test_hr)
for(i in 0:6){
test_hr<-ifelse(test_hr> (b1+24*i) & test_hr< (b2+24*i), 1 ,test_hr)
}
test_hr<-ifelse(test_hr==1,1,0)
return(test_hr)
}
This function works fine, but I'm wondering if anybody out there could do it more efficiently/succinctly.
See full code and data set at my github: anthonyjp87 168 hr transformation file/data.
Cheers!
You can use integer division %/% to capture the day of the week, and modulus, %% to capture the hour in the day:
weekHours <- 1:168
# return the indices of all elements where the hour is between 8AM and 10AM, inclusive
test_hr <- weekHours[weekHours %% 24 %in% 8:10]
Note that midnight is represented by 0. If you want to wrap this into a function, you might use
getTest_hr <- function(weekHours, startTime, stopTime) {
weekHours[weekHours %% 24 %in% seq(startTime, stopTime)]
}
To get the day of the week, you can use integer division:
# get all indices for the third day of the week
dayOfWeek3 <- weekHours[(weekHours %/% 24 + 1) == 3]
To get a binary vector of the selected time periods, simply pull the logical out of the index:
allTimesBinary <- (weekHours %% 24) %in% 8:10

Calculate Running Difference in Dates as New Dataframe Column

I've searched for several days and am still stumped.
Given a dataset defined by the following:
ids = c("a","b","c")
dates = c(as.Date("2015-01-01"), as.Date("2015-02-01"), as.Date("2015-02-15"))
test = data.frame(ids, dates)
I am trying to dynamically add new columns to the data frame whose values will be the difference between the column date (2015-03-01) and the value in the date column. I would expect the result would look like the following, but with a better column name:
d20150301 = c(59, 28, 14)
result = data.frame(ids, dates, d20150301)
Many thanks in advance.
You can subtract a vector of dates from a single date, so
test$d2015_03_01 <- as.Date('2015-03-01')-test$dates
makes test look like
> test
ids dates d2015_03_01
1 a 2015-01-01 59 days
2 b 2015-02-01 28 days
3 c 2015-02-15 14 days

Selecting Specific Dates in R

I am wondering how to create a subset of data in R based on a list of dates, rather than by a date range.
For example, I have the following data set data which contains 3 years of 6-minute data.
date zone month day year hour minute temp speed gust dir
1 09/06/2009 00:00 PDT 9 6 2009 0 0 62 2 15 156
2 09/06/2009 00:06 PDT 9 6 2009 0 6 62 13 16 157
I have used breeze<-subset(data, ws>=15 & wd>=247.5 & wd<=315, select=date:dir) to select the rows which meet my criteria for a sea breeze, which is fine, but what I want to do is create a subset of the days which contain those times that meet my criteria.
I have used...
as.character(breeze$date)
trimdate<-strtrim(breeze$date, 10)
breezedate<-as.Date(trimdate, "%m/%d/%Y")
breezedate<-format(breezedate, format="%m/%d/%Y")
...to extract the dates from each row that meets my criteria so I have a variable called breezedate that contains a list of the dates that I want (not the most eloquent coding to do this, I'm sure). There are about two-hundred dates in the list. What I am trying to do with the next command is in my original dataset data to create a subset which contains only those days which meet the seabreeze criteria, not just the specific times.
breezedays<-(data$date==breezedate)
I think one of my issues here is that I am comparing one value to a list of values, but I am not sure how to make it work.
Lets assume your breezedate list looks like this and data$date is simple string:
breezedate <- as.Date(c("2009-09-06", "2009-10-01"))
This is probably want you want:
breezedays <- data[as.Date(data$date, '%m/%d/%Y') %in% breezedate]
The intersect() function (docs) will allow you to compare one data frame to another and return those records that are the same.
To use, run the following:
breezedays <- intersect(data$date,breezedate) # returns into breezedays all records that are shared between data$date and breezedate

Resources