Interpolation in R when there are 3 columns - r

I need to find the interpolated value for consumption from the speed and weather.
I have tried approx function but it is only for 2 variables, wont accept three or more.
Speed weather fuel
10 2 30
12 3 35
14 8 38
15 9 65
need to find fuel for speed_new = 13 and weather = 7.
approx(x=Speed,y=Fuel,z=Weather,xout= speed_new,rule = 2)$y #need to also mention the weather

Related

R new variable based on other column

Using the dataset 'cars' in R I would like to add a new column to this dataset that takes the average of the column 'dist' dependent on the values in the column 'speed', while also having R evaluating the 'speed' as a grouping parameter.
So first I need 19 groups reflecting the unique speeds in cars$speed:
4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25
For each of these 19 groups I would like to know what the average dist is, but only if at least one of the entries in each of these 19 categories meet a criteria (e.g. at least one dist-value is above 20.
With the cars-dataset I would get something like this back for the cars with speed 4 to 12:
speed dist avr_dist_if_one_speed_is_above20
4 2 none
4 10 none
7 4 13
7 22 13
8 16 none
9 10 none
10 18 26
10 26 26
10 34 26
11 17 22.5
11 28 22.5
12 14 21.5
12 20 21.5
12 24 21.5
12 28 21.5
...
Since the 2 cars that have speed 4 both have a dist below 20, I do not get an average for these two entries. For the cars that have speed 7 I get an average dist of 13 since at least one car with speed 7 have a dist above 20.
For the cars with speed 8 and 9 I do not get an average, as both of these cars have a dist below 20. The cars with speed 10 should return an average of 26
since two of the cars with speed 10 have a dist above 20.
For cars with speed 11 I get 22.5
For cars with speed 12 I get 21.5.
The R-code should calculate an average dist for all the remaining speed-categories, as the rest all include cars with dist>20.
This will do what you are looking for if I understand your question right.
library(dplyr)
cars %>%
group_by(speed) %>%
summarise(n = n(), avg_dist = ifelse(any(dist > 20),mean(dist, na.rm = T), NA)
Try this:
library(dplyr)
cars %>%
group_by(speed, dist) %>%
group_by(speed) %>%
mutate(avr_dist_if_one_speed_is_above20 = mean(dist[max(dist)>20]))

Adding extreme value distributed noise (with µ=0,σ=10) to a vector of numbers in R

I have the following matrix
Measurement Treatment
38 A
14 A
54 A
69 A
20 B
36 B
35 B
10 B
11 C
98 C
88 C
14 C
I want to add extreme value distributed noise (with mean=0 and sd=10) to the Measurement values. How can I achieve that in R?
I found revd in extRemes package, but it does not work as expected. Does devd from the same package do what I want to do? (but it does not allow for mean and sd to be defined)
If you want to use your measure as the mean for the noise, then you can do this:
measure = round(runif(10,0,30),0)
data = data.frame(measure)
for(i in 1:nrow(data)){
data$measure1[i] = rnorm(1,data$measure[i],10)
}
data
measure measure1
1 6 6.281557
2 12 -5.780177
3 18 13.529773
4 26 33.665584
5 14 12.666614
6 24 41.146132
7 5 -1.850390
8 14 16.728703
9 13 26.082601
10 13 14.066475
EDIT: You can avoid the for loop with this instead:
data$measure1 = data$measure + rnorm(1,0,10)

How to create range x values with basic R

I have just begun using R and have gone through multiple books and sources and they get more and more complex yet I still am unable to find a solution to what I think should be quite a basic process.
I have data with 3 columns as shown: (I am really simplifying everything to try and get a really clear answer which can applied to multiple situations)
min max value
1 5 23
8 15 9
33 35 30
I would like to plot this data on a graph.
by this data I intend that every value between 1 and 5 for example on the x axis is equal to 23 on the y axis.
I have tried several things including assigning each column to vectors a , b , and c respectively.
generating the correct number of values with:
y <- rep( c, (a-b+1))
which works as expected
then the problem occurs with getting the appropriate x values, I tried:
x <- (a:b)
but because of the way R functions it only applies to the first variables.
Now I can make this work by manually typing everything in like:
x <- c(1:5, 8:15, 33:35)
but I really need an automated way to do this because I am working with huge datasets of this structure.
I have seen some other people seem to have similar issues, however the underlying principle always seem to be convoluted with vast datasets and entire codes in questions so I have been unable to get to a good solution to this problem.
If anyone with a little more experience could clear up this issue I would be hugely grateful!
dat <- read.table(text=
"min max value
1 5 23
8 15 9
33 35 30",
header=TRUE)
I'm still not quite sure what you mean, but maybe:
newdat <- with(dat,data.frame(x=c(min,max),y=rep(value,2)))
newdat <- plyr::arrange(newdat,x)
plot(y~x,type="s",data=newdat)
It's not clear what you want to do between 5 and 8, 15 and 33 ... another possibility is to plot each bit as a separate segment:
plot(max~value,data=dat,xlim=range(c(dat$min,dat$max)),
type="n")
apply(dat,1,function(x) segments(x[1],x[3],x[2],x[3]))
How about this:
# your data.frame
df<-data.frame(min=c(1,8,33),max=c(5,15,35),value=c(23,9,30))
x<-unlist(apply(df,1,function(x)x[1]:x[2]))
y<-unlist(apply(df,1,function(x)rep(x[3],x[2]-x[1]+1)))
plotdata<-data.frame(x=x,y=y)
plotdata
x y
1 1 23
2 2 23
3 3 23
4 4 23
5 5 23
6 8 9
7 9 9
8 10 9
9 11 9
10 12 9
11 13 9
12 14 9
13 15 9
14 33 30
15 34 30
16 35 30
Something like this?
a <- c(c(1:5), c(8:15), c(33:35))
b <- c(rep(23,5), rep(9,8), rep(30,3))
plot(a,b, type="l")

Frequency distribution with custom format data

I need help with a R plot, with a data format I have not worked with before. Please help if you know.
NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3
i need a bar plot with numbers on X axis (continuous, not bins in histogram) and frequency on Y, but combined.
like
10 46
11 3
12 6
it seems simple enough, but i have 10,000 rows and large numbers in real data so I am looking for a good solution in R without doing it manually.
What about:
##tapply splits dd$FREQ by dd$NUM and "sums" them
barplot(tapply(dd$FREQUENCY, dd$NUMBER, sum))
to get:
Read in your data:
dd = read.table(textConnection("NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3"), header=TRUE)

Count of element in data.frame

I have data that illustrates hurricane tracks crossing through a series of "gates". How would I code it to output the GateID, and the count of times that each GateID occurs in the total data frame?
track_id day hour month year rate gate_id pres_inter vmax_inter
9 10 0 7 1 9.6451E-06 2 97809 23.545
9 10 0 7 1 9.6451E-06 17 100170 13.843
10 3 6 7 1 9.6451E-06 2 96662 31.568
13 22 12 8 1 9.6451E-06 1 94449 48.466
13 22 12 8 1 9.6451E-06 17 96749 30.55
16 13 0 8 1 9.6451E-06 4 98702 19.205
16 13 0 8 1 9.6451E-06 16 98585 18.143
19 27 6 9 1 9.6451E-06 9 98838 20.053
header <- read.table(fname_in, nrows=1)
track <- read.table(fname_in, sep=',', skip=1)
colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter")
I think I would like to count the occurrence of each gate_id, and also perhaps output the maximum wind per gate (vmax_inter), etc....
Totally reading your mind, since you provide nothing concrete to go on. But if GateID is one of your data frame columns, you can get the count for each unique GateID along with other parameters using count from package plyr.
install.packages("plyr")
library("plyr")
count(mydf, vars = "GateID")
See ?count after installing for further details.
For the 2nd part of your question, see ?aggregate and consider the formula interface. For example,
aggregate(gate_id ~ vmax_inter, data = mydf, FUN = max)
or something similar. By the way, you can combine your two read.table steps with 'read.csv`

Resources