Resizing and interpolating middle values in column in R - r

I have a dataframe.
df <- data.frame(level = c(1:10), values = c(3,4,5,6,8,9,4,2,1,6))
Which I would like to resize to fewer levels, lets say 6 levels.
Where level 0 and level 10 are corresponding to level 0 and level 6 in the new dataframe. (I just guessed some floats in between, not sure what the result would actually be)
level value
1 3
2 3.4
3 4.6
4 6.2
5 2.2
6 6
How would I go about doing this?

Maybe you want to use approxfun for interpolation like below?
data.frame(
level = 1:6,
values = approxfun(df$level, df$values)(seq(1, nrow(df), length.out = 6))
)
which gives
level values
1 1 3.0
2 2 4.8
3 3 7.2
4 4 7.0
5 5 1.8
6 6 6.0

Related

na.approx Interpolation in R

I'm using Zoo's na.approx to fill NA values.
library(zoo)
Bus_data<-data.frame(Action = c("Boarding", "Alighting",NA, NA,"Boarding", "Alighting",NA, NA,"Boarding", "Alighting"),
Distance=c(1,1,2,2,3,3,4,4,5,5),
Time = c(1,2,NA,NA,5,6,NA,NA,9,10))
I'd like the resulting data.frame to look like the following:
Action Distance Time
1 Boarding 1 1
2 Alighting 1 2
3 NA 2 3.5
4 NA 2 3.5
5 Boarding 3 5
6 Alighting 3 6
7 NA 4 7.5
8 NA 4 7.5
9 Boarding 5 9
10 Alighting 5 10
However, when I use
na.approx(Bus_data$Time,Bus_data$Distance,ties = "ordered" )
1 Boarding 1 2 <-Value Changes
2 Alighting 1 2
3 NA 2 3.5
4 NA 2 3.5
5 Boarding 3 6 <-Value Changes
6 Alighting 3 6
7 NA 4 7.5
8 NA 4 7.5
9 Boarding 5 10 <-Value Changes
10 Alighting 5 10
Any idea how I could get the desired outcome through na.approx? Note, in the example "Distance" is evenly spaced for simplification, the dataset has varying distances.
You can use approx from baseR
Time = c(1,2,NA,NA,5,6,NA,NA,9,10)
approx(Time, method = "constant", n = length(Time), f = .5)$y
Result
# [1] 1.0 2.0 3.5 3.5 5.0 6.0 7.5 7.5 9.0 10.0
From ?approx
f :
for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.
With na.approx it would be similar
library(zoo)
na.approx(Time, method = "constant", f = .5)
We could replace the non-NA elements of original column to NA after the na.approx and then do a coalesce
library(dplyr)
library(zoo)
coalesce(Bus_data$Time, replace(na.approx(Bus_data$Time,Bus_data$Distance,
ties = "ordered" ),
!is.na(Bus_data$Time), NA))
#[1] 1.0 2.0 3.5 3.5 5.0 6.0 7.5 7.5 9.0 10.0

Changing duplicated coordinate values by adding a decimal place R

I have UTM coordinate values from GPS collared leopards, and my analysis gets messed up if there are any points that are identical. What I want to do is add a 1 to the end of the decimal string to make each value unique.
What I have:
> View(coords)
> coords
X Y
1 623190.9 4980021
2 618876.6 4980729
3 618522.7 4980896
4 618522.7 4980096
5 618522.7 4980096
6 622674.1 4976161
I want something like this, or something that will make each number unique (doesn't have to be a +1)
> coords
X Y
1 623190.9 4980021
2 618876.6 4980729
3 618522.7 4980896
4 618522.71 4980096.1
5 618522.72 4977148.2
6 622674.1 4976161
Ive looked at existing questions and got this to work for a simulated data set, but not for values with more than 1 duplicated value.
DF <- data.frame(A=c(5,5,6,6,7,7), B=c(1, 1, 2, 2, 2, 3))
>View(DF)
A B
1 5 1
2 5 1
3 6 2
4 6 2
5 7 2
6 7 3
DF <- do.call(rbind, lapply(split(DF, list(DF$A, DF$B)),
function(x) {
x$A <- x$A + seq(0, by=0.1, length.out=nrow(x))
x$B <- x$B + seq(0, by=0.1, length.out=nrow(x))
x
}))
>View(DF
A B
5.1.1 5.0 1.0
5.1.2 5.1 1.1
6.2.3 6.0 2.0
6.2.4 6.1 2.1
7.2 7.0 2.0
7.3 7.0 3.0
The'2s' in column B don't continue to add a decimal place when there are more than 2. I also had a problem accomplishing this when the number was more than 4 digits (i.e. XXXXX vs XX) There's probably a better way to do this, but I would love help on adding these decimals and possibly altering them in the original data frame which has 12 columns of various data.
It is easier to use make.unique
DF[] <- lapply(DF, function(x) as.numeric(make.unique(as.character(x))))
DF
# A B
#1 5.0 1.0
#2 5.1 1.1
#3 6.0 2.0
#4 6.1 2.1
#5 7.0 2.2
#6 7.1 3.0

How to do this in R

I have a dataset that looks like this:
groups <- c(1:20)
A <- c(1,3,2,4,2,5,1,6,2,7,3,5,2,6,3,5,1,5,3,4)
B <- c(3,2,4,1,5,2,4,1,3,2,6,1,4,2,5,3,7,1,4,2)
position <- c(2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1)
sample.data <- data.frame(groups,A,B,position)
head(sample.data)
groups A B position
1 1 1 3 2
2 2 3 2 1
3 3 2 4 2
4 4 4 1 1
5 5 2 5 2
6 6 5 2 1
The "position" column always alternates between 2 and 1. I want to do this calculation in R: starting from the first row, if it's in position 1, ignore it. If it starts at 2 (as in this example), then calculate as follows:
Take the first 2 values of column A that are at position 2, average them, then subtract the first value that is at position 1 (in this example: (1+2)/2 - 3 = -1.5). Then repeat the calculation for the next set of values, using the last position 2 value as the starting point, i.e. the next calculation would be (2+2)/2 - 4 = -2.
So basically, in this example, the calculations are done for the values of these sets of groups: 1-2-3, 3-4-5, 5-6-7, etc. (the last value of the previous is the first value of the next set of calculation)
Repeat the calculation until the end. Also do the same for column B.
Since I need the original data frame intact, put the newly calculated values in a new data frame(s), with columns dA and dB corresponding to the calculated values of column A and B, respectively (if not possible then they can be created as separated data frames, and I will extract them into one afterwards).
Desired output (from the example):
dA dB
1 -1.5 1.5
2 -2 3.5
3 -3.5 2.5
4 -4.5 2.5
5 -4.5 2.5
6 -2.5 4
groups <- c(1:20)
A <- c(1,3,2,4,2,5,1,6,2,7,3,5,2,6,3,5,1,5,3,4)
B <- c(3,2,4,1,5,2,4,1,3,2,6,1,4,2,5,3,7,1,4,2)
position <- c(2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1)
sample.data <- data.frame(groups,A,B,position)
start <- match(2, sample.data$position)
twos <- seq(from = start, to = nrow(sample.data), by = 2)
df <-
sapply(c("A", "B"), function(l) {
sapply(twos, function(i) {
mean(sample.data[c(i, i+2), l]) - sample.data[i+1, l]
})
})
df <- setNames(as.data.frame(df), c('dA', 'dB'))
As your values in position always alternate between 1 and 2, you can define an index of odd rows i1 and an index of even rows i2, and do your calculations:
## In case first row has position==1, we add an increment of 1 to the indexes
inc=0
if(sample.data$position[1]==1)
{inc=1}
i1=seq(1+inc,nrow(sample.data),by=2)
i2=seq(2+inc,nrow(sample.data),by=2)
res=data.frame(dA=(lead(sample.data$A[i1])+sample.data$A[i1])/2-sample.data$A[i2],
dB=(lead(sample.data$B[i1])+sample.data$B[i1])/2-sample.data$B[i2]);
This returns:
dA dB
1 -1.5 1.5
2 -2.0 3.5
3 -3.5 2.5
4 -4.5 2.5
5 -4.5 2.5
6 -2.5 4.0
7 -3.5 2.5
8 -3.0 3.0
9 -3.0 4.5
10 NA NA
The last row returns NA, you can remove it if you need.
res=na.omit(res)

How to find the measurements of same type in R

I am new in R.I have one question regarding my data set.
S.NO Type Measurements
1 1 2.1
2 2 3.3
3 2 3.1
4 3 2.7
5 3 2.6
6 3 4.5
7 2 1.1
8 3 2.2
suppose we have measurements in column 3 but their types are given in column 2.Each measurement is either type 1,type 2 or type 3.Now if we are interested to find only
measurements corressponding to type 2(suppose),how we can do it in R?
I am looking forward to response.
This is a basic subsetting question covered in most introductory R guides:
with(mydf, mydf[Type == 2, ])
# S.NO Type Measurements
# 2 2 2 3.3
# 3 3 2 3.1
# 7 7 2 1.1
with(mydf, mydf[Type == 2, "Measurements"])
# [1] 3.3 3.1 1.1
You can also look at the subset function:
subset(mydf, subset = Type == 2, select = "Measurements")
# Measurements
# 2 3.3
# 3 3.1
# 7 1.1
# make some data
testData$measurement=1:10
testData$Type=sample(1:3,10,replace=T)
testData=data.frame(testData)
# fetch only type 2
testData[testData$Type==2,]
# now only the measurements
testData[testData$Type==2,"measurement"]

Using melt / cast with variables of uneven length in R

I'm working with a large data frame that I want to pivot, so that variables in a column become rows across the top.
I've found the reshape package very useful in such cases, except that the cast function defaults to fun.aggregate=length. Presumably this is because I'm performing these operations by "case" and the number of variables measured varies among cases.
I would like to pivot so that missing variables are denoted as "NA"s in the pivoted data frame.
So, in other words, I want to go from a molten data frame like this:
Case | Variable | Value
1 1 2.3
1 2 2.1
1 3 1.3
2 1 4.3
2 2 2.5
3 1 1.8
3 2 1.9
3 3 2.3
3 4 2.2
To something like this:
Case | Variable 1 | Variable 2 | Variable 3 | Variable 4
1 2.3 2.1 1.3 NA
2 4.3 2.5 NA NA
3 1.8 1.9 2.3 2.2
The code dcast(data,...~Variable) again defaults to fun.aggregate=length, which does not preserve the original values.
Thanks for your help, and let me know if anything is unclear!
It is just a matter of including all of the variables in the cast call. Reshape expects the Value column to be called value, so it throws a warning, but still works fine. The reason that it was using fun.aggregate=length is because of the missing Case in the formula. It was aggregating over the values in Case.
Try: cast(data, Case~Variable)
data <- data.frame(Case=c(1,1,1,2,2,3,3,3,3),
Variable=c(1,2,3,1,2,1,2,3,4),
Value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))
cast(data,Case~Variable)
Using Value as value column. Use the value argument to cast to override this choice
Case 1 2 3 4
1 1 2.3 2.1 1.3 NA
2 2 4.3 2.5 NA NA
3 3 1.8 1.9 2.3 2.2
Edit: as a response to the comment from #Jon. What do you do if there is one more variable in the data frame?
data <- data.frame(expt=c(1,1,1,1,2,2,2,2,2),
func=c(1,1,1,2,2,3,3,3,3),
variable=c(1,2,3,1,2,1,2,3,4),
value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))
cast(data,expt+variable~func)
expt variable 1 2 3
1 1 1 2.3 4.3 NA
2 1 2 2.1 NA NA
3 1 3 1.3 NA NA
4 2 1 NA NA 1.8
5 2 2 NA 2.5 1.9
6 2 3 NA NA 2.3
7 2 4 NA NA 2.2
Here is one solution. It does not use the package or function you mention, but it could be of use. Suppose your data frame is called df:
M <- matrix(NA,
nrow = length(unique(df$Case)),
ncol = length(unique(df$Variable))+1,
dimnames = list(NULL,c('Case',paste('Variable',sort(unique(df$Variable))))))
irow <- match(df$Case,unique(df$Case))
icol <- match(df$Variable,unique(df$Variable)) + 1
ientry <- irow + (icol-1)*nrow(M)
M[ientry] <- df$Value
M[,1] <- unique(df$Case)
To avoid the warning message, you could subset the data frame according to another variable, i.e a categorical variable having three levels a,b,c. Because in you current data for category a it has 70 cases, for b 80 cases, c has 90. Then the cast function doesn't know how to aggregate them.
Hope this helps.

Resources