Matching 2 date columns with unequal length - r

i have the following sample of data
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
the output i want is something like this
X Y
11/12/2016 11/12/2016
12/12/2016 NA
13/12/2016 13/12/2016
14/12/2016 14/12/2016
15/12/2016 NA
16/12/2016 NA
17/12/2016 NA
i have tried the following code but not getting the desired output
> X <- as.Date(data$X)
> Y <- as.Date(data$Y)
> Z <- NA
> for (i in 1:length(X)) {
+ if(X[i] == Y){
+ Z <- Y}
+ else NA }

Try this:
Your data:
> X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
> Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
Creating new vector of NA's and doing the match:
> Z<-rep(NA,length(X))
> Z[which(X %in% Y)]<-X[which(X %in% Y)]
> Z
[1] "11/12/2016" NA "13/12/2016" "14/12/2016" NA NA NA
Your data frame:
> data.frame(X,Y=Z)
X Y
1 11/12/2016 11/12/2016
2 12/12/2016 <NA>
3 13/12/2016 13/12/2016
4 14/12/2016 14/12/2016
5 15/12/2016 <NA>
6 16/12/2016 <NA>
7 17/12/2016 <NA>

You could use merge:
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
df_X <- data.frame(X)
df_Y <- data.frame(X = Y, Y = Y)
merge(df_X, df_Y, all = TRUE)
Or, if you like a tidyverse-approach:
library(tidyverse)
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
df_X <- tibble(X)
df_Y <- tibble(X = Y, Y = Y)
full_join(df_X, df_Y)
The important part is, that you duplicate your column you want to match and name it accordingly, or use the by-argument.

Got the answer!
What you want is to "match" the values of one long vector for your other vector. For that, the function match is perfect because return the vector position of the matched elements. First, input the data (I added some corrections):
# Input data
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
# Transform into dates
X <- as.Date(X,"%d/%m/%Y")
Y <- as.Date(Y, "%d/%m/%Y")
Then, I create the data.frame Z based on the long vector X and I add the matched values of the vector Y:
# Run function match so you can see what can of output generates
match(x = X, table = Y)
# Create data.frame
Z <- data.frame(X = X,
# add matched values
Y = Y[match(x = X, table = Y)])
Hope this helps.

Related

Specifying R to take one argument at a time when passing multiple arguments using '...'

I am a novice in R required by my superior to do things a certain way. I am interested in determining values of descriptive statistics setup count and heavy-dominance setup count. Setup count basically counts the number of setups found within a location, while heavy-dominance setup count counts the number of setups that has dominance values of x population ≥ 50% within the said location. This is how I would normally calculate said statistics:
##Normal Approach
#Sample Data 1
v <- c(53, 2, 97) #let vector "v" represent Location 1
w <- c(7, 16, 31, 44, 16) #let vector "w" represent Location 2
#Setup Count
sc_v <- length(v)
sc_w <- length(w)
sc <- c(sc_v, sc_w)
sc
#Heavy-Dominance Setup Count
hd_v <- length(which(v >= 50))
hd_w <- length(which(w >= 50))
hd <- c(hd_v, hd_w)
hd
I am tasked with developing a function that can both determine said statistical values from raw data and concatenate the outputs into a single vector. Here are the working functions I developed:
#Setup Count (2 vectors at a time only)
setup.count <- function(x, y){
a <- length(x)
b <- length(y)
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (2 vectors at a time only)
heavy.dominance <- function(x, y){
a <- length(which(x >= 50))
b <- length(which(y >= 50))
d <- c(a, b)
d
}
y <- setup.count(v, w)
y
z <- heavy.dominance(v, w)
z
Suppose there are more than two locations:
#Sample Data 2
v <- c(53, 2, 97)
w <- c(7, 16, 31, 44, 16)
x <- c(45, 22, 96, 74) #let vector "x" represent the additional Location 3
How can I specify R to take one argument at a time when passing multiple arguments using '...'? Here are the failed attempts to revise the abovementioned functions, to give an idea:
##Attempt 1
#Setup Count (incorrect v1)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(data) #will return the number of locations other than x, not the separate number of setups within each of these locations
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v1)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(data >= 50)) #will return the error "'list' object cannot be coerced to type 'double'"
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
##Attempt 2
#Setup Count (incorrect v2)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(unlist(data)) #will return the total number of setups in all locations other than x, not as separate values
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v2)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(unlist(data) >= 50)) #will return the total number of setups with dominance ≥ 50% in all locations other than x, not as separate values
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
You may just list() elements in the ellipsis. Use sapply() to loop over the list elements. Add a type= argument to have one function for both purposes, and a thresh= argument.
setup.fun <- function(..., type=c('count', 'dominance'), thresh=50) {
x <- list(...)
type <- match.arg(type)
if (type == 'count') sapply(x, length)
else sapply(x, function(x) length(which(x >= thresh)))
}
setup.fun(v, w, x)
# [1] 3 5 4
setup.fun(v, w, x, type='count')
# [1] 3 5 4
setup.fun(v, w, x, type='dominance')
# [1] 2 0 2
setup.fun(v, w, x, type='d')
# [1] 2 0 2
setup.fun(v, w, x, v)
# [1] 3 5 4 3
setup.fun(v)
# [1] 3
setup.fun(v, w, x, type='dominance', thresh=40)
# [1] 2 1 3

Find variables that occur only in one cluster in data.frame in R

Using BASE R, I wonder how to answer the following question:
Are there any value on X or Y (i.e., variables of interest names) that occurs only in one element in m (as a cluster) but not others? If yes, produce my desired output below.
For example:
Here we see X == 3 only occurs in element m[[3]] but not m[[1]] and m[[2]].
Here we also see Y == 99 only occur in m[[1]] but not others.
Note: the following is a toy example, a functional answer is appreciated. AND X & Y may or may not be numeric (e.g., be string).
f <- data.frame(id = c(rep("AA",4), rep("BB",2), rep("CC",2)), X = c(1,1,1,1,1,1,3,3),
Y = c(99,99,99,99,6,6,6,6))
m <- split(f, f$id) # Here is `m`
mods <- names(f)[-1] # variables of interest names
Desired output:
list(AA = c(Y = 99), CC = c(X = 3))
# $AA
# Y
# 99
# $CC
# X
# 3
This is a solution based on rapply() and table().
ux <- rapply(m, unique)
tb <- table(uxm <- ux[gsub(rx <- "^.*\\.(.*)$", "\\1", names(ux)) %in% mods])
r <- Map(setNames, n <- uxm[uxm %in% names(tb)[tb == 1]], gsub(rx, "\\1", names(n)))
setNames(r, gsub("^(.*)\\..*$", "\\1", names(r)))
# $AA
# Y
# 99
#
# $CC
# X
# 3
tmp = do.call(rbind, lapply(names(f)[-1], function(x){
d = unique(f[c("id", x)])
names(d) = c("id", "val")
transform(d, nm = x)
}))
tmp = tmp[ave(as.numeric(as.factor(tmp$val)), tmp$val, FUN = length) == 1,]
lapply(split(tmp, tmp$id), function(a){
setNames(a$val, a$nm)
})
#$AA
# Y
#99
#$BB
#named numeric(0)
#$CC
#X
#3
This utilizes #jay.sf's idea of rapply() with an idea from a previous answer:
vec <- rapply(lapply(m, '[', , mods), unique)
unique_vec <- vec[!duplicated(vec) & !duplicated(vec, fromLast = T)]
vec_names <- do.call(rbind, strsplit(names(unique_vec), '.', fixed = T))
names(unique_vec) <- vec_names[, 2]
split(unique_vec, vec_names[, 1])
$AA
Y
99
$CC
X
3

Wrong value occur when converting points from UTM to WGS84 in R

I use the method from Stanislav in this topic of Forum, which is a question about "converting latitude and longitude points to UTM". I edited the function reversely to change points from UTM to WGS84, which is:
library(sp); library(rgdal)
#Function
UTMToLongLat<-function(x,y,zone){
xy <- data.frame(ID = 1:length(x), X = x, Y = y)
coordinates(xy) <- c("X", "Y")
proj4string(xy) <- CRS(paste("+proj=utm +zone=",zone," ellps=WGS84",sep=''))
res <- spTransform(xy, CRS("+proj=longlat +datum=WGS84"))
return(as.data.frame(res))
}
The example in the previous question mentioned above is tried, that is:
x2 <- c(-48636.65, 1109577); y2 <- c(213372.05, 5546301)
What is expected is (118, 10), (119, 50) in WGS84. Colin's example is in UTM51.
So, the following sentence is used:
done2 <- UTMToLongLat(x2,y2,51)
However, it produced: (118.0729, 1.92326), (131.4686, 49.75866).
What is wrong? By the way, how to control the decimal digits of the output?
First, you mistook the expression of the coordinate. It should be:
x <- c(-48636.65, 213372.05)
y <- c(1109577, 5546301)
In the function, it will be transformed and stored as:
> data.frame(ID = 1:length(x), X = x, Y = y)
# ID X Y
# 1 1 -48636.65 1109577
# 2 2 213372.05 5546301
And execute your function again:
> UTMToLongLat(x, y, 51)
# ID X Y
# 1 1 118 9.999997
# 2 2 119 50.000001
To control the decimal digits:
> round(UTMToLongLat(x, y, 51))
# ID X Y
# 1 1 118 10
# 2 2 119 50

number elements in a vector with constraints

Given x and y I wish to create the desired.result below:
x <- 1:10
y <- c(2:4,6:7,8:9)
desired.result <- c(1,2,2,2,3,4,4,5,5,6)
where, in effect, each sequence in y is replaced in x by the the first element in the sequence in y and then the elements of the new x are numbered.
The intermediate step for x would be:
x.intermediate <- c(1,2,2,2,5,6,6,8,8,10)
Below is code that does this. However, the code is not general and is overly complex:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y1 <- rep(min(unlist(y[1])), length(unlist(y[1])))
y2 <- rep(min(unlist(y[2])), length(unlist(y[2])))
y3 <- rep(min(unlist(y[3])), length(unlist(y[3])))
new.x <- x
new.x[unlist(y[1])] <- y1
new.x[unlist(y[2])] <- y2
new.x[unlist(y[3])] <- y3
rep(unique.x, rle(new.x)$lengths)
[1] 1 2 2 2 3 4 4 5 5 6
Below is my attempt to generalize the code. However, I am stuck on the second lapply.
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y2 <- lapply(y, function(i) rep(min(i), length(i)))
new.x <- x
lapply(y2, function(i) new.x[i[1]:(i[1]-1+length(i))] = i)
rep(unique.x, rle(new.x)$lengths)
Thank you for any advice. I suspect there is a much simpler solution I am overlooking. I prefer a solution in base R.
A solution like this should work:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
x[unlist(y)]<-rep(sapply(y,'[',1),lapply(y,length))
rep(1:length(rle(x)$lengths), rle(x)$lengths)
## [1] 1 2 2 2 3 4 4 5 5 6

How to combine two vectors into a data frame

I have two vectors like this
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
I'd like to output the dataframe like this:
> print(df)
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
What's the way to do it?
While this does not answer the question asked, it answers a related question that many people have had:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
df <- data.frame(x,y)
names(df) <- c(x_name,y_name)
print(df)
cond rating
1 1 100
2 2 200
3 3 300
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
require(reshape2)
df <- melt(data.frame(x,y))
colnames(df) <- c(x_name, y_name)
print(df)
UPDATE (2017-02-07):
As an answer to #cdaringe comment - there are multiple solutions possible, one of them is below.
library(dplyr)
library(magrittr)
x <- c(1, 2, 3)
y <- c(100, 200, 300)
z <- c(1, 2, 3, 4, 5)
x_name <- "cond"
y_name <- "rating"
# Helper function to create data.frame for the chunk of the data
prepare <- function(name, value, xname = x_name, yname = y_name) {
data_frame(rep(name, length(value)), value) %>%
set_colnames(c(xname, yname))
}
bind_rows(
prepare("x", x),
prepare("y", y),
prepare("z", z)
)
This should do the trick, to produce the data frame you asked for, using only base R:
df <- data.frame(cond=c(rep("x", times=length(x)),
rep("y", times=length(y))),
rating=c(x, y))
df
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
However, from your initial description, I'd say that this is perhaps a more likely usecase:
df2 <- data.frame(x, y)
colnames(df2) <- c(x_name, y_name)
df2
cond rating
1 1 100
2 2 200
3 3 300
[edit: moved parentheses in example 1]
You can use expand.grid( ) function.
x <-c(1,2,3)
y <-c(100,200,300)
expand.grid(cond=x,rating=y)
Here's a simple function. It generates a data frame and automatically uses the names of the vectors as values for the first column.
myfunc <- function(a, b, names = NULL) {
setNames(data.frame(c(rep(deparse(substitute(a)), length(a)),
rep(deparse(substitute(b)), length(b))), c(a, b)), names)
}
An example:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
myfunc(x, y, c(x_name, y_name))
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
df = data.frame(cond=c(rep("x",3),rep("y",3)),rating=c(x,y))
Alt simplification of https://stackoverflow.com/users/1969435/gx1sptdtda above:
cond <-c(1,2,3)
rating <-c(100,200,300)
df <- data.frame(cond, rating)
df
cond rating
1 1 100
2 2 200
3 3 300

Resources