I have a csv file(just call its name as 'csv') and want to use a lag function. Below is my code. (ColA and ColB are the name of columns of csv)
X <- subset(csv, ColA == 1)
Y <- c(NA, lag(X$ColB, 1))
Let's say there are 10 rows which satisfy ColA == 1. The problem is that I just want to have a vector of which length is 10 but after the lag function, its output shows a vector of which length is 11. How to fix it?
You can use the lagpad function in the ecm package. This will leave off the last element of the vector to retain the same length.
library(ecm)
X <- 1:10
Y <- lagpad(X)
Y
[1] NA 1 2 3 4 5 6 7 8 9
Related
This question already has answers here:
Extracting indices for data frame rows that have MAX value for named field
(3 answers)
Closed 4 years ago.
I have matrix containing two columns and many rows. The first column name is idCombinaison and the second column name is accuarcy. The accuarcy has a float values.
Now I want to get all rows which the value of accuarcy == max value. In some cases (like depicted in the picture), I can have many rows which the value of accuarcy equals to max, so I want to get all these rows!
I tried this:
maxAccuracy <- subset(accuarcyMatrix, accuarcyMatrix['accuarcy'] == max(accuarcyMatrix['accuarcy']))
But this return an empty vector. Any ideas please?
A reproducible data simulating your matrix:
set.seed(123)
x <- matrix(sample(1:9, 30, T), 10, 3)
row.names(x) <- 1:10
colnames(x) <- LETTERS[1:3]
# A B C
# 1 3 9 9
# 2 8 5 7
# 3 4 7 6
# ...
In matrix objects, you need to use a binary way to extract element such as data[a, b]. Take the above data for example, x["C"] will return NA and x[, "C"] will return all elements in column C. Therefore, the following two codes are going to generate different outputs.
subset(x, x["C"] == max(x["C"]))
# A B C (Empty)
subset(x, x[, "C"] == max(x[, "C"]))
# A B C
# 1 3 9 9
# 4 8 6 9
Maybe something like this?
library(dplyr)
accuarcyMatrix %>%
filter_at(vars(accuarcy),
any_vars(.==max(.))
)
Base R solution (although this is very likely a duplicate):
accuarcyMatrix[ which(accuarcyMatrix$accuarcy == max(accuarcyMatrix$accuarcy) , ]
I'm guessing you will want to change "accuarcy" to "accuracy"
I am using dplyr to manipulate data. i have two columns : x and y. In a third column (say z), I'd like to have the first index of y in all the x column.
For instance:
For the first row I get 4 because 7 is in 4th position in x.
So I have tried
df <- df %>%
mutate(z = which (x==y)[1])
But the comparison is made elementwise (i.e. I get only fives in z). Hence my question : how to make the difference between vector to be taken element wise and vectors to be taken as vector in dplyr mutate ?
dplyr does not decide whether or not the function is applied element-wise. mutate only provides a syntax that lets you use other functions more concisely by recognising that if you refer to x inside mutate, you probably mean the column df$x in df. It also does one simple broadcasting step, where if you supply it a function that returns only a single value it will copy it to the whole output.
We can show the same behaviour with which and match outside of dplyr below. Because == does an element-wise comparison your first method returns all 5. match on the other hand, "returns a vector of the positions of (first) matches of its first argument in its second" (from the documentation) which is what you want. I compare the two syntaxes at the bottom to show that the key is the function you supply that determines how inputs are read, not mutate.
x = c(1,2,3,7,9)
y = c(7,3,9,1,9)
x == y
#> [1] FALSE FALSE FALSE FALSE TRUE
which(x == y)
#> [1] 5
match(y, x)
#> [1] 4 3 5 1 5
library(dplyr)
df <- data.frame(x, y)
df$z1 = match(df$y, df$x) # a base R syntax that forces you to specify the data frame name
df <- df %>% mutate(z2 = match(y, x)) # dplyr syntax that is more concise
df # they produce the same result
#> x y z1 z2
#> 1 1 7 4 4
#> 2 2 3 3 3
#> 3 3 9 5 5
#> 4 7 1 1 1
#> 5 9 9 5 5
Created on 2018-06-29 by the reprex package (v0.2.0).
Given this data.frame
x y z
1 1 3 5
2 2 4 6
I'd like to add the value of columns x and z plus a coefficient 10, for every rows in dat.
The intended result is this
x y z result
1 1 3 5 16 #(1+5+10)
2 2 4 6 18 #(2+6+10)
But why this code doesn't produce the desired result?
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
Coeff <- 10
# Function
process.xz <- function(v1,v2,cf) {
return(v1+v2+cf)
}
# It breaks here
sm <- apply(dat[,c('x','z')], 1, process.xz(dat$x,dat$y,Coeff ))
# Later I'd do this:
# cbind(dat,sm);
I wouldn't use an apply here. Since the addition + operator is vectorized, you can get the sum using
> process.xz(dat$x, dat$z, Coeff)
[1] 16 18
To write this in your data.frame, don't use cbind, just assign it directly:
dat$result <- process.xz(dat$x, dat$z, Coeff)
The reason it fails is because apply doesn't work like that - you must pass the name of a function and any additional parameters. The rows of the data frame are then passed (as a single vector) as the first argument to the function named.
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
Coeff <- 10
# Function
process.xz <- function(x,cf) {
return(x[1]+x[2]+cf)
}
sm <- apply(dat[,c('x','z')], 1, process.xz,cf=Coeff)
I completely agree that there's no point in using apply here though - but it's good to understand anyway.
I've got a seemingly simple question that I can't answer: I've got three vectors:
x <- c(1,2,3,4)
weight <- c(5,6,7,8)
y <- c(1,1,1,2,2,2)
I want to create a new vector that replicates the values of weight for each time an element in x matches y such that it produces the following new weight vector associated with y:
y_weight <- c(5,5,5,6,6,6)
Any thoughts on how to do this (either loop or vectorized)? Thanks
You want the match function.
match(y, x)
to return the indicies of the matches, the use that to build your new weight vector
weight[match(y, x)]
#Using plyr
library(plyr)
df<-as.data.frame(cbind(x,weight)) # converting to dataframe
df<-rename(df,c(x="y")) # rename x as y for joining dataframes
y<-as.data.frame(y) # converting to dataframe
mydata <- join(df, y, by = "y",type="right")
> mydata
y weight
1 1 5
2 1 5
3 1 5
4 2 6
5 2 6
6 2 6
I have a dataset like this:
x
A B
1 x 2
2 y 4
3 z 4
4 x 4
5 x 4
6 x 3
......
I want to know if in this dataset are present a same number of "A" upper than some value(for example 3).
Probably i will need to group this value in a temporary table getting this:
X Y z
4 1 1
and after this i will call another method (that i don't know) that gives me this result
X
because only the value X is present more than 3 times in my previous table.
Can R optimise this operation?
data<-data.frame(factor(c("x","y","z","x","x","x")),c(2,4,4,4,4,3))
To get the count of each letter, do
table(data[,1])
and to get the name of the factors with > 3
names(table(data[,1]))[table(data[,1]) > 3]
DonĀ“t know if I understand you right... whats with this B column?
Is this working for you?
set.seed(1234)
A <- sample(c("x", "y", "z"), 20, replace = TRUE)
Ad <- data.frame(table(A))
with(Ad, A[Freq >= 7])
[1] x y