Creating new variable based on index position and value of other variable - r

I could use some help. I need to add a new variable to a dataframe based on whether or not the value of a variable in a dataframe equals the index value of another vector. Below is a simplified example:
vector [2 7 15 4 5]
dataframe (4 variables; Index, Site, Quad, Count)
Index Site Quad Count
1 2 3 0
1 3 7 2
2 1 8 0
2 3 3 1
3 2 3 0
4 3 7 2
5 1 8 0
5 3 3 1
The variable I would like to create would match value of df$Index from the dataframe with the matching position in the vector. That is, when df$Index = 1, the new variable would be 2 (position 1 in the vector), when df$Index = 2, the new variable would be 7 (position 2 in the vector), when df$Index = 3, the new variable would be 3 (position 3 in the vector).
I've ended up in a R wormhole, and know the solution is simple, but I cannot seem to get it. Thanks for any help.

If your indexes are atually integer indices, for example
dd<-read.table(text="Index Site Quad Count
1 2 3 0
1 3 7 2
2 1 8 0
2 3 3 1
3 2 3 0
4 3 7 2
5 1 8 0
5 3 3 1", header=TRUE)
vec <- c(2, 7, 15, 4, 5)
Then you can create the new column with
dd$value <- vec[dd$Index]
dd
# Index Site Quad Count value
# 1 1 2 3 0 2
# 2 1 3 7 2 2
# 3 2 1 8 0 7
# 4 2 3 3 1 7
# 5 3 2 3 0 15
# 6 4 3 7 2 4
# 7 5 1 8 0 5
# 8 5 3 3 1 5

Related

R left_join() replacing joined values rather than adding in new columns

I have the following dataframes:
A<-data.frame(AgentNo=c(1,2,3,4,5,6),
N=c(2,5,6,1,9,0),
Rarity=c(1,2,1,1,2,2))
AgentNo N Rarity
1 1 2 1
2 2 5 2
3 3 6 1
4 4 1 1
5 5 9 2
6 6 0 2
B<-data.frame(Rank=c(1,5),
AgentNo.x=c(2,5),
AgentNo.y=c(1,4),
N=c(3,1),
Rarity=c(1,2))
Rank AgentNo.x AgentNo.y N Rarity
1 1 2 1 3 1
2 5 5 4 1 2
I would like to left join B onto A by columns "AgentNo"="AgentNo.y" and "N"="N" but rather than add new columns to A from B I want the same columns from A but where joined values have been updated and taken from B.
For any joined rows I want A.AgentNo to now be B.AgentNo.x, A.N to be B.N and A.Rarity to be B.Rarity. I would like to drop B.Rank and B.Agent.y completely.
The result should be:
Result<-data.frame(AgentNo=c(2,2,3,5,5,6), N=c(2,5,6,1,9,0), Rarity=c(1,2,1,1,2,2))
AgentNo N Rarity
1 2 3 1
2 2 5 2
3 3 6 1
4 5 1 2
5 5 9 2
6 6 0 2
After some data wrangling, you can use rows_update to update the rows of A by the values of B:
library(dplyr)
A <- A %>%
mutate(AgentNo.y = AgentNo)
B <- select(B, AgentNo = AgentNo.x, AgentNo.y, N, Rarity)
rows_update(A, B, by = "AgentNo.y") %>%
select(-AgentNo.y)
output
AgentNo N Rarity
1 2 3 1
2 2 5 2
3 3 6 1
4 5 1 1
5 5 9 2
6 6 0 2

subset.data.frame in R

I have a data frame of raw data:
raw <- data.frame(subj = c(1,1,1,2,2,2,3,3,3,4,4,4),
blah = c(0,0,0,1,1,1,1,0,1,0,0,0))
From it, I want to remove the bad subj.
badsubj <- c(1,4)
trim <- subset.data.frame(raw, subj != badsubj)
But for some reason, all the badsubj values are not removed:
subj blah
2 1 0
4 2 1
5 2 1
6 2 1
7 3 1
8 3 0
9 3 1
11 4 0
What am I doing wrong? Obersvations 2 and 11 should be excluded because they are members of badsubj.
raw[!raw$subj %in% badsubj, ]
wrong use of !=
The problem is that subj and badsubj do not have the same length. Therefore badsubj will be recycled until both vectors have the same length. Then your code compares elementwise the values in the output below.
subj badsubj
1 1 1
2 1 4
3 1 1
4 2 4
5 2 1
6 2 4
7 3 1
8 3 4
9 3 1
10 4 4
11 4 1
12 4 4

changing values in dataframe in R based on criteria

I have a data frame that looks like
> mydata
ID Observation X
1 1 3
1 2 3
1 3 3
1 4 3
2 1 4
2 2 4
3 1 8
3 2 8
3 3 8
I have some code that counts the number of observations per ID, determines which IDs have a number of observations that meet a certain criteria (in this case, >=3 observations), and returns a vector with these IDs:
> vals
[1] 1 3
Now I want to manipulate the X values associated with these IDs, e.g. by adding 1 to each value, giving a data frame like this:
> mydata
ID Observation X
1 1 4
1 2 4
1 3 4
1 4 4
2 1 4
2 2 4
3 1 9
3 2 9
3 3 9
I'm pretty new to R and am uncertain how I might do this. It might help to know that X is constant for each ID.
The call mydata$ID %in% vals returns TRUE or FALSE to indicate whether the ID value for each row is in the vals vector. When you add this to the data currently in mydata$X, the TRUE and FALSE are converted to 1 and 0, respectively, yielding the desired result:
mydata$X <- mydata$X + mydata$ID %in% vals
# mydata
# ID Observation X
# 1 1 1 4
# 2 1 2 4
# 3 1 3 4
# 4 1 4 4
# 5 2 1 4
# 6 2 2 4
# 7 3 1 9
# 8 3 2 9
# 9 3 3 9

Combine minimum values of row and column in matrix

Suppose I have a vector of size n=8 v=(5,8,2,7,9,12,2,1). I would like to know how to build a N x N matrix that compares every pair of values of v and returns the minimum value of each comparation. In this example, it would be like this:
5 5 2 5 5 5 2 1
5 8 2 7 8 8 2 1
2 2 2 2 2 2 2 1
5 7 2 7 7 7 2 1
5 8 2 7 9 9 2 1
5 8 2 7 9 12 2 1
2 2 2 2 2 2 2 1
1 1 1 1 1 1 1 1
Could you help me with this, please?
outer(v, v, pmin)
Notice the use of pmin, not min, as the former is vectorised but not the latter.

Summing two dataframes based on common value

I have a dataframe that looks like
day.of.week count
1 0 3
2 3 1
3 4 1
4 5 1
5 6 3
and another like
day.of.week count
1 0 17
2 1 6
3 2 1
4 3 1
5 4 5
6 5 1
7 6 13
I want to add the values from df1 to df2 based on day.of.week. I was trying to use ddply
total=ddply(merge(total, subtotal, all.x=TRUE,all.y=TRUE),
.(day.of.week), summarize, count=sum(count))
which almost works, but merge combines rows that have a shared value. For instance in the example above for day.of.week=5. Rather than being merged to two records each with count one, it is instead merged to one record of count one, so instead of total count of two I get a total count of one.
day.of.week count
1 0 3
2 0 17
3 1 6
4 2 1
5 3 1
6 4 1
7 4 5
8 5 1
9 6 3
10 6 13
There is no need to merge. You can simply do
ddply(rbind(d1, d2), .(day.of.week), summarize, sum_count = sum(count))
I have assumed that both data frames have identical column names day.of.week and count
In addition to the suggestion Ben gave you about using merge, you could also do this simply using subsetting:
d1 <- read.table(textConnection(" day.of.week count
1 0 3
2 3 1
3 4 1
4 5 1
5 6 3"),sep="",header = TRUE)
d2 <- read.table(textConnection(" day.of.week count1
1 0 17
2 1 6
3 2 1
4 3 1
5 4 5
6 5 1
7 6 13"),sep = "",header = TRUE)
d2[match(d1[,1],d2[,1]),2] <- d2[match(d1[,1],d2[,1]),2] + d1[,2]
> d2
day.of.week count1
1 0 20
2 1 6
3 2 1
4 3 2
5 4 6
6 5 2
7 6 16
This assumes no repeated day.of.week rows, since match will return only the first match.

Resources