Adding column with a value based on entry order of specific factor [duplicate] - r

This question already has answers here:
In R, how do I create consecutive ID numbers for each repetition in a separate variable?
(3 answers)
Closed 9 years ago.
I would like to add a column to a data frame where the values in the column are based upon the entry order for a specific factor in another column. So specifically for my data I would like to have a "1" for the first visit to a point, a "2" for the second visit, a "3" for the third etc. However, some points have repetitive visits for a given date and should share the same visit number.
The data frame is pre-sorted and looks something like this:
Transect Point Date
1 BEN 1 5/7/12
2 BEN 1 5/10/12
3 BEN 1 5/10/12
4 BEN 2 5/8/12
5 BEN 2 5/11/12
6 BEN 2 5/13/12
I would like to get something like this:
Transect Point Date Vist
1 BEN 1 5/7/12 1
2 BEN 1 5/10/12 2
3 BEN 1 5/10/12 2
4 BEN 2 5/8/12 1
5 BEN 2 5/11/12 2
6 BEN 2 5/13/12 3

Assuming your data.frame is called SODF, use ave:
within(SODF, {
Visit <- ave(Point, Point, FUN = seq_along)
})
# Transect Point Date Visit
# 1 BEN 1 5/7/12 1
# 2 BEN 1 5/10/12 2
# 3 BEN 1 5/13/12 3
# 4 BEN 2 5/8/12 1
# 5 BEN 2 5/11/12 2
If you are grouping by more than one column, for example "Transect" and "Point", change the ave statement to:
ave(Point, Transect, Point, FUN = seq_along)
There are, of course, other approaches, both using base R and using packages. Several of these are summarized and benchmarked by #Arun in his answer here.
Update to address new question requirements
One quick solution that comes to mind considering your new requirement is to first extract the unique cases, perform the index generation as done above, and merge the resulting table with your original table.
SODFunique <- SODF[!duplicated(SODF), ]
SODFunique <- within(SODFunique, {
Visit <- ave(Point, Transect, Point, FUN = seq_along)
})
merge(SODF, SODFunique, sort = FALSE)
# Transect Point Date Visit
# 1 BEN 1 5/7/12 1
# 2 BEN 1 5/10/12 2
# 3 BEN 1 5/10/12 2
# 4 BEN 2 5/8/12 1
# 5 BEN 2 5/11/12 2
# 6 BEN 2 5/13/12 3

Related

R: Matching and repeating occurence [duplicate]

This question already has answers here:
Complete dataframe with missing combinations of values
(2 answers)
Closed 2 years ago.
(sample code below) I have two data sets. One is a library of products, the other is customer id, date and viewed product and another detail.I want to get a merge where I see per each id AND date all the library of products as well as where the match was. I have tried using full_join and merge and right and left joins, but they do not repeat the rows. below is the sample of what i am trying to achieve.
id=c(1,1,1,1,2,2)
date=c(1,1,2,2,1,3)
offer=c('a','x','y','x','y','a')
section=c('general','kitchen','general','general','general','kitchen')
t=data.frame(id,date,offer,section)
offer=c('a','x','y','z')
library=data.frame(offer)
######
t table
id date offer section
1 1 1 a general
2 1 1 x kitchen
3 1 2 y general
4 1 2 x general
5 2 1 y general
6 2 3 a kitchen
library table
offer
1 a
2 x
3 y
4 z
and i want to get this:
id date offer section
1 1 1 a general
2 1 1 x kitchen
3 1 1 y NA
4 1 1 z general
...
(there would have to be 6*4 observations)
I realize because I match by offer it is not going to repeat the values like so, but what is another option to do that? Thanks a lot!!
You can use complete to get all combinations of library$offer for each id and date.
tidyr::complete(t, id, date, offer = library$offer)
# A tibble: 24 x 4
# id date offer section
# <dbl> <dbl> <chr> <chr>
# 1 1 1 a general
# 2 1 1 x kitchen
# 3 1 1 y NA
# 4 1 1 z NA
# 5 1 2 a NA
# 6 1 2 x general
# 7 1 2 y general
# 8 1 2 z NA
# 9 1 3 a NA
#10 1 3 x NA
# … with 14 more rows
You can use tidyr and dplyr to get the data. The crossing() function will create all combinations of the variables you pass in
library(dplyr)
library(tidyr)
t %>%
select(id, date) %>%
{crossing(id=.$id, date=.$date, library)} %>%
left_join(t)

subseting columns by the name of rows of another dataframe

I need to subset the columns of a dataframe taking into account the rownames of another dataframe.(in R)
Im trying to select the representative species of Brazilian Amazon subseting a great Brazilian database taking into account the percentage of representative location, information which is in another dataframe
> a <- data.frame("John" = c(2,1,1,2), "Dora" = c(1,1,3,2), "camilo" = c(1:4),"alex"=c(1,2,1,2))
> a
John Dora camilo alex
1 2 1 1 1
2 1 1 2 2
3 1 3 3 1
4 2 2 4 2
> b <- data.frame("SN" = 1:3, "Age" = c(15,31,2), "Name" = c("John","Dora","alex"))
> b
SN Age Name
1 1 15 John
2 2 31 Dora
3 3 2 alex
> result <- a[,rownames(b)[1:3]]
Error in `[.data.frame`(a, , rownames(b)[1:3]) :
undefined columns selected
I want to get this dataframe
John Dora alex
1 2 1 1
2 1 1 2
3 1 3 1
4 2 2 2
The simple a[,b$Name] does not work because b$Name is considered a factor. Be careful because it won't throw an error but you will get the wrong answer!
But this is easy to fit by using a[,as.character(b$Name)]instead!

Create a ID value based on an incremental value when a value in a column changes in R [duplicate]

This question already has answers here:
Is there a dplyr equivalent to data.table::rleid?
(6 answers)
Closed 5 years ago.
I would like to create a 'segment' ID so that:
If the value (in one column) is the same as the row before you maintain the same segment ID
However, if the value (in one column) is different than the row before the segment ID increments by one
I am currently trying to achieve this via:
require(dplyr)
person <- c("Mark","Mark","Mark","Mark","Mark","Steve","Steve","Tim", "Tim", "Tim","Mark")
df <- data.frame(person,stringsAsFactors = FALSE)
df$segment = 1
df$segment <- ifelse(df$person == dplyr::lag(df$person),dplyr::lag(df$segment),dplyr::lag(df$segment)+1)
But I am not getting the desired result through this method.
Any help would be appreciated
If you want to increment on change, try this
df %>% mutate(segment = cumsum(person != lag(person, default="")))
# person segment
# 1 Mark 1
# 2 Mark 1
# 3 Mark 1
# 4 Mark 1
# 5 Mark 1
# 6 Steve 2
# 7 Steve 2
# 8 Tim 3
# 9 Tim 3
# 10 Tim 3
# 11 Mark 4
A base R solution might look like this
c(1, cumsum(person[-1] != person[-length(person)]) +1)
[1] 1 1 1 1 1 2 2 3 3 3 4

How to use "cast" in reshape without aggregation

In many uses of cast I've seen, an aggregation function such as mean is used.
How about if you simply want to reshape without information loss.
For example, if I want to take this long format:
ID condition Value
John a 2
John a 3
John b 4
John b 5
John a 6
John a 2
John b 1
John b 4
To this wide-format without any aggregation:
ID a b
John 2 4
John 3 5
Alex 6 1
Alex 2 4
I suppose that this is assuming that observations are paired and you were missing value would mess this up but any insight is appreciated
In such cases you can add a sequence number:
library(reshape2)
DF$seq <- with(DF, ave(Value, ID, condition, FUN = seq_along))
dcast(ID + seq ~ condition, data = DF, value.var = "Value")
The last line gives:
ID seq a b
1 John 1 2 4
2 John 2 3 5
3 John 3 6 1
4 John 4 2 4
(Note that we used the sample input from the question but the sample output in the question does not correspond to the sample input.)

Match dataframe rows according to two variables (Indexing)

I am essentially trying to get disorganized data into long form for linear modeling.
I have 2 data.frames "rec" and "book"
Each row in "book" needs to be pasted onto the end of several of the rows of "rec" according to two variables in the row: "MRN" and "COURSE" which match.
I have tried the following and variations thereon to no avail:
i=1
newlist=list()
colnames(newlist)=colnames(book)
for ( i in 1:dim(rec)[1]) {
mrn=as.numeric(as.vector(rec$MRN[i]));
course=as.character(rec$COURSE[i]);
get.vector<-as.vector(((as.numeric(as.vector(book$MRN))==mrn) & (as.character(book$COURSE)==course)))
newlist[i]<-book[get.vector,]
i=i+1;
}
If anyone has any suggestions on
1)getting this to work
2) making it more elegant (or perhaps just less clumsy)
If I have been unclear in any way I beg your pardons.
I do understand I haven't combined any data above, I think if I can generate a long-format data.frame I can combine them all on my own
Sounds like you need to merge the two data-frames. Try this:
merge(rec, book, by = c('MRN', 'COURSE'))
and do read the help for merge (by doing ?merge at the R console) for more options on how to merge these.
I've created a simple example that may help you. In my case i wanted to paste the 'value' column from df1 in each row of df2, according to variables x1 and x2:
df1 <- read.table(textConnection("
x1 x2 value
1 2 12
1 3 56
2 1 35
2 2 68
"),header=T)
df2 <- read.table(textConnection("
test x1 x2
1 1 2
2 1 3
3 2 1
4 2 2
5 1 2
6 1 3
7 2 1
"),header=T)
library(sqldf)
sqldf("select df2.*, df1.value from df2 join df1 using(x1,x2)")
test x1 x2 value
1 1 1 2 12
2 2 1 3 56
3 3 2 1 35
4 4 2 2 68
5 5 1 2 12
6 6 1 3 56
7 7 2 1 35

Resources