Hello I am very new to the programming world and data science as well, and I am trying to work my way through it.
I am trying to assign values to the column in a data frame and using for loop such that the data frame is divided into ten groups and every row in every group is assigned a rank, such that row 1 to 10 is assigned as rank 1 and row 11 to 20 is assigned as rank 2 and so on. The original dimension of subset data set is 100 * 6
My data frame looks like
Data Frame
The codes I have written are:
x <- round(nrow(subset) / 10)
a=1
for(j in 1:10){
for(i in a:x){
subset[i, "rank"] = j
}
j = j + 1
a = x + 1
x = x * j
}
However, the loop runs infinitely and keeps on adding additional rows to the data frame. I had to manually stop the loop and the resulting dimension of the subset data frame was 17926 * 6.
Please help me understand where am I going wrong in writing the loop.
P.S. subset is a data frame name and not the subset function in R
Thanks in Advance !!
It might be better for you to start working with vectorized calculations instead of loops. This will help you in the future.
For example:
df <- data.frame(x = 1:100)
df$rank <- (df$x-1)%/%10 + 1
df
results in:
x rank
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
11 11 2
12 12 2
13 13 2
14 14 2
15 15 2
16 16 2
17 17 2
18 18 2
19 19 2
20 20 2
21 21 3
22 22 3
23 23 3
24 24 3
25 25 3
How about something like this:
subset$Rank <- ceiling(as.numeric(rownames(subset))/10)
The as.numeric converts the rowname into a number, dividing it by 10 and rounding up should give you what you need? Let me know if I've misunderstood.
Related
I am using R and working with this sample dataframe.
library(tibble)
library(stats)
set.seed(111)
conditions <- factor(c("1","2","3"))
df_sim <-
tibble::tibble(StudentID = 1:10,
Condition = sample(conditions,
size = 10,
replace = T),
XP = stats::rpois(n = 10,
lambda = 15))
This creates the following tibble.
StudentID
Condition
XP
1
2
8
2
3
11
3
3
16
4
3
12
5
1
22
6
3
16
7
1
18
8
3
8
9
2
14
10
1
17
I am trying create a new column in my dataframe called DyadID. The purpose of this column is to create a variable that is uniquely shared by two students in the dataframe — in other words, two students (e.g. Student 1 and Student 9) would share the same value (e.g. 4) in the DyadID column.
However, I only want observations linked together if they share the same Condition value. Condition contains three unique values (1, 2, 3). I want condition 1 observations linked with other condition 1 observations, 2 with 2, and 3 with 3.
Importantly, I'd like the students to be linked together randomly.
Ideally, I would like to stay within the tidyverse as that is what I am most familiar with. However, if that's not possible or ideal, any solution would be appreciated.
Here is a possible outcome I am hoping to achieve.
StudentID
Condition
XP
DyadID
1
2
8
4
2
3
11
1
3
3
16
2
4
3
12
1
5
1
22
3
6
3
16
NA
7
1
18
3
8
3
8
2
9
2
14
4
10
1
17
NA
Note that two students did not receive a pairing, because there was an odd number in condition 1 and condition 3. If there is an odd number, the DyadID can be NA.
Thank you for your help with this!
Using match to get a unique id according to Condition and sample for randomness.
library(dplyr)
df_sim <- df_sim %>% mutate(dyad_id = match(Condition,sample(unique(Condition))))
I've got a matrices list created as following:
#create the database
vect_date <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)
vect <- c(48,40,32,36,37,37,20,15,15,24,24,10,10,10)
vect <- as.data.frame(cbind(vect_date, vect))
vect <- vect[order(vect$vect_date),]
#create levels depending on vect$vect value
vect$level <- 1
for(i in 2:length(vect$vect)){vect$level[i] <- ifelse(vect$vect[i]==vect$vect[i-1], vect$level[i- 1],vect$level[i-1]+1)}
#create the list
monotone <- split(vect, f=vect$level)
Now, I would like to change vect$vect value of each of these levels/matrices depending on the vect$vect value of the subsequent matrix. I guess the issue consists of indexing elements and using for loops, but I don't know how to do that.
As an example, I would like to change the value of vect$vect depending on the fact that the subsequent is 10. In that case, the vect$vect value of that level should be multiplied by 100, obtaining:
vect <- c(48,40,37,36,37,37,20,15,15,2400,2400,10,10,10)
Any help would be great!
I think you can use factor in R first to get your levels:
vect_date <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)
vect <- c(48,40,32,36,37,37,20,15,15,24,24,10,10,10)
vect <- as.data.frame(cbind(vect_date, vect))
vect <- vect[order(vect$vect_date),]
vect$level = factor(vect$vect,levels=unique(vect$vect))
vect$level = as.numeric(vect$level)
So if we want to change the level that comes before vect that have values 10, we can do:
level_tochange = vect$level[vect$vect==10] - 1
level_tochange
[1] 8 8 8
This tells us we need to change rows with level == 8. Note I use %in% because in events where you have more than 2 levels with vect==10, this will still work:
rows_tochange = which(vect$level %in% level_tochange)
vect$vect[rows_tochange] = vect$vect[rows_tochange]*100
vect
vect_date vect level
1 1 48 1
2 2 40 2
3 3 32 3
4 4 36 4
5 5 37 5
6 6 37 5
7 7 20 6
8 8 15 7
9 9 15 7
10 10 2400 8
11 11 2400 8
12 12 10 9
13 13 10 9
14 14 10 9
I would like to give a sequence of numbers to a new column to a data frame. But this sequence will repeat several times based on a value in another column. (i.e It starts from 1 until that specific value will be changed to other value).
My problem is how to define the ending point for each sequence in r.
A part of my data frame with the column "V2" which I intend to add:
V1 V2(new added column with sequential numbers)
12 1
12 2
12 3
12 4
12 5
13 1
13 2
13 3
13 4
13 5
13 6
14 1
14 2
14 3
14 4
I tried to use the following code, which was not working!
count <- table(df$V1)
c <- as.integer(names(count)[df$V1==12])
repeat{
df$V2<- seq(1,c, by=1)
if(df$V1!=12){
break
}
}
It sounds like you might be looking for rle since you're interested in any time the "V1" variable changes.
Try the following:
> sequence(rle(df$V1)$lengths)
[1] 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4
rle is a very good solution but you could also have used ave:
tab$V2 <- ave(tab$V1, tab$V1, FUN=seq_along)
hth
Well Ananda beats my effort:
vec = numeric(0)
for(i in unique(df$V1)){
n = length(df$V1[df$V1 == i])
vec = c(vec, 1:n)
}
I am hoping to create a random sample from panel data based on the unique id.
For instance if you start with:
e = data.frame(id=c(1,1,1,2,2,3,3,3,4,4,4,4), data=c(23,34,45,1,23,45,6,2,9,39,21,1))
And you want a random sample of 2 unique ids:
out = data.frame(id=c(1,1,1,3,3,3), data=c(23,34,45,45,6,2))
Although sample gives me random unique ids
sample( e$id ,2) # give c(1,3)
I can't figure out how to use logical calls to return all the desired data.
I have tried a number of things including:
e[ e$id == sample( e$id ,2) ] # only returns 1/2 the data
Any ideas??? Its killing me.
I'm not entirely sure what your expected result should be, but does this work for what you're trying to do?
> e[e$id %in% sample(e$id, 2), ]
id data
6 3 45
7 3 6
8 3 2
9 4 9
10 4 39
11 4 21
12 4 1
Or maybe you want this:
> e[e$id %in% sample(unique(e$id), 2), ]
id data
1 1 23
2 1 34
3 1 45
9 4 9
10 4 39
11 4 21
12 4 1
In R I find myself doing something like this a lot:
adataframe[adataframe$col==something]<-adataframe[adataframe$col==something)]+1
This way is kind of long and tedious. Is there some way for me
to reference the object I am trying to change such as
adataframe[adataframe$col==something]<-$self+1
?
Try package data.table and its := operator. It's very fast and very short.
DT[col1==something, col2:=col3+1]
The first part col1==something is the subset. You can put anything here and use the column names as if they are variables; i.e., no need to use $. Then the second part col2:=col3+1 assigns the RHS to the LHS within that subset, where the column names can be assigned to as if they are variables. := is assignment by reference. No copies of any object are taken, so is faster than <-, =, within and transform.
Also, soon to be implemented in v1.8.1, one end goal of j's syntax allowing := in j like that is combining it with by, see question: when should I use the := operator in data.table.
UDPDATE : That was indeed released (:= by group) in July 2012.
You should be paying more attention to Gabor Grothendeick (and not just in this instance.) The cited inc function on Matt Asher's blog does all of what you are asking:
(And the obvious extension works as well.)
add <- function(x, inc=1) {
eval.parent(substitute(x <- x + inc))
}
# Testing the `inc` function behavior
EDIT: After my temporary annoyance at the lack of approval in the first comment, I took the challenge of adding yet a further function argument. Supplied with one argument of a portion of a dataframe, it would still increment the range of values by one. Up to this point has only been very lightly tested on infix dyadic operators, but I see no reason it wouldn't work with any function which accepts only two arguments:
transfn <- function(x, func="+", inc=1) {
eval.parent(substitute(x <- do.call(func, list(x , inc)))) }
(Guilty admission: This somehow "feels wrong" from the traditional R perspective of returning values for assignment.) The earlier testing on the inc function is below:
df <- data.frame(a1 =1:10, a2=21:30, b=1:2)
inc <- function(x) {
eval.parent(substitute(x <- x + 1))
}
#---- examples===============>
> inc(df$a1) # works on whole columns
> df
a1 a2 b
1 2 21 1
2 3 22 2
3 4 23 1
4 5 24 2
5 6 25 1
6 7 26 2
7 8 27 1
8 9 28 2
9 10 29 1
10 11 30 2
> inc(df$a1[df$a1>5]) # testing on a restricted range of one column
> df
a1 a2 b
1 2 21 1
2 3 22 2
3 4 23 1
4 5 24 2
5 7 25 1
6 8 26 2
7 9 27 1
8 10 28 2
9 11 29 1
10 12 30 2
> inc(df[ df$a1>5, ]) #testing on a range of rows for all columns being transformed
> df
a1 a2 b
1 2 21 1
2 3 22 2
3 4 23 1
4 5 24 2
5 8 26 2
6 9 27 3
7 10 28 2
8 11 29 3
9 12 30 2
10 13 31 3
# and even in selected rows and grepped names of columns meeting a criterion
> inc(df[ df$a1 <= 3, grep("a", names(df)) ])
> df
a1 a2 b
1 3 22 1
2 4 23 2
3 4 23 1
4 5 24 2
5 8 26 2
6 9 27 3
7 10 28 2
8 11 29 3
9 12 30 2
10 13 31 3
Here is what you can do. Let us say you have a dataframe
df = data.frame(x = 1:10, y = rnorm(10))
And you want to increment all the y by 1. You can do this easily by using transform
df = transform(df, y = y + 1)
I'd be partial to (presumably the subset is on rows)
ridx <- adataframe$col==something
adataframe[ridx,] <- adataframe[ridx,] + 1
which doesn't rely on any fancy / fragile parsing, is reasonably expressive about the operation being performed, and is not too verbose. Also tends to break lines into nicely human-parse-able units, and there is something appealing about using standard idioms -- R's vocabulary and idiosyncrasies are already large enough for my taste.