Getting subset of data based on conditional values from two columns - r

I have a data frame called (image attached below) where I need to select rows based on conditional values of two columns. Specifically, I need to select those rows where c1Pos and c2Pos have consecutive values (e.g., 4 and 5, 3 and 2, etc.).
I have tried the following, but it doesn't work:
combined_df_locat_short_cues_consec<-subset(combined_df_color_locat_cues, subset=!(c2Pos==c1Pos+1|c1Pos==c1Pos-1))
Any help would be very much appreciated.
Thanks in advance,
Mikel

Please replace
subset=! with subset=
c1Pos==c1Pos-1 with c2Pos==c1Pos-1
combined_df_locat_short_cues_consec<-subset(combined_df_color_locat_cues, subset=(c2Pos==c1Pos+1|c2Pos==c1Pos-1))

Related

Sum rows after a specific column for uncertain number of columns

Let's say i want to sum values of rows in several data frames. I want to start with column 2 and sum every value, that comes after that column. The different data frames may have different numbers of columns though. I guess it can work with
rowSums(df[2:X]).
I just dont know, what to replace the X with. Or is there a totally different way of doing it?
Regards
In case you only want to exclude the first column you can write:
rowSums(df[-1])
or
rowSums(df[,-1])
Use ncol to get column number :
rowSums(df[2:ncol(df)])
You can also use length.
rowSums(df[2:length(df)])

Replacing part of dataset A with dataset B

I'm trying to replace a chunck of a dataset A (say, the columns 7 to 25 for some rows listed in a vector "rows") with a dataset B of the same size. The dataset B repeats one row whose values are contained in the vector "new_values." I tried the following code:
A[rows, 7:25] <- sapply(A[rows, 7:25], replace, values=new_values, list= 1:ncol(A[rows, 7:25]))
It's not working, however. What is happening is that the columns in A are all the same and each row has a different value in "new_values"
Any idea how to fix that?
Thank you!

Subset dataframe by unique id variables with certain number of rows

I have not found a clear answer to this question, so hopefully someone can put me in the right direction!
I have a nested data frame (panel data), with multiple observations within multiple individuals. I want to subset my data frame by those individuals (id) which have at least 20 rows of data.
I have tried the following:
subset1 = subset(df, table(df$id)[df$id] >= 20)
However, I still find individuals with less that 20 rows of data.
Can anyone supply a solution?
Thanks in advance
subset1 = subset(df, as.logical(table(df$id)[df$id] >= 20))
Now, it should work.
The subset function actually is getting a series of true and false from the condition part, which indicates if the row should be kept or not/ meet the condition or not. Hence, the output of the condition part should be a series of true or false.
However, if you put table(df$id)[df$id]>=20 in the console, you will see it returns an array rather than logic. In this case, it is pretty straight that you just need to turn it into logic. Then, it works.

Editting randomly sampled subset of an indexed subset in R?

I have a question about indexing and editing data structures in R. For instance, suppose I have a data frame myDF:
myDF=data.frame(a=rep(c(1,2),10), b=rep(0,20), c=rep(0,20), d=rep(0,20))
I know that I can use column a to index other columns and edit them like this:
myDF$b[myDF$a==1]=3
And I know I can use sample() to get 5 cells at random from a column and edit them like this:
myDF$c[sample(1:20,5)]=6
But how can I select a specific number of cells at random from among those selected based on another column, for editing purposes? E.g. what if I want to set the value of 5 random cells from d to 4 with the constraint that all of these cells also be from rows in which a==1?
You can combine sample and subsetting like his :
myDF$d[sample(which(myDF$a==1),5)]<-4
which selects the rows that fit the condition, then sample just select five of them and you update these rows d value.

Adding across a dataset

I have a dataset and I am trying to add across the columns. For example, say there are 50 rows and 100 columns. For each row I want to go through specific columns (not all) and add the results.
Thanks for any help!
apply(df[,c(1,5,10,11,15)],1,sum) will add columns 1,5,10,11, and 15.
rowSums is generally faster than apply(dat, 1, sum). Furthermore they both may need to have an additional argument to prevent NA values for sabotaging the results.
rowSums( dat[ , cols_to_sum] , na.rm=TRUE )
If you want to have an irregular selection of columns, i.e. different columns from different rows, then that too is possible but you will need to clarify the question.

Resources