Formula to count information of 2 different columns - count

I have a column A with various names of areas lets say Area 1 to 10 (repeated throughout the column, one in each cell). Then I have a column B with dates that something was done in that specific area, some cells no date is in yet because nothing was done. I need to create a summary where I count how many times that something was done in that specific area. That means I need to take each area (Area 1, area 2, area 3 etc.) and count how many times I did an action. I will know it was done by the fact that there is a date in column B. I need a formula that can help me calculate this.

Is this what you're looking for?
library(tidyverse)
# create sample data
df <- tibble(A=rep(c(1:10),3), B=rep(c(Sys.Date(), NA),15))
df
A B
1 1 2019-02-06
2 2 NA
3 3 2019-02-06
4 4 NA
5 5 2019-02-06
6 6 NA
7 7 2019-02-06
8 8 NA
9 9 2019-02-06
10 10 NA
...
# grouping and summarising it for column A
df %>%
mutate(count=ifelse(!is.na(B), 1, 0)) %>%
group_by(A) %>%
summarise(count=sum(count,na.rm=T))
A count
1 1 3
2 2 0
3 3 3
4 4 0
5 5 3
6 6 0
7 7 3
8 8 0
9 9 3
10 10 0

If I understand you well:
SELECT area_name, COUNT(action_date) WHERE action_date <> '' GROUP by area_name;

Related

R - How do you to count number of rows associated within two group_by() functions?

I have a dataset (see example below) in which each individual underwent two sessions, each with 4 trials. In each trial they could either pick correctly (1) or incorrectly (0) as designated by the y variable. I am trying to calculate the rate correct choices per individual per session. (This is an example dataset, the real one is larger and has many more rows so I don't want to do this by hand)
df
head(df, 16)
row name session_number y
1 1 Tom 1 1
2 2 Tom 1 1
3 3 Tom 1 0
4 4 Tom 1 0
5 5 Tom 2 1
6 6 Tom 2 0
7 7 Tom 2 1
8 8 Tom 2 0
9 9 Rob 1 0
10 10 Rob 1 1
11 11 Rob 1 0
12 12 Rob 1 1
13 13 Rob 2 0
14 14 Rob 2 1
15 15 Rob 2 0
16 16 Rob 2 1
For example, I want to know that Tom, on his first session, picked correctly in 0.50 of his trials. This is calculated by summing Y and dividing by the number of rows associated with "Tom" AND "Session 1". I can't seem to figure out how to calculate those number of rows though in a larger dataset.
I tried using group_by() and mutate(), but I still can't seem to get it to work because the count() is not working.
by_name_by_session <- df %>%
group_by(df$name) %>%
group_by(session_number) %>%
mutate(rate = (sum(df$y)/count(df$name)))
Thanks in advance to anyone who can help!

R Studio: Time Syncing Data Sets

I have a simple problem, and a bit more complicated twist at the end.
I have 2 datasets A & B (Separate when imported into R):
Dataset A is pulled from a DAQ that is sampling at 2000 times a second, while dataset B is pulled from a scope at 500 times a second. I have a test that records data from the DAQ and Scope for 5 seconds.
In R Studio I want to time synchronize this data and, for the sake of learning, how can I do it in both of the following ways?
1) Without duplicating values so filtering doesn't stair step:
A B
1 1 1
2 2 NA
3 3 NA
4 4 NA
5 5 2
6 6 NA
7 7 NA
8 8 NA
9 9 3
10 10 NA
11 11 NA
12 12 NA
2) With duplicating numbers if I don't want NA's in the functions I apply to the frame:
A B
1 1 1
2 2 1
3 3 1
4 4 1
5 5 2
6 6 2
7 7 2
8 8 2
9 9 3
10 10 3
11 11 3
12 12 3
Now here is the twist where it becomes a very unique problem I have. Lets say Dataset A records a bit before & after the 5 second test. Dataset A also has an extra column for "Trigger" which is either a 0 or a 1. 1 is a high that represents recording and basically where Dataset B starts. When it switches back to 0, Dataset B has finished recording.
Is there a way I can strategically do the above time sync in Dataset A? The reason I want to keep the data before & after the "true" recording section, is to make sure a filter or a filtfilt sweep will level out before the data truly starts.
Thanks for any help!

How to merge dating correctly

I'm trying to merge 7 complete data frames into one great wide data frame. I figured I have to do this stepwise and merge 2 frames into 1 and then that frame into another so forth until all 7 original frames becomes one.
fil2005: "ID" "abr_2005" "lop_2005" "ins_2005"
fil2006: "ID" "abr_2006" "lop_2006" "ins_2006"
But the variables "abr_2006" "lop_2006" "ins_2006" and 2005 are all either 0,1.
Now the things is, I want to either merge or do a dcast of some sort (I think) to make these two long data frames into one wide data frame were both "abr_2005" "lop_2005" "ins_2005" and abr_2006" "lop_2006" "ins_2006" are in that final file.
When I try
$fil_2006.1 <- merge(x=fil_2005, y=fil_2006, by="ID__", all.y=T)
all the variables with _2005 at the end if it is saved to the fil_2006.1, but the variables ending in _2006 doesn't.
I'm apparently doing something wrong. Any idea?
Is there a reason you put those underscores after ID__? Otherwise, the code you provided will work
An example:
dat1 <- data.frame("ID"=seq(1,20,by=2),"varx2005"=1:10, "vary2005"=2:11)
dat2 <- data.frame("ID"=5:14,"varx2006"=1:20, "vary2006"=21:40)
# create data frames of differing lengths
head(dat1)
ID varx2005 vary2005
1 1 1 2
2 3 2 3
3 5 3 4
4 7 4 5
5 9 5 6
6 11 6 7
head(dat2)
ID varx2006 vary2006
1 5 1 21
2 6 2 22
3 7 3 23
4 8 4 24
5 9 5 25
6 10 6 26
merged <- merge(dat1,dat2,by="ID",all=T)
head(merged)
ID varx2006 vary2006 varx2005 vary2005
1 1 NA NA 1 2
2 3 NA NA 2 3
3 5 1 21 3 4
4 5 11 31 3 4
5 7 13 33 4 5
6 7 3 23 4 5

R: How to use intervals as input data for histograms?

I would like to import the data into R as intervals, then I would like to count all the numbers falling within these intervals and draw a histogram from this counts.
Example:
start end freq
1 8 3
5 10 2
7 11 5
.
.
.
Result:
number freq
1 3
2 3
3 3
4 3
5 5
6 5
7 10
8 10
9 7
10 7
11 5
Some suggestions?
Thank you very much!
Assuming your data is in df, you can create a data set that has each number in the range repeated by freq. Once you have that it's trivial to use the summarizing functions in R. This is a little roundabout, but a lot easier than explicitly computing the sum of the overlaps (though that isn't that hard either).
dat <- unlist(apply(df, 1, function(x) rep(x[[1]]:x[[2]], x[[3]])))
hist(dat, breaks=0:max(df$end))
You can also do table(dat)
dat
1 2 3 4 5 6 7 8 9 10 11
3 3 3 3 5 5 10 10 7 7 5

Compute difference between rows in R and setting in zero first difference

Hi everybody I am trying to solve a little problem in R. I want to compute the difference between rows in a dataframe in R. My dataframe looks like this:
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
I want to create a new column named diff.var. This column saves the results of differences from rows in variable. One posibble solution is using diff() function. When I used this function I got this:
diff(df$x4)
[1] 2 6 -10 1 0 8 3
That works fine but when I try to apply in my dataframe using df$diff.var=diff(df$x4) I got this:
Error in `$<-.data.frame`(`*tmp*`, "diff.var", value = c(2, 6, -10, 1, :
replacement has 7 rows, data has 8
Due to the fact that the firs row doesn't have a previous row to compute the difference I want to set this in zero. I would like to get something this:
ID x2 x3 x4 diff.var
1 8 11 2 0
2 7 12 4 2
3 6 13 10 6
4 5 14 0 -10
5 4 15 1 1
6 3 16 1 0
7 2 17 9 8
8 1 18 12 3
Where the first element of diff.var is zero due to this element doesn't have a previous element. I would like to build a function to set firts element of diff.var is zero and that makes the differences for the next rows. I wish to create a new dataframe with all variables and diff.var because ID is used por posterior analysis with diff.var. diff() doesn't allow to create this new variable. Thanks for your help.
This question was already asked before in this forum and can be found elsewhere. Anyway, do what Frank suggests
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
df$vardiff <- c(0, diff(df$x4))
df
ID x2 x3 x4 vardiff
1 1 8 11 2 0
2 2 7 12 4 2
3 3 6 13 10 6
4 4 5 14 0 -10
5 5 4 15 1 1
6 6 3 16 1 0
7 7 2 17 9 8
8 8 1 18 12 3

Resources