Sum each four rows in a column - r

I'm a beginner in R and I really need your help. I'm trying to get a new dataframe that stores the sum of each five rows in my columns.
For example, I have a dataframe (delta) with two columns (A,B)
A B
2 3
1 2
3 2
4 5
3 7
5 6
2 5
and the output I'm looking for is
AA BB
13 19
16 22
17 25
where
13 = row1+row2+row3+row4+row5
16 = row2+row3+row4+row5+row6
and so on ...
I have no idea where to start. Thanks a lot for your help guys.

The subject refers to 4 rows but the example in the question refers to 5. We have used 5 below but if you intended 4 just replace 5 with 4 in the code.
1) rollsum Using the reproducible input in the Note at the end use rollsum . Omit the as.data.frame if a matrix is ok as output.
library(zoo)
as.data.frame(rollsum(DF, 5))
## A B
## 1 13 19
## 2 16 22
## 3 17 25
2) filter filter in base R works too. Note that if you have dplyr loaded it clobbers filter so in that case use stats::filter in place of filter to ensure you get the correct version.
setNames(as.data.frame(na.omit(filter(DF, rep(1, 5)))), names(DF))
## A B
## 1 13 19
## 2 16 22
## 3 17 25
Note
Lines <- "
A B
2 3
1 2
3 2
4 5
3 7
5 6
2 5"
DF <- read.table(text = Lines, header = TRUE)

Here is a data.table option using frollsum, e.g.,
> na.omit(setDT(df)[,lapply(.SD,frollsum,5)])
A B
1: 13 19
2: 16 22
3: 17 25
or
> na.omit(setDT(df)[,setNames(frollsum(.SD,5),names(.SD))])
A B
1: 13 19
2: 16 22
3: 17 25

Related

Randomly select number (without repetition) for each group in R

I have the following dataframe containing a variable "group" and a variable "number of elements per group"
group elements
1 3
2 1
3 14
4 10
.. ..
.. ..
30 5
then I have a bunch of numbers going from 1 to (let's say) 30
when summing "elements" I would get 900. what I want to obtain is to randomly select a number (from 0 to 30) from 1-30 and assign it to each group until I fill the number of elements for that group. Each of those should appear 30 times in total.
thus, for group 1, I want to randomly select 3 number from 0 to 30
for group 2, 1 number from 0 to 30 etc. until I filled all of the groups.
the final table should look like this:
group number(randomly selected)
1 7
1 20
1 7
2 4
3 21
3 20
...
any suggestions on how I can achieve this?
In base R, if you have df like this...
df
group elements
1 3
2 1
3 14
Then you can do this...
data.frame(group = rep(df$group, #repeat group no...
df$elements), #elements times
number = unlist(sapply(df$elements, #for each elements...
sample.int, #...sample <elements> numbers
n=30, #from 1 to 30
replace = FALSE))) #without duplicates
group number
1 1 19
2 1 15
3 1 28
4 2 15
5 3 20
6 3 18
7 3 27
8 3 10
9 3 23
10 3 12
11 3 25
12 3 11
13 3 14
14 3 13
15 3 16
16 3 26
17 3 22
18 3 7
Give this a try:
df <- read.table(text = "group elements
1 3
2 1
3 14
4 10
30 5", header = TRUE)
# reproducibility
set.seed(1)
df_split2 <- do.call("rbind",
(lapply(split(df, df$group),
function(m) cbind(m,
`number(randomly selected)` =
sample(1:30, replace = TRUE,
size = m$elements),
row.names = NULL
))))
# remove element column name
df_split2$elements <- NULL
head(df_split2)
#> group number(randomly selected)
#> 1.1 1 25
#> 1.2 1 4
#> 1.3 1 7
#> 2 2 1
#> 3.1 3 2
#> 3.2 3 29
The split function splits the df into chunks based on the group column. We then take those smaller data frames and add a column to them by sampling 1:30 a total of elements time. We then do.call on this list to rbind back together.
Yo have to generate a new dataframe repeating $group $element times, and then using sample you can generate the exact number of random numbers:
data<-data.frame(group=c(1,2,3,4,5),
elements=c(2,5,2,1,3))
data.elements<-data.frame(group=rep(data$group,data$elements),
number=sample(1:30,sum(data$elements)))
The result:
group number
1 1 9
2 1 4
3 2 29
4 2 28
5 2 18
6 2 7
7 2 25
8 3 17
9 3 22
10 4 5
11 5 3
12 5 8
13 5 26
I solved as follow:
random_sample <- rep(1:30, each=30)
random_sample <- sample(random_sample)
then I create a df with this variable and a variable containing one group per row repeated by the number of elements in the group itself

How to use less than or equal to a value of a column as a condition to select the row in another column?

Simple question, I think. Basically, I want to use the concept "less than or equal to a number" as the condition to select the row of one column, and then find the value on the same row in another column. But what happens if the number stated in the condition isn't found in the first column?
Let's assume this is my data frame:
df<-as.data.frame((matrix(c(1:10,11:20), nrow = 10, ncol = 2)))
df
V1 V2
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
Let's assume I want to use the condition <=5 in df$V1 to obtain the row that is used to find the value of the same row in df$V2.
df[which(df$V1 <= 5),2]
15
But what happens if the number used in the condition isn't found? Let's assume this is my new data.frame
V1 V2
1 1 11
2 2 12
3 3 13
4 4 14
5 6 15
6 7 16
7 8 17
8 9 18
9 10 19
10 11 20
Using the same above command df[which(df$V1 <= 5),2], I obtain a different answer. For some reason I obtain the entire column instead of one number.
11 12 13 14 15 16 17 18 19 20
Any suggestions?
Use the subset operator:
df[df[,2]<= 5,1]

How to give a "/" in a column name to a dataframe in R?

I wish to give a "/" (backslash) in a column name in a dataframe. Any idea how?
I tried following to no avail,
tmp1 <- data.frame("Cost/Day"=1:10,"Days"=11:20)
tmp1
Cost.Day Days
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I then tried this, it worked.
tmp <- data.frame(1:10,11:20)
colnames(tmp) <- c("Cost/Day","Days")
tmp
Cost/Day Days
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I would prefer giving the name while constructing the dataframe itself. I tried escaping it but it still didn't work.
tmp2 <- data.frame("Cost\\/Day"=1:10,"Days"=11:20)
tmp2
You can use check.names=FALSE in the data.frame. By default, it is TRUE. And when it is TRUE, the function make.names changes the colnames. ie.
make.names('Cost/Day')
#[1] "Cost.Day"
So, try
dat <- data.frame("Cost/Day"=1:10,"Days"=11:20, check.names=FALSE)
head(dat,2)
# Cost/Day Days
#1 1 11
#2 2 12
The specific lines in data.frame function changing the column names is
--------
if (check.names)
vnames <- make.names(vnames, unique = TRUE)
names(value) <- vnames
--------

R - indices of matching values of two data.tables

This is my first post at StackOverflow. I am relatively a newbie in programming and trying to work with the data.table in R, for its reputation in speed.
I have a very large data.table, named "Actions", with 5 columns and potentially several million rows. The column names are k1, k2, i, l1 and l2. I have another data.table, with the unique values of Actions in columns k1 and k2, named "States".
For every row in Actions, I would like to find the unique index for columns 4 and 5, matching with States. A reproducible code is as follows:
S.disc <- c(2000,2000)
S.max <- c(6200,2300)
S.min <- c(700,100)
Traces.num <- 3
Class.str <- lapply(1:2,function(x) seq(S.min[x],S.max[x],S.disc[x]))
Class.inf <- seq_len(Traces.num)
Actions <- data.table(expand.grid(Class.inf, Class.str[[2]], Class.str[[1]], Class.str[[2]], Class.str[[1]])[,c(5,4,1,3,2)])
setnames(Actions,c("k1","k2","i","l1","l2"))
States <- unique(Actions[,list(k1,k2,i)])
So if i was using data.frame, the following line would be like:
index <- apply(Actions,1,function(x) {which((States[,1]==x[4]) & (States[,2]==x[5]))})
How can I do the same with data.table efficiently ?
This is relatively simple once you get the hang of keys and the special symbols which may be used in the j expression of a data.table. Try this...
# First make an ID for each row for use in the `dcast`
# because you are going to have multiple rows with the
# same key values and you need to know where they came from
Actions[ , ID := 1:.N ]
# Set the keys to join on
setkeyv( Actions , c("l1" , "l2" ) )
setkeyv( States , c("k1" , "k2" ) )
# Join States to Actions, using '.I', which
# is the row locations in States in which the
# key of Actions are found and within each
# group the row number ( 1:.N - a repeating 1,2,3)
New <- States[ J(Actions) , list( ID , Ind = .I , Row = 1:.N ) ]
# k1 k2 ID Ind Row
#1: 700 100 1 1 1
#2: 700 100 1 2 2
#3: 700 100 1 3 3
#4: 700 100 2 1 1
#5: 700 100 2 2 2
#6: 700 100 2 3 3
# reshape using 'dcast.data.table'
dcast.data.table( Row ~ ID , data = New , value.var = "Ind" )
# Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27...
#1: 1 1 1 1 4 4 4 7 7 7 10 10 10 13 13 13 16 16 16 1 1 1 4 4 4 7 7 7...
#2: 2 2 2 2 5 5 5 8 8 8 11 11 11 14 14 14 17 17 17 2 2 2 5 5 5 8 8 8...
#3: 3 3 3 3 6 6 6 9 9 9 12 12 12 15 15 15 18 18 18 3 3 3 6 6 6 9 9 9...

How to automatically shrink down row numbers in R data frame when removing rows in R

I'm having a difficulty properly shrinking down the row numbers in a data frame.
I have a data set named "mydata" which I imported from a text file using R. The data frame has about 200 rows with 10 columns.
I removed the row number 3, 7, 9, 199 by using:
mydata <- mydata[-c(3, 7, 9, 199),]
When I run this command, the row 3,7,9,199 are gone from the list but the row number doesn't automatically shrink down to 196, but stays at 200. I feel like somehow these row numbers are attached to each "row" as part of the dataframe?
How do I fix this problem?
What puzzles me even more is that when I import the textfile using R Studio, I don't have any problem. (I see 196 when I run the above command). But when using R, I can't change the row number in a dataframe that matches the actual number of rows in the list.
Can anyone please tell me how to fix this??
You can simply do:
rownames(mydata) <- NULL
after performing the subsetting.
For example:
> mydata = data.frame(a=1:10, b=11:20)
> mydata = mydata[-c(6, 8), ]
> mydata
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
7 7 17
9 9 19
10 10 20
> rownames(mydata) <- NULL
> mydata
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 7 17
7 9 19
8 10 20
You could also use the data.table package which does not store row.names in the same way (see the data.table intro, instead it will print with the row number.
See the section on keys for how data.table works with row names and keys
data.table inherits from data.frame, so a data.table is a data.frame if functions and pacakges accept only data.frames.
eg
library(data.table)
mydata <- data.table(mydata)
mydata
## a b
## 1: 1 11
## 2: 2 12
## 3: 3 13
## 4: 4 14
## 5: 5 15
## 6: 6 16
## 7: 7 17
## 8: 8 18
## 9: 9 19
## 10: 10 20
mydata = mydata[-c(6, 8), ]
mydata
## a b
## 1: 1 11
## 2: 2 12
## 3: 3 13
## 4: 4 14
## 5: 5 15
## 6: 7 17
## 7: 9 19
## 8: 10 20

Resources