Split data.frame function by a field

Split data.frame function by a field - r

I have a data frame function whose output is too lengthy which is being used as an output in a r shiny app. I want to spilt this by field fac. How could I do it. So I want tables which has fac= A and so on for the unique fields in fac. Thank you.
prod()
x y fac
1 1 1 C
2 1 2 B
3 1 3 B
4 1 4 B
5 1 5 A
6 1 6 B
7 1 7 B
8 1 8 C
9 1 9 C
10 1 10 C

Related

How to repeat query on different parts of a dataset in R?

I want to repeat a particular query on a large dataset and I am sure the answer to my question is quite basic, but after reading various sources on 'for' loops, repeat and replicate functions for about 2 hours, I still can't find any examples which appear to do what I need to do.
The dataset contains survey data from particular sites which are split into plots and each plot contains multiple species entries so the data looks like this:
SITE PLOT SPECIES
1 1 a
1 1 b
1 2 a
1 2 c
1 3 b
1 3 c
1 3 d
1 4 a
1 5 a
1 5 b
2 1 b
2 1 c
2 3 a
2 3 b
2 4 b
2 4 c
2 4 d
2 5 e
The actual data is over 6500 rows as there are hundreds of sites and each should contain 20 plots - the issue is some plots are missing from some sites, so what I need to do is establish how many plots are missing in total. I can use the following code to query how many unique plots are on each site so in the example below I query how many unique plots are in site number 7:
NROW(unique(df$PLOT[df$SITE=="7"]))
[20]
But I have hundreds of sites, so is there a function that will allow me to query each site automatically without manually changing the site number each time?

Here is a base R way with tapply.
x <- '
SITE PLOT SPECIES
1 1 a
1 1 b
1 2 a
1 2 c
1 3 b
1 3 c
1 3 d
1 4 a
1 5 a
1 5 b
2 1 b
2 1 c
2 3 a
2 3 b
2 4 b
2 4 c
2 4 d
2 5 e'
df1 <- read.table(textConnection(x), header = TRUE)
num_plots <- with(df1, tapply(PLOT, SITE, \(x) length(unique(x))))
which(num_plots != max(num_plots))
#> 2
#> 2
Created on 2022-05-26 by the reprex package (v2.0.1)

Not quite sure what you're going for but does this help?
Using data.table:
df <- read.table(text='SITE PLOT SPECIES
1 1 a
1 1 b
1 2 a
1 2 c
1 3 b
1 3 c
1 3 d
1 4 a
1 5 a
1 5 b
2 1 b
2 1 c
2 3 a
2 3 b
2 4 b
2 4 c
2 4 d
2 5 e', header=TRUE)
library(data.table)
setDT(df)[, .(plots=uniqueN(PLOT)), by=.(SITE)]
## SITE plots
## 1: 1 5
## 2: 2 4

Increment variable based on specific string change in r

I am looking to create a variable that increments when specific strings appear in a column. If strings "x", "y", or "z" appear in Event, I want the sequence to increment, otherwise I would like it to stay constant. Any help would be appreciated!
See table below:
Event Seq
1 a 1
2 b 1
3 x 2
4 c 2
5 a 2
6 b 2
7 y 3
8 a 3
9 z 4
10 b 4
11 y 5
12 a 5
13 b 5

This for loop can update the Seq as you requested in your question:
for(i in 1:nrow(df)){
if(df$Event[i] %in% c('x','y','z')){
df$Seq[i] <- df$Seq[i] + 1
}
}
> df
Event Seq
1 a 1
2 b 1
3 x 3
4 c 2
5 a 2
6 b 2
7 y 4
8 a 3
9 z 5
10 b 4
11 y 6
12 a 5
13 b 5

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3

You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

R cumulative sum based upon other columns

I have a data.frame as below. The data is sorted by column txt and then by column val. summ column is sum of value in val colummn and the summ column value from the earlier row provided that the current row and the earlier row have same value in txt column...How could i do this in R?
txt=c(rep("a",4),rep("b",5),rep("c",3))
val=c(1,2,3,4,1,2,3,4,5,1,2,3)
summ=c(1,3,6,10,1,3,6,10,15,1,3,6)
dd=data.frame(txt,val,summ)
> dd
txt val summ
1 a 1 1
2 a 2 3
3 a 3 6
4 a 4 10
5 b 1 1
6 b 2 3
7 b 3 6
8 b 4 10
9 b 5 15
10 c 1 1
11 c 2 3
12 c 3 6

If by "most earlier" (which in English is more properly written "earliest") you mean the nearest, which is what is implied by your expected output, then what you're talking about is a cumulative sum. You can apply cumsum() separately to each group of txt with ave():
dd <- data.frame(txt=c(rep("a",4),rep("b",5),rep("c",3)), val=c(1,2,3,4,1,2,3,4,5,1,2,3) );
dd$summ <- ave(dd$val,dd$txt,FUN=cumsum);
dd;
## txt val summ
## 1 a 1 1
## 2 a 2 3
## 3 a 3 6
## 4 a 4 10
## 5 b 1 1
## 6 b 2 3
## 7 b 3 6
## 8 b 4 10
## 9 b 5 15
## 10 c 1 1
## 11 c 2 3
## 12 c 3 6

Specifying Order of Variables for SAS Report

I am working on a project where I need to specify the order of objects in the data for a custom SAS report. I am having trouble with something that should be easy, here is an example of the data I am working with.
obs ord ord2 name
1 3 1 A
2 3 . B
3 3 . C
4 3 . D
5 4 1 E
6 4 . F
7 5 1 G
8 5 . H
9 5 . I
10 5 . J
What I'd like is...
obs ord ord2 name
1 3 1 A
2 3 2 B
3 3 3 C
4 3 4 D
5 4 1 E
6 4 2 F
7 5 1 G
8 5 2 H
9 5 3 I
10 5 4 J
So that for every first occurrence of ord, ord2 = 1,...,n_i.
Thanks for the help!

Just apply a group numbering to the original data set, provided that the table has been sorted by ord.
data table1;
set table1;
by ord;
ord2_ + 1;
if first.ord then ord2_ = 1;
drop ord2;
rename ord2_=ord2;
Run;

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Split data.frame function by a field - r

Related

How to repeat query on different parts of a dataset in R?

Increment variable based on specific string change in r

How to reverse a column in R

R cumulative sum based upon other columns

Specifying Order of Variables for SAS Report

Categories

Resources