This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 4 years ago.
In R i have a matrix that has several categorical values to it. Indexed size 2sqm, 4 sqm, 6sqm, number of units from 1-3, number of persons from 1-4 and then a column that has a summarized count from all the occurrences.
ex:
Size;Units;Pers;Count
4;3;4;3 # three time this row
2;1;1;2 # two times this row
6;2;2;1 # one times this row
How can i make the last column/vector multyply the rows so that is prints out:
Size;Units;Pers;Count
4;3;4;1
4;3;4;1
4;3;4;1
2;1;1;1
2;1;1;1
6;2;2;1
Either in spreadsheet or in R.
This is a assignment for school and i just cannot find the way to make the last vector (which i use as a constant to multiply the first 3 columns and yet still keep one in the last column entry.
We can replicate the sequence of rows by the 'Count' column and transform to create the 'Count' column of 1.
transform(df1[rep(1:nrow(df1), df1$Count),-4], Count=1)
This can be also done with wrapper function expandRows from library(splitstackshape)
library(splitstackshape)
transform(expandRows(df1, 'Count'), Count=1)
Related
This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 1 year ago.
I have this data frame of about 35'000 observations. The problem is that there are about 5'000 occurences (as exemplified by the first two and last two rows of the image) whereby I have two observations relating to the same COD_DOM but with differing values of RENDIMENTO. What I would like is to calculate the average RENDIMENTO for all COD_DOM which appear twice and thus keep only one observation with the average value.
If your data.frame is just these two columns, you should be able to use:
library(dplyr)
new_df <- data.frame %>%
group_by(COD_DOM) %>%
summarize(RENDIMENTO=mean(RENDIMENTO))
This question already has answers here:
R keep rows with at least one column greater than value
(3 answers)
Delete rows in R if a cell contains a value larger than x
(1 answer)
Closed 2 years ago.
I have a matrix, and I want to remove all rows that contain at least one element less than a value, 3 on this example. Sample data:
A=matrix(c(10,2,4,8,5,4,8,10,5),byrow=T,ncol=3)
#Remove rows that contain at least one value less than 3
final_matrix=matrix(c(8,5,4,8,10,5), byrow=T,ncol=3)
How to get to the final matrix from the initial matrix A? My real matrix contains thousands of rows tens of columns, this is a toy example. I tried A=A[A>3,] but I get an error "logical subscript too long"
This question already has answers here:
Remove last N rows in data frame with the arbitrary number of rows
(4 answers)
Closed 2 years ago.
I have a dataset consisting of 250 observations. I want to select all observations expect last. I know I can do this by following codes. But if do not know exact number of observations how I can do this.
dataset(mtcars)
mtcars_lag<-mtcars[1:31,]
## skipping first observation and selecting all
mtcars_forward<-mtcars[2:32,]
Using nrow() gets you the number of observations in the dataset. mtcars_subset <- mtcars[1:(nrow(mtcars)-1), ] will fetch you all observations except the last one.
EDIT: Added parenthesis in line with suggestion from MrFlick.
This question already has answers here:
Find duplicate values in R [duplicate]
(5 answers)
Closed 4 years ago.
I have an issue in selecting duplicate rows in R. A data fame has 14 columns and 1 million rows. I have to do row comparison i.e finding out identical rows, would be duplicate. I want to get the duplicate row by this method. My data frame is like
Data frame sample
Last two rows were identical, so need to mark it as flag value 1.
I don't know how to start with this.
I have tried these codes,
df <- unique(data[,1:97]) //this method gives me unique set not number of duplicates.
dim(data[duplicated(data),])[1] // this method gives me the number of duplicates but not ids.
I need to know the duplicate ids.
my intension is to check each row and written total number of duplicate rows or the line number.
Look into the duplicated() function. It can be used to remove the duplicated rows or inversely keep them as well
This question already has answers here:
Subset data to contain only columns whose names match a condition
(10 answers)
Closed 6 years ago.
I have a data frame with a set of species IDs in the ID column, and sample IDs as separate columns with the motif CA_**. The data look like this:
ID <- c('A','B','C')
CA_01 <- c(3,9,54)
CA_56 <- c(2,7,12)
CA_92 <- c(45,4,47)
d<- data.frame(ID,CA_01,CA_56,CA_92)
ID CA_01 CA_56 CA_92
A 3 2 45
B 9 7 4
C 54 12 47
I want to sum across the columns within each row, and generate a new column, that is the total abundance of each species ID across sample columns (final values 50, 20, 113). Furthermore, There are many other columns in my real data frame. I only want to sum across columns that start with CA_**.
NOTE: this is different than the question asked here, as the asker knows the positions of the columns the asker wants to sum. Imy example I only know that the columns start with the motif, CA_. I don't know the positions. Its also different that the question here, as I specifically ask how to sum across columns based on the grep command.
We can use grep to subset the columns having column names that start with CA_ and get the sum of the rows with rowSums.
d$newCol <- rowSums(d[grep('^CA\\_', names(d))])