This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
I try to get sum for each of the Flags 1-3 in my dataframe and keep same column names, so I get single row , but looks like I missing some df/Numeric conversion here, can you please advice , not sure whey I get dim(dfs) = NULL??
df <- data.frame(label=2017, F1=1:4, F2=2:5, F3=3:6)
df
dfs <- c( max(df$label), sum(df$F1), sum(df$F2), sum(df$F3))
#dfs <- data.frame(c( max(df$label), sum(df$F1), sum(df$F2), sum(df$F3)) )
dfs
str(dfs)
dim(dfs)
colnames(dfs) <-c('Label', 'F1','F2','F3')
## Error in `colnames<-`(`*tmp*`, value = c("Label", "F1", "F2", "F3")) :
## attempt to set 'colnames' on an object with less than two dimensions
Your c() creates a vector, not a data frame. If you convert your vector to a one-row data frame with as.data.frame(t(dfs)), you'll be able to set the column names.
You might also be interested in colSums(), or maybe even the How to sum variables by group? R-FAQ.
Related
This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 9 months ago.
I have thise code that generates random values as part of a sequence. I need to keep duplicates and remove values that are NOT repeated. Any help with this?
Apparently, a solution is supposed to contain '%/%, as.numeric, names, table, >'
Here is original code.
x <- sample(c(-10:10), sample(20:40, 1), replace = TRUE)
Any help would be appreciated!
I would use table(). Table will give you a list of all values and the number of times they occur.
vec <- c(1,2,3,4,5,3,4,5,4,5)
valueset <- unique(vec) # all unique values in vec, will be used later
#now we determine whihc values are occuring more than once
valuecounts <- table(vec) # returns count of all unique values, values are in the names of this variable as strings now regardless of original data type
morethanone <- names(valuecounts)[valuecounts>1] #returns values with count>1
morethanone <- as.numeric(morethanone) # converts strings back to numeric
valueset[valueset %in% morethanone] #returns the original values which passed the above cirteria
As a function....
duplicates <- function(vector){
# Returns all unique values that occur more than once in a list
valueset <- unique(vector)
valuecounts <- table(vector)
morethanone <- names(valuecounts)[valuecounts>1]
morethanone <- as.numeric(morethanone)
return( valueset[valueset %in% morethanone] )
}
Now try running duplicates(x)
This question already has answers here:
R Reshape data frame from long to wide format? [duplicate]
(2 answers)
Complete dataframe with missing combinations of values
(2 answers)
Closed 2 years ago.
I'm working on some bilateral trade data with each row consisting of the ID for exporter and importer, as well as their trade amount. I then want to map the trade amount of each row onto its corresponding cell in a matrix object which has the IDs of "exporter" and "importer" listed as "row" and "column" dimnames.
I am wondering what will be an easier way to do this? Below is my current working code.
# import data
mat <- readRDS(url("https://www.dropbox.com/s/aj1607s975c5gf6/mat.rds?dl=1"))
head(mat, 10)
# import ID
id <- readRDS(url("https://www.dropbox.com/s/6weala2j0idb16i/id.rds?dl=1"))
# create matrix (there are a total of 161 possible IDs though not all of them appear on the data)
matrix <- matrix(rep( 0, len=161*161), nrow = 161)
dimnames(matrix) <- list(unique(id), unique(id))
# how can I fill the trade value (in mat[, 3]) into the corresponding cell on the matrix by match mat[, 1] and mat[, 3] on the dimnames(matrix)?
Try with complete and pivot_wider from tidyr.
library(tidyr)
mat %>%
complete(pid = unique(id), rid = unique(id)) %>%
pivot_wider(names_from = pid, values_from = TradeValue)
This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 2 years ago.
I am trying to access a column in a data frame using a variable , this variable wil be populated
in a loop
atr<-"yield_kgha"
so what i want is the second line below where $atr to act like it was $yield_kgha
I tried $get(atr) with no luck ... how do I get atr to be taken literally
meanis=MEAN = mean(zones[[zonename]]$yield_kgha , na.rm = TRUE) #get the mean yield_kgha in the zone
meanis=MEAN = mean(zones[[zonename]]$atr , na.rm = TRUE) #get the mean yield_kgha in the zone
If we want to use an object, then use [[ instead of $ similar to the 'zonename'
mean(zones[[zonename]][[atr]] , na.rm = TRUE) #
This question already has answers here:
Replace all occurrences of a string in a data frame
(7 answers)
Closed 2 years ago.
I would like to replace a series of "99"s in my dataframe with NA. To do this for one column I am using the following line of code, which works just fine.
data$column[data$column == "99"] = NA
However, as I have a large number of columns I want to apply this to all columns. The following line of code isn't doing it. I assume it is because the third "x" is again a reference to the dataframe and not to a specific column.
data = lapply(data, function(x) {x[x == "99"] = NA})
Any advice on what I should change?
If you want to replace all 99, simply do
data[data=="99"] <- NA
If you want to stick to the apply function
apply(data, 2, function(x) replace(x, x=="99", NA))
This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 4 years ago.
I would like to extract all columns for which the values are numeric from a dataframe, for a large dataset.
#generate mixed data
dat <- matrix(rnorm(100), nrow = 20)
df <- data.frame(letters[1 : 20], dat)
I was thinking of something along the lines of:
numdat <- df[,df == "numeric"]
That however leaves me without variables. The following gives an error.
dat <- df[,class == "numeric"]
Error in class == "numeric" :
comparison (1) is possible only for atomic and list types
What should I do instead?
use sapply
numdat <- df[,sapply(df, function(x) {class(x)== "numeric"})]