I have a data frame that consists of municipality names (factors) in the first column and number of projects (integers) in columns two and three.
Var.1<-c("Andover", "Avon", "Bethany")
Freq.x<-c(2,NA,10)
Freq.y<-c(4,2,9)
Projects<-data.frame(Var.1,as.integer(as.numeric(Freq.y)),as.integer(as.numeric(Freq.x)))
[Note: I am making the second and third columns as integers here because that's how they are categorized in my actual data set.]
I was able to take the row sums of the rows using:
Projects$Sum<-rowSums(Projects[,2:3])
However, I'm unable to figure out how to take the column sums. I tried using the following formula:
Projects[Total,]<-colSums(Projects[2:3,])
I get the error:
Error in colSums(Projects[2:3, ]) : 'x' must be numeric
Even when I convert the second and third columns to as.numeric, I get the same response.
Can someone advise how to obtain the column sums create a new row at the bottom which will house the results?
You can do something like this:
Var.1<-c("Andover", "Avon", "Bethany")
Freq.x<-c(2,NA,10)
Freq.y<-c(4,2,9)
freq <- cbind(Freq.x, Freq.y)
freq <- rbind(freq, colSums(freq, na.rm=TRUE))
Projects <- data.frame(name=c(Var.1, "Total"), freq)
In particular: keep numeric part separate and compute it's sums; add "TOtal" to the character vector before it will be converted to factor, and thereafter make the data.frame
Related
I have a dataframe with multiple columns that I want to group according to their names. When several columns names respond to the same pattern, I want them grouped in a single column and that column is the sum of the group.
colnames(dataframe)
[1] "Départements" "01...3" "01...4" "01...5" "02...6" "02...7" "02...8" "02...9" "02...10" "03...11"
[11] "03...12" "03...13" "04...14" "04...15" "05...16" "05...17" "05...18" "06...19" "06...20" "06...21"
So I use this bit of code that works just fine when every column are numeric, though the first one is character and therefore I hit an error. How can I exclude the first column from the code?
#Group columns by patern, look for a pattern and loop through
patterns <- unique(substr(names(dataframe_2012), 1, 3))` #store patterns in a vector
dataframe <- sapply(patterns, function(xx) rowSums(dataframe[,grep(xx, names(dataframe)), drop=FALSE]))
#loop through
This is the error code I get
Error in rowSums(DEPTpolicedata_2012[, grep(xx, names(DEPTpolicedata_2012)), :
'x' must be numeric
You can simply remove the first column using
patterns$Départements <- NULL
I have a dataset called credit_df with dimensions 32561*15. It has a column for native.country with 1843 missing values. missing values are given as ?
I have created a factor variable with the list of countries using the below code
country <- unique(credit_df$native.country)
The above code also came with one ? value as it was part of the dataset. So i have removed that alone using the below
country <- as.data.frame(country)
country %>% filter(country != "?")
Now the country factor variable has all the country names in the dataset. Now I would like to assign those to the missing values in the column randomly. How do i do it ?
I tried the below code per one of the suggested methods
credit_df$native.country[credit_df$native.country %in% c("?")] <-
sample(country, NROW(credit_df$native.country[credit_df$native.country %in% c("?")]), replace = T)
but all the "?" turned out to be missing values
sum(is.na(credit_df$native.country))
[1] 583
NOTE: Even not considering this example if any of you could suggest how to impute character values randomly I am okay with it.
Example : if I have a column of country with missing values . and I have a vector/dataframe with a bunch of country names. How do i assign them randomly to the missing values in the country column
You could try using sample()
credit_df$native.country[credit_df$native.country %in% c("?")] <-
sample(country, NROW(credit_df$native.country[credit_df$native.country %in% c("?")]), replace = T)
The sample command here creates a vector using random values form country. The length of the generated vector is the same length as the number of rows you want to replace. The replace = T argument is only needed if you want to take a sample larger than the population (didn't know how much rows there are to replace and how many values there are in country).
I have a data frame consisting of five character variables which represent specific bacteria. I then have thousands of observations of each variable that all begin with the letter K. eg
x <- c(K0001,K0001,K0003,K0006)
y <- c(K0001,K0001,K0002,K0003)
z <- c(K0001,K0002,K0007,K0008)
r <- c(K0001,K0001,K0001,K0001)
o <- c(K0003,K0009,K0009,K0009)
I need to identify unique observations in the first column that don't appear in any of the remaining four columns. I have tried the approach suggested here which I think would work if I could create individual vectors using select ...
How to tell what is in one vector and not another?
but when I try to create a vector for analysis using the code ...
x <- select(data$x)
I get the error
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "character
I have tried to mutate the vectors using as.factor and as.numeric but neither of these approaches work as the first gives an equivalent error as above, and as.numeric returns NAs.
Thanks in advance
The reference that you cited recommended using setdiff. The only thing that you need to do to apply that solution is to convert the four columns into one, so that it can be treated as a set. You can do that with unlist
setdiff(data$x, unlist(data[,2:5]))
"K0006"
I have a big data frame (22k rows, 400 columns) which is generated using read.csv from a csv file. It appears that every column is a factor and all the row values are the levels of this factor.
I now want to do some analysis (like PCA) but I can't work with it unless it is a matrix, but even when I try it like matrix, all I get is
> prcomp(as.matrix(my_data))
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Is there a way of transforming this data frame with factors to a simple big matrix?
I am new in R so forgive all the (maybe terrible) mistakes.
Thanks
You can do it that way:
df<-data.frame(a=as.factor(c(1,2,3)), b=as.factor(c(2,3,4)))
m<-apply(apply(df, 1, as.character), 1, as.numeric)
apply uses a method on the given data.frame. It is important not to leave out to transform it to character first, because otherwise it will be converted to the internal numeric representation of the factor.
To add column names, do this:
m<-m[-1,] # removes the first 'empty' row
colnames(m)<-c("a", "b") # replace the right hand side with your desired column names, e.g. the first row of your data.frame
One more tip. You probably read the data.frame from a file, when you set the parameter header=TRUE, the first row will not be the header but the column names of the data.frame will be correct.
I have a data frame which I will call "abs.data" that contains 265 columns (variables). I have another data frame which I will call "corr.abs" that contains updated data on a subset of the columns in "abs.data". Both data frames have an equal number of rows, n=551. I need to replace the columns in "abs.data" with the correct observations in "corr.abs" where the column names match. I have tried the following
abs.samps <- colnames(abs.data) #vector of column names in abs. data
corr.abs.samps <- colnames(corr.abs) #vector of column names in corr.abs
abs.data[,which(abs.samps %in% corr.abs.samps==TRUE)] <- corr.abs[,which(corr.abs.samps %in% abs.samps==TRUE)] #replace columns in abs.data with correct observations in corr.abs where the column names are the same
When I run the left and right side of the last line of code R pulls the right columns, but it fails to replace the columns in abs.data with the correct data in corr.abs. Any ideas why?
you can find the common column names using
comm_col <- intersect(colnames(abs.samps), colnames(corr.abs))
eg. you find X2 is the common column
you can first drop the columns, in this case X2 from abs.samps that you do not want using subset
x<-subset(abs.samps, select = -X2)
then you can just add the new column (eg. column name X2)to the new data frame
y<-cbind(corr.abs$X2,x)