Let's say i want to sum values of rows in several data frames. I want to start with column 2 and sum every value, that comes after that column. The different data frames may have different numbers of columns though. I guess it can work with
rowSums(df[2:X]).
I just dont know, what to replace the X with. Or is there a totally different way of doing it?
Regards
In case you only want to exclude the first column you can write:
rowSums(df[-1])
or
rowSums(df[,-1])
Use ncol to get column number :
rowSums(df[2:ncol(df)])
You can also use length.
rowSums(df[2:length(df)])
Related
I am still new to R and I am attempting to solve a seemingly simple problem. I would like to identify all of the unique combinations of values from 4 different rows, and update an additional column in my df to annotate whether or not it is unique.
Giving a df with columns A-Z, I have used the following code to identify unique combinations of column A,B,C,D, and E. I am trying to update column F with this information.
unique(df[ ,c("A", "B","C","D", "E")])
This returns each of the individual rows with unique combinations as expected, but I cannot figure out what the next step I should take in order to update column "F" with a value to indicate that it is a unique row. Thanks in advance for any pointers!
Is there a quick way to code for those specific vectors? Like I only want to use every 4th column in my matrix then plot the selected columns. I'm very new to R and have absolutely no idea what I'm doing. I know how to select a single vector and how to select a certain number in a row but that doesn't really help.
If you're looking to extract every 4th column from a matrix you can use seq().
Here's an example. I made a dummy dataset: foo<-matrix(c(rep(c(4,3,2,7),100)),nrow=10,ncol=10)
Then you can store the column indexes that you want from your matrix like so:
colsyouwant<-seq(from = 4, to = ncol(foo), by = 4)
from = whatever column you'd like to start from, in your case the 4th. Then you specify where you'd like to stop, so I used the ncol function to count how many columns are in the matrix. In this case my matrix isn't a multiple of 4 but it doesn't matter because seq stops before then. Then by=4 because you want to select every fourth column.
The colsyouwant now equals to 4 8. Simply use brackets and the name of your variable to get the columns you want out. foo[,colsyouwant]. Here the brackets just specify what part of the matrix I want as an output, it goes [rows,columns]. Since I want all the rows I leave that spot blank and then specify the rows using the colsyouwant variable, or in other words 4 8.
I am working in r, what I want to di is make a table or a graph that represents for each participant their missing values. i.e. I have 4700+ participants and for each questions there are between 20 -40 missings. I would like to represent the missing in such a way that I can see who are the people that did not answer the questions and possible look if there is a pattern in the missing values. I have done the following:
Count of complete cases in a data frame named 'data'
sum(complete.cases(mydata))
Count of incomplete cases
sum(!complete.cases(mydata$Variable1))
Which cases (row numbers) are incomplete?
which(!complete.cases(mydata$Variable1))
I then got a list of numbers (That I am not quite sure how to interpret,at first I thought these were the patient numbers but then I noticed that this is not the case.)
I also tried making subsets with only the missings, but then I litterly only see how many missings there are but not who the missings are from.
Could somebody help me? Thanks!
Zas
If there is a column that can distinguish a row in the data.frame mydata say patient numbers patient_no, then you can easily find out the patient numbers of missing people by:
> mydata <- data.frame(patient_no = 1:5, variable1 = c(NA,NA,1,2,3))
> mydata[!complete.cases(mydata$variable1),'patient_no']
[1] 1 2
If you want to consider the pattern in which the users have missed a particular question, then this might be useful for you:
Assumption: Except Column 1, all other columns represent the columns related to questions.
> lapply(mydata[,-1],function(x){mydata[!complete.cases(x),'patient_no']})
Remember that R automatically attach numbers to the observations in your data set. For example if your data has 20 observations (20 rows), R attaches numbers from 1 to 20, which is actually not part of your original data. They are the row numbers. The results produced by the R code: which(!complete.cases(mydata$Variable1)) correspond to those numbers. The numbers are the rows of your data set that has at least one missing data (column).
I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]
I have a dataset and I am trying to add across the columns. For example, say there are 50 rows and 100 columns. For each row I want to go through specific columns (not all) and add the results.
Thanks for any help!
apply(df[,c(1,5,10,11,15)],1,sum) will add columns 1,5,10,11, and 15.
rowSums is generally faster than apply(dat, 1, sum). Furthermore they both may need to have an additional argument to prevent NA values for sabotaging the results.
rowSums( dat[ , cols_to_sum] , na.rm=TRUE )
If you want to have an irregular selection of columns, i.e. different columns from different rows, then that too is possible but you will need to clarify the question.