I have a data frame with a large number of observations and I want to remove NA values in 1 specific column while keeping the rest of the data frame the same. I want to do this without using na.omit(). How do I do this?
We can use is.na or complete.cases to return a logical vector for subsetting
subset(df1, complete.cases(colnm))
where colnm is the actual column name
This is how I would do it using dplyr:
library(dplyr)
df <- data.frame(a = c(1,2,NA),
b = c(5,NA,8))
filter(df, !is.na(a))
# output
a b
1 1 5
2 2 NA
Pic shows the row number order
I am trying to add a variable to my data set that represents the row number; however every code I've found adds them in order as the rows are currently (1,2,3,4,5), rather than in the order the View option shows (129, 98, 21, 09). I need the order shown in the View option, as I am trying to merge with a another data set, and need the correct ("original row number").
I cannot add row numbers before making changes to the data set as the function doesn't work when I add the ID number.
Alternatively, being able to sort the data by row number would also help, but I don't know how to do that either (clicking on the arrow above the row number does nothing).
A bit of context
I am classifying network nodes in R. I made a matrix from the networks nodes and edges (using nodes2vec), and have to merge this matrix with nodes labels data set (this data set contains one variable which shows if nodes are positive or negative). The picture above shows the created matrix, and the original row numbers from the network data set are no longer in the original order. I need to add a variable to the matrix, that I converted to a data frame using:
netdf1 <- as.data.frame(network.node2vec)
that represents the original row number
what I tried
netdf1 <- netdf1 %>% mutate(id = row_number())
This just adds the row number as the rows are currently ordered so 1,2,3,4...
WHAT WORKED IN THE END == CORRECT ANSWER
db$ID <- rownames(db)
If I do understand your question right you have some kind of dataframe with row names that are not continuus? And now you want to have these row names in an extra column as numeric values?
You can use the row.names()-function and can convert them to numeric if you like:
# just creating a DF that might show what you mean:
testDF <- data.frame(x = 1:10, y = sample((1:1000), 10))
testDF <- testDF[testDF$y < 500,]
View(testDF)
# one possible way to get the row names
testDF$rowNum <- as.numeric(row.names(testDF))
And try to type ?sort to the console if you like to learn something about sorting vectors.
Let's say you have a data frame with row names that are out of order:
my_data <- data.frame(row.names = 5:1,
V1 = 1:5)
#> my_data
# V1
#5 1
#4 2
#3 3
#2 4
#1 5
dplyr::row_number() will add row numbers based on the current sorting, not based on the row names. (A general practice in the tidyverse is to eschew keeping useful data in the row names and to instead incorporate any sorts of row ID info into a variable.)
So you could use #user2554330's advice and add my_data$ID <- row.names(my_data) or the tidyverse equivalent of my_data %>% tibble::rownames_to_column(var = "ID"), then sort by that column.
my_data %>%
tibble::rownames_to_column(var = "ID") %>%
arrange(ID)
ID V1
1 1 5
2 2 4
3 3 3
4 4 2
5 5 1
I have a data frame which looks like this
where value of b ranges from 1:31 and alpha_1,alpha_2 and alpha_3 can only have value 0 and 1. for each b value i have 1000 observations so total 31000 observations. I want to group the entire dataset by b and wanted to count value of alpha columns ONLY when its value is 1. So the end result would have 31 observations (unique b values from 1:31) and count of alpha values when its 1.
how do i do this in R. I have tried using pipe methods in dplyr and nothing seems to be working.
We can use
library(dplyr)
df1 %>%
group_by(b) %>%
summarise_at(vars(starts_with("alpha")), sum)
I have a dataset in which I wish to sum each value in column n, with its corresponding value in column (n+(ncol/2)); i.e., so I can sum a value in column 1 row 1 with a value in column 12 row 1, for a dataset with 22 columns, and repeat this until column 11 is summed with column 22. The solution needs to work for hundreds of rows.
How do I do this using R, while ignoring the column names?
Suppose your data is
d <- setNames(as.data.frame(matrix(rnorm(100 * 22), nc = 22)), LETTERS[1:22])
You can do a simple matrix addition using numbers to select the columns:
output <- d[, 1:11] + d[, 12:22]
so, e.g.
all.equal(output[,1], d[,1] + d[,12])
# [1] TRUE
My data contains columns trial,sequence and message, the message Onset occurs only once in each trial, but at different sequence positions in different trials.
data<-data.frame(trial=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),sequence=c(1:10,1:10,1:10),message=c(NA,NA,NA,NA,"Onset",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Onset",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Onset",NA,NA,NA))
I want to create a new column called sequence_new so that in each trial level, the message Onset corresponds to "0" in the new column, like the following:
data_n<-data.frame(trial=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),sequence=c(1:10,1:10,1:10),message=c(NA,NA,NA,NA,'Onset',NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,'Onset',NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,'Onset',NA,NA,NA),sequence_new=c(-4,-3,-2,-1,0,1,2,3,4,5,-5,-4,-3,-2,-1,0,1,2,3,4,-6,-5,-4,-3,-2,-1,0,1,2,3))
Try
library(data.table)
setDT(data)[, sequence_new:=(1:.N)-which(message=='Onset'),trial]
Or
library(dplyr)
data %>%
group_by(trial) %>%
mutate(sequence_new = row_number()- which(message=='Onset'))
Or using base R
data$sequence_new <- with(data, ave(seq_along(message), trial,
FUN=seq_along) -ave(message=='Onset', trial, FUN=which))