This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 3 years ago.
I need to convert simple data.table to named vector.
Lets say I have data.table
a <- data.table(v1 = c('a', 'b', 'c'), v2 = c(1,2,3))
and I want to get the following named vector
b <- c(1, 2, 3)
names(b) <- c('a', 'b', 'c')
Is there a way to do it simple
Using setNames() in j:
a[, setNames(v2, v1)]
# a b c
# 1 2 3
We can use split and unlist to get it as named vector.
unlist(split(a$v2, a$v1))
#a b c
#1 2 3
Related
I want to extract a column from a dataframe in R based on a condition for another column in the same dataframe, the dataframe is given below.
b <- c(1,2,3,4)
g <- c("a", "b" ,"b", "c")
df <- data.frame(b,g)
row.names(df) <- c("aa", "bb", "cc" , "dd")
I want to extract all values for column b as a dataframe (with rownames) where column g has value 'b',
My required output is given below:
df
b
cc 3
dd 4
I have tried several methods like which or subset but it does not work. I have also tried to find the answer to this question on stackoverflow but I was not able to find it. Is there a way to do it?
Thanks,
You can use the subset function in base R -
subset(df, g == 'b', select = b)
# b
#bb 2
#cc 3
Using data.table
library(data.table)
setDT(df, key = 'g')['b', .(b)]
b
1: 2
2: 3
Or with collapse
library(collapse)
sbt(df, g == 'b', b)
b
1 2
2 3
This is the basic way of slicing data in r
df[df$g == 'b',]['b']
Or the tidyverse answer
df %>%
filter(g == 'b') %>%
select(b)
This question already has answers here:
Deleting multiple columns in R
(4 answers)
Closed 3 years ago.
I'm looking for a simple way to remove all columns between two columns in a dataframe in R.
So let's say I have a dataframe like so:
> test = data.frame('a' = 'a', 'b' = 'b', 'c'= 'c', 'd' = 'd', 'e' = 'e')
> test
a b c d e
1 a b c d e
I'd like to be able to do the following in a dplyr chain
test %>% delete_between(a,c)
>test
d e
1 d e
We can use
test %>%
select(-(a:c))
full = data.frame(group = c('a', 'a', 'a', 'a', 'a', 'b', 'c'), values = c(1, 2, 2, 3, 5, 3, 4))
filter = data.frame(group = c('a', 'b', 'c'), values = c(4, 3, 3))
## find rows of full where values are larger than filter for the given group
full[full$group == filter$group & full$values > filter$values, ]
prints an empty data.frame with the warning:
Warning messages:
1: In full$group == filter$group :
longer object length is not a multiple of shorter object length
2: In full$values > filter$values :
longer object length is not a multiple of shorter object length
I'm looking for all the rows in full that match that criteria, to end up with:
full
> group
group values
a 5
c 4
Using merge
full=merge(full,filter,by='group')
full=full[full$values.x>full$values.y,]
full$values.y=NULL
names(full)=c('group','values')
> full
group values
5 a 5
7 c 4
Or match
full$Filter=filter$values[match(full$group,filter$group)]
full=full[full$values>full$Filter,]
full$Filter=NULL
> full
group values
5 a 5
7 c 4
full[unlist(sapply(1:NROW(filter), function(i)
which(full$group == filter$group[i] & full$values > filter$values[i]))),]
# group values
#5 a 5
#7 c 4
Using base R functions Map, split, unlist, and logical indexing you can do
full[unlist(Map(">", split(full$values, full$group), split(filter$values, filter$group))),]
group values
5 a 5
7 c 4
here, you split the value vectors by group into lists and feed these to Map, which applies >. As Map returns a list, unlist returns a logical vector which is fed to [ for subsetting. Note that this requires that both data.frames are sorted by group and that each has the same levels in the group variable.
One option is to use dplyr.
library(dplyr)
dt <- full %>%
left_join(filter, by = "group") %>%
dplyr::filter(values.x > values.y) %>%
select(group, values = values.x)
dt
group values
1 a 5
2 c 4
Or purrr.
library(purrr)
dt <- full %>%
split(.$group) %>%
map2_df(filter %>% split(.$group), ~.x[.x$values > .y$values, ])
dt
group values
1 a 5
2 c 4
This question already has answers here:
Subsetting a data frame to the rows not appearing in another data frame
(5 answers)
Closed 6 years ago.
I have two dataframes.
selectedcustomersa is a dataframe with information about 50 customers. Fist column is the name (Group.1).
selectedcustomersb is another dataframe (same structure) with information about 2000 customers and customers from selectedcustomersa are included there.
I want selctedcustomersb without the customers from selctedcustomersa.
I tried:
newselectedcustomersb<-filter(selectedcustomersb, Group.1!=selectedcustomersa$Group.1)
One way to do this is to use the anti_join in dplyr as follows. It will work across multiple columns and such.
library(dplyr)
df1 <- data.frame(x = c('a', 'b', 'c', 'd'), y = 1:4)
df2 <- data.frame(x = c('c', 'd', 'e', 'f'), z = 1:4)
df <- anti_join(df2, df1)
df
x z
1 e 3
2 f 4
Try:
newselectedcustomersb <- filter(selectedcustomersb, !(Group.1 %in% selectedcustomersa$Group.1))
I have a question very similar to a previous one but I am unable to generalize it to my case.
I have data that looks sort of like this
Within each ID, I have several Vis rows. The ones of interest to me are only a and b. The data is such that for each column in the data (V1...V7), if a is present, b is present and for all values of a, b is missing and vice versa. I would like to combine Vis's a and b for each ID group such that I have a single row (either a or b or even a new one, it doesn't really matter) without any missing data for any of the columns.
Based on the image showed, may be this helps. Here I am using actual NAs with only a couple of V columns.
We create a numeric index for column names that start with 'V' followed by numbers ('nm1'). Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we use Map, loop over the columns specified by the index 'nm1' (SD[, nm1, with=FALSE]) and the 'Vis' column, replace the 'V' column elements where the 'Vis' is either 'a' or 'b' by the non-NA element (na.omit(x[..), and assign the output to the numeric index.
library(data.table)
nm1 <- grep('V\\d+',colnames(df1))
setDT(df1)[, (nm1):= Map(function(x,y)
replace(x, which(y %in% c('a', 'b')), na.omit(x[y %in% c('a', 'b')])),
.SD[,-1, with=FALSE], list(.SD[[1]])), ID]
We change the 'b' values to 'a'
df1[Vis=='b', Vis := 'a']
and get the unique rows
unique(df1)
# ID Vis V1 V2
#1: 2 a 1 2
#2: 2 c 4 5
#3: 3 a 3 4
#4: 4 a 2 3
#5: 4 c 3 4
#6: 4 d 1 1
data
df1 <- data.frame(ID= rep(c(2,3,4), c(3,2,4)), Vis=c('a', 'b', 'c', 'a',
'b', 'a', 'b', 'c', 'd'), V1= c(1, NA, 4, 3, NA, NA, 2, 3, 1),
V2= c(NA, 2, 5, 4, NA, 3, NA, 4, 1), stringsAsFactors=FALSE)
Just sum the values you need while removing NAs. There are more vectorized ways to do this, but the for loop is a bit clearer.
for(I in unique(df1$ID)) {
df_sub <- subset(df1, df1$ID==I & df1$Vis %in% c("a", "b"))
df1 <- subset(df1, df1$ID != I)
new_row <- apply(df_sub[, -1:-2], 2, sum, na.rm=TRUE)
df1 <- rbind(df1, c(ID=I, new_row))
}