How to subset a dataframe based on one column level? [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I would like to subset a df based on one level in a column, i.e. keep all rows that only contain this unique level within a column.
For this example I want a df with all columns that meet the criteria "blue" in column "D" without losing information. Whether that is subset, filter, etc.
A B C D E
1 2 3 "blue" 8
7 4 6 "red" 5
5 9 1 "green" 2
I have tried the variations of the following script:
newdf = subset(df, D == "blue")
newdf = subset(df, levels(D) == "blue")

This should work:
newdf = df[df$D == "blue", ]

Related

R: how to sample rows with custom frequencies [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have a data frame in R that has two columns, one with last names, the other with the frequency of each last name. I would like to randomly select last names based on the frequency values (0 -> 1).
So far I have tried using the sample function, but it doesn't allow for specific frequencies for each value. Not sure if this is possible :/
df1 <- data.frame(names = c("John","Mary"),freq=c(0.2,0.8))
df1
# names freq
# 1 John 0.2
# 2 Mary 0.8
set.seed(1)
sample100 <- sample(
x = df1$names,
size = 100,
replace=TRUE,
prob=df1$freq)
table(sample100)
# sample100
# John Mary
# 17 83

First row not detected with R [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have my data in xls file.I try to read like this
> df = read.xls ("natgas.xls")
Output
df
Dec.2007 X2399154
1 Jan-2008 2733970
2 Feb-2008 2503421
3 Mar-2008 2278151
4 Apr-2008 1823867
5 May-2008 1576387
6 Jun-2008 1604249
7 Jul-2008 1708641
8 Aug-2008 1682924
9 Sep-2008 1460924
10 Oct-2008 1635827
Everything is OK,except the first line.
When I index second column
> df[,2]
[1] 2733970 2503421 2278151 1823867 1576387 1604249 1708641 1682924 1460924
the first value is missing.
How to solve this?
Looks like you need to add header = FALSE to your read.xls call (which seems to come from the gdata package):
df1 <- read.xls("natgas.xls", header = FALSE)

How to add a column with constant observation and another variable with consecutive numbers with a character in R [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I want to add a first column with consecutive numbers with characters in a existing data frame.
I use the following code. It does not work.
df$VARNAME_ <- paste0('COL', 1:5)(df)
I want to it look like this.
VARNAME_ old_var1 old_var2
COL1 1 2
COL2 1 2
COL3 1 2
COL4 1 2
COL5 1 2
Thanks in advance.
I am Sorry that I asked a stupid question. And now I figure out.
The solution is as following.
actual_df<-data.frame(df)#transfer matrix a to data frame
actual_df<-cbind(VARNAME_=paste0('COL', 1:5),actual_df) #add COL1~COL5 in the first column
actual_df<-cbind(ROWTYPE_ = 'PROX', actual_df) #Add a variable with constant observations in first column. Now the previous column become second one.
df$VARNAME_ = paste0('COL', 1:5)
will work

How can I subset based on multiple criteria? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I have a dataframe of this nature:
id year levels
A 1967 cat
B 1965 dog
C 1980 cat
A 1989 dog
B 1990 mouse
C 2010 pig
And I want to subset once using these criteria at the same time:
1. id = A
2. year > 1980
3. levels = dog
I know how to do subset(df, year>1980) but don't know how to combine these criteria.
When I do this,
sub<-subset(all,year>1980 & id == 'A' & levels == 'dog')
I get an empty dataframe
you can try:
df[df$id == "A" & df$year > 1980 & df$levels == "dog",]

WeiRd: R does not find value but it's just there [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Trying to merge two data frames, using a variable called hash_id. For some reason R does not recognize the hash-id's in one of the data frames, while it does so in the other.
I have checked and I just don't get it. See below how I checked:
> head(df1[46],1) # so I take the first 'hash-id' from df1
# hash_id
# 1 abab123123
> which(df2 == "abab123123", arr.ind=TRUE) # here it shows that row 6847 contains a match
# row col
# [1,] 6847 32`
> which(df1 == "abab123123", arr.ind=TRUE) # and here there is NO matching value!
# row col
#
One possibility is trailing or leading spaces in the concerned columns for one of the datasets. You could do:
library(stringr)
df1[, "hash_id"] <- str_trim(df1[,"hash_id"])
df2[, "hash_id"] <- str_trim(df2[, "hash_id"])
which(df1[, "hash_id"]=="abab123123", arr.ind=TRUE)
which(df2[, "hash_id"]=="abab123123", arr.ind=TRUE)
Another way would be use grep
grepl("\\babab123123\\b", df1[,"hash_id"])
grepl("\\babab123123\\b", df2[,"hash_id"])

Resources