Data.frame filtering using %in% [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have the following data.frame:
qualifiers symbols values
1 Buy AAPL 326.0
2 Sell MSFT 598.3
3 Sell GOOGL 201.5
I want to keep only the rows where qualifiers is "Sell", and then remove qualifiers column.
So the new data.frame would be:
symbols values
1 MSFT 598.3
2 GOOGL 201.5
Here is what I've tried:
# Select the rows with "Sell" qualifier
valid_symbols <- df$symbols[df$qualifiers == "Sell"]
# Keep only these
df <- df[df$symbols %in% valid_symbols]
# Remove qualifiers column
df$qualifiers <- NULL
Line 1 is working as expected:
> valid_symbols
[1] MSFT GOOGL
Levels: AAPL GOOGL MSFT
But line 2 doesn't:
> df
symbols values
1 AAPL 326.0
2 MSFT 598.3
3 GOOGL 201.5
It seems like it is filtering out by column instead of by line.
So I wonder:
What is wrong in my code
Is there a most efficient/elegant way to achieve what I want

The reason why the code is not working is because the , is needed. By default, without using the ,, it thinks that we are providing the column index/column names etc.
df <- df[df$symbols %in% valid_symbols,]
#OP's code
df$qualifiers <- NULL
If the non-numeric columns are factor, then we may need to wrap with droplevels to remove the unused levels in those columns
df <- droplevels(df)
However, this can be done with subset
subset(df, qualifiers == "Sell", select = -1)
Or with dplyr filter
library(dplyr)
df %>%
filter(qualifiers == "Sell") %>%
select(2:3)

Related

Selecting column in dataframe returns NULL [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to access a column in my dataframe using dataframe$column format. But it returns NULL. What am I doing wrong ? Please help
As you can see from the output, you don't have a column called Ozone; the column, and the only one, you have is called V1. You will have to split the data in V1 into columns. This can be done using tidyr's separate, like so:
Data:
df <- data.frame(
V1 = c("Ozone,Solar.R,Wind,Temp,Month,Day",
"41,190,7.4,67,5,1")
)
First, get your column names:
col_names <- unlist(strsplit(df$V1[1], ","))
The column names are now stored in a vector:
col_names
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
Now transform df:
library(dplyr)
library(tidyr)
df %>%
# first rename the col to be transformed:
rename("Ozone,Solar.R,Wind,Temp,Month,Day" = V1) %>%
# remove the first row, which is now redundant:
slice(2:nrow(.)) %>%
# separate into columns using the `col_names`:
separate(1, into = col_names, sep = ",")
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1

Dividing all integers of a column in R [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am trying to divide all integers in a column with another integer. I have a database with a column that has integers that go above 1*10^20. Because of this my plots are way to big. I need to normalize the data to have a better understanding what is going on. For example, the data that I have:
[x][Day] [Amount]
[1] 1 1 23440100
[2] 2 2 41231020
[3] 3 3 32012010
I am using a data.frame for my own data, so here you have the data frame for the data above
x <- c(1,2,3)
day <- c(1,2,3)
Amount <- c(23440100, 41231020, 32012010)
my.data <- data.frame(x, day, Amount)
I tried using another answer, provided here, but that doesn't seem to work.
The code that I tried:
test <- my.data[, 3]/1000
Hope someone can help me out! Cheers, Chester
I guess you are looking for this?
my.data$Amount <- my.data$Amount/1000
such that
> my.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Use mutate from dplyr
Since you're using a data.frame, you can use this simple code:
library(dplyr)
mutated.data <- my.data %>%
mutate(Amount = as.integer(Amount / 1000))
> mutated.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Hope this helps.

Recode 2 variables to one in one line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Say I have a DF like:
df=data.frame(a=c(0,0,1,1),b=c(0,1,0,1))
only it has a long no. of rows. I'd like to create a column depending on simultaneous values of a & b, e.g
df
a b c
0 0 10
0 1 11
1 0 12
1 1 13
I take this can be done with inner joins, using sqldf or maybe dplyr; is there a quicker way, with or without libraries?
Thanks in advance, p
You could do:
library(dplyr)
df %>% mutate(newcol = paste0(a, b))
Depending on how you want the new column to be labelled.
If you have a vector of desired values, let's call it lookup:
lookup <- 10:100
df %>% mutate(newcol = lookup[as.factor(paste0(a, b))])
I think what you mean is that you have some other data frame (say called dictionary) with a c column, and you look up the (a, b) in the dictionary and grab the c from there??
df=data.frame(a=c(0,0,1,1),b=c(0,1,0,1))
dictionary <- df
dictionary$c <- 10:13
dictionary <- dictionary[sample(4), ] # shuffle it just to prove it works
In that case you can do
merge(df, dictionary, merge=c('a', 'b'), all.x=T)
And that will grab the matching c column from dictionary and plonk it into df. The all.x will put a NA there if there is no matching (a, b) in dictionary.
If speed becomes an issue, you might try data.table
library(data.table)
setDT(df) # convert to data.table
setDT(dictionary) # convert to data.table
# set key
setkey(df,a,b)
setkey(dictionary,a,b)
# merge
dictionary[df] # will be `df` with the `c` column added, `NA` if no match
Super cheaty and only applicable to this example but:
df$c <- 10 + df$b + df$a*2?
otherwise, look at ?merge

R programming- find lowest value [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've just started learning R. I wanted to know how can I find the lowest value in a column for a unique value in other column. For example, in this case I wanted to know the lowest avg price per year.
I have a data frame with about 7 columns, 2 of them being average price and year. The year is obviously recurrent ranges from 2000 to 2009. The data also has various NA's in different columns.
I have very less idea about running a loop or whatsoever in this regard.
Thank you :)
my data set looks something like this:
avgprice year
332 2002
NA 2009
5353 2004
1234 NA and so on.
To break down my problem to find first five lowest values from year 2000-2004.
s<-subset(tx.house.sales,na.rm=TRUE,select=c(avgprice,year)
s2<-subset(s,year==2000)
s3<-arrange(s2)
tail(s2,5)
I know the code fails miserably. I wanted to first subset my dataframe on the basis of year and avgprice. Then sort it for each year through 2000-2004. Arrange it and using tail() print the lowest five. However I also wanted to ignore the NAs
You could try
aggregate(averageprice~year, df1, FUN=min)
Update
If you need to get 5 lowest "averageprice" per "year"
library(dplyr)
df1 %>%
group_by(year) %>%
arrange(averageprice) %>%
slice(1:5)
Or you could use rank in place of arrange
df1 %>%
group_by(year) %>%
filter(rank(averageprice, ties.method='min') %in% 1:5)
This could be also done with aggregate, but the 2nd column will be a list
aggregate(averageprice~year, df1, FUN=function(x)
head(sort(x),5), na.action=na.pass)
data
set.seed(24)
df1 <- data.frame(year=sample(2002:2008, 50, replace=TRUE),
averageprice=sample(c(NA, 80:160), 50, replace=TRUE))

Merge 2 columns into one in dataframe [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
This should be simple, but I am struggling with it.
I want to combine two columns in a single dataframe into one. I have separate columns for custemer ID (20227) and year (2009). I want to create a new column that has both (2009_20227).
You could use paste
transform(dat, newcol=paste(year, customerID, sep="_"))
Or use interaction
dat$newcol <- as.character(interaction(dat,sep="_"))
data
dat <- data.frame(year=2009:2013, customerID=20227:20231)
Some alternative way with function unite in tidyr:
library(tidyr)
df = data.frame(year=2009:2013, customerID=20227:20231) # using akrun's data
unite(df, newcol, c(year, customerID), remove=FALSE)
# newcol year customerID
#1 2009_20227 2009 20227
#2 2010_20228 2010 20228
#3 2011_20229 2011 20229
#4 2012_20230 2012 20230
#5 2013_20231 2013 20231
Another alternative (using the example of #akrun):
dat <- data.frame(year=2009:2013, customerID=20227:20231)
dat$newcol <- paste(dat$year, dat$customerID, sep="_")

Resources