How do I indicate/select a certain column in tibble? [duplicate] - r

This question already has answers here:
Subset / filter rows in a data frame based on a condition in a column
(3 answers)
Closed 2 years ago.
position price model url
<int> <chr> <chr> <chr>
1 1 "\nab 1.699,00 €\~ "\nGROUND CONTROL\n" NA
2 2 "\nab 1.999,00 €\~ "\nROOT MILLER\n" NA
3 3 "\nab 3.099,00 €\~ "\nPIKES PEAK\n" NA
4 4 "\n" "\nTHE BRUCE\n" NA
5 5 "\n" "\nCOUNT SOLO\n" NA
6 6 "\nab 1.849,00 €\~ "\nPSYCHO PATH\n" NA
7 7 "\nab 2.599,00 €\~ "\nTHRILL HILL\n" NA
8 8 "\nab 2.899,00 €\~ "\nTHRILL HILL TRAI~ NA
9 9 "\nab 2.149,00 €\~ "\nSOUL FIRE\n" NA
this is a 33x4 tibble I created. I would like to get rid of the whole row without price info e.g. 4th 5th rows(they do not have the data bc the product is not for sale). I thought of the filter or subset function with condition nchar(...)!=0 but I am having trouble indicating that column. Can you help me?

We can use a comparison operator to check whether the 'price' column is not equal to string "\n" to subset the rows
df2 <- subset(df1, price != "\n")
nchar will not be 0 when there is \n
nchar("\n")
#[1] 1

Related

How to turn characters into numbers from a column in a dataset? [duplicate]

This question already has answers here:
Test if a vector contains a given element
(8 answers)
Replace logical values (TRUE / FALSE) with numeric (1 / 0)
(7 answers)
Closed 1 year ago.
I have a dataframe in the following form:
Date equity company press Categorization Year Month Event greenwashing
<chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
1 07/30/21 153. JPMorgan NA NA NA NA NA 0
2 07/29/21 153 JPMorgan NA NA NA NA NA 0
3 07/28/21 152. JPMorgan NA NA NA NA NA 0
4 07/27/21 151. JPMorgan NA NA NA NA NA 0
5 07/26/21 152. JPMorgan NA NA NA NA NA 0
6 07/23/21 151. JPMorgan NA NA NA NA NA 0
In the column 'greenwashing' there are some variables that are in the format character such as: The Guardian, Financial Times, among others. I need to turn these characters into 1.
I already tried to name a word list and use the if else code:
word.list = c("Financial Times",
"The Guardian",
"Mena Report",
"States News Service",
"US Newshire",
"DeSmogBlog",
"PR Newshire",
"The New York Times")
if(word.list){
print("1")
We can do this easily by converting the logical to integer with +
df$greenwashing <- +(df$greenwashing %in% word.list)
Using tidyverse:
Let's say your data frame is df, working off your attempt
library(tidyverse)
df <- mutate(df, greenwashing = ifelse(greenwashing %in% word.list, 1, greenwashing)
You can try -
df$greenwashing <- as.integer(df$greenwashing %in% word.list)
This will change all the word.list values to 1 and rest to 0

Paste date in new column if condition is true in another R [duplicate]

This question already has an answer here:
Replace value using index [R]
(1 answer)
Closed 2 years ago.
I want to extract the date from a variable if the condition in another variable is true.
Example: if comorbidity1==10, extract the date from smr_01, otherwise NA
I also need to do this for if if comorbidity1==11 OR comorbidity1==12, extract the date from smr_01, otherwise NA
This is what I want my data to look like
comorbidity1 smr_01 NewDate
1 20120607 NA
10 20120607 20120607
10 20120613 20120613
3 20121103 NA
6 20150607 NA
12 20140509 NA
11 20120405 NA
I have tried this
fulldata$NewDate<-ifelse(fulldata$comorbidity1==10, fulldata$smr_01, NA)
but it is not pasting the date in the correct format.
what I am getting looks like this
comorbidity1 smr_01 NewDate
1 20120607 NA
10 20120607 4675
10 20120613 17856
3 20121103 NA
6 20150607 NA
12 20140509 NA
11 20120405 NA
smr_01 is classed as a date
Thank you
Try :
df$NewDate <- as.Date(NA)
inds <- df$comorbidity1 == 10
#For more than 1 value use %in%
#inds <- df$comorbidity1 %in% 10:12
df$NewDate[inds] <- df$smr_01[inds]
df

how to extract the value from multiple columns in a specific order [duplicate]

This question already has answers here:
Get Value of last non-empty column for each row [duplicate]
(3 answers)
Closed 4 years ago.
I have this dataset that contains variables from three previous years.
data <- read.table(text="
a 2015 2016 2017
1 100 100 100
2 1000 5 NA
3 10000 NA NA", header=TRUE)
I would like to create a new column in my data which contains the value from the most recent year. The order is 2017 ->2016 ->2015.
output <- read.table(text="
a 2015 2016 2017 recent
1 100 100 100 100
2 1000 5 NA 5
3 10000 NA NA 10000", header=TRUE)
I know that I can use "if" command to achieve it, but I am wondering if there is a quick and simple way to do it.
Thanks!
Here's a simple base R solution. This assumes that the years are sorted from left-right.
data$recent <- apply(data, 1, function(x) tail(na.omit(x), 1))
a X2015 X2016 X2017 recent
1 1 100 100 100 100
2 2 1000 5 NA 5
3 3 10000 NA NA 10000

How to express a variable as a function of 2 others in a dataframe composed of 3 vectors

I know it is fundamental but I can't find the trick ...
Here is an exemple :
Species <- c("dark frog",rep(c("elephant","tiger","boa"),3),"black mamba")
Year <- c(rep(2011,4),rep(2012,3),rep(2013,4))
Abundance <- c(2,4,5,6,9,2,1,5,6,8,4)
df <- data.frame(Species, Year, Abundance)
I would like to obtain another dataframe (3 rows *5 columns) with the abundance values in function of the species as the column names (each species appearing thus only one time) and the years as the row names (appearing one time also).
May someone help me please ?
You mean something like this?
> xtabs(Abundance~Year+Species, data=df)
Species
Year black mamba boa dark frog elephant tiger
2011 0 6 2 4 5
2012 0 1 0 9 2
2013 4 8 0 5 6
The class for the above is a table, so if you prefer a data.frame instead, you can try:
library(tidyr)
new.df<- spread(df, key = Species, value = Abundance)
Year black mamba boa dark frog elephant tiger
1 2011 NA 6 2 4 5
2 2012 NA 1 NA 9 2
3 2013 4 8 NA 5 6
If you want 0s instead of NA add the following line:
new.df[is.na(new.df)]<- 0

cross sectional sub-sets in data.table

I have a data.table which contains multiple columns, which is well represented by the following:
DT <- data.table(date = as.IDate(rep(c("2012-10-17", "2012-10-18", "2012-10-19"), each=10)),
session = c(1,2,3), price = c(10, 11, 12,13,14),
volume = runif(30, min=10, max=1000))
I would like to extract a multiple column table which shows the volume traded at each price in a particular type of session -- with each column representing a date.
At present, i extract this data one date at a time using the following:
DT[session==1,][date=="2012-10-17", sum(volume), by=price]
and then bind the columns.
Is there a way of obtaining the end product (a table with each column referring to a particular date) without sticking all the single queries together -- as i'm currently doing?
thanks
Does the following do what you want.
A combination of reshape2 and data.table
library(reshape2)
.DT <- DT[,sum(volume),by = list(price,date,session)][, DATE := as.character(date)]
# reshape2 for casting to wide -- it doesn't seem to like IDate columns, hence
# the character DATE co
dcast(.DT, session + price ~ DATE, value.var = 'V1')
session price 2012-10-17 2012-10-18 2012-10-19
1 1 10 308.9528 592.7259 NA
2 1 11 649.7541 NA 816.3317
3 1 12 NA 502.2700 766.3128
4 1 13 424.8113 163.7651 NA
5 1 14 682.5043 NA 147.1439
6 2 10 NA 755.2650 998.7646
7 2 11 251.3691 695.0153 NA
8 2 12 791.6882 NA 275.4777
9 2 13 NA 111.7700 240.3329
10 2 14 230.6461 817.9438 NA
11 3 10 902.9220 NA 870.3641
12 3 11 NA 719.8441 963.1768
13 3 12 361.8612 563.9518 NA
14 3 13 393.6963 NA 718.7878
15 3 14 NA 871.4986 582.6158
If you just wanted session 1
dcast(.DT[session == 1L], session + price ~ DATE)
session price 2012-10-17 2012-10-18 2012-10-19
1 1 10 308.9528 592.7259 NA
2 1 11 649.7541 NA 816.3317
3 1 12 NA 502.2700 766.3128
4 1 13 424.8113 163.7651 NA
5 1 14 682.5043 NA 147.1439

Resources