Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to access a column in my dataframe using dataframe$column format. But it returns NULL. What am I doing wrong ? Please help
As you can see from the output, you don't have a column called Ozone; the column, and the only one, you have is called V1. You will have to split the data in V1 into columns. This can be done using tidyr's separate, like so:
Data:
df <- data.frame(
V1 = c("Ozone,Solar.R,Wind,Temp,Month,Day",
"41,190,7.4,67,5,1")
)
First, get your column names:
col_names <- unlist(strsplit(df$V1[1], ","))
The column names are now stored in a vector:
col_names
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
Now transform df:
library(dplyr)
library(tidyr)
df %>%
# first rename the col to be transformed:
rename("Ozone,Solar.R,Wind,Temp,Month,Day" = V1) %>%
# remove the first row, which is now redundant:
slice(2:nrow(.)) %>%
# separate into columns using the `col_names`:
separate(1, into = col_names, sep = ",")
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Creating factor levels in a dataset with NAs works for individual columns, but I need to iterate across many more columns (all start with 'impact.') and have struck a problem inside a dplyr mutate(across)
What am I doing wrong?
Reprex below
library(tribble)
library(dplyr)
df <- tribble(~id, ~tumour, ~impact.chemo, ~impact.radio,
1,'lung',NA,1,
2,'lung',1,NA,
3,'lung',2,3,
4,'meso',3,4,
5,'lung',4,5)
# Factor labels
trt_labels <- c('Planned', 'Modified', 'Interrupted', 'Deferred', "Omitted")
# Such that factor levels match labels as, retaining NAs where present:
data.frame(level = 1:5,
label = trt_labels)
# Create factor works for individual columns
factor(df$impact.chemo, levels = 1:5, labels = trt_labels)
factor(df$impact.radio, levels = 1:5, labels = trt_labels)
# But fails inside mutate(across)
df %>%
mutate(across(.cols = starts_with('impact'), ~factor(levels = 1:5, labels = trt_labels)))
Just making #27ϕ9's comment an answer: the purrr-style lambda function you specified inside across is not correct because it needs the first argument, which is the object the function should refer to (in this case, the dataframe columns selected by across).
To fix your issue, you should insert .x inside the lambda function, which is non other than a shortcut for function(x) x - see this page for more info about purrr-style lambda functions.
df %>%
mutate(across(.cols = starts_with('impact'), ~factor(.x, levels = 1:5, labels = trt_labels)))
# A tibble: 5 x 4
# id tumour impact.chemo impact.radio
# <dbl> <chr> <fct> <fct>
# 1 1 lung NA Planned
# 2 2 lung Planned NA
# 3 3 lung Modified Interrupted
# 4 4 meso Interrupted Deferred
# 5 5 lung Deferred Omitted
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have a data frame look like this
a <- c(10,NA,30,40,NA,60,70,80,90,90,80,90,10,40)
b <- c(l,k,l,l,k,l,l,l,k,k,l,l,k,l)
c <- c(1,1,1,2,2,2,2,2,3,3,3,4,4,4)
I want to group data frame by column 'b' and 'c', then replace row values in 'a' column by max value of each group. For example: the 1st and 2nd of the 'a' column would be replaced by 30. Here is my code:
df%>%group_by(b, c)%>%mutate(a = max(a, na.rm = TRUE))
Other values are replaced by max value but not NA. I don't know why mutatefunction rewrite NA by inf. Here is the result I have with my code:
a <- c(30,inf,30,80,inf,80,80,80,90,90,90,90,10,90)
But I want it like this:
a <- c(30,30,30,80,80,80,80,80,90,90,90,90,10,90)
Assuming your data are:
Tuong_df <- data.frame(
c(10,NA,30,40,NA,60,70,80,90,90,80,90,10,40),
c("l","l","l","l","l","l","l","l","k","k","k","k","k","k"),
c(1,1,1,2,2,2,2,2,3,3,3,4,4,4))
names(Tuong_df) <- c("Var1","Var2","Var3")
You have to run the following code:
Tuong_df_mod <- Tuong_df %>%
group_by(Var2,Var3) %>%
mutate(Var1=max(Var1,na.rm=TRUE))
Anyway, for the near future, it should be better if you release reproducible code.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have the following data.frame:
qualifiers symbols values
1 Buy AAPL 326.0
2 Sell MSFT 598.3
3 Sell GOOGL 201.5
I want to keep only the rows where qualifiers is "Sell", and then remove qualifiers column.
So the new data.frame would be:
symbols values
1 MSFT 598.3
2 GOOGL 201.5
Here is what I've tried:
# Select the rows with "Sell" qualifier
valid_symbols <- df$symbols[df$qualifiers == "Sell"]
# Keep only these
df <- df[df$symbols %in% valid_symbols]
# Remove qualifiers column
df$qualifiers <- NULL
Line 1 is working as expected:
> valid_symbols
[1] MSFT GOOGL
Levels: AAPL GOOGL MSFT
But line 2 doesn't:
> df
symbols values
1 AAPL 326.0
2 MSFT 598.3
3 GOOGL 201.5
It seems like it is filtering out by column instead of by line.
So I wonder:
What is wrong in my code
Is there a most efficient/elegant way to achieve what I want
The reason why the code is not working is because the , is needed. By default, without using the ,, it thinks that we are providing the column index/column names etc.
df <- df[df$symbols %in% valid_symbols,]
#OP's code
df$qualifiers <- NULL
If the non-numeric columns are factor, then we may need to wrap with droplevels to remove the unused levels in those columns
df <- droplevels(df)
However, this can be done with subset
subset(df, qualifiers == "Sell", select = -1)
Or with dplyr filter
library(dplyr)
df %>%
filter(qualifiers == "Sell") %>%
select(2:3)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
This should be simple, but I am struggling with it.
I want to combine two columns in a single dataframe into one. I have separate columns for custemer ID (20227) and year (2009). I want to create a new column that has both (2009_20227).
You could use paste
transform(dat, newcol=paste(year, customerID, sep="_"))
Or use interaction
dat$newcol <- as.character(interaction(dat,sep="_"))
data
dat <- data.frame(year=2009:2013, customerID=20227:20231)
Some alternative way with function unite in tidyr:
library(tidyr)
df = data.frame(year=2009:2013, customerID=20227:20231) # using akrun's data
unite(df, newcol, c(year, customerID), remove=FALSE)
# newcol year customerID
#1 2009_20227 2009 20227
#2 2010_20228 2010 20228
#3 2011_20229 2011 20229
#4 2012_20230 2012 20230
#5 2013_20231 2013 20231
Another alternative (using the example of #akrun):
dat <- data.frame(year=2009:2013, customerID=20227:20231)
dat$newcol <- paste(dat$year, dat$customerID, sep="_")
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Trying to merge two data frames, using a variable called hash_id. For some reason R does not recognize the hash-id's in one of the data frames, while it does so in the other.
I have checked and I just don't get it. See below how I checked:
> head(df1[46],1) # so I take the first 'hash-id' from df1
# hash_id
# 1 abab123123
> which(df2 == "abab123123", arr.ind=TRUE) # here it shows that row 6847 contains a match
# row col
# [1,] 6847 32`
> which(df1 == "abab123123", arr.ind=TRUE) # and here there is NO matching value!
# row col
#
One possibility is trailing or leading spaces in the concerned columns for one of the datasets. You could do:
library(stringr)
df1[, "hash_id"] <- str_trim(df1[,"hash_id"])
df2[, "hash_id"] <- str_trim(df2[, "hash_id"])
which(df1[, "hash_id"]=="abab123123", arr.ind=TRUE)
which(df2[, "hash_id"]=="abab123123", arr.ind=TRUE)
Another way would be use grep
grepl("\\babab123123\\b", df1[,"hash_id"])
grepl("\\babab123123\\b", df2[,"hash_id"])