I have the following dataframe:
df <- data.frame(Sample_name = c("01_00H_NA_DNA", "01_00H_NA_RNA", "01_00H_NA_S", "01_00H_NW_DNA", "01_00H_NW_RNA", "01_00H_NW_S", "01_00H_OM_DNA", "01_00H_OM_RNA", "01_00H_OM_S", "01_00H_RL_DNA", "01_00H_RL_RNA", "01_00H_RL_S"),
Pair = c("","", "S1","","","S2","","","S3","", "","S5"))
I am trying to create a new variable treatment based on sample_name. I used the following code:
df$treatment <- ifelse(grep("_NA_", df$sample_name, ignore.case = T), "nat",
ifelse(grep("_NW_", df$sample_name, ignore.case = T), "natH2",
ifelse(grep("_RL_", df$sample_name, ignore.case = T), "RNALat",
ifelse(grep("_OM_", df$sample_name, ignore.case = T ), "Om"))))
I don't understand what I am doing wrong here, I got an error saying
Error in $<-.data.frame(*tmp*, "treatment", value = logical(0)) :
replacement has 0 rows, data has 12
Any suggestions?
Got the answer, added grepl to each grep statement:
df$treatment <- ifelse(grepl("_NA_", df$sample_name, ignore.case = T), "nat",
ifelse(grepl("_NW_", df$sample_name, ignore.case = T ), "natH2",
ifelse(grepl("_RL_", df$sample_name, ignore.case = T), "RNALat",
ifelse(grepl("_OM_", df$sample_name, ignore.case = T ), "Om", "NA"))))
I have a list something like this:
my_data<- list(c(dummy= 300), structure(123.7, .Names = ""),
structure(143, .Names = ""), structure(113.675, .Names = ""),
structure(163.75, .Names = ""), structure(656, .Names = ""),
structure(5642, .Names = ""), structure(1232, .Names = ""))
I want the minimun and maximum values from this list
I have tried using
But I get an error: Error in min(weighted_mae) : invalid 'type' (list) of argument
typeof(my_data) #[1] "list"
class(my_data) #[1] "list"
What is the right way for getting the minimum and maximum from my_data?
You could do:
my_data |>
unlist(use.names = FALSE) |>
The following is the same, without piping:
range(unlist(my_data, use.names = FALSE))
If you want to get minimum and maximum values separately, then you could do:
min(unlist(my_data, use.names = FALSE))
max(unlist(my_data, use.names = FALSE))
When i run the code below in R, I get the error: 'FUN(left, right) : non-numeric argument to binary operator'. I tried to fix this by converting the variables that are characterized as 'character' to numeric variables by using the code : as.numeric(). However, NA's are introduced by coercion when I try to use that operator. As a result, the whole column in my datafram is empty as it shows only NA's for every row. Does anyone know to to fix this error? Thank you in advance!
sessie_03 <- read_csv("~/Downloads/sessie_03.csv")
sessie_03 <- read.csv("~/Downloads/sessie_03.csv", header = TRUE, sep = ",")
stars_master <- read.csv("~/Downloads/stars_master.csv", header = TRUE, sep =";")
stars_numbers <- read.csv("~/Downloads/stars_numbers.csv", header = TRUE, sep = ";", dec = ",")
new_stars_master <- tibble(Title.id = sessie_03$title_id,
Title.year = sessie_03$Year,
Star1.name = sessie_03$imdb.com_star1_name,
Star1.id = sessie_03$imdb.com_star1_id,
Star2.name = sessie_03$imdb.com_star2_name,
Star2.id = sessie_03$imdb.com_star2_id,
Star3.name = sessie_03$imdb.com_star3_name,
Star3.id = sessie_03$imdb.com_star3_id,
new_stars_numbers <- tibble(Star.id = stars_numbers$imdb_com_star_id,
Title.year = stars_numbers$ï..year,
"Title.year+1" = stars_numbers$ï..year + 1,
Star.rank = stars_numbers$the_numbers_com_starpower_rank
STP <- tibble(Title.id = new_stars_master$Title.id,
Star1.id = new_stars_master$Star1.id,
Star1.name = new_stars_master$Star1.name,
Star1.rank = new_stars_master %>% left_join(new_stars_numbers, by = c("Star1.id" = "Star.id",
"Title.year" = "Title.year+1"))
%>% select(Star.rank),
Star2.id = new_stars_master$Star2.id,
Star2.name = new_stars_master$Star2.name,
Star2.rank = new_stars_master %>% left_join(new_stars_numbers, by = c("Star2.id" = "Star.id",
"Title.year" = "Title.year+1"))
%>% select(Star.rank),
Star3.id = new_stars_master$Star3.id,
Star3.name = new_stars_master$Star3.name,
Star3.rank = new_stars_master %>% left_join(new_stars_numbers, by = c("Star3.id" = "Star.id",
"Title.year" = "Title.year+1"))
%>% select(Star.rank),
Star.power = (Star1.rank + Star2.rank + Star3.rank ) /3
I am very new to R and RStudio and currently running codes from Machine Learning with R Quick Start Guide to review bulk financial data. I am running the following code chunk in R:
t <- proc.time()
for (i in 1:length(myfiles)){
myfiles <- list.files(path = "~/MachineLearning/Banks_model", pattern = "20", full.names = TRUE)
filelist <- list.files(path = myfiles[i], pattern = "*", full.names = TRUE)
for (h in 1:length(filelist)){
#assuming tab separated values with a header
aux = as.data.frame(read_delim(filelist[h], "\t", escape_double = FALSE, col_names = FALSE, trim_ws = TRUE, skip = 2))
variables<-colnames(as.data.frame(read_delim(filelist[h], "\t", escape_double = FALSE, col_names = TRUE, trim_ws = TRUE, skip = 0)))
union <- Reduce(function(x,y) merge(x, y, all=TRUE,
by=c("ID RSSD","Reporting Period")), tables, accumulate=FALSE)
rm(list=ls()[! ls() %in% c(ls(pattern="year*"),"tables","t")])
proc.time() - t
and received the following error:
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns
The traceback in RStudio is as follows:
stop(ngettext(sum(bad), "'by' must specify a uniquely valid column", "'by' must specify uniquely valid columns"), domain = NA)
fix.by(by.x, x)
merge.data.frame(x, y, all = TRUE, by = c("ID RSSD", "Reporting Period"))
merge(x, y, all = TRUE, by = c("ID RSSD", "Reporting Period"))
merge(x, y, all = TRUE, by = c("ID RSSD", "Reporting Period"))
f(init, x[[i]])
Reduce(function(x, y) merge(x, y, all = TRUE, by = c("ID RSSD", "Reporting Period")), tables, accumulate = FALSE)
Any idea what the fix is please?
I'm trying to work with an API from the NS (Dutch train company). I want to have it in a dataframe format but I get this error when I run the following code:
NSspoorkaart <- GET("https://gateway.apiportal.ns.nl/Spoorkaart-API/api/v1/spoorkaart",
add_headers("Ocp-Apim-Subscription-Key" = "f354d5839ec5454fbaf1bc44304b1845"))
JSON <- fromJSON(content(NSspoorkaart, "text"), flatten = TRUE)
Data_NS <- as.data.frame(JSON)
Can someone explain me what I'm doing wrong?
Would this work?
NSspoorkaart <- GET("https://gateway.apiportal.ns.nl/Spoorkaart-API/api/v1/spoorkaart", add_headers("Ocp-Apim-Subscription-Key" = "f354d5839ec5454fbaf1bc44304b1845"))
NSspoorkaart.string <- content(NSspoorkaart, as = "text", encoding = "UTF-8")
NSspoorkaart.list <- jsonlite::fromJSON(NSspoorkaart.string)
NSspoorkaart.df <- NSspoorkaart.list$payload$features
I am currently using the code below very often to import a big dataset into R and forcing it to treat everything as character in order to avoid the truncation of rows. The code seems to work well, but I was wondering whether any of you knows how it could be simplified or improved to so it doesn't get so repetitive each time I need to do it.
dataset.path <- choose.files(caption = "Select dataset", multi = FALSE)
data.columns <- read_delim(dataset.path, delim = '\t', col_names = TRUE, n_max = 0)
data.coltypes <- c(rep("c", ncol(data.columns)))
data.coltypes <- str_c(data.coltypes, collapse = "")
dataset <- read_delim(dataset.path, delim = '\t', col_names = TRUE, col_types = data.coltypes)
like #Roland has suggested, you should write a function. here is one possibility:
foo <- function(){
dataset.path <- choose.files(caption = "Select dataset", multi = FALSE)
data.columns <- read_delim(dataset.path, delim = '\t', col_names = TRUE, n_max = 0)
data.coltypes <- paste(rep("c", ncol(data.columns)), collapse = "")
dataset <- read_delim(dataset.path, delim = '\t', col_names = TRUE, col_types = data.coltypes)
you can then just call foo() whenever you need to read a database in using this method.
your two liner:
data.coltypes <- c(rep("c", ncol(data.columns)))
data.coltypes <- str_c(data.coltypes, collapse = "")
can be collapsed into just one line and only using base R paste instead of str_c in the stringr package.