I would like to know if there is a way to count the number of lines in a R script.
Ignoring lines of comment.
I didn't find a solution on the Internet. But maybe I missed something.
Example sctipt tester.R with 8 lines, one commented:
x <- 3
x+1
x+2
#x+4
x*x
Function to count lines without comments:
foo <- function(path) {
rln <- read_lines(path)
rln <- rln[-grep(x = trimws(rln) , pattern = '^#')]
rln <- rln[ trimws(rln) != '']
return(length(rln))
}
Test run:
> foo('tester.R')
[1] 7
You could try this :
library(magrittr)
library(stringr)
library(readr)
number_of_lines_of_code <- function(file_path){
file <- readr::read_file(file_path)
file_lines <- file %>% stringr::str_split("\n")
first_character_of_lines <- file_lines %>%
lapply(function(line)stringr::str_replace_all(line," ","")) %>%
lapply(function(line)stringr::str_sub(line,1,1)) %>%
unlist
sum(first_character_of_lines != "#" & first_character_of_lines != "\r")
}
number_of_lines_of_code("your/file/path.R")
That doesn't seem like very useful information, but you can do this:
script <- "#assign
a <- 1
b <- 2
"
nrow(read.table(text = script, sep = "°"))
[1] 2
I use ° as the separator, because it's an unlikely character in most R scripts. Adjust that as needed.
Of course, this could be done much more efficiently outside of R.
Related
I have a problem with my R code.
I want to convert numbers written as a characters in vector powpow into real numbers. As usually I used as.numeric() function, but I have no idea why it doesn't work.
Here is my code, if anyone knows how to solve my problem, please write.
Thanks in advance.
The problematic part is start with comment "# średnia i kwantyle powierzchni powiatów woj. wlkp."
############################################################
### Zadanie 1 ###
library(rvest)
library(tidyverse)
library(magrittr)
url <- "https://pl.wikipedia.org/wiki/Wojew%C3%B3dztwo_wielkopolskie"
website_html <- url %>% read_html()
tbls <- website_html %>% html_nodes("table")
tabele <- tbls[11] %>% html_table() %>% as.data.frame()
head(tabele)
tabele <- tabele[, -1]
head(tabele)
length(colnames(tabele))
nazwy <- colnames(tabele)
nazwy[1] <- 'powiat'
nazwy[2] <- 'siedziba'
nazwy[3] <- 'ludnosc'
nazwy[4] <- 'powierzchnia'
nazwy[5] <- 'gestosc'
nazwy[6] <- 'urbanizacja'
nazwy[7] <- 'wyd_budzet'
nazwy[8] <- 'doch_budzet'
nazwy[9] <- 'zadluzenie'
nazwy[10] <- 'stopa'
nazwy -> colnames(tabele)
head(tabele)
powiaty <- tabele # rm(tabele)
# średnia i kwantyle powierzchni powiatów woj. wlkp.
str(powiaty$powierzchnia)
powpow <- powiaty$powierzchnia
str(powpow)
for(i in 1:length(powpow))
{
powpow[i] <- powpow[i] %>% gsub("\\,", "\\.", ., perl=TRUE) %>% as.numeric()
print(str(powpow[i]))
}
What I want is a powpow vector of numbers, not characters.
Depending on your global settings, you may need to replace , to . as decimal separators. An easy solution is as.numeric():
# if your global settings accept "," as a decimal separator
powpow_numeric <- as.numeric(powpow)
# if your global settings do NOT accept "," as a decimal separator
powpow_numeric <- as.numeric(sub(",", ".", powpow, fixed = T))
There is also a way to change your global settings if the first option doesn't work, but I don't know it off the top of my head. Maybe someone else can help with this.
You already loaded the tidyverse package. you can use de parse_number() function from readr and get a numeric vector out of powpow.
parse_number(powpow)
as.numeric(powpow) can do the same, but parse numbers will work in cases where the vector contains non numeric characters, like letters.
Anyway, base in what you have done I did as follow with all others variables that you will have to change:
powiaty <- powiaty %>%
mutate(powierzchnia = parse_number(powierzchnia),
urbanizacja = parse_number(urbanizacja),
wyd_budzet = parse_number(wyd_budzet),
doch_budzet = parse_number(doch_budzet),
# in the case of "zadluzenie" and "stopa" we have to change ',' by dots before parsing
zadluzenie = str_replace(zadluzenie, ",", "\\."),
stopa = str_replace(stopa, ",", "\\."),
zadluzenie = parse_number(zadluzenie),
stopa = parse_number(stopa))
glimpse(powiaty)
I'm working in R with strings like the following:
"a1_1;a1_2;a1_5;a1_6;a1_8"
"two1_1;two1_4;two1_5;two1_7"
I need to split these strings into two strings based on the last digit being less than 7 or not. For instance, the desired output for the two strings above would be:
"a1_1;a1_2;a1_5;a1_6" "a1_8"
"two1_1;two1_4;two1_5" "two1_7"
I attempted the following to no avail:
x <- "a1_1;a1_2;a1_5;a1_6;a1_8"
str_split("x", "(\\d<7);")
In an earlier version of the question I was helped by someone that provided the following function, but I don't think it's set up to handle digits both before and after the semicolon in the strings above. I'm trying to modify it but I haven't been able to get it to come out correctly.
f1 <- function(strn) {
strsplit(gsubfn("(;[A-Za-z]+\\d+)", ~ if(readr::parse_number(x) >= 7)
paste0(",", sub(";", "", x)) else x, strn), ",")[[1]]
}
Can anyone help me understand what I'd need to do to make this split as desired?
Splitting and recombining on ;, with a simple regex capture in between.
s <- c("a1_1;a1_2;a1_5;a1_6;a1_8", "two1_1;two1_4;two1_5;two1_7")
sp <- strsplit(s, ";")
lapply(sp,
function(x) {
l <- sub(".*(\\d)$", "\\1", x) < 7
c(paste(x[l], collapse=";"), paste(x[!l], collapse=";"))
}
)
# [[1]]
# [1] "a1_1;a1_2;a1_5;a1_6" "a1_8"
#
# [[2]]
# [1] "two1_1;two1_4;two1_5" "two1_7"
Below is my code. I use an extra variation "tmp" to clean the "ABC_Chla". Because the "Location_name" can change, I use "assign()" and "get()" function.
Location_name <- "ABC_"
tmp <- get(paste(Location_name,"DO",sep = "")) %>% filter(log.DO != -Inf)
assign(paste(Location_name,"DO",sep = ""), tmp)
My code can achieve this goal, but it seems not concise (introduce a temporary variable). Is there a better way?
Assuming the inputs shown reproducibly in the Note at the end (next time please make sure your question includes complete reproducible code including inputs) we can make the following changes:
use paste0 instead of paste
create a variable locname to hold the name of the data frame and a variable e to be the environment where our data frame is located
use e[[...]] instead of get and assign
use magrittr %<>% two-way pipe
possibly use filter(is.finite(log.DO)) -- not shown below
giving this code:
library(dplyr)
library(magrittr)
e <- .GlobalEnv # change if our data frame is in some other environment
locname <- paste0(Location_name, "DO")
e[[locname]] %<>%
filter(log.DO != -Inf)
The result is:
get(locname, e)
## log.DO
## 1 1
## 2 2
Alternative
This alternative only uses ordinary pipes. We use e and locname from above.
library(dplyr)
e[[locname]] <- e[[locname]] %>%
filter(log.DO != -Inf)
Note
Test input:
ABC_DO <- data.frame(log.DO = c(1, -Inf, 2))
Location_name <- "ABC_"
You only have a temporary variable because you store the data in tmp, i don't see it as a problem.But, n this case, the only thing that i see you can do is pass the code of tmp directly to assign, like:
assign(
paste(Location_name,"DO",sep = ""),
get(paste(Location_name,"DO",sep = "")) %>% filter(log.DO != -Inf)
)
The last 2 hours ive been tyring to figure this out but i cant figure it out.
i got a variable where it starts with 4 letters ends with 2 numbers.
Now i want to subset only those starting with KJHB and ends with a number between 20-33.
The function im trying is:
df <- mydata
x <- seq(20,33)
df2 <- subset(df, grepl('^KJHB & x$, col1))
Any idea?
Alright i came up with a not totally correct answer but its working for me.
x <- paste("KJHB",seq(20,33), sep = "")
x <- as.data.frame(table(x))
df2 <- subset(df, col1 %in% x$x)
not the most correct way but did the job and the code is simple so a novice like me can understand it xD.
You could try stringr. This doesn't exactly check that it's the beginning or end of the string, but if it's a uniform pattern this may be useful:
my_match = function(string, start_string, num_seq){
return( str_extract(string, start_string) &&
any( !is.na( str_extract(string, as.character(num_seq)) ))
}
is_matched = my_match(df$your_col, "KJHB", 20:33)
df2 = df1[ is_matched, ]
There is probably something smarter that can be done with str_locate too.
I am sorry, I could not find an answer to this question anywhere and would really appreciate your help.
I have .csv files for each hour of a year. The filename is written in the following way:
hh_dd_mm.csv (e.g. for February 1st 00:00--> 00_01_02.csv). In order to make it easier to sort the hours of a year I would like to change the filename to mm_dd_hh.csv
How can I write in R to change the filename from the pattern HH_DD_MM to MM_DD_HH?
a <- list.files(path = ".", pattern = "HH_DD_MM")
b<-paste(pattern="MM_DD_HH")
file.rename(a,b)
Or you could do:
a <- c("00_01_02.csv", "00_02_02.csv")
gsub("(\\d{2})\\_(\\d{2})\\_(\\d{2})(.*)", "\\3_\\2_\\1\\4", a)
#[1] "02_01_00.csv" "02_02_00.csv"
Not sure if this is the best solution, but seem to work
a <- c("00_01_02.csv", "00_02_02.csv")
b <- unname(sapply(a, function(x) {temp <- strsplit(x, "(_|[.])")[[1]] ; paste0(temp[[3]], "_", temp[[2]], "_", temp[[1]], ".", temp[[4]])}))
b
## [1] "02_01_00.csv" "02_02_00.csv"
You can use chartr to create the new file name. Here's an example..
> write.csv(c(1,1), "12_34_56")
> list.files()
# [1] "12_34_56"
> file.rename("12_34_56", chartr("1256", "5612", "12_34_56"))
# [1] TRUE
> list.files()
# [1] "56_34_12"
In chartr, you can replace the elements of a string, so long as it doesn't change the number of characters in the original string. In the above code, I basically just swapped "12" with "56", which is what it looks like you are trying to do.
Or, you can write a short string swapping function
> strSwap <- function(x) paste(rev(strsplit(x, "[_]")[[1]]), collapse = "_")
> ( files <- c("84_15_45", "59_95_21", "31_51_49",
"51_88_27", "21_39_98", "35_27_14") )
# [1] "84_15_45" "59_95_21" "31_51_49" "51_88_27" "21_39_98" "35_27_14"
> sapply(files, strSwap, USE.NAMES = FALSE)
# [1] "45_15_84" "21_95_59" "49_51_31" "27_88_51" "98_39_21" "14_27_35"
You could also so it with the substr<- assignment function
> s1 <- substr(files,1,2)
> substr(files,1,2) <- substr(files,7,8)
> substr(files,7,8) <- s1
> files
# [1] "45_15_84" "21_95_59" "49_51_31" "27_88_51" "98_39_21" "14_27_35"