How to change the order of my results? - r

I have some data got from the database and then I execute the floowing code
for(i in length(lreturns):1)
stand[i] <- sd(lreturns[i : (i + 8)])
rolling <- stand * sqrt(252) * 100
and get this
[1] 203058612348
[9] 158456851532156
[17] 563548153215322
[25] 271515527841123222
[33]NA NANANANANANANA
[41] NANANANANANA
My question is how to show the NA first, and afterward everything else ? I am using R.
[1]NA NANANANANANANA
[9] NANANANANANA2030
[17]586123481584
[25] 568515321565635
[33] 48153215322271515
[41] 527841123222

I would just write
c(rolling[is.na(rolling)], rolling[!is.na(rolling)])
If you don't care about the non-NA values being sorted, you can use sort() as #BenBolker suggested in the comments. There is an argument na.last in this function
sort(rolling, na.last = FALSE)

Related

rbind error while performing a for-loop: duplicate 'row.names' are not allowed

The enclosed code is an attempt to extract data from an api, but when I try to paginate and bind the rows, the row index duplicates posing the below error:
**Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed**
**In addition: Warning message: non-unique values when setting 'row.names':**
The code is:
df = tibble()
for (i in seq(from = 0, to = 620, by = 24)) {
linky = paste0("https://www.rightmove.co.uk/api/_search?locationIdentifier=REGION%5E94405&numberOfPropertiesPerPage=24&radius=0.0&sortType=2&index=",i,"&includeSSTC=false&viewType=LIST&channel=BUY&areaSizeUnit=sqft&currencyCode=GBP&isFetching=false")
pge <- jsonlite::fromJSON(linky)
props <- pge$properties
print(linky)
Sys.sleep(runif(1, 2.34, 6.19))
df = rbind(df, tibble(props))
print(paste("Page:", i))
}
HA_area_ <- df
As the error indicates due to different column names the dataframes can't be bound together. Below are the column names for first two dataframes.
[[1]]
[1] "id" "bedrooms" "bathrooms" "numberOfImages"
[5] "numberOfFloorplans" "numberOfVirtualTours" "summary" "displayAddress"
[9] "countryCode" "location" "propertyImages" "propertySubType"
[13] "listingUpdate" "premiumListing" "featuredProperty" "price"
[17] "customer" "distance" "transactionType" "productLabel"
[21] "commercial" "development" "residential" "students"
[25] "auction" "feesApply" "feesApplyText" "displaySize"
[29] "showOnMap" "propertyUrl" "contactUrl" "staticMapUrl"
[33] "channel" "firstVisibleDate" "keywords" "keywordMatchType"
[37] "saved" "hidden" "onlineViewingsAvailable" "lozengeModel"
[41] "hasBrandPlus" "propertyTypeFullDescription" "addedOrReduced" "formattedDistance"
[45] "heading" "enhancedListing" "displayStatus" "formattedBranchName"
[49] "isRecent"
[[2]]
[1] "id" "bedrooms" "bathrooms" "numberOfImages"
[5] "numberOfFloorplans" "numberOfVirtualTours" "summary" "displayAddress"
[9] "countryCode" "location" "propertyImages" "propertySubType"
[13] "listingUpdate" "premiumListing" "featuredProperty" "price"
[17] "customer" "distance" "transactionType" "productLabel"
[21] "commercial" "development" "residential" "students"
[25] "auction" "feesApply" "feesApplyText" "displaySize"
[29] "showOnMap" "propertyUrl" "contactUrl" "staticMapUrl"
[33] "channel" "firstVisibleDate" "keywords" "keywordMatchType"
[37] "saved" "hidden" "onlineViewingsAvailable" "lozengeModel"
[41] "hasBrandPlus" "displayStatus" "formattedBranchName" "addedOrReduced"
[45] "isRecent" "formattedDistance" "propertyTypeFullDescription" "enhancedListing"
[49] "heading"
You can see different names of column at certain positions.
Instead of rbind we can use lapply and store results in a list.
Wee shall create function f1 to get the dataframe required and then use possibly to skip any errors.
f1 = function(x){
linky = paste0("https://www.rightmove.co.uk/api/_search?locationIdentifier=REGION%5E94405&numberOfPropertiesPerPage=24&radius=0.0&sortType=2&index=",x,"&includeSSTC=false&viewType=LIST&channel=BUY&areaSizeUnit=sqft&currencyCode=GBP&isFetching=false")
pge <- jsonlite::fromJSON(linky)
props <- pge$properties
print(linky)
Sys.sleep(runif(1, 2.34, 6.19))
print(paste("Page:", x))
return(props)
}
x = seq(from = 0, to = 620, by = 24)
df = lapply(x, possibly(f1, NA))
library(data.table)
dt <- lapply(seq(from = 0, to = 620, by = 24), function(i) {
uri <- paste0("https://www.rightmove.co.uk/api/_search?locationIdentifier=REGION%5E94405&numberOfPropertiesPerPage=24&radius=0.0&sortType=2&index=", i,"&includeSSTC=false&viewType=LIST&channel=BUY&areaSizeUnit=sqft&currencyCode=GBP&isFetching=false")
as.data.table(jsonlite::fromJSON(uri)$properties)
})
dt <- rbindlist(dt, fill = T)
Strangely I changed from rbind() to bind_rows() and for some reason it worked. Although there was the added complication of unnesting some columns. It would not allow me to save the data as a CSV without unnesting the nested columns. Thank you for the answers too

str_split on first and second occurence of delimter at different locations in character vector

I have a character list that has weather variables followed by "mean_#" where # is a number between 5 and 10. I want to subset the list to only have the weather variable names themselves. The mean weather variables look like this:
> mean_vars
[1] "dew_mean_10" "dew_mean_5" "dew_mean_6" "dew_mean_7"
[5] "dew_mean_8" "dew_mean_9" "humid_mean_10" "humid_mean_5"
[9] "humid_mean_6" "humid_mean_7" "humid_mean_8" "humid_mean_9"
[13] "rain_mean_10" "rain_mean_5" "rain_mean_6" "rain_mean_7"
[17] "rain_mean_8" "rain_mean_9" "soil_moist_mean_10" "soil_moist_mean_5"
[21] "soil_moist_mean_6" "soil_moist_mean_7" "soil_moist_mean_8" "soil_moist_mean_9"
[25] "soil_temp_mean_10" "soil_temp_mean_5" "soil_temp_mean_6" "soil_temp_mean_7"
[29] "soil_temp_mean_8" "soil_temp_mean_9" "solar_mean_10" "solar_mean_5"
[33] "solar_mean_6" "solar_mean_7" "solar_mean_8" "solar_mean_9"
[37] "temp_mean_10" "temp_mean_5" "temp_mean_6" "temp_mean_7"
[41] "temp_mean_8" "temp_mean_9" "wind_dir_mean_10" "wind_dir_mean_5"
[45] "wind_dir_mean_6" "wind_dir_mean_7" "wind_dir_mean_8" "wind_dir_mean_9"
[49] "wind_gust_mean_10" "wind_gust_mean_5" "wind_gust_mean_6" "wind_gust_mean_7"
[53] "wind_gust_mean_8" "wind_gust_mean_9" "wind_spd_mean_10" "wind_spd_mean_5"
[57] "wind_spd_mean_6" "wind_spd_mean_7" "wind_spd_mean_8" "wind_spd_mean_9"
And this is all I want at the end:
> var_names
"dew" "humid" "rain" "solar" "temp" "soil_moist" "soil_temp" "wind_dir" "wind_gust" "wind_spd"
Now I figured out how to do it but I fill my method is extraneous due to a lack of ability with regular expressions. I also will have to repeat my process 20 times substituting "mean" with other words.
var_names <- unique(str_split_fixed(mean_vars, "_", n = 3)[c(1:18,31:42),1])
var_names <- unlist(c(var_names, unique(unite(as_tibble(str_split_fixed(mean_vars, "_", n = 3)[c(19:30,43:60), 1:2])))))
I've been trying to stay within the realm of the tidyverse packages as much as possible so I was using stringr::str_split_fixed.
If you have a solution using this same function that would be ideal as I could continue the same programming style, but I'm open to all suggestions.
Thanks.
Use sub and unique. This is shorter and has no package dependencies (or use unique(str_replace(mean_vars, "_mean.*", "")) with stringr):
unique(sub("_mean.*", "", mean_vars))
giving:
[1] "dew" "humid" "rain" "soil_moist" "soil_temp"
[6] "solar" "temp" "wind_dir" "wind_gust" "wind_spd"
If for some reason you really want to use str_split then:
rmMean <- function(x) paste(head(x, -2), collapse = "_")
unique(sapply(str_split(mean_vars, "_"), rmMean))
Note
mean_vars <- c("dew_mean_10", "dew_mean_5", "dew_mean_6", "dew_mean_7", "dew_mean_8",
"dew_mean_9", "humid_mean_10", "humid_mean_5", "humid_mean_6",
"humid_mean_7", "humid_mean_8", "humid_mean_9", "rain_mean_10",
"rain_mean_5", "rain_mean_6", "rain_mean_7", "rain_mean_8", "rain_mean_9",
"soil_moist_mean_10", "soil_moist_mean_5", "soil_moist_mean_6",
"soil_moist_mean_7", "soil_moist_mean_8", "soil_moist_mean_9",
"soil_temp_mean_10", "soil_temp_mean_5", "soil_temp_mean_6",
"soil_temp_mean_7", "soil_temp_mean_8", "soil_temp_mean_9", "solar_mean_10",
"solar_mean_5", "solar_mean_6", "solar_mean_7", "solar_mean_8",
"solar_mean_9", "temp_mean_10", "temp_mean_5", "temp_mean_6",
"temp_mean_7", "temp_mean_8", "temp_mean_9", "wind_dir_mean_10",
"wind_dir_mean_5", "wind_dir_mean_6", "wind_dir_mean_7", "wind_dir_mean_8",
"wind_dir_mean_9", "wind_gust_mean_10", "wind_gust_mean_5", "wind_gust_mean_6",
"wind_gust_mean_7", "wind_gust_mean_8", "wind_gust_mean_9", "wind_spd_mean_10",
"wind_spd_mean_5", "wind_spd_mean_6", "wind_spd_mean_7", "wind_spd_mean_8",
"wind_spd_mean_9")

Error with R dplyr left_join

So I've been trying to use left_join to get the columns of a new dataset onto my main dataset (called employee)
I've double checked the vector names and the cleaning that I've don't and nothing seems to work. Here is my code. Would appreciate any help.
job_codes <- read_csv("Quest_UMMS_JobCodes.csv")
job_codes <- job_codes %>%
clean_names() %>%
select(job_code, pos_desc = pos_des_desc)
job_codes$is_nurse <- str_detect(tolower(job_codes$pos_desc), "nurse")
employee <- employee %>%
left_join(job_codes, by = "job_code")
The error I keep getting:Error in eval(substitute(expr), envir, enclos) :
'job_code' column not found in rhs, cannot join
here are the results of
names(job_code)
> names(job_codes)
[1] "job_code" "pos_desc" "is_nurse"
names(employee)
> names(employee)
[1] "REC_NUM" "ZIP" "STATE"
[4] "SEX" "EEO_CLASS" "BIRTH_YEAR"
[7] "EMP_STATUS" "PROCESS_LEVEL" "DEPARTMENT"
[10] "JOB_CODE" "UNION_CODE" "SUPERVISOR"
[13] "DATE_HIRED" "R_SHIFT" "SALARY_CLASS"
[16] "EXEMPT_EMP" "PAY_RATE" "ADJ_HIRE_DATE"
[19] "ANNIVERS_DATE" "TERM_DATE" "NBR_FTE"
[22] "PENSION_PLAN" "PAY_GRADE" "SCHEDULE"
[25] "OT_PLAN_CODE" "DECEASED" "POSITION"
[28] "WORK_SCHED" "SUPERVISOR_IND" "FTE_TOTAL"
[31] "PRO_RATE_TOTAL" "PRO_RATE_A_SAL" "NEW_HIRE_DATE"
[34] "COUNTY" "FST_DAY_WORKED" "date_hired"
[37] "date_hired_adj" "term_date" "employment_duration"
[40] "current" "age" "emp_duration_years"
[43] "DESCRIPTION.x" "PAY_STATUS.x" "DESCRIPTION.y"
[46] "PAY_STATUS.y"
Now, after the OP has added the column names of both tables in the Q, it is evident that the columns to join on are written in different ways (upper vs lower case).
If the column names are different, help("left_join") suggests:
To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.
So, in this case it should read
employee <- employee %>% left_join(job_codes, by = c("JOB_CODE" = "job_code"))

How to write a list to file as one row and without quotes in R

I am trying to write a list to file as one row and without quotes in R.
Content of the list is:
[1] "X4775495036_J" "X4775495036_F" "X5147722015_F" "X5067554009_F"
[5] "X5067554063_B" "X4954590047_A" "X5067554063_G" "X5067554009_L"
[9] "X5147722015_D" "X5511045011_D" "X5067554063_A" "X4805447025_F"
[13] "X5455362015_K" "X4805447025_L" "X5147722015_B" "X5067554009_G"
[17] "X5147722014_K" "X5067554063_H" "X5147722009_G" "X5067554008_H"
[21] "X5067554054_H" "X4805447016_K" "X5147722014_E" "X4954590051_K"
[25] "X5067554008_E" "X5147722015_H" "X5147722009_H" "X5067554063_D"
[29] "X5147722015_A" "X5511045022_E" "X5067554054_I" "X5067554063_J"
[33] "X5067554007_F" "X4775495036_E" "X4775495036_H" "X4805447025_H"
[37] "X5067554009_I" "X4805447025_K" "X4954590051_C" "X4805447025_E"
[41] "X5067554063_E" "X5147722009_J" "X5067554054_C" "X5067554054_G"
[45] "X4805447016_I" "X5455362015_B" "X5067554009_H" "X5147722014_A"
[49] "X4775495036_I" "X5067554063_L" "X5455362015_J" "X4954590047_J"
[53] "X5067554009_A" "X4954590051_D" "X5455362015_I" "X5511045011_E"
[57] "X5147722014_F"
I want something like this (all elements in one row):
X4775495036_J X4775495036_F X5147722015_F X5067554009_F ...
I have tried with write.table, write but with no result.
Note that you don't have a list, you have a character vector.
cat(your_vector, "\n", file="your_file.txt")
The "\n" is an optional newline at the end.
You could use the ncolumns argument of write:
n <- LETTERS[1:10] # create example values
write(n, "letters.txt", ncolumns=length(n))
Or you could concatenate your names before:
nc <- paste0(n, collapse=" ")
write(nc, "letters.txt")

R: Using for loop on data frame

I have a data frame, deflator.
I want to get a new data frame inflation which can be calculated by:
deflator[i] - deflator[i-4]
----------------------------- * 100
deflator [i - 4]
The data frame deflator has 71 numbers:
> deflator
[1] 0.9628929 0.9596746 0.9747274 0.9832532 0.9851884
[6] 0.9797770 0.9913502 1.0100561 1.0176906 1.0092516
[11] 1.0185932 1.0241043 1.0197975 1.0174097 1.0297328
[16] 1.0297071 1.0313232 1.0244618 1.0347808 1.0480411
[21] 1.0322142 1.0351968 1.0403264 1.0447121 1.0504402
[26] 1.0487097 1.0664664 1.0935239 1.0965951 1.1141851
[31] 1.1033155 1.1234482 1.1333870 1.1188136 1.1336276
[36] 1.1096461 1.1226584 1.1287245 1.1529588 1.1582911
[41] 1.1691221 1.1782178 1.1946234 1.1963453 1.1939922
[46] 1.2118189 1.2227960 1.2140535 1.2228828 1.2314258
[51] 1.2570788 1.2572214 1.2607763 1.2744415 1.2982076
[56] 1.3318808 1.3394186 1.3525902 1.3352815 1.3492751
[61] 1.3593859 1.3368135 1.3642940 1.3538567 1.3658135
[66] 1.3710932 1.3888638 1.4262185 1.4309707 1.4328823
[71] 1.4497201
This is a very tricky question for me.
I tried to do this using a for loop:
> d <- data.frame(deflator)
> for (i in 1:71) {d <-rbind(d,c(delfaotr ))}
I think I might be doing it wrong.
Why use data frames? This is a straightforward vector operation.
inflation = 100 * (deflator[1:67] - deflator[-(1:4)])/deflator[-(1:4)]
I agree with #Fhnuzoag that your example suggests calculations on a numeric vector, not a data frame. Here's an additional way to do your calculations taking advantage of the lag argument in the diff function (with indexes that match those in your question):
lagBy <- 4 # The number of indexes by which to lag
laggedDiff <- diff(deflator, lag = lagBy) # The numerator above
theDenom <- deflator[seq_len(length(deflator) - lagBy)] # The denominator above
inflation <- laggedDiff/theDenom
The first few results are:
head(inflation)
# [1] 0.02315470 0.02094710 0.01705379 0.02725941 0.03299085 0.03008297

Resources