using subset with a string in R - r

I have the following data frame in R (made up stuff to learn the program):
country population civilised
1 Town 13 5
2 city 69 9
3 Home 24 2
4 Stuff 99 9
and I am trying to access specific rows with the subset function, like
test <- subset(t, country==Town. But all ever get is object not found.

We need to quote the string.
test <- subset(t, country=='Town')
test
# country population civilised
#1 Town 13 5
NOTE; t is a function name (Check ?t). It is better to name objects that are not function names.

Related

Return closest matching value in seperate dataframe in R

DATAFRAMES
DataFrame 1
GENDER HANDICAP PRIMARYREC
male 10 Model1
female 12 Model2
male 18 Model3
DataFrame 2
MODEL1Lofts MODEL2Lofts MODEL3Lofts
9 10.5 9
10.5 10.5
12
My current filter function:
driverfunc = driver_model[driver_model$Gender==gender &
driver_model$HandicapMin<=handicap & driver_model$HandicapMax>=handicap, ]
return(driverfunc) }
**EXAMPLE INPUT** GetDriver(gender = "Male", handicap = 5)
Note: It automatically finds the recommended model
GOAL
What I'm looking to do is add a nested function within my filter (if possible) that includes the user inputting their "current loft" and then it lookups the possible values from DataFrame2 and adds that output to driverfunc.
This is literally my 4th day doing R and I've searched around a lot but getting stuck and any help would be tremendously appreciated!

Is there a way I can use r code in order to calculate the average price for specific days? (AVERAGEIF function)

Firstly: I have seen other posts about AVERAGEIF translations from excel into R but I didn't see one that worked on my specific case and I couldn't get around to making one work.
I have a dataset which encompasses the daily pricings of a bunch of listings.
It looks like this
listing_id date price
1 1000 1/2/2015 $100
2 1200 2/4/2016 $150
Sample of the dataset (and desired outcome) # https://send.firefox.com/download/228f31e39d18738d/#rlMmm6UeGxgbkzsSD5OsQw
The dataset I would like to have has only the date and the average prices of all listings on that date. The goal is to get a (different) dataframe which would look something like this so I can work with it:
Date Average Price
1 4/5/2015 204.5438
2 4/6/2015 182.6439
3 4/7/2015 176.553
4 4/8/2015 182.0448
5 4/9/2015 183.3617
6 4/10/2015 205.0997
7 4/11/2015 197.0118
8 4/12/2015 172.2943
I created this in Excel using the Average.if function (and copy pasting by value) from the sample provided above.
I tried to format the data in Excel first where I could use the AVERAGE.IF function saying take the average if it is this specific date. The problem with this is that the dataset consists of 30million rows and excel only allows for 1 million so it didn't work.
What I have done so far: I created a data frame in R (where i want the average prices to go into) using
Avg = data.frame("Date" =1:2, "Average Price"=1:2)
Avg[nrow(Avg) + 2036,] = list("v1","v2")
Avg$Date = seq(from = as.Date("2015-04-05"), to = as.Date("2020-11-01"), by = 'day')
I tried to create an averageif-like function by this article and another but could not get it to work.
I hope this is enough information to go on otherwise I would be more than happy to provide more.
If your question is how to replicate the AVERAGEIF function, you can use logical indexing :
R code :
> df
Dates Prices
1 1 100
2 2 120
3 3 150
4 1 320
5 2 250
6 3 210
7 1 102
8 2 180
9 3 150
idx <- df$Dates == 1 # Positions where condition is true
mean(df$Prices[idx]) # Prints same output as Excel

r exporting summarise results to html or word

After going to find how to summarize a DataFrame I did it.
I can see the results in my Console which is what is shown below after the first two lines of code
byTue <- group_by(luckyloss.3,L_byUXR)
( sumMon <- summarize(byTue,count=n()) )
Below is what I see on the Console It feels good because it shows I got what I was looking for
The results below come from a column of 234 rows which has many values repeated.
So this I did a summarise of the 234 rows where in the case of ANA comes 8 times, ARI 14 and so on
# A tibble: 30 × 2
L_byUXR count
<chr> <int>
1 ANA 8
2 ARI 14
3 ATL 16
4 BAL 4
5 BOS 6
6 CHA 12
7 CHN 8
8 CIN 10
9 CLE 4
10 COL 8
# ... with 20 more rows
What I want is to have this output of 30 rows by two columns in a way I can take it to a word document or could even be HTML
I tried to do a write(byTUE.csv) but what I received was the list of 234 rows of the original data frame. It's like the summarise disappeared, I have checked other ways like markdown or create new files tried to see if the knitr package could help but nothing.
library(stringi) # ONLY NECESSARY FOR DATA SIMULATION
library(officer) # <<= install this
library(tidyverse)
Simulate some data:
set.seed(2017-11-18)
data_frame(
L_byUXR = stri_rand_strings(30, 3, pattern="[A-Z]"),
count = sample(20, 30, replace=TRUE)
) -> sumMon
Start a new Word doc and add the table, saving to a new doc:
read_docx() %>% # a new, empty document
body_add_table(sumMon, style = "table_template") %>%
print(target="new.docx")
I kept looking for an answer and found the "stargazer" package for R, which allowed me to get the result of the dataframe as a text which can be further edited
When you write the R instruction, in "out = ", name the file you want as output and stargazer will place it there for you in your session's folder
The instruction I used was:
stargazer(count, type = "text", summary = FALSE, title="Any Title", digits=1, out="table1.txt")
Even though I found the answer I could not have done it without the help of hrbrmstr who showed me there was a package do do it, I just needed to work more on it

How do I replace values in an R dataframe column with a corresponding value?

Ok, so I have a dataframe that I downloaded from Pew Research Center. One of the columns (called 'cregion') contains a series of numbers from 1-56, with each number corresponding to a geographic location in the U.S. Most of these locations are states, and the additional 6 are at the sub-state level. So, for example, the number '1' corresponds to 'Alabama', and '11' corresponds to the 'District Of Columbia'.
What I'd like to do is replace each of those numbers in the 'cregion' column with the ACTUAL name of the region it corresponds to. Unfortunately, there is no column in this data frame that I can use to swap the values, as the key for which number corresponds to which region exists completely separately (word document). I'm new to R and while I've been searching for a few hours for the best way to go about this, I can't seem to find a method that would work (or I just don't understand the explanation). Can anybody suggest a method to me?
If you have a vector of the state names as strings called statevec whose ith element corresponds to cregion i, and your data frame is named dat, just do
dat <- data.frame(cregion = sample(1:50), stuff = runif(50))
head(dat)
# cregion stuff
#1 25 0.665843896
#2 11 0.144631131
#3 13 0.691616240
#4 28 0.507454243
#5 9 0.416535139
#6 30 0.004196311
statevec <- state.name
dat$cregion <- statevec[dat$cregion]
head(dat)
# cregion stuff
#1 Missouri 0.665843896
#2 Hawaii 0.144631131
#3 Illinois 0.691616240
#4 Nevada 0.507454243
#5 Florida 0.416535139
#6 New Jersey 0.004196311

How to read a one lined CSV in R?

I have been working on a dummy dataset recently and i found out that the data provided to me was all in single line. A similiar example for the same is depicted as follows:
Name,Age,Gender,Occupation A,10,M,Student B,11,M,Student C,11,F,Student
i want to import the data and obtain an output as follows:
Name Age Gender Occupation
A 10 M Student
B 11 M Student
C 12 F Student
a case may arise that a value might be missing. a logic is required to import such data. Can anyone help me out to build a logic behind the import of such data sets.
i tried the normal import but it really didn't helped. just imported the file by read.csv() function and it didn't gave me an expected result.
EDIT: what if the data is like:
Name,Age,Gender,Occupation ABC XYZ,10,M,Student B,11,M,Student C,11,F,Student
and i want an output like:
Name Age Gender Occupation
ABC XYZ 10 M Student
B 11 M Student
C 12 F Student
You could read your file in with readLines, turn spaces into line breaks, and then read it with read.csv:
# txt <- readLines("my_data.txt") # with a real data file
txt <- readLines(textConnection("Name,Age,Gender,Occupation A,10,M,Student B,11,M,Student C,11,F,Student"))
read.csv(text=gsub(" ","\n",txt))
output
Name Age Gender Occupation
1 A 10 M Student
2 B 11 M Student
3 C 11 F Student
If you have millions of records, you will probably want to speed up this process, so I suggest using data.table's fread instead of read.csv, which can also take a shell command to pre-process the file before reading in R, and sed will be a lot faster then doing the string manipulation in R.
Eg if you have this CSV stored at /tmp/x.csv, you can try something like:
> data.table::fread("sed 's/ /\\n/g' /tmp/x.csv")
Name Age Gender Occupation
1: A 10 M Student
2: B 11 M Student
3: C 11 F Student

Resources