I have the following two data frame.
First, I have the occupations data frame. Sample data frame below
state <- c("00","00","32","32")
codetype <- c("19","19","19","19")
code <- c ("123456","123457","123456","123457")
codetitle <- c("doctors","lawyers","doctors","lawyers")
first <- data.frame(state,codetype,code,codetitle)
Second, data frame is this one
state <- c("01","01","04","04","05","05")
codetype <- c("19","19","19","19","19","19")
code <- c("123456","123457","123456","123457","123456","123457")
pct10 <- c(12.30,12.65,14.50,14.23,15.65,25.22)
second <- data.frame(state,codetype,code,pct10)
The desired task is this..need to create new rows in the first data frame. The desired result would get the unique state values from the second data frame and create identical rows in the first.. just with new state values in the beginning. I know that I use expand_grid. My only real perplexity is how
Desired Result
state codetype code codetitle
32 19 123456 Doctors
32 19 123457 Lawyers
00 19 123456 Doctors
00 19 123457 Lawyers
01 19 123456 Doctors
01 19 123457 Lawyers
04 19 123456 Doctors
04 19 123457 Lawyers
05 19 123456 Doctors
05 19 123457 Lawyers
Perhaps this:
library(dplyr)
second %>%
select(-pct10) %>%
distinct() %>%
left_join(distinct(first, code, codetitle), by = "code") %>%
bind_rows(first)
# state codetype code codetitle
# 1 01 19 123456 doctors
# 2 01 19 123457 lawyers
# 3 04 19 123456 doctors
# 4 04 19 123457 lawyers
# 5 05 19 123456 doctors
# 6 05 19 123457 lawyers
# 7 00 19 123456 doctors
# 8 00 19 123457 lawyers
# 9 32 19 123456 doctors
# 10 32 19 123457 lawyers
Alternatively, you may use the plyr::rbind.fill along with left join
third <- plyr::rbind.fill(first,second) %>% select(-codetitle,-pct10) %>%
left_join(first %>% select(code, codetitle) %>% unique(), by=c('code'))
Created on 2023-02-06 with reprex v2.0.2
state codetype code codetitle
1 00 19 123456 doctors
2 00 19 123457 lawyers
3 32 19 123456 doctors
4 32 19 123457 lawyers
5 01 19 123456 doctors
6 01 19 123457 lawyers
7 04 19 123456 doctors
8 04 19 123457 lawyers
9 05 19 123456 doctors
10 05 19 123457 lawyers
You can use the expand.grid() function to create the desired result:
state_codes <- unique(second$state)
expanded_grid <- expand.grid(state = state_codes, codetype = first$codetype[1],
code = first$code, codetitle = first$codetitle)
result <- rbind(first, expanded_grid)
result
state codetype code codetitle
1 00 19 123456 doctors
2 00 19 123457 lawyers
3 32 19 123456 doctors
4 32 19 123457 lawyers
5 01 19 123456 doctors
6 04 19 123456 doctors
7 05 19 123456 doctors
8 01 19 123457 doctors
9 04 19 123457 doctors
10 05 19 123457 doctors
11 01 19 123456 doctors
12 04 19 123456 doctors
13 05 19 123456 doctors
14 01 19 123457 doctors
15 04 19 123457 doctors
Related
I am using the httr package to retrieve data from our reporting system using its REST API. I am specifying the content to be a xlsx. The response contains the raw (binary?) file.
Here's what my request looks like:
request = GET("http://server/.../documents/123456",
add_headers(.headers = c('Accept'= 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'authtoken' = paste0('', logonToken,''))) ,
content_type("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"),
encode = 'raw'
)
content(request)
[1] 50 4b 03 04 0a 00 08 08 08 00 19 88 79 54 00 00 00 00 00 00 00
[44] 2f 73 68 65 65 74 31 2e 78 6d 6c a5 57 5b 6f 9b 30 14 7e 9f b4
[87] 00 23 43 2f db af 9f c1 94 d8 c6 58 93 92 87 54 f5 77 f1 39 fe
... etc
The result can be saved as a .xlsx and opened in Excel. However, I would like to read this data directly into a data frame. Is there a way to mimic the result into a readable input within the same script?
I am able to pass an extra parameter write_disk to save the response directly as a file. Specifying a path is required. I tried testing with tempfile() to write and read the response directly back in, but wasn't able to get it to work.
Is there any way to read a raw file from an R environment object?
Yes, here's a fully reproducible example url:
url <- paste0('https://file-examples.com/storage/fe91183158623ded19eb446/',
'2017/02/file_example_XLSX_100.xlsx')
Now download our file an get its raw contents:
raw_xlsx <- httr::GET(url)$content
Let's create a temporary file to store it:
tmp <- tempfile(fileext = '.xlsx')
Now write the raw data to the file:
writeBin(raw_xlsx, tmp)
Our excel file is now saved in the temporary file, which we can read however you would normally read them into R:
my_excel <- readxl::read_excel(tmp)
And the result is:
my_excel
#> # A tibble: 100 x 8
#> `0` `First Name` `Last Name` Gender Country Age Date Id
#> <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 1 Dulce Abril Female United States 32 15/10/2017 1562
#> 2 2 Mara Hashimoto Female Great Britain 25 16/08/2016 1582
#> 3 3 Philip Gent Male France 36 21/05/2015 2587
#> 4 4 Kathleen Hanner Female United States 25 15/10/2017 3549
#> 5 5 Nereida Magwood Female United States 58 16/08/2016 2468
#> 6 6 Gaston Brumm Male United States 24 21/05/2015 2554
#> 7 7 Etta Hurn Female Great Britain 56 15/10/2017 3598
#> 8 8 Earlean Melgar Female United States 27 16/08/2016 2456
#> 9 9 Vincenza Weiland Female United States 40 21/05/2015 6548
#> 10 10 Fallon Winward Female Great Britain 28 16/08/2016 5486
#> # ... with 90 more rows
My dataset has several variables and I want to build a subset as well as create new variables based on those conditions
dat1
S1 S2 H1 H2 Month1 Year1 Month2 Year2
16 17 81 70 09 2017 07 2017
17 16 80 70 08 2017 08 2016
14 16 81 81 09 2016 05 2016
18 15 70 81 07 2016 09 2017
17 16 80 80 08 2016 05 2016
18 18 81 70 05 2017 04 2016
I want to subset such that if S1=16,17,18 and H1=81,80 then I create a new variable Hist=H1 , date=paste(Month1,Year1) Sip = S1
Same goes for set of S2, H2 .
My output should be: [ The first 4 rows comes for sets of S1,H1, Month1,Year2 and last 2 rows comes from S2,H2,Month2,Year2
Hist Sip Date
81 16 09-2017
80 17 08-2017
80 17 08-2016
81 18 05-2017
81 16 05-2016
80 16 05-2016
My Code :
datnew <- dat1 %>%
mutate(Date=ifelse((S1==16|S1==17|S1=18)&(H1==80|H1==81),paste(01,Month1,Year1,sep="-"),
ifelse((S2==16|S2==17|S2==18)&(H2==80|H2==81),paste(Month2,Year2,sep="-"),"NA")),
hist=ifelse((S1==16|S1==17|S1=18)&(H1==80|H1==81),H1,
ifelse((S2==16|S2==17|S2==18)&(H2==80|H2==81),H2,"NA")),
sip=ifelse((S1==16|S1==17|S1=18)&(H1==80|H1==81),S1,
ifelse((S2==16|S2==17|S2==18)&(H2==80|H2==81),S2,"NA")))
In the original data I have 10 sets of such columns ie S1-S10, H1-H10, Month1_-Month10... And for each variable I have lot more conditions of numbers.
In this method it is going on and on. Is there any better way to do this?
Thanks in advance
Here is a tidyverse solution. Separate into two data frames and bind the rows together.
library(tidyverse)
bind_rows(
dat1 %>% select(patientId, ends_with("1")) %>% rename_all(str_remove, "1"),
dat1 %>% select(patientId, ends_with("2")) %>% rename_all(str_remove, "2")
) %>%
transmute(
patientId,
Hist = H,
Sip = S,
date = paste0(Month, "-", Year)
) %>%
filter(
Sip %in% 16:18,
Hist %in% 80:81
)
#> # A tibble: 6 x 4
#> patientId Hist Sip date
#> <int> <dbl> <dbl> <chr>
#> 1 1 81 16 09-2017
#> 2 2 80 17 08-2017
#> 3 5 80 17 08-2016
#> 4 6 81 18 05-2017
#> 5 3 81 16 05-2016
#> 6 5 80 16 05-2016
I am using R for my time series analysis and I have the following csv file that I have loaded into R:
CSV file:
I have used the zoo package to convert my data frame into a ts object:
library(zoo)
df1_ts <- as.ts(read.zoo(df1, FUN = as.yearmon))
Running:
class(df1_ts)
# [1] "mts" "ts" "matrix"`
However when I run head(df1_ts), I get the following results:
head(df1_ts)
# Time Series:
# Start = 2014
# End = 2018
# Frequency = 1
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 2014 4621 3569 4249 4593 3320 1970 2483 3474 4302 5670 5788 5570
# 2015 5747 4346 5176 5362 5360 3707 3883 5138 5568 6034 5989 5648
# 2016 5821 5164 5781 5346 5339 4743 5417 5514 5880 5899 6014 5641
# 2017 5980 5341 5890 5596 5753 5470 5589 5545 5749 5938 5864 5567
# 2018 5655 5392 5766 5268 5680 5337 5197 5714 5802 5935 5955 5637
Why am I getting Frequency=1? I am expecting the Frequency to be 12 as these are monthly data?
How can I fix this?
I have tried the following, without success:
df1_ts <- as.ts(read.zoo(df1, FUN = as.yearmon), freq=12)
The code shown in the question is creating a multivariate time series consisting of 12 series (one for each month column) whose time index is the year; however, what is wanted is a single univariate monthly series.
Using df1 shown reproducibly in the Note at the end, first convert the data.frame df1 to a matrix using transpose and then unravel this transposed matrix column by column into a single vector using c. Now we can define the ts series directly:
tt <- ts(c(t(df1[-1])), start = df1$Year[1], freq = 12)
giving:
frequency(tt)
## [1] 12
tt
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2014 1 2 3 4 5 6 7 8 9 10 11 12
## 2015 13 14 15 16 17 18 19 20 21 22 23 24
## 2016 25 26 27 28 29 30 31 32 33 34 35 36
## 2017 37 38 39 40 41 42 43 44 45 46 47 48
## 2018 49 50 51 52 53 54 55 56 57 58 59 60
Note
Please do not use images to show your input data as it means that anyone wanting to answer with it would need to retype it. Provide it reproducibly as R code. I have done this for you this time, changing the data to avoid typing all those numbers.
df1 <- as.data.frame(cbind(2014:2018, matrix(1:60, ncol = 12, byrow = TRUE)))
names(df1) <- c("Year", month.abb)
How can I find out if the final number in the list is even or odd and then put that row into a dataframe.
I have multiple csv files that end in either odd or even 'lap numbers' (i.e. #17 and 26 below)
Total time 10:00.61
Lap times
01 00:07.46
02 00:05.64
03 00:01.07
04 00:01.04
05 00:04.71
06 00:06.43
07 00:12.52
08 00:07.34
09 00:05.46
10 00:05.81
11 00:05.52
12 00:06.51
13 00:10.75
14 00:00.83
15 00:03.64
16 00:02.75
17 00:01.20
and this...
Total time 10:00.61
Lap times
01 00:07.46
02 00:05.64
03 00:01.07
04 00:01.04
05 00:04.71
06 00:06.43
07 00:12.52
08 00:07.34
09 00:05.46
10 00:05.81
11 00:05.52
12 00:06.51
13 00:10.75
14 00:00.83
15 00:03.64
16 00:02.75
17 00:01.20
18 00:06.17
19 00:04.40
20 00:00.75
21 00:00.84
22 00:01.29
23 00:02.31
24 00:03.04
25 00:02.85
26 00:05.86
I use this loop to go through the csv files
output = lapply(files, function(x) {
dat = read.csv(x, header= TRUE)
dat = dat[-c(1),]
dat = as.data.frame(dat)
dat = separate(data = dat, col = dat, into = c("lap", "duration"), sep =
"\\ ")
})
the output then looks like this
[[1]]
lap duration
1 01 00:07.46
2 02 00:05.64
3 03 00:01.07
4 04 00:01.04
5 05 00:04.71
6 06 00:06.43
7 07 00:12.52
8 08 00:07.34
9 09 00:05.46
10 10 00:05.81
11 11 00:05.52
12 12 00:06.51
13 13 00:10.75
14 14 00:00.83
15 15 00:03.64
16 16 00:02.75
17 17 00:01.20
[[2]]
lap duration
1 01 00:07.46
2 02 00:05.64
3 03 00:01.07
4 04 00:01.04
5 05 00:04.71
6 06 00:06.43
7 07 00:12.52
8 08 00:07.34
9 09 00:05.46
10 10 00:05.81
11 11 00:05.52
12 12 00:06.51
13 13 00:10.75
14 14 00:00.83
15 15 00:03.64
16 16 00:02.75
17 17 00:01.20
18 18 00:06.17
19 19 00:04.40
20 20 00:00.75
21 21 00:00.84
22 22 00:01.29
23 23 00:02.31
24 24 00:03.04
25 25 00:02.85
26 26 00:05.86
How can I see if the last row is even or odd (i.e. row 17 and 26 respectively)? Then I possibly want to take those last rows and put them into a separate dataframe.
First of all, you can make your read procedure much, much simpler.
output <- lapply(files, read.csv, skip = 1)
Now, as for an odd/even number of rows.
n <- sapply(output, nrow)
ifelse(n %% 2 == 0, "even", "odd")
Note that instead of the character values "even" and "odd" you can have the ifelse return anything of your choice.
Haven't tested, but this can give you some hints. Just made few modifications to your code:
output = lapply(files, function(x) {
dat = read.csv(x, header= TRUE)
last_row = nrow(dat)
# to see if even or odd number
ifelse(last_row %% 2==0, 'this is even','this is odd')
# insert last row into a new data frame
last_row_values = unlist(dat[last_row,])
dat = as.data.frame(last_row_values)
colnames(dat) <- c('lap','duration')
return (dat)
})
I think since output will be a list, later you might need to do:
df = do.call('rbind', output)
I want to produce a graphic that looks something like this (with percentage and legend) by R:
My original data is:
AIRBUS BOEING EMBRAER
2002 18 21 30
2003 20 23 31
2004 23 26 29
2005 22 25 26
2006 22 25 25
2007 22 27 17
2008 21 21 16
2009 17 19 22
2010 14 22 24
2011 17 27 22
2012 16 22 19
2013 11 24 19
There are similar questions on SO already, but I seem to lack the sufficient amount of intelligence (or understanding of R) to extrapolate from them to a solution to my particular problem.
First, gather or melt your data into long format. Then it's easy.
library(tidyverse)
df <- read.table(
text = "
YEAR AIRBUS BOEING EMBRAER
2002 18 21 30
2003 20 23 31
2004 23 26 29
2005 22 25 26
2006 22 25 25
2007 22 27 17
2008 21 21 16
2009 17 19 22
2010 14 22 24
2011 17 27 22
2012 16 22 19
2013 11 24 19",
header = TRUE
)
df_long <- df %>%
gather(company, percentage, AIRBUS:EMBRAER)
ggplot(df_long, aes(x = YEAR, y = percentage, fill = company)) +
geom_col() +
ggtitle("Departure delays by company and Year") +
scale_x_continuous(breaks = 2002:2013)