I have a table of the following format:
Initial Table Formatting
And I'm seeking an output resembling the following:
Date
Value
January 1659
Value 1
February 1659
Value 2
March 1659
Value 3
April 1659
Value 4
and so on (numerical representations of the Month and Year are perfectly fine also.
I've attempted using merge operations but I'm thinking there must be an easier way (possibly using packages). I've found somewhat similar questions asked but none obviously applicable yet.
You can use pivot_longer and unite, both from the tidyr package:
library(tidyr)
pivot_longer(df, -Year) |>
unite(date, name, Year, sep = " ")
#> # A tibble: 120 x 2
#> date value
#> <chr> <int>
#> 1 Jan 1659 68
#> 2 Feb 1659 97
#> 3 Mar 1659 89
#> 4 Apr 1659 74
#> 5 May 1659 44
#> 6 Jun 1659 2
#> 7 Jul 1659 81
#> 8 Aug 1659 22
#> 9 Sep 1659 87
#> 10 Oct 1659 1
#> # ... with 110 more rows
Data used
set.seed(1)
df <- cbind(1659:1668, replicate(12, sample(99, 10))) |>
as.data.frame() |>
setNames(c("Year", month.abb))
df
#> Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 1 1659 68 97 89 74 44 2 81 22 87 1 76 43
#> 2 1660 39 85 37 42 25 45 13 93 83 43 39 1
#> 3 1661 1 21 34 38 70 18 40 28 90 59 24 29
#> 4 1662 34 54 99 20 39 22 89 48 48 26 53 78
#> 5 1663 87 74 44 28 51 78 48 33 64 15 92 22
#> 6 1664 43 7 79 96 42 65 96 45 94 58 86 70
#> 7 1665 14 73 33 44 6 70 23 21 60 29 40 28
#> 8 1666 82 79 84 87 24 87 84 31 51 24 83 37
#> 9 1667 59 98 35 70 32 93 29 17 34 42 90 61
#> 10 1668 51 37 70 40 14 75 98 73 10 48 35 46
Created on 2022-11-29 with reprex v2.0.2
I'm importing a dataset of numbers. Just one column being Numbers_Picked which has 600,000 rows. Each row consist of 20 integers ranging from 01 to 80, each separated by a white space. My problem is R will only handle it as characters. When represented as characters, all 20 numbers show up.
library(readr)
numbers_picked <- read_delim("C:/Users/HP/Desktop/csv/numbers_picked.csv",
" ", escape_double = FALSE, col_types = cols(numbers_picked =
col_character()))
View(numbers_picked)
When I use the white space delimiter and set it to integer, the data preview show that the column only takes one value.
library(readr)
numbers_picked <- read_delim("C:/Users/HP/Desktop/csv/numbers_picked.csv",
" ", escape_double = FALSE, col_types = cols(numbers_picked =
col_integer()))
View(numbers_picked)
Basically, I want to represent 20 integers in one column.
here is a sample of the dataset
numbers_picked
06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79
03 05 12 13 14 16 17 18 24 28 33 34 35 39 44 55 62 63 64 67
09 10 12 13 15 25 30 31 36 42 43 44 46 48 51 57 65 69 75 79
08 12 15 20 27 33 34 37 41 43 44 45 54 55 60 61 66 70 72 76
Windows 10
RSTUDIOS - Latest
File - .csv
Link -Large File 600,000 + lines
I created a .csv file with the numbers you provided using ' ' as a separator and this worked like a charm.
numbers_picked <- read.table("C:/Users/HP/Desktop/csv/numbers_picked.csv",
sep = " ")
If your intention was to get a data frame with a single list-column of integer vectors, then you can do the following.
Read in the column as a character vector, then str_split it into a list of character vectors. We can then map each of the character vectors to an integer vector.
library('tidyverse')
csv_text <- 'numbers_picked
06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79
03 05 12 13 14 16 17 18 24 28 33 34 35 39 44 55 62 63 64 67
09 10 12 13 15 25 30 31 36 42 43 44 46 48 51 57 65 69 75 79
08 12 15 20 27 33 34 37 41 43 44 45 54 55 60 61 66 70 72 76'
read_csv(csv_text) %>%
mutate(numbers_picked = stringr::str_split(numbers_picked, ' ') %>% map(as.integer))
# numbers_picked
# <list>
# 1 <int [20]>
# 2 <int [20]>
# 3 <int [20]>
# 4 <int [20]>
I have an excel file as shown below.
A B C D E
2010 25 74 85 88 89
2011 27 86 97 99
2012 37 115 131
2013 47 146
2014 56
But When I am loading in R it is giving following output
R Output:
X__1 A B C D E
1 2010 25 74 85 88 89
2 2011 27 86 97 99
3 2012 37 115 131
4 2013 47 146
5 2014 56
But my required output in R should be in the format shown below for my calculations :
A B C D E
2010 25 74 85 88 89
2011 27 86 97 99
2012 37 115 131
2013 47 146
2014 56
Can anyone please help me solve this issue
We can use the following code to set the row name
# Set the row name using X__1
rownames(dt) <- dt$X__1
# Remove X__1
dt$X__1 <- NULL
Or we can use tidyverse package to do this.
library(tidyverse)
dt <- dt %>%
# Remove row name
remove_rownames() %>%
# Set column as row name
column_to_rownames("X__1")
Data
dt <- read.table(text = " X__1 A B C D E
1 2010 25 74 85 88 89
2 2011 27 86 97 99
3 2012 37 115 131
4 2013 47 146
5 2014 56 ",
header = TRUE, fill = TRUE)
I've a large dataframe which contains 12 columns each for two types of values, Rested and Active. I want to convert the columns of each month into rows, thus bring all the month columns (Jan, Feb, Mar... ) under 'Month'
My data is as follows:
ID L1 L2 Year JR FR MR AR MYR JR JLR AGR SR OR NR DR JA FA MA AA MYA JA JLA AGA SA OA NA DA
1234 89 65 2003 11 34 6 7 8 90 65 54 3 22 55 66 76 86 30 76 43 67 13 98 67 0 127 74
1234 45 76 2004 67 87 98 5 4 3 77 8 99 76 56 4 3 2 65 78 44 53 67 98 79 53 23 65
I'm trying to make it appear as below (column R represents Rested and column A represents Active. and monthly JR, FR, MR respectively means Jan Rested, Feb Rested, Mar Rested and JA, FA, MA respectively means Jan Active, Feb Active, Mar Active and etc):
So, here I'm trying to convert each of the monthly columns to rows and keeping them beside each other for R and A values by creating a new Month column.
ID L1 L2 Year Month R A
1234 89 65 2003 Jan 11 76
1234 89 65 2003 Feb 34 86
1234 89 65 2003 Mar 6 30
1234 89 65 2003 Apr 7 76
1234 89 65 2003 May 8 43
1234 89 65 2003 Jun 90 67
1234 89 65 2003 Jul 65 13
1234 89 65 2003 Aug 54 98
1234 89 65 2003 Sep 3 67
1234 89 65 2003 Oct 22 0
1234 89 65 2003 Nov 55 127
1234 89 65 2003 Dec 66 74
1234 45 76 2004 Jan 67 3
1234 45 76 2004 Feb 87 2
1234 45 76 2004 Mar 98 65
1234 45 76 2004 Apr 5 78
1234 45 76 2004 May 4 44
1234 45 76 2004 Jun 3 53
1234 45 76 2004 Jul 77 67
1234 45 76 2004 Aug 8 98
1234 45 76 2004 Sep 99 79
1234 45 76 2004 Oct 76 53
1234 45 76 2004 Nov 56 23
1234 45 76 2004 Dec 4 65
I've tried various things like stack,melt,unlist
data_reshape <- reshape(df,direction="long", varying=list(c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA")), v.names="Precipitation", timevar="Month")
data_stacked <- stack(data, select = c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA"))
but their result is not quite expected - they are giving Jan values of all years, and then Feb values of all years, and then March values of all years, and etc. But I want to structure them in an proper monthly manner for each Year for each ID existing in the entire dataset.
How to achieve this in R?
Here's a base reshape approach:
res <- reshape(mydf, direction="long", varying=list(5:16, 17:28), v.names=c("R", "A"), times = month.name, timevar = "Month")
res[with(res, order(ID, -L1, L2, Year)), -8]
Here's a possible solution using the devel version of data.table
library(data.table) ## v >= 1.9.5
res <- melt(setDT(df),
id = 1:4, ## id variables
measure = list(5:16, 17:ncol(df)), # a list of two groups of measure variables
variable = "Month", # The name of the additional variable
value = c("R", "A")) # The names of the grouped variables
setorder(res, ID, -L1, L2, Year) ## Reordering the data to match the desired output
res[, Month := month.abb[Month]] ## You don't really need this part as you already have the months numbers
# ID L1 L2 Year Month R A
# 1: 1234 89 65 2003 Jan 11 76
# 2: 1234 89 65 2003 Feb 34 86
# 3: 1234 89 65 2003 Mar 6 30
# 4: 1234 89 65 2003 Apr 7 76
# 5: 1234 89 65 2003 May 8 43
# 6: 1234 89 65 2003 Jun 90 67
# 7: 1234 89 65 2003 Jul 65 13
# 8: 1234 89 65 2003 Aug 54 98
# 9: 1234 89 65 2003 Sep 3 67
# 10: 1234 89 65 2003 Oct 22 0
# 11: 1234 89 65 2003 Nov 55 127
# 12: 1234 89 65 2003 Dec 66 74
# 13: 1234 45 76 2004 Jan 67 3
# 14: 1234 45 76 2004 Feb 87 2
# 15: 1234 45 76 2004 Mar 98 65
# 16: 1234 45 76 2004 Apr 5 78
# 17: 1234 45 76 2004 May 4 44
# 18: 1234 45 76 2004 Jun 3 53
# 19: 1234 45 76 2004 Jul 77 67
# 20: 1234 45 76 2004 Aug 8 98
# 21: 1234 45 76 2004 Sep 99 79
# 22: 1234 45 76 2004 Oct 76 53
# 23: 1234 45 76 2004 Nov 56 23
# 24: 1234 45 76 2004 Dec 4 65
Installation instructions:
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
This is an inelegant solution, but I'm going to post it just to show how problems can be solved with basic tools without relying on high level functions when the task doesn't necessarily require them. I think that the more tools you have, the more you can approach correctly to problems. Here we are:
#extract the data part
data<-t(as.matrix(df[,5:28]))
#build the data.frame cbinding the needed columns
res<-cbind(df[rep(1:nrow(df),each=12),1:4], #this repeats the first 4 columns 12 times each
Month=month.abb, #the month column
R=as.vector(data[1:12,]), # the R column, obtained from the first 12 rows of data
A=as.vector(data[13:24,])) #as above
rownames(res)<-NULL #just to remove the row names
Now I have some monthly data like :
1/1/90 620
2/1/90,591
3/1/90,574
4/1/90,542
5/1/90,534
6/1/90,545
#...etc
If I use ts() function, it's easy to make the data into time series structure like:
Jan Feb Mar ... Nov Dec
1990 620 591 574 ... 493 464
1991 100 200 300 ...........
Is there any possibilities to change it into quarterly repeating like this:
1st 2nd 3rd 4th
1990-Q1 620 591 574 464
1990-Q2 100 200 300 400
1990-Q3 ...
1990-Q4 ...
1991-Q1 ...
I tried to change
ts(mydata,start=c(1990,1),frequency=12)
to
ts(mydata,start=c(as.yearqrt("1990-1",1)),frequency=4)
but it seems not working.
Could anyone help me? Thank you very much.
monthly <- ts(mydata, start = c(1990, 1), frequency = 12)
quarterly <- aggregate(monthly, nfrequency = 4)
I don't agree with Hyndman on this one. Which is rare as Hyndman can usually do no wrong. However, I can show you his solution doesn't give the OP what he wants.
test<-c(1:100)
test_ts <- ts(test, start=c(2000,1), frequency=12)
test_ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000 1 2 3 4 5 6 7 8 9 10 11 12
2001 13 14 15 16 17 18 19 20 21 22 23 24
2002 25 26 27 28 29 30 31 32 33 34 35 36
2003 37 38 39 40 41 42 43 44 45 46 47 48
2004 49 50 51 52 53 54 55 56 57 58 59 60
2005 61 62 63 64 65 66 67 68 69 70 71 72
2006 73 74 75 76 77 78 79 80 81 82 83 84
2007 85 86 87 88 89 90 91 92 93 94 95 96
2008 97 98 99 100
test_agg <- aggregate(test_ts, nfrequency=4)
test_agg
2000 6 15 24 33
2001 42 51 60 69
2002 78 87 96 105
2003 114 123 132 141
2004 150 159 168 177
2005 186 195 204 213
2006 222 231 240 249
2007 258 267 276 285
2008 294
Well, wait, that first quarter isn't the average of the 3 months, its the sum. (1+2+3 =6 but you want it to show the mean=2). So you will need to modify that a tad.
test_agg <- aggregate(test_ts, nfrequency=4)/3
# divisor is (old freq)/(new freq) = 12/4 = 3
Qtr1 Qtr2 Qtr3 Qtr4
2000 2 5 8 11
2001 14 17 20 23
2002 26 29 32 35
2003 38 41 44 47
2004 50 53 56 59
2005 62 65 68 71
2006 74 77 80 83
2007 86 89 92 95
2008 98
Which now shows you the mean of the monthly data written as quarterly.
The divisor is the trick here. If you had weekly (freq=52) and wanted quarterly (freq=4) you'd divide by 52/4=13.
If you want the mean instead of the sum, just add "mean":
quarterly <- aggregate(monthly, nfrequency=4,mean)