In R how to select rows from a dataframe? - r

I have a column "Year" in my dataframe ("import") and I need to only select 2015 out of some 30 years. However none of the steps I tried worked. Things I tried include:
iy2015<-subset(import, import$year==2015)
iy2015<-import[which(import$year==2015),]
iy2015<-import[import$year==2015,]
all have given me an empty dataframe.

For me your last option works, check if 2015 is in the column and check whether year is a column name. I used: iy2015 = import[import$Year==2015,]
EDIT:
You need to use Year instead of year.

Related

How to exclude specific row from filtering in grepl in R?

I have following dataset:
I would want to filter out the rows that have year 2019 apart from row "30.12.2019 bis 05.01.2020".
I tried to do it with grepl fiction, but I don't know how the exclamation is working there.
Thank you for helping.

R coding, I'm trying to correctly order the variables in my dataframe from 1 to 13 but it goes like 201501, 2015010, 011,012,013, 02...09

I have a large dataframe sorted by fiscal year and fiscal period. I am trying to create a time plot starting at fiscal period 1 of 2015, ending at fiscal period 13 of 2019. I have two columns, one for FY, one for FP. They look like this.
I merged the two columns together separated by a 0 in a new column (C) using the code:
MarkP$C = paste(MarkP$FY, MarkP$FP, sep="0")
This ensures that my new column is a numeric variable.
It looks like this (check column C)
Then since I want to plot a time plot of total sales per period, I aggregated all sales to the level of C, so all rows ending with the same C aggregate together. I used this code for the aggregation.
MarkP11 <- MarkP %>%
group_by(C) %>%
summarise(Sales=sum(Sales))
This is what MarkP11 looks like.
The problem i'm having is that the row's are out of order so when I plot them, it gives me an incorrect plot. It has period 10 coming after period 1.
I've done some research and discovered that the sprintf function may work but i'm not sure how I can incorporate that into the code for my data frame.
The code below is how my C column is created by merging two columns. I believe I need to edit this line with the 'sprintf' function but i'm not sure how to get that to work.
R programming
MarkP$C = paste(MarkP$FY, MarkP$FP, sep="0")
I expect the ordering of the MarkP dataframe to look something like this:
sprintf is indeed what you want:
sprintf("%0.0f%02.0f", 2019, c(1,10))
# [1] "201901" "201910"
This assumes that FP's range is 0-99. It would not be incorrect to use sprintf("%d%02d", 2019, c(1,10)) since you're intending to use integers, but sometimes I find that seemingly-integer values can trigger Error ... invalid format '%02d', so I just strong-arm it. You could also use as.integer on each set of values ... another workaround.
I was speaking with a colleague of mine and he helped me figure out the solution. Like r2evans commented, sprintf is the correct function. The syntax that worked for me was:
MarkP$C = paste(MarkP$FY, sprintf("%02d", MarkP$FP), sep-"")
What that did in my code was concatenate the two cells FY and FP together in a new cell titled "C".
-It first added my FY column to the new cell.
-Then, since sep="" there was no separator character so FY and FP were simply merged together.
-Since I added the sprintf function with
("%02d",
it padded the FP column with 0 zero prior to tacking on my FP column.

Subsetting dates from colnames

I have a dataframe as follows:
TAS1 2000 obs. of 9862 variables
Each of these variables (columns) represent daily temperatures from 1979-01-01 to 2005-12-31. The colnames have been set with these dates. I now wish to separate the dataframe into twelve separate monthly data frames - containing Jan, Feb, Mar etc.
I have tried:
TAS1.JAN = subset(TAS1, grepl("-01-"), colnames(TAS1))
But get the error:
Error in grepl("-01-") : argument "x" is missing, with no default
Is there a relatively quick solution for this? I feel there must be but haven't cracked it despite trying various solutions.
I would subset January data like below.
Jan_df <- subset(MyDatSet, select=(grepl("-01-, colnames(MyDatSet))))
I have assumed that your parent dataset is called MyDatSet and a pattern "-01-" defines that it is January data.
You may repeat the process for other 11 months or come up with intelligent loop.
Like Roland, in the comments, suggested, I would opt for melting mechanism too. However, since I do not know your use case, here you go based on what you posted and asked for.
As your error says, you are missing an argument there:
tas1.jan <- subset(df, grepl("-01-", df$tas1))
Another way to do it with the help of stringr and dplyr would be:
library(stringr)
library(dplyr)
tas1.jan <- df %>% filter(str_detect(tas1, "-01-"))
Bottom side of this approach: you need to run a loop or do this 12 times for all months.

Create a stack of n subset data frames from a single data frame based on date column

I need to create a bunch of subset data frames out of a single big df, based on a date column (e.g. - "Aug 2015" in month-Year format). It should be something similar to the subset function, except that the count of subset dfs to be formed should change dynamically depending upon the available values on date column
All the subsets data frames need to have similar structure, such that the date column value will be one and same for each and every subset df.
Suppose, If my big df currently has last 10 months of data, I need 10 subset data frames now, and 11 dfs if i run the same command next month (with 11 months of base data).
I have tried something like below. but after each iteration, the subset subdf_i is getting overwritten. Thus, I am getting only one subset df atlast, which is having the last value of month column in it.
I thought that would be created as 45 subset dfs like subdf_1, subdf_2,... and subdf_45 for all the 45 unique values of month column correspondingly.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
subdf_i <- subset(df, mnth == uniqmnth[i])
i==i+1
}
I hope there should be some option in the subset function or any looping might do. I am a beginner in R, not sure how to arrive at this.
I think the perfect solution for this might be use of assign() for the iterating variable i, to get appended in the names of each of the 45 subsets. Thanks for the note from my friend. Here is the solution to avoid the subset data frame being overwritten each run of the loop.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
assign(paste("subdf_",i,sep=""), subset(df, mnth == uniqmnth[i])) i==i+1
}

extracting corresponding values from lists in a data frame

I have a very simple data frame ("newDF") consisting of 2 lists "year" and "value". The list "year" is a simple list from 1850 to 2011. I wish to extract the "value" corresponding to the year 1990 for use in another package. I suspect it is a very simple question. Any help would be appreciated. Many thanks.
This should work:
newDF$value[newDF$year==1990]
The $ identifies a column in the dataframe; the brackets are a way to subset that column, and inside the brackets you just put a logical argument that will be (TRUE) for the row (or rows) you want. So you could get all years since 1990 with a very simply modification:
newDF$value[newDF$year>=1990]
subset(newDF, year==1990, select="value")

Resources