R language-read the value in excel and check - r

I am new to R language.
I have a dataset, the first column is the name and second is the value.
I want to read each value in the second column and check if the value falls into a certain range.
For example,
Name value
AA 123
and the range in (100,150)
which function can be used?
Thank you in advance.

you can use this code:
for(i in 1:nrow(df)){
if(df[i,2]<150){
if(df[i,2]>100){
print(df[i,])
}
}
}
that the df is your dataset

It is not entirely clear what the desired result is.
Option 1 could be a new column is created that records (e.g. TRUE/FALSE) whether an observation (row) is in the limits. For example, with tidyverse style code to create a column called "is_valid":
df %>% mutate(is_valid = ...)
Option 2 could be filtering the data to create a subset that has only the desired observations, e.g. with tidyverse style code.
df %>% filter(value <= 150) %>% filter(value >= 100)
In any case, I'd recommend spending some time to learn how to do this in a tidyverse way with dplyr (google the free online book R for Data Science).

Related

Create new dataframe column in R that conditions on row values without iterating?

So let's say I have the following dataframe "df":
names <- c("Bob","Mary","Ben","Lauren")
number <- c(1:4)
age <- c(20,33,34,45)
df <- data.frame(names,number,age)
Let's say I have another dataframe ("df2") with thousands of people and I want to sum the income of people in that other dataframe that have the given name, number and age of each row in "df". That is, for each row "i" of "df", I want to create a fourth column "TotalIncome" that is the sum of the income of all the people with the given name, age and number in dataframe "df2". In other words, for each row "i":
df$TotalIncome[i] <- sum(
df2$Income[df2$Name == df1$Name[i] &
df2$Numbers == df1$Numbers[i] &
df2$Age == df1$Age[i]], na.rm=TRUE)
Is there a way to do this without having to iterate in a for loop for each row "i" and perform the above code? Is there a way to use apply() to calculate this for the entire vector rather than only iterating each line individually? The actual dataset I am working with is huge and iterating takes quite a while and I am hoping there is a more efficient way to do this in R.
Thanks!
Have you considered use dplyr package? You can use some grammar with SQL-style and make this job quick and easy.
The code will be something like
library(dplyr)
df %>% left_join(df2) %>%
group_by(name, numbers, age) %>%
summarize(TotalIncome = sum(Income))
I suggest you to find the cheat sheets available on dplyr site or see the Wickham and Grolemund book.

Remove rows based on data in another dataframe?

I have right now a dataset with more than 186k observations (rows), this is presented in figure 1. These are all companies in BVDID column and they should contain data in all years of 2013 to 2017.
missingdata <- series %>% filter(LIABILITIES == 0) %>% select(BVDID)
However, I found 87k rows of only zero-values in missingdata object using the code above.
How do I delete the rows of the series object with BVDID (company code) in the dataframe missing data? Also there should be a way to make those years look better under my str(series) and put them ascending based on each company code.
Best regards
THERE are many ways, one such way.
use tidyverse anti_join function which gives the result as similar to set operation A-B and therefore will remove all matching rows from the second data.
series %>% anti_join(missingdata, by =c("BVDID" = "BVDID"))
Or directly. Liabilities == 0 will return boolean values, adding + before it converts these to 0 or 1 and checking the sum of these values if greater than 1, which are to be removed.
series %>% group_by(BVDID) %>% filter(sum(+(LIABILITIES == 0)) > 0)
series %>%
# filter out the BVDIDs from missingdata
filter(!BVDID %in% pull(missingdata)) %>%
# order the df
arrange(BVDID, year)

How can I create subsets from these data frame?

I want to aggregate my data. The goal is to have for each time interval one point in a diagram. Therefore I have a data frame with 2 columns. The first columns is a timestamp. The second is a value. I want to evaluate each time period. That means: The values be added all together within the Time period for example 1 second.
I don't know how to work with the aggregate function, because these function supports no time.
0.000180 8
0.000185 8
0.000474 32
It is not easy to tell from your question what you're specifically trying to do. Your data has no column headings, we do not know the data types, you did not include the error message, and you contradicted yourself between your original question and your comment (Is the first column the time stamp? Or is the second column the time stamp?
I'm trying to understand. Are you trying to:
Split your original data.frame in to multiple data.frame's?
View a specific sub-set of your data? Effectively, you want to filter your data?
Group your data.frame in to specific increments of a set time-interval to then aggregate the results?
Assuming that you have named the variables on your dataframe as time and value, I've addressed these three examples below.
#Set Data
num <- 100
set.seed(4444)
tempdf <- data.frame(time = sample(seq(0.000180,0.000500,0.000005),num,TRUE),
value = sample(1:100,num,TRUE))
#Example 1: Split your data in to multiple dataframes (using base functions)
temp1 <- tempdf[ tempdf$time>0.0003 , ]
temp2 <- tempdf[ tempdf$time>0.0003 & tempdf$time<0.0004 , ]
#Example 2: Filter your data (using dplyr::filter() function)
dplyr::filter(tempdf, time>0.0003 & time<0.0004)
#Example 3: Chain the funcions together using dplyr to group and summarise your data
library(dplyr)
tempdf %>%
mutate(group = floor(time*10000)/10000) %>%
group_by(group) %>%
summarise(avg = mean(value),
num = n())
I hope that helps?

Finding counts of each occurrence in R

I am trying to find the number of occurrences of each string within a certain row of a data frame in R. I assume I would use the unique() function.
For example, If I wanted a count of how many times each type of dog showed up within a data frame, how would I go about this?
Thanks!
It would be best if you gave a reproducible example. but...
sum(df[row_num, ] %in% c("Golden Retriever"))
would give the number of occurrences of "Golden Retreiver" in the first row. Iterating using a for loop would work for whole data frame.
Using the dplyr package you can do a rowwise operation to to populate a new column with the count. eg.
df %>% rowwise() %>% mutate(gold_count = sum(c(col_name1, col_name2, ...,) %in% "Golden Retriever"))
you can do this for all the other as well

Percentile Ranking by Grouping in R

The column ID is a sequence of 1-63 in repetition. I wish to add two new columns Closepctl and Quantitypctl in which I can rank each entry from 1-63 i.e., on the basis of Close and Quantity column but grouped with respect to ID. Is there any way to do this in R?
I tried it in excel but failed to find any grouping option there. Excel approach is also appreciated.
See if the following is helpful...
library(dplyr)
data(iris)
df <- iris %>%
group_by(Species) %>%
mutate(RankSepal = percent_rank(Sepal.Length),
RankPetal = percent_rank(Petal.Length))

Resources