How to sort a data frame by order of values in a Column? - r

I am trying to sort the following data frame by the values in the period column.enter image description here
the code that I am using is as follows :
data = read.csv("inputSample.csv")
datasub = subset(data,data$Period<41 & data$Period>0)
write.csv(datasub,"period+.csv")
new = read.csv("period+.csv")
sub = subset(new,new$NumberOfClaims>0)
sub1 = subset(new,new$NumberOfClaims==0)
opr <- function(set)
{
return((set$LossAmt * set$SimulationCount)/set$NumberOfClaims)
}
operated = data.frame( sub$LoanID,opr(sub), sub$EndingBalance, sub$BalanceInClaims, sub$Period)
operated = operated[order("sub.Period")]
print(operated)
however the code above simply returns the values of the first column in the dataframe that too in an unsorted order. I have tried using with() and other ways but none of them seem to work. Please help me out. Thanks

Related

Creating a function to save different columns from a list into a dataframe

I am trying to look into a dataset using a forloop. The data contains lists and I'm trying to create a function which saves the output of the loop into a dataframe. It is something similar to this example
test = function(inputname) {
if(is_empty(obs$company$inputname[[1]])){
data[[i]] = obs
} else {
data[[i]] = obs$company %>%
unnest(inputname)
}
df_inputname = data %>%
rbindlist(fill = TRUE) %>%
select(company:date)
}
test(address)
Im trying to get this to return a dataframe called 'df_address' containing information from the input_name = address from the list in the data. But i get an error messasage saying that the column doesnt exists. The idea is I can look up different variables within the list and save the results as a dataframe. Ideally I would want to be able to search for multiple variables and add them into a dataframe, but in this first go Im simply annoyed as why I ge tthe error. Any ideas?

Reading row and column index values from GRIB file via netcdf

Recently I have been working on netcdf files and I am using this library. I am able to open and read the data like this:
NetcdfFile ncfile = NetcdfFile.open(inputPath);
I am able to list variables and get desired variable inside the data:
List<Variable> variables = ncfile.getVariables();
Variable tcc = ncFile.findVariable("tcc_0");
I am able to get rank and shape of variable too, and I can get data of the table by this:
int[] readOrigin = new int[2];
int[] readShape = new int[2];
readOrigin[0] = desiredRow;
readOrigin[1] = 0;
readShape[0] = 1;
readShape[1] = numberOfColumns;
Array arr = tcc.read(readOrigin, readShape);
This code gets all the values of 'desiredRow'th row and I can iterate over arr and find each specific value for each column.
However, I want to get all the values for columns and row indexes. When I want to achieve table's [0][0] value, I can achieve that. By I am not able to achieve all the row and column index values. I need to get 32.035, 32.08, ... for the row index values and same for the columns.
Any help is appreciated.
After some research I found a way to get them. They are called root group and has 1 dimensional data inside it which is exactly what I needed.
Variable rootGroupLatitudeVar = ncFile.getRootGroup().findVariableLocal("latitude");
Array latitudeArr = rootGroupLatitudeVar.read();
int latitudeArrSize = rootGroupLatitudeVar.getShape(0);
..iteration

Integers change its values generating time series from dataframe in R

I have a list with dataframes inside it like this:
x = data.frame("city" = c("Madrid","Madrid","Madrid","Madrid"),
"date" = c('2018-11-01','2018-11-02','2018-11-03','2018-11-04'),
"visits" = c(100,200,80,38), "temp"=c(20,10,17,16))
list_of_cities= split(x, x$city) #In my original df there are a lot of cities
Then, to create a time series object (ts), I follow the next process:
madrid_data = select(list_of_cities[['Madrid']],date,visits,temp)
madrid = ts(madrid_data[,2:3], start = c(2018,305), frequency = 365)
In this example, the problem I have does not arise. However, with my original dataframe I get this:
How could I solve it? Thank you very much in advance
The problem comes from the type "integer64". It is needed to change integer64 to numeric, and in that way, everything is solved.
x$visits = as.numeric(x$visits)

Using R, group data together based on two values in a table

I need to build a "profile" of a data set, showing the number of data entries that lie between two values. I have been able to achieve the result using the "group_by" function, however the resultant output is not in a format that I can use further down my workflow. Here is that output:
What I need, is something that looks like this:
The "Data Count" column, I've not been able to populate but is there for illustration.
The code I am using is as follows;
library(formattable)
PML_Start = 0
PML_Max = 100000000
PML_Interval = 5000000
Lower_Band <- currency(seq(PML_Start, PML_Max-PML_Interval, PML_Interval),digits=0)
Upper_Band <- currency(seq(PML_Start+PML_Interval,PML_Max,PML_Interval),digits = 0)
PML_Profile <- data.frame("Lower Band"=Lower_Band,"Upper Band"=Upper_Band,"Data Count")
I know cannot figure out how to further populate this table. I gave this a go, but didn't really believe it would work.
PML_Profile <- Profiles_on_Data_Provided_26_9_17 %>%
group_by (Lower_Band) %>%
summarise("Premium" = sum(Profiles_on_Data_Provided_26_9_17$`Written Premium - Total`))
Any thoughts?

creating data frame using for loop for string data in R

I have csv file with following data.
i wanted to put this data in dataframe "dfSubClass".
After i will find unique subject list as "uniquesubject" and unique class list as "uniqueclass"form "dfSubClass".
Using "uniquesubject", "uniqueclass" and for loop i wanted to create all subject and class combinations as
csv and expected data
I tried following but its not working.
dfSubClass <- read.csv("SubjectClass.csv",header = TRUE)
uniquesubject = unique(planningItems["Subject"])
uniqueclass = unique(planningItems["Class"])
newDF <- data.frame()
for(Subject in 1:nrow(uniquesubject)){
for(Class in 1:nrow(uniqueclass)){
newDF = rbind(newDF,c(uniquesubject[Subject,],uniqueclass[Class,]))
}
}
this not giving me desired output please help .
I would suggest using the function expand.grid which will automatically generate all the combinations.
Also in your code unique(planningItems["Subject"]), it will return a data frame which is actually not a good idea for this case. A vector would be better.
Here is my code:
uniquesubject = unique(dfSubClass$Subject)
uniqueclass = unique(dfSubClass$Class)
newDF=expand.grid(uniquesubject,uniqueclass)
If using for loops, the main issue in your code is about the rbind function. Here is my code:
uniquesubject = unique(dfSubClass$Subject)
uniqueclass = unique(dfSubClass$Class)
newDF = data.frame()
for (Subject in 1:length(uniquesubject)){
for (Class in 1:length(uniqueclass)){
newDF=rbind(newDF,data.frame("Subject"=uniquesubject[Subject],"Class"=uniqueclass[Class]))
}
}
I think the main different to your code is that I created a dataframe inside the rbind() instead of creating a vector using c(). This is to make sure the result is in dataframe structure instead of a matrix.

Resources