Recently I have been working on netcdf files and I am using this library. I am able to open and read the data like this:
NetcdfFile ncfile = NetcdfFile.open(inputPath);
I am able to list variables and get desired variable inside the data:
List<Variable> variables = ncfile.getVariables();
Variable tcc = ncFile.findVariable("tcc_0");
I am able to get rank and shape of variable too, and I can get data of the table by this:
int[] readOrigin = new int[2];
int[] readShape = new int[2];
readOrigin[0] = desiredRow;
readOrigin[1] = 0;
readShape[0] = 1;
readShape[1] = numberOfColumns;
Array arr = tcc.read(readOrigin, readShape);
This code gets all the values of 'desiredRow'th row and I can iterate over arr and find each specific value for each column.
However, I want to get all the values for columns and row indexes. When I want to achieve table's [0][0] value, I can achieve that. By I am not able to achieve all the row and column index values. I need to get 32.035, 32.08, ... for the row index values and same for the columns.
Any help is appreciated.
After some research I found a way to get them. They are called root group and has 1 dimensional data inside it which is exactly what I needed.
Variable rootGroupLatitudeVar = ncFile.getRootGroup().findVariableLocal("latitude");
Array latitudeArr = rootGroupLatitudeVar.read();
int latitudeArrSize = rootGroupLatitudeVar.getShape(0);
..iteration
Related
I have an Xarray Dataset which looks like
I would like to be able to select data variables (which i already know how to do) and plot a quadmesh of the data variable selected. The problem is that different data variables have different numbers and types of coordinates which means the closest I got is having a really long hard coded switch statement to handle each possible data variable and have a sel on it with the appropriate number and type of coordinates (for ES I would need to select on pres1 and time). How would I approach this in order to be able to allow the user to visualize the geospatial data no matter the data variable selected?
Here is my current attempt :
select_field = pnw.Select(name="Field", options=list(xds.data_vars))
select_time = pnw.Select(name="Time", options=list(xds.coords['time'].values))
select_pres1 = pnw.Select(name="Pressure 1", options=list(xds.coords['pres1'].values))
def fieldFiltered(select_field):
return xds[select_field]
xdsi = hvplot.bind(fieldFiltered, select_field).interactive(sizing_mode='stretch_both')
wid = xdsi.sel( pres1=select_pres1 , time=select_time ).widgets()
ploti = xdsi.sel(time=select_time).hvplot()
pn.Row(
wid[1],
wid[2],
ploti
)
which gives me something similar to
but breaks for any other variable
I am trying to sort the following data frame by the values in the period column.enter image description here
the code that I am using is as follows :
data = read.csv("inputSample.csv")
datasub = subset(data,data$Period<41 & data$Period>0)
write.csv(datasub,"period+.csv")
new = read.csv("period+.csv")
sub = subset(new,new$NumberOfClaims>0)
sub1 = subset(new,new$NumberOfClaims==0)
opr <- function(set)
{
return((set$LossAmt * set$SimulationCount)/set$NumberOfClaims)
}
operated = data.frame( sub$LoanID,opr(sub), sub$EndingBalance, sub$BalanceInClaims, sub$Period)
operated = operated[order("sub.Period")]
print(operated)
however the code above simply returns the values of the first column in the dataframe that too in an unsorted order. I have tried using with() and other ways but none of them seem to work. Please help me out. Thanks
I have a data frame that I am trying to condense from multiples rows into one row.The data set is fairly large, but I am starting with a small subset. So here I want to turn 2 rows into 1; I want the information to follow the information in the first row.
The original problem was that I had a column of data that I need to "flatten" so that I can use the bits and pieces. The column is in JSON format.
"[{\"task\":\"T0\",\"task_label\":\"Did any birds visit the feeding platform or bird feeders?\",\"value\":\"**Yes**—but there were no displacements. Next, enter all of the birds you see at the feeders. \"},{\"task\":\"T1\",\"value\":[{\"choice\":\"EUROPEANSTARLING\",\"answers\":{\"WHATISTHELARGESTNUMBEROFINDIVIDUALSTHATYOUSAWSIMULTANEOUSLY\":\"4\"},\"filters\":{}},{\"choice\":\"MOURNINGDOVE\",\"answers\":{\"WHATISTHELARGESTNUMBEROFINDIVIDUALSTHATYOUSAWSIMULTANEOUSLY\":\"2\"},\"filters\":{}}]},{\"task\":\"T6\",\"task_label\":\"Is it actively precipitating (rain or snow)?\",\"value\":[\"Yes.\"]}]"
So I used code developed by another coder to "flatten" this out by task. Then, I want to join it back up so that I have one line of information for each classification.
Currently, I have merged tasks T0 and T4, but I need to merge this to another task, T5. In order to do that, I need to reduce the data in merge of T0 and T4 to one row. So right now I'm working with a small subset of the data and have a table that essentially looks like this:
x <- data.frame("subject_ids" = c(19232716, 19232716), "classification_id" = c(120545061,120545061), "task_index.x" = c(1,1),
"task.x" = c("TO","TO"), "value" = c("Displacement","Displacement"), "task_index.y"=c(2,5), "task.y"= c("T4, T4","T4"),
"total.species"=c("2,2","1"), "choice" = c("MOURNINGDOVE, COMMONGRACKLE","MOURNINGDOVE"), "S_T"=c("Target,Target","Target,Source"))
but I want it to look like this:
y <- data.frame("subject_ids" = c(19232716), "classification_id" = c(120545061), "task_index.x" = c(1),
"task.x" = c("TO"), "value" = c("Displacement"), "task_index.y"=c(2), "task.y"= "T4, T4",
"total.species"=c("2,2"), "choice" = c("MOURNINGDOVE, COMMONGRACKLE"), "S_T"=c("Target,Target"),
"task_index.y"=c(5), "task.y"= "T4",
"total.species"=c("1"), "choice" = c("MOURNINGDOVE"), "S_T"=c("Target,Source"))
I am new to R and I am practicing to write R functions. I have 100 cvs separate
data files stored in my directory, and each is labeled by its id, e.g. "1" to "100.
I like to write a function that reads some selected files into R, calculates the
number of complete cases in each data file, and arrange the results into a data frame.
Below is the function that I wrote. First I read all files in "dat". Then, using
rbind function, I read the selected files I want into a data.frame. Lastly, I computed
the number of complete cases using sum(complete.cases()). This seems straightforward but
the function does not work. I suspect there is something wrong with the index but
have not figured out why. Searched through various topics but could not find a useful
answer. Many thanks!
`complete = function(directory,id) {
dat = list.files(directory, full.name=T)
dat.em = data.frame()
for (i in id) {
dat.ful= rbind(dat.em, read.csv(dat[i]))
obs = numeric()
obs[i] = sum(complete.cases(dat.ful[dat.ful$ID == i,]))
}
data.frame(ID = id, count = obs)
}
complete("envi",c(1,3,5)) `
get error and a warning message:
Error in data.frame(ID = id, count = obs) : arguments imply differing number of rows: 3, 5
One problem with your code is that you reset obs to numeric() each time you go through the loop, so obs ends up with only one value (the number of complete cases in the last file in dat).
Another issue is that the line dat.ful = rbind(dat.em, read.csv(dat[i])) resets dat.ful to contain just the data frame being read in that iteration of the loop. This won't cause an error, but you don't actually need to store the previous data frames, since you're just checking the number of complete cases for each data frame you read in.
Here's a different approach using lapply instead of a loop. Note that instead of giving the function a vector of indices, this function takes a vector of file names. In your example, you use the index instead of the file name as the file "id". It's better to use the file names directly, because even if the file names are numbers, using the index will give an incorrect result if, for some reason, your vector of file names is not sorted in ascending numeric order, or if the file names don't use consecutive numbers.
# Read files and return data frame with the number of complete cases in each csv file
complete = function(directory, files) {
# Read each csv file in turn and store its name and number of complete cases
# in a list
obs.list = lapply(files, function(x) {
dat = read.csv(paste0(directory,"/", x))
data.frame(fileName=x, count=sum(complete.cases(dat)))
})
# Return a data frame with the number of complete cases for each file
return(do.call(rbind, obs.list))
}
Then, to run the function, you need to give it a directory and a list of file names. For example, to read all csv files in the current working directory, you can do this:
filesToRead = list.files(pattern=".csv")
complete(getwd(), filesToRead)
I have a data set that is saved as a .csv file that looks like the following:
Name,Age,Password
John,9,\i1iiu1h8
Kelly,20,\771jk8
Bob,33,\kljhjj
In R I could open this file by the following:
X = read.csv("file.csv",header=TRUE)
Is there a default command in Matlab that reads .csv files with both numeric and string variables? csvread seems to only like numeric variables.
One step further, in R I could use the attach function to create variables with associated with teh columns and columns headers of the data set, i.e.,
attach(X)
Is there something similar in Matlab?
Although this question is close to being an exact duplicate, the solution suggested in the link provided by #NathanG (ie, using xlsread) is only one possible way to solve your problem. The author in the link also suggests using textscan, but doesn't provide any information about how to do it, so I thought I'd add an example here:
%# First we need to get the header-line
fid1 = fopen('file.csv', 'r');
Header = fgetl(fid1);
fclose(fid1);
%# Convert Header to cell array
Header = regexp(Header, '([^,]*)', 'tokens');
Header = cat(2, Header{:});
%# Read in the data
fid1 = fopen('file.csv', 'r');
D = textscan(fid1, '%s%d%s', 'Delimiter', ',', 'HeaderLines', 1);
fclose(fid1);
Header should now be a row vector of cells, where each cell stores a header. D is a row vector of cells, where each cell stores a column of data.
There is no way I'm aware of to "attach" D to Header. If you wanted, you could put them both in the same structure though, ie:
S.Header = Header;
S.Data = D;
Matlab's new table class makes this easy:
X = readtable('file.csv');
By default this will parse the headers, and use them as column names (also called variable names):
>> x
x =
Name Age Password
_______ ___ ___________
'John' 9 '\i1iiu1h8'
'Kelly' 20 '\771jk8'
'Bob' 33 '\kljhjj'
You can select a column using its name etc.:
>> x.Name
ans =
'John'
'Kelly'
'Bob'
Available since Matlab 2013b.
See www.mathworks.com/help/matlab/ref/readtable.html
I liked this approach, supported by Matlab 2012.
path='C:\folder1\folder2\';
data = 'data.csv';
data = dataset('xlsfile',sprintf('%s\%s', path,data));
Of cource you could also do the following:
[data,path] = uigetfile('C:\folder1\folder2\*.csv');
data = dataset('xlsfile',sprintf('%s\%s', path,data));