I have a point text file with records in separate rows and I would like to convert it to NetCDF file. Is there any utility or executable that can be helpful for this?
Just read in the text file and use the netcdf module contained in scipy.
Generic example:
import numpy as np
from scipy.io.netcdf import netcdf_file as Dataset
data = []
with open('./temp_text.txt','r') as txtfile:
[data.append(float(row[0])) for row in txtfile]
ncfile_out = Dataset('./temp.nc','w')
ncfile_out.createDimension('record',len(data))
nc_data = ncfile_out.createVariable('data',np.dtype('float').char,('record',))
nc_data[:] = data
ncfile_out.close()
Related
Thanks for reading this post. I import a .csv file from Qualtrics platform almost daily to a specific folder (i.e., "Thesis_Folder"). Whenever I import a .csv file to the Thesis_Folder, the default name of the .csv file includes today's date and time. In other words, the name of the .csv file changes every time I import the data from Qualtrics (e.g., "Thesis data June+3_2019++12:48"). However, the .csv file always contains the words "Thesis data". My question is: How can I import a .csv file if the file contains "Thesis data" in the name of the file?
list.files() lists the name of the files with specific patterns, but it does not import them as data frames. Thank you.
files <- list.files("C:/Users/User/Desktop/csv", pattern=".*Thesis.*\\.csv$", full.names=TRUE)
list_of_frames <- lapply(files, read.csv)
Store all data.frames in a list.
Untested since you do not provide example data, but this should work:
library("rio")
library("dplyr")
data <- list.files( # find respective files
path = "./Thesis_Folder",
pattern = ".csv$", # you might want a more specific regex if possible
full.names = TRUE
) %>%
lapply(import) %>% # import from rio usually works well
bind_rows() # bind the list of data.frames to one big df
I am trying to read an excel file (xlsx) into a data frame in ibm watson studio. the excel file is saved in my list of assets. i'm a bit new to python
i have tried creating a project token with some help i got here. I will appreciate if someone helps with the complete code.
i tried this
from project_lib import Project
project = Project(project_id='',
project_access_token='')
pc = project.project_context
file = project.get_file("xx.xlsx")
file.sheet_names
df = pd.ExcelFile(file)
df = file.parse (0)
df.head ()
i needed to pass the excel file into a pandas data frame , pd for eg.
All you need to do is
First insert the project token as you already did.
Then simply fetch file and then do .seek(0),
Then read it using pandas' read_excel() and you should be able to read it.
# Fetch the file
my_file = project.get_file("tests-example.xls")
# Read the CSV data file from the object storage into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_excel(my_file, nrows=10)
For more information:- https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html
I have 200 tsv documents to import
I use code
list_file <- list.files(pattern = "*.tsv")
read.all <- lapply(list_file, read.table,header=TRUE, sep="\t",na.strings = c(""," "), fill = TRUE)
It provides warning as EOF within quoted string to import
At first I tought it is warning so I moved on, the structure of data is find. But when I check the number of rows in last few lists, I found it didn't import all rows from those tsv documents. It seems nothing wrong with my data. Because I imported a single tsv file that imported incompletely with lapply function and it was imported successfully without losing any information
Since I can't monitor which file was not import appropriately, and I can't trust I have all information I need. Can any one help to provide suggestions?
Maybe provide a method that may slow but without errors? Many thanks.
I'm really new to Python, so in advance thank you for this probably stupid question:
I have a directory filled with LAS files and I would like to assign each of the with a variable, so that I can continue working with them all afterwards. I can't seem to find an answer that works ... the code below, is for some reason I cant figure out not working - thanks for the help!
%%
### Importing function packages ### NumPy, Pandas, PyPlot, OS, LASio, Sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import lasio as las
#%%
#Directory
my_dir = os.walk("C:\\Users\\L_R\\Desktop\\test\\")
my_list = []
#Creates a list with LAS files in the DIR
for i in my_dir:
my_list.append(i)
print my_list
#Iterates through DIR and creates dataframes out of files
count=0
current_las=my_list[count]
filename = current_las
for dirname, dirs, files in my_dir:
while count < len(my_list):
if filename.endswith(".LAS"):
las_output = las.read(filename) #module reading the file
count = count + 1
A few things. First I assume you have a dictionary, and you want to extract the items and assign them to variables using the names saved in the keys. Is that correct?
You could load the entire directory and then save them to the correct files.
If I am correct, this code will allow you first load the data and then save the data to variables.
D = np.load('Dict.py')
for key,val in D.items():
exec(key + '=val')
Suppose I have variable s with this code:
s <- "foo\nbar"
Then change it to data.frame
s2 <- data.frame(s)
Now s2 is a data.frame with one records, next I export to a csv file with:
write.csv(s2, file = "out.csv", row.names = F)
Then I open it with notepad, the "foo\nbar" was flown into two lines. With SAS import:
proc import datafile = "out.csv" out = out dbms = csv replace;
run;
I got two records, one is '"foo', the other is 'bar"', which is not expected.
After struggling for a while, I found if I export from R with foreign package like this:
write.dbf(s2, 'out.dbf')
Then import with SAS:
proc import datafile = "out.dbf" out = out dbms = dbf replace;
run;
Everything works nice and got one records in sas, the value seems to be 'foo bar'.
Does this mean csv is a bad choice when dealing with data, compared with dbf? Are there any other solutions or explanations to this?
A CSV file stands for comma-separated-version. This means that each line in the file should contain a list of values separated by a comma. SAS imported the file correctly based on the definition of the CSV file (ie. 2 lines = 2 rows).
The problem you are experiencing is due to the \n character(s) in your string. This sequence of characters happens to represent a newline character, and this is why the R write.csv() call is creating two lines instead of putting it all on one.
I'm not an expert in R so I can't tell you how to either modify the call to write.csv() or mask the \n value in the input string to prevent it from writing out the newline character.
The reason you don't have this problem with .dbf is probably because it doesn't care about commas or newlines to indicate when new variables or rows start, it must have it's own special sequence of bytes that indicate this.
DBF - is a database formats, which are always easier to work with because they have variable types/lengths embedded in their structure.
With a CSV or any other delimited file you have to have documentation included to know the file structure.
The benefit of CSV is smaller file sizes and compatibility across multiple OS and applications. For a while Excel (2007?) no longer supported DBF for example.
As Robert says you will need to mask the new line value. For example:
replace_linebreak <- function(x,...){
gsub('\n','|n',x)
}
s3 <- replace_linebreak(s2$s)
This replaces \n with |n, which would you would then need to replace when you import again. Obviously what you choose to mask it with will depend on your data.