Using HDF and NetCDF files
HDF-EOS files
MODIS data, and other NASA data comes packaged as .hdf files. These are HDF-EOS files, specified by NASA and based on the HDF4 specification. More info here, and there are code examples for using HDF-EOS files with different languages here. For a MOD17 example see here.
Since HDF5 is the current and most supported HDF format, it may be easiest to first convert HDF-EOS files to HDF5 files using a conversion tool. Download and unpack then cd
into that directory and run ./h4toh5 ~/path/to/file.hdf
.
Reading with python
If using a HDF5 files, h5py or PyTables can be used to access the data in the file.
If using HDF4 data with python, check these resources:
- PyHDF info
- In anaconda, you may want the the conda-forge package
- hdf4 also seems to work
Also - maybe check this out:
- http://www.pymodis.org/
Reprojecting the data
Usually these come in the a standard sinusoidal projection and there may or may not be lat/lon data provided in the file. If there is no lat/lon data it must be created using the file metadata (corner coordinates of the tile, cell size, etc). It is possible to do this and reproject with GDAL (which can find the file metadata using GetGeoTransform) and Basemap in Python (see examples here)
If using only pyhdf, this:
https://lpdaac.usgs.gov/tools/modis_reprojection_tool
or this may help:
http://hdfeos.org/software/eos2dump.php
NetCDF files
Reading the file
Some of this cribbed from:
http://www.hydro.washington.edu/~jhamman/hydro-logic/blog/2013/10/12/plot-netcdf-data/
They can be opened by GDAL (though potentially a little tricky)
ds = gdal.Open(mstmip_dir + 'BIOME-BGC_BG1_Monthly_NEP.nc4')
Or you can just use the ncdf-python module directly
ncdf2 = Dataset(mstmip_dir + 'BIOME-BGC_BG1_Monthly_NEP.nc4')
# Then pull out relevant variables
nep = ncdf2.variables['NEP'][-1] # data for one day month
lats = ncdf2.variables['lat'][:]
lons = ncdf2.variables['lon'][:]
ncdftime = ncdf2.variables['time'][:]
nep_units = ncdf2.variables['NEP'].units
Timestamps
You can use the time converter in the ncdftime module
time_conv = utime('days since 1700-01-01 00:00:00')
times = time_conv.num2date(ncdftime)
print(times[num])
Or just create a numpy array td = np.array([np.timedelta64(int(i), 'D') for i in ncdftime ]) times = td + np.datetime64('1700-01-01 00:00:00') print(times[-1])