Data Import and Usage Example#
Here we demonstrate importing a .parquet
file and using it to create a few basic plots.
First, we’ll showcase the land cover sets by filtering the Vancouver Island basins for a subset of basins in the order of
When importing a parquet file with GeoPandas, we can switch between active geometry columns by using the set_geometry()
attribute function.
import os
import pandas as pd
import geopandas as gpd
import geoviews as gv
from shapely.geometry import Point
# import geoviews.feature as gf
gv.extension('bokeh')
BASE_DIR = os.path.dirname(os.getcwd())
DATA_DIR = os.path.join(BASE_DIR, 'notebooks/data/')
# foo = '/home/danbot2/code_5820/large_sample_hydrology/bcub/processed_data/basin_attributes/polygons'
# df = gpd.read_file(os.path.join(DATA_DIR, f'Vancouver_Island_attributes.gpkg'))
df = gpd.read_parquet(os.path.join(DATA_DIR, f'Vancouver_Island_attributes.parquet'))
print(f'There are {len(df)} basins in the attributes file. The active geometry column at import is "geometry" (basin centroid)')
print(f'The crs of the dataframe is {df.crs.to_epsg()}')
There are 1001 basins in the attributes file. The active geometry column at import is "geometry" (basin centroid)
The crs of the dataframe is 3005
Here I’ll add the basin centroids (centroid_x, centroid_y) and pour points (ppt_x, ppt_y) as point geometries.
df['centroid_geom'] = df.apply(lambda row: Point(row['centroid_x'], row['centroid_y']), axis=1)
df['ppt_geom'] = df.apply(lambda row: Point(row['ppt_x'], row['ppt_y']), axis=1)
I’ll set the active geometry column to the newly created ppt_geom
column so any automatic geometry reference will use the basin centroid.
# we need to reproject to 4326 for plotting
centroid_df = df.copy().set_geometry('centroid_geom', crs=3005).to_crs(4326)
Below, we’ll colour map the snow water equivalent (mean annual) to a point representing the basin centroid.
pt_label = 'swe'
points_element = gv.Points(centroid_df).opts(color=pt_label, cmap='RdYlGn',
colorbar=True, clabel=f'{pt_label}',
)
plot = gv.tile_sources.CartoLight() * points_element
plot.opts(width=800, height=600, title='Mean annual (max) SWE (from Daymet 1980-2022)')
Now we’ll compute the forest cover change (as a % of the basin area) for the period 2010-2020. We’ll map the change to the colour of the basin polygon fill, where red will indicate reduction in forest cover, and green will indicate increase in forest cover.
df['forest_change_2010_to_2020'] = (100*df['Land_Use_Forest_frac_2020'] - 100*df['Land_Use_Forest_frac_2010']).astype(int)
min_area, max_area = 20, 25
label = 'low_prcp_duration'
label = 'forest_change_2010_to_2020'
if df.crs.to_epsg() != 3005:
df = df.to_crs(3005)
df['area'] = df['geometry'].area / 1e6
print(df['area'].min(), df['area'].max())
filtered_basins = df[(df['area'] <= max_area) & (df['area'] > min_area)].copy()
len(filtered_basins)
0.9990161285338974 1752.681170388754
35
# attributes = attributes[(attributes['drainage_area_km2'] <= max_area) & (attributes['drainage_area_km2'] > min_area)].copy()
filtered_basins.head()
# filtered_basins.set_geometry('basin_geometry', inplace=True)
filtered_basins = filtered_basins.to_crs(4326)
# filtered_polygons = df.iloc[filtered_ids, :].copy()
# filtered_polygons.head()
polygons_element = gv.Polygons(filtered_basins).opts(color=label, cmap='RdYlGn',
line_color=None, colorbar=True, clabel=f'{label}',
)
plot = gv.tile_sources.CartoLight() * polygons_element
plot.opts(width=800, height=600)