CEREMA Île-de-France · Internship Project

Internship Hydrology
Workflow

Python pipeline developed during my internship at CEREMA to clean, resample, and analyse rainfall/flow/storage time series — estimating dry-weather baseflow and separating rainfall-induced flow from raw sensor data.

Python 100% pandas · numpy · matplotlib Baseflow Separation Rainfall–Runoff CEREMA Data 30-min time step

Overview

What the workflow does

Raw sensor exports from CEREMA field sites contain irregular timestamps, missing value codes (−6999 / −7999), and mixed rainfall, flowrate, and storage tank data in a single semicolon-separated file. This pipeline automates the full path from raw export to clean diagnostic figures, ready for model calibration.

01
Load & CleanRead raw CEREMA exports, strip metadata lines, handle missing codes −6999/−7999, enforce correct dtypes.
02
ResampleRegularise irregular timestamps to a chosen time step (default: 30 min) using pandas resampling with gap-aware aggregation.
03
Rain / Dry Period MarkingMark rain vs dry periods from rain-event tables to separate wet- and dry-weather signals in the flow record.
04
Baseflow EstimationDetect storage-tank filling segments from dV/dt dynamics; fit baseflow from dry-weather periods; build a continuous baseflow series.
05
Rainfall-Induced FlowCompute rainfall-induced flow = total flow − baseflow. Export intermediate and final CSVs alongside diagnostic QA figures.

Outputs

Example figures

These figures are generated from the 30-minute resampled time series. Raw input data are not included in the repository (field site data, confidential).

QA plot: observed flow, estimated baseflow, rainfall-induced flow at 30-min resolution
Figure 1. QA plot — observed total flow (blue), estimated dry-weather baseflow (orange), and computed rainfall-induced flow (green) at 30-minute resolution. The separation quality can be visually inspected against the rainfall record.
Rainfall time series at 30-minute resolution
Figure 2. Rainfall time series (30-min). Two pluviometers shown. Data cleaned and resampled from raw CEREMA export.
Flowrate and stored volume at 30-minute resolution
Figure 3. Flowrate and storage tank volume (30-min). Storage dynamics (dV/dt) are used to detect dry-weather filling segments for baseflow fitting.

Technical Details

Stack & repository structure

Python 3 pandas numpy matplotlib Time Series Analysis Baseflow Separation Rainfall–Runoff CEREMA Île-de-France 30-min resampling
internship-hydrology-workflow/
├── scripts/
│ └── run_pipeline.py    ← entry point (410 lines): load → clean → resample → export
├── src/                 ← package modules (expand as workflow grows)
├── example_figures/
│ ├── rainfall_30min.png
│ ├── flow_and_storage_30min.png
│ └── qa_baseflow_rainflow_30min.png
├── Inputs/              ← place raw data here (git-ignored)
├── Outputs/             ← results written here (git-ignored)
├── requirements.txt
└── README.md

The pipeline is designed to be modular — each processing stage is a separate function, making it easy to extend. The entry script run_pipeline.py currently implements steps 1–2 (load, clean, resample, export). Steps 3–5 (rain marking, baseflow, flow separation) are implemented as next modules in src/.