Raw Database Schema
Overview
The raw database stores data as fetched from Eurostat APIs plus the UMP WEEE package, with only minimal normalization (e.g., UMP charts are reshaped to a long format) and added fetch metadata. This preserves traceability while keeping source-specific structures intact.
Database: CirQuant-database/raw/CirQuant_2010-2024.duckdb
Table Naming Convention
Tables follow a consistent naming pattern:
- PRODCOM:
prodcom_ds_XXXXXX_YYYY - COMEXT:
comext_ds_XXXXXX_YYYY - Waste treatment/collection:
env_<dataset>_YYYY
Where:
XXXXXXis the dataset ID (with hyphens replaced by underscores)YYYYis the 4-digit year
PRODCOM Tables
Table: prodcom_ds_059358_YYYY
Annual PRODCOM data for sold production, imports, and exports.
| Column | Type | Description |
|---|---|---|
| update_data | VARCHAR | Data update timestamp |
| time | VARCHAR | Time period (year) |
| prccode | VARCHAR | PRODCOM code without dots (e.g., “27114000”) |
| indicators | VARCHAR | Indicator code (PRODVAL, PRODQNT, etc.) |
| freq_label | VARCHAR | Frequency label (“Annual”) |
| timestamp_data | VARCHAR | Data timestamp |
| value | VARCHAR | Numeric value or unit string (stored as string) |
| timestamp_global | VARCHAR | Global update timestamp |
| indicators_label | VARCHAR | Human-readable indicator name |
| update_structure | VARCHAR | Structure update timestamp |
| decl | VARCHAR | Declarant code (country) |
| freq | VARCHAR | Frequency code (“A” for annual) |
| prccode_label | VARCHAR | Product description |
| decl_label | VARCHAR | Country name |
| dataset_updated | VARCHAR | Dataset update date |
| time_label | VARCHAR | Year label |
| prodcom_code_original | VARCHAR | PRODCOM code with dots (e.g., “27.11.40.00”) |
Indicators in DS-059358:
PRODVAL: Production value in EURPRODQNT: Production quantityEXPVAL: Export value in EUREXPQNT: Export quantityIMPVAL: Import value in EURIMPQNT: Import quantityQNTUNIT: Unit of measurement (KG, L, M3, etc.)
Table: prodcom_ds_059359_YYYY
Total production indicators (where available).
Same structure as DS-059358 but typically contains only the total production fields and units. Frequently sparse because of confidentiality restrictions on EU individual country data.
COMEXT Tables
Table: comext_ds_059341_YYYY
International trade data at 6-digit HS code level.
| Column | Type | Description |
|---|---|---|
| freq | VARCHAR | Frequency code (“A” for annual) |
| reporter | VARCHAR | Reporter country code |
| partner | VARCHAR | Partner code (INT_EU27_2020, EXT_EU27_2020) |
| product | VARCHAR | HS 6-digit code without dots |
| flow | VARCHAR | Trade flow code (1=Import, 2=Export) |
| indicators | VARCHAR | Indicator code (VALUE_EUR, QUANTITY_KG) |
| time | VARCHAR | Year |
| value | VARCHAR | Numeric value (stored as string, may use scientific notation) |
| freq_label | VARCHAR | Frequency label |
| reporter_label | VARCHAR | Reporter country name |
| partner_label | VARCHAR | Partner description |
| product_label | VARCHAR | Product description |
| flow_label | VARCHAR | Flow description (“IMPORT” or “EXPORT”) |
| indicators_label | VARCHAR | Indicator description |
| time_label | VARCHAR | Year label |
| hs_code_query | VARCHAR | HS code used in the query |
| indicator_query | VARCHAR | Indicator requested |
| partner_type | VARCHAR | Partner type (INTRA_EU, EXTRA_EU) |
| partner_code | VARCHAR | Original partner code |
| flow_type | VARCHAR | Flow type name |
| flow_code | BIGINT | Flow code (1 or 2) |
| fetch_date | TIMESTAMP | When data was fetched (format: YYYY-MM-DD HH:MM:SS.SSS) |
Indicators:
VALUE_EUR: Trade value in EURQUANTITY_KG: Trade quantity in kilograms
Partners:
INT_EU27_2020: Intra-EU27 trade (from 2020)EXT_EU27_2020: Extra-EU27 trade (from 2020)
Waste Treatment / Collection Tables
Table: env_wastrt_YYYY
Waste treatment statistics by waste category and operation.
Key dimensions/columns (as returned by EurostatAPI):
freq,unit,hazard,wst_oper,waste,geo,time,value- Additional columns:
dataset,year,fetch_date,original_key,original_value
Table: env_waseleeos_YYYY
WEEE open-scope collection/sales data (post-2018 categories).
Key dimensions/columns:
freq,waste,wst_oper,unit,geo,time,value- Additional columns:
dataset,year,fetch_date,original_key,original_value
Table: env_waselee_YYYY
Legacy WEEE collection/sales data (pre-2018 categories).
Key dimensions/columns:
freq,waste,wst_oper,unit,geo,time,value- Additional columns:
dataset,year,fetch_date,original_key,original_value
Table: env_waspb_YYYY
Portable battery sales/collection.
Key dimensions/columns:
freq,wst_oper,waste,unit,geo,time,value- Additional columns:
dataset,year,fetch_date,original_key,original_value
Urban Mine Platform Tables
Table: ump_weee_history
Urban Mine Platform (UMP) WEEE charts CSV normalized to historical observations (OBS/baseline). The dataset stores country × year × category values with the UMP stock/flow ID as the metric.
Columns:
geo(VARCHAR): Geography/country label parsed from the worksheetyear(INTEGER): Year extracted from the wide column header or an explicityearcolumnproduct_label(VARCHAR): Category label (Layer 1 when available; fallback: Waste Stream)metric(VARCHAR): UMP stock/flow identifier (e.g.,WEEE_EEEPOM,WEEE_collected)value(DOUBLE): Parsed numeric value for the year/categoryunit(VARCHAR, nullable): Unit string when present in the worksheetproduct_key(VARCHAR, nullable): CirQuant product key when UMP WEEE categories map toconfig/products.tomlWEEE codes (e.g.,EE_TEE→WEEE_Cat1)source_sheet/source_file(VARCHAR): Source file identifiers for traceabilityfetch_date(VARCHAR): Timestamp when the CSV was ingested
Table: ump_weee_sankey
Urban Mine Platform (UMP) WEEE sankey CSV filtered to historical observations (OBS/baseline). This table preserves the flow structure for downstream MFA-style analysis.
Columns:
waste_stream(VARCHAR): Waste stream label (typicallyWEEE)location(VARCHAR): Geography labelyear(INTEGER): Year extracted from the CSVscenario(VARCHAR, nullable): Scenario label (OBS retained; BAU excluded by the loader)additional_specification(VARCHAR, nullable): Additional specification field from UMPstock_flow_id(VARCHAR, nullable): Stock/flow identifierlayer_1/layer_2/layer_3/layer_4(VARCHAR, nullable): Sankey decomposition layersvalue(DOUBLE): Parsed numeric value for the rowunit(VARCHAR, nullable): Unit string when presentsource_file(VARCHAR): Source CSV filenamefetch_date(VARCHAR): Timestamp when the CSV was ingested
Notes:
- By default, UMP tables are filtered to WEEE categories mapped from
config/products.toml(matching the scoped product list). - Battery data from UMP is not yet available; a stub loader exists to be activated when the dataset is published.
Data Types and Storage Decisions
- String Storage for Values: The
valuecolumn is VARCHAR to accommodate:- Numeric values (production quantities, trade values)
- Unit strings (KG, L, M3 from QNTUNIT indicator)
- Missing or suppressed data markers
-
Preserved Original Codes: Both cleaned (no dots) and original (with dots) versions of product codes are stored
- Metadata Columns: Additional columns track query parameters and fetch timestamps for debugging and traceability
Example Queries
Get all production values for heat pumps in 2020:
SELECT * FROM prodcom_ds_059358_2020
WHERE prodcom_code_original = '28.21.13.30'
AND indicators = 'PRODVAL';
Get EU imports of batteries from outside EU in 2022:
SELECT * FROM comext_ds_059341_2022
WHERE product = '850760'
AND partner_code = 'EXT_EU27_2020'
AND flow = '1';
Data Quality Notes
- Missing Data: NULL or empty values are common due to:
- Confidentiality suppression
- No production/trade in that year
- Data not yet reported
- Value Formats: Numeric values may include:
- Scientific notation (e.g., “2.1788009e8”, “6.69011062e8”)
- Decimal separators
- Special markers (“:c” for confidential)
- Country Coverage: Not all EU countries report all products