If you're a student, researcher, data analyst, or curious explorer working on European industrial data, you’ve probably heard about the France EAE (Enquête Annuelle d’Entreprise) Industrie Survey Dataset. This powerful dataset offers detailed insights into France's industrial sectors, collected through an annual enterprise survey.
But here’s the catch: while the data is rich and incredibly useful, finding and downloading it can feel like navigating a maze—especially if French isn't your first language.
In this comprehensive guide, I’ll walk you through exactly how to find, access, and download the France EAE Industrie dataset—in simple terms, step-by-step, and without any jargon. Whether you’re working on academic research or a business project, this post will save you hours of frustration.
🧠 What Is the EAE Industrie Survey Dataset?
Before diving into the how-to part, let’s understand what the dataset is about.
EAE (Enquête Annuelle d’Entreprise) stands for Annual Enterprise Survey conducted by INSEE (the French National Institute of Statistics and Economic Studies). It’s similar to an economic census that captures data about:
This dataset covers industries in France, both small and large, and is vital for understanding industrial structure, productivity, and growth trends over time.
📍 Where to Find the France EAE Industrie Dataset?
There are two main places where this dataset is typically hosted:
1. INSEE Official Website
https://www.insee.fr
This is the primary source. INSEE provides raw data, reports, and methodological notes.
2. CEPR or Data Archives
Websites like CEPR (Centre for Economic Policy Research) or ICPSR sometimes host versions of the dataset, especially for researchers affiliated with academic institutions.
But for the most accurate and direct data access, INSEE is your best bet.
🌍 Language Barrier? No Problem.
INSEE is a French site, and while some content is available in English, most dataset descriptions and download pages are in French. But don’t worry! We’ll include translations for key terms so you can navigate the site like a pro.
🧭 Step-by-Step Guide: How to Download the Dataset
Let’s get into the meat of this blog post.
Step 1: Go to INSEE’s Main Website
Visit: https://www.insee.fr/fr/accueil
Click on the top-right language dropdown and select “English” if you prefer.
However, the dataset pages are typically only in French, so you’ll eventually have to work with the French version.
Step 2: Use the Search Bar
In the search bar (top-right), type:
"Enquête Annuelle d'Entreprise Industrie" or simply EAE Industrie.
You’ll see a list of pages related to surveys and datasets. Look for something like:
🟢 "Enquête annuelle d'entreprise - Industrie (EAE)"
Click on it.
Step 3: Understand the Dataset Page
Once you’re on the dataset page, you’ll find several tabs:
👉 Your main focus should be:
“Accès aux données” = Data Access
“Données détaillées” = Detailed Data
Step 4: Choose a Year or Time Period
The EAE data is collected annually, and you may see options to download by year (e.g., 1995–2008 or 2010 onwards).
Click the year or time period you need. Some datasets are grouped by decade.
Step 5: Select the Format and Level of Detail
You’ll now see a list of downloadable files. Pay attention to:
👉 Choose CSV format, and if you want raw data, go for “Établissements.”
Step 6: Accept Terms and Download
You may be asked to accept usage terms or fill out a quick form.
No worries—it’s usually simple and free. Just enter:
✅ After that, the download link will appear.
Step 7: Translate and Open the Dataset
Once downloaded, open the file in Excel or your preferred software.
Column headers will be in French, so here’s a cheat sheet:
| French Column | English Meaning |
|---|
| Code APE | Industry Code |
| CA | Revenue (Chiffre d’affaires) |
| Salariés | Employees |
| Investissements | Investments |
| Production | Production Volume |
| Energie | Energy Consumption |
You can also use Google Translate or DeepL to translate headers in bulk.
🔎 Example Use Cases for the Dataset
Here’s how real-world analysts use the EAE dataset:
-
Productivity Analysis
Compare output per worker across industries.
-
Energy Efficiency Studies
Track energy consumption trends over the years.
-
Regional Industrial Distribution
Analyze which regions produce which products.
-
Historical Growth Patterns
See how French industry has evolved post-2000s.
-
Comparative European Research
Combine this data with Eurostat to analyze EU industry as a whole.
🧰 Tools You Can Use to Work With the Data
-
Excel / Google Sheets – For quick overviews.
-
R / Python (pandas) – For deep analysis or visualizations.
-
Tableau / Power BI – If you want dashboards.
-
QGIS – To map regional industrial data (if location included).
⚠️ Common Issues and How to Solve Them
| Issue | Fix |
|---|
| File won’t open properly | Use Excel’s “Import from Text/CSV” feature and set delimiter as semicolon (;) |
| French headers confusing | Use online translation tools |
| Data looks too complex | Start with smaller years or summary files |
| Missing data for recent years | Check INSEE's newer survey structure or Eurostat backups |
🙋 FAQs
❓ Is the dataset free?
Yes. INSEE provides public access to this dataset for free.
❓ Do I need permission to use it?
Not for academic or personal use. Just credit INSEE if you publish your analysis.
❓ Can I get data for other sectors?
Yes! INSEE also publishes EAE datasets for services, trade, and construction.
❓ Can I get it in English?
Unfortunately, the dataset itself is in French. But you can translate it with tools like DeepL or ask for support from academic forums.
📚 Bonus: Useful Links
✍️ Final Thoughts
Downloading the France EAE Industrie Survey dataset might seem tricky at first—especially with the language barrier and government data portals—but once you know the path, it becomes simple and empowering.
Whether you're analyzing France’s post-pandemic industrial recovery, comparing energy consumption across sectors, or just exploring Europe’s economic trends, this dataset is gold.
Now that you know how to find and download it, go make that research project shine!
1. What exactly is the “EAE-Industrie” and why would you want it?
The EAE-Industrie is the historical backbone of French industrial micro-statistics. Between 1984 and 2007 it surveyed all manufacturing firms with ≥ 20 employees and ≥ €5 million turnover, capturing a 360-degree view of their balance sheets, payroll, investment, subcontracting, regional footprint and much more. Think of it as the corporate “census” that preceded today’s ESA (Enquête Sectorielle Annuelle). The survey stopped after 2007, but those 24 vintages remain gold for anyone studying long-run productivity, regional de-industrialisation, firm finance, supply-chain localisation, or labour-market dynamics. INSEE still maintains the questionnaires and documentation, while CASD hosts the anonymised micro-files — three per year:
| File | Level | Typical row-count (2007) |
|---|
| ENT | Enterprise | ~8 500 |
| ETAB | Establishment | ~19 000 |
| BRANC | “Branch” within enterprise | ~27 000 |
Each row comes with ~300 variables: identifiers (SIREN, SIRET), NAF industry codes, turnover, wage bill, R&D, tangible and intangible investment, exports, region, etc. The micro-data are confidential, so you cannot simply click-and-download from the open web. Instead you must go through France’s secure access pipeline, CASD (Centre d’Accès Sécurisé aux Données). insee.frcasd.eu
2. Two download routes, one decision
| Route | What you get | Who is it for? |
|---|
| A. Aggregated tables / time-series | Already anonymised, no firm identifiers. Usually XLS/CSV or SDMX via INSEE’s BDM API. | Journalists, lecturers, quick trend plots. |
| B. Micro-data via CASD | Full anonymised firm-level files (ENT, ETAB, BRANC) plus metadata. | Academic researchers, consultants building micro-economic models, PhD students. |
Most people asking “how do I download the EAE-Industrie dataset?” mean
Route B. Still, we’ll cover the lightweight Route A first because you can do it in ten minutes and it’s a nice sandbox before you request secure access.
3. The five-minute download (Route A)
-
Browse to INSEE → Définitions, Méthodes & Qualité → Sources → EAE Industrie.
At the bottom of the source sheet you’ll see links to “Nos sites partenaires” and “Téléchargement de fichiers” — these open zipped Excel sheets with ready-made tables (employment, turnover, investment) by NAF division and year. insee.fr
-
Use the BDM SDMX API (optional).
INSEE exposes most of its macro tables through https://bdm.insee.fr/series/sdmx. With the pandasdmx Python library you can pull a dataflow in three lines:
If the specific EAE code is not found, try the successor ESA codes or simply fall back to the zipped Excel above. pandasdmx.readthedocs.ioinsee.fr
That’s it: you’ve technically “downloaded” EAE-based statistics.
4. The real deal — getting micro-data via CASD (Route B)
4.1 Understand the legal framework
French statistical law treats any record that could indirectly reveal a firm’s identity as confidential. Therefore INSEE only releases EAE micro-data inside the CASD’s encrypted environment. Researchers sign a contract, justify their project before the Statistical Confidentiality Committee, and work on a virtual machine (“SD-Box” or on-site terminal). Outputs leave the box only after disclosure control. Don’t panic — thousands of projects have gone through the pipeline. Your real hurdle is paperwork.
4.2 Step-by-step checklist
| # | What you do | Tips & gotchas |
|---|
| 1. Draft a research protocol | 3–5 pages: research question, variables needed, years requested, expected outputs. Attach CVs of all team members. | |
| 2. Create a project on the CASD portal | https://www.casd.eu → “Submit new project”. Pick “EAE Industrie” from the data catalogue. You’ll see the 24 yearly products with DOIs (1984–2007). casd.eu | |
| 3. Institutional sign-off | Your university or company signs the CASD licence. CASD charges a fee (rough guide: €3 000 per project, students get hefty discounts). | |
| 4. Confidentiality Committee approval | Meetings happen monthly. Provide clarity on anonymisation, merging with other data (e.g., DADS payroll) and intended publication. Typical turnaround: 6–8 weeks. | |
| 5. Personal accreditation and biometrics | Each user registers fingerprints + face ID at CASD (Paris 7th) or via visiting session. This unlocks the SD-Box. | |
| 6. Receive & install SD-Box | A small encrypted mini-PC. Plug into Ethernet, power, external monitor. It boots a virtual desktop with R, Python, Stata, SAS. | |
| 7. Download datasets inside the box | In the CASD launcher, go to “My data → EAE Industrie 2007 → Download ENT/ETAB/BRANC”. Files arrive as .csv.gz plus a README.yaml describing variable labels, formats and links to questionnaire PDFs. | |
| 8. Start coding | Most users immediately: import pandas as pd then df_ent = pd.read_csv('ENT_2007.csv.gz', compression='gzip', dtype=str). | |
| 9. Export results | Click “Request output export”. CASD staff run disclosure checks; simple regression outputs come back in a few hours, large tables in a day. | |
| 10. Keep an audit trail | CASD may ask for your code if something looks disclosive. Version-control everything (Git is installed). | |
4.3 Typical timeline
| Week | Milestone |
|---|
| 0 | Submit project |
| 4 | Licence signed |
| 8 | Committee approval |
| 10 | Biometrics & SD-Box shipped |
| 11 | First log-in |
| 12 | All vintages downloaded, coding begins |
4.4 Money & hardware FAQ
-
Do I have to travel to Paris for biometrics?
Usually yes for first-time users; CASD sometimes organises regional sessions.
-
Can I copy the CSVs off the SD-Box onto my laptop?
No; the box uses full-disk encryption and blocks USB mass-storage devices.
-
Can I share code with co-authors abroad?
Yes, via GitHub or GitLab, but never push raw data.
5. Inside the files – reading, merging, cleaning
Below is a minimalist Python snippet (to be run inside CASD) that stacks all 24 annual ENT files, converts monetary values to 2023 euros using INSEE’s industrial deflator, then merges the ETAB file for 2007 to get establishment latitude/longitude for geospatial mapping:
-
Identifiers
-
Industry codes
Early years use NAP-70; post-1993 use NAF Rev. 1. Use INSEE’s crosswalk tables to harmonise (look for CL_NAF_NAP.csv in the metadata bundle).
-
Weights
Variable POND lets you extrapolate to the universe of firms > 20 employees. Multiply before you aggregate.
6. Common pitfalls and how to dodge them
| Pitfall | How to avoid |
|---|
| “File not found” after download | The CASD launcher unzips to a project-specific folder, not your home. Use ~/DATA/PROJECT_123/ . |
| Encoding errors | All files are UTF-8 since the 2021 re-processing. Specify encoding='utf-8' if using R’s readr. |
| Inconsistent variable names across years | Load the YAML dictionary provided for each year; or build a mapping dict in Python. |
| NAF code changes | Use INSEE official concordance tables (CL_NAFA88_NAF2.csv). |
| Export rejected by CASD | Aggregate further (e.g., min 3 enterprises per cell) or random-round monetary values to nearest €10 000. |
7. Linking EAE to modern datasets
Even though EAE stops in 2007, you can build a 40-year firm panel by:
-
Stitching EAE (1984-2007) to ESA (2008-2024) – same identifiers, slightly different questionnaire.
-
Merging with DADS payroll micro-data – gives you worker-level info.
-
Appending BRN (tax balance sheets) – extends back to 1978 for large firms.
-
Pulling trade data (DEB/DES) – to split domestic vs. export sales.
CASD hosts all of these, so you can request them in the same project once the committee is satisfied you need them.
8. If you get stuck
-
CASD Helpdesk — help@casd.eu, response within 48 h (business days).
-
INSEE metadata fiches — every variable documented in French + English PDF. insee.fr
-
User community — the #casd channel on the French Economics PhD Slack often shares code snippets.
-
Training — CASD offers a free half-day webinar “Débuter sur SD-Box” every month.
9. Quick recap
-
Need only sector-level figures? Grab the ready-made Excel tables or query BDM SDMX in five minutes.
-
Need firm-level micro-data? Write a project, apply via CASD, clear the confidentiality committee, download ENT/ETAB/BRANC inside the secure SD-Box.
-
Plan for ~3 months lead time if you’re starting from scratch.
-
Budget ~€3 000 unless you qualify for student rates.
-
Use Python/R/Stata as usual once inside; just remember disclosure control on exit.
Follow these steps and you’ll go from zero to “EAE-Industrie power-user” without nasty surprises — and with a dataset that lets you watch French manufacturing transform from the Minitel era right up to the eve of the iPhone. Bon téléchargement !
Visit These sites for Other blogs and information:
Comments
Post a Comment