"Testing for differences in survey-based density expectations: A compositional data approach" by Jonas Dovern, Alexander Glas and Geoff Kenny. Journal of Applied Econometrics, 2024.
This Readme-file provides a short description of the data used to derive all results from the main article and the Online Appendix. The statistical programming languages we used are R and Stata.
For the Hotelling test, we use the function hotel2T2 from the Compositional package for R written by Michail Tsagris, Giorgos Athineou, Abdulaziz Alenaz and Christos Adam.
We use data from the Euopean Central Bank' Survey of Professional Forecasters (SPF) and the Federal Reserve Bank of New York's Survey of Consumer Expectations (SCE).
The SPF data is freely available from https://www.ecb.europa.eu/stats/ecb_surveys/survey_of_professional_forecasters/html/all_data.en.html. The raw data is processed as follows:
- close all exterior bins by assuming they have the same width as the interior intervals
- extend lower and upper bounds of interior bins by 0.05 percentage points to close gaps between bins
- exclude observations if respondent did not report a histogram
- exclude observations if the sum of the probabilities in the histogram deviates by at least 0.9% from 100% (we permit small deviations from 100% for the rounder-application)
- replace unused bins with zeroes
- exclude single-bin histograms
- divide all probabilities by 100 (must lie between 0 and 1 for Hotelling test)
The SCE data is freely available from https://www.newyorkfed.org/microeconomics/databank.html. The raw data is processed as follows:
- close all exterior bins by assuming they have the same width as the adjacent interior intervals
- exclude observations if respondent did not report a histogram
- exclude observations if the sum of the probabilities in the histogram deviates from 100%
- replace unused bins with zeroes
- exclude single-bin histograms
- divide all probabilities by 100 (must lie between 0 and 1 for Hotelling test)
- duplicate socioeconomic information for all waves (sometimes only asked upon first participation of respondent)
The zip-file "Testing for differences in survey-based density expectations A compositional data approach (replication data)" contains all datasets (csv files in long format) needed to replicate the results of our simulations and applications.
Description of individual data files:
1) The file "data_simulation" (4 columns, 553 rows incl. column names) contains the one-year-ahead inflation density forecasts from the 2020Q1 wave of the SPF. This data is used to calibrate the baseline histogram in the Monte Carlo Simulations in Section 3.
fct_period: Survey wave
fct_id: Respondent ID
bin_id: Bin ID
bin_pr: Bin probability
2) The file "data_application_I" (5 columns, 5449 rows incl. column names) contains the one- and five-year-ahead inflation density forecasts from the 2021Q1 to 2022Q2 waves of the SPF. This data is used for the application in Section 4.1.
fct_period: Survey wave
fct_id: Respondent ID
fct_hor: Forecast horizon (1 for one-year-ahead, 5 for five-year-ahead)
bin_id: Bin ID
bin_pr: Bin probability
3) The file "data_application_II" (6 columns, 90088 rows incl. column names) contains the one-year-ahead inflation and GDP growth density forecasts from the 1999Q1 to 2022Q2 waves of the SPF. This data is used for the application in Section 4.2.
fct_var: Outcome variable (inf = inflation, gdp = GDP growth)
fct_period: Survey wave
fct_id: Respondent ID
rounder: Rounder classification (1 for rounders, 0 for non-rounders)
bin_id: Bin ID
bin_pr: Bin probability
4) The file "data_application_III" (5 columns, 1048576 rows incl. column names) contains the one-year-ahead inflation density forecasts from the June 2013 to December 2021 waves of the SCE. This data is used for the application in Section 4.3.
fct_period: Survey wave
fct_id: Respondent ID
female: Gender (1 for women, 0 for men)
bin_id: Bin ID
bin_pr: Bin probability
5) The file "data_application_IV_inflation" (6 columns, 1048576 rows incl. column names) contains the one-year-ahead inflation density forecasts from the June 2013 to December 2021 waves of the SCE. This data is used for the application in Section 4.4.
fct_period: Survey wave
fct_id: Respondent ID
tenure: Survey tenure
firsttimer: Dummy for survey tenure (1 for first-time respondents, 0 for others)
bin_id: Bin ID
bin_pr: Bin probability
6) The file "data_application_IV_income" (6 columns, 653911 rows incl. column names) contains the one-year-ahead density forecasts for personal income growth from the June 2013 to December 2021 waves of the SCE. This data is used for the application in Section 4.4.
fct_period: Survey wave
fct_id: Respondent ID
tenure: Survey tenure
firsttimer: Dummy for survey tenure (1 for first-time respondents, 0 for others)
bin_id: Bin ID
bin_pr: Bin probability
7) The file "data_application_V" (5 columns, 973401 rows incl. column names) contains the one-year-ahead density forecasts for average house prices nationwide from the June 2013 to December 2021 waves of the SCE. This data is used for the application in Section 4.5.
fct_period: Survey wave
fct_id: Respondent ID
state: US state where respondent is living
bin_id: Bin ID
bin_pr: Bin probability
If you have any questions, please do not hesitate to contact us via alexander.glas@zew.de.