Marianne Bertrand’s 2004 article “How much should we trust differences-in-differences estimates?” (appeared in QJE) outlines several tests that can be done to assess the robustness of difference-in-differences estimates given concerns of false positives.
One recommendation is to run a placebo simulation in which–in a first step–the treatment indicator is randomly assigned to observations in the data set and–in a second step–the regressions are run again with the goal to compare the main estimates with those from the placebo regression.
I have written a little Stata script that runs such a placebo simulation and compiles an Excel spreadsheet which gives the placebo coefficient estimates along with the confidence interval bounds.
Here’s that script. It assumes a panel dataset in use which observations take the form of unit-years (e.g., firm-years). The only thing necessary to adjust for your purposes is to set the parameters at the top.
global project_folder = `"C:\Users\path to project"' global depvar = "dependent variable" global treatment = "treatment binary" global post = "time binary which is 1 for observations after the treatment" global idvar = "unit identifier variable (e.g., id)" global timevar = "time identifier variable (e.g., years)" global controls = "list of control variables (e.g., age)" global seed = "110" //sets the memory for reproducible random variable generations global treatment_groupsize = "number of observations in the treatment group (e.g., 100)" global numruns = "#runs of the simulation (e.g., 60)" **set excel headers putexcel set $project_folder, replace putexcel A1=("DV Coefficient") putexcel B1=("DV Lower CI") putexcel C1=("DV Upper CI") local cellcounter = 3 set seed $seed *estimate "true" regression xtset $idvar $timevar xtreg $depvar i.$treatment##i.$post $controls $timevar, fe robust putexcel A2=(_b[1.$treatment#1.$post]) putexcel B2=(_b[1.$treatment#1.$post] - invttail(e(df_r),0.025)*_se[1.$treatment#1.$post]) putexcel C2=(_b[1.$treatment#1.$post] + invttail(e(df_r),0.025)*_se[1.$treatment#1.$post]) forvalues i=1/$numruns { randomtag if $timevar == awardm-4, count($treatment_groupsize) gen(r) //ssc bys $idvar: egen placebo = max(r) drop r tab placebo capture xtreg $depvar i.placebo##i.$post $controls $timevar, fe robust putexcel A`cellcounter'=(_b[1.placebo#1.$post]) putexcel B`cellcounter'=(_b[1.placebo#1.$post] - invttail(e(df_r),0.025)*_se[1.placebo#1.$post]) putexcel C`cellcounter'=(_b[1.placebo#1.$post] + invttail(e(df_r),0.025)*_se[1.placebo#1.$post]) if _rc!=0 { display "Error on run "`i' } else { estimates store result`i' } drop placebo local cellcounter=`cellcounter'+1 }
In one of the next blog posts, I will show how to use this generated spreadsheet for plots of the placebo confidence intervals or simple tabulation summaries for your papers.