# How to do a placebo simulation in difference-in-differences designs (part 1)

Marianne Bertrand’s 2004 article “How much should we trust differences-in-differences estimates?” (appeared in QJE) outlines several tests that can be done to assess the robustness of difference-in-differences estimates given concerns of false positives.

One recommendation is to run a placebo simulation in which–in a first step–the treatment indicator is randomly assigned to observations in the data set and–in a second step–the regressions are run again with the goal to compare the main estimates with those from the placebo regression.

I have written a little Stata script that runs such a placebo simulation and compiles an Excel spreadsheet which gives the placebo coefficient estimates along with the confidence interval bounds.

Here’s that script. It assumes a panel dataset in use which observations take the form of unit-years (e.g., firm-years). The only thing necessary to adjust for your purposes is to set the parameters at the top.

```global project_folder = `"C:\Users\path to project"'
global depvar = "dependent variable"
global treatment = "treatment binary"
global post = "time binary which is 1 for observations after the treatment"
global idvar = "unit identifier variable (e.g., id)"
global timevar = "time identifier variable (e.g., years)"
global controls = "list of control variables (e.g., age)"
global seed = "110" //sets the memory for reproducible random variable generations
global treatment_groupsize = "number of observations in the treatment group (e.g., 100)"
global numruns = "#runs of the simulation (e.g., 60)"

putexcel set \$project_folder, replace
putexcel A1=("DV Coefficient")
putexcel B1=("DV Lower CI")
putexcel C1=("DV Upper CI")
local cellcounter = 3
set seed \$seed

*estimate "true" regression
xtset \$idvar \$timevar
xtreg \$depvar i.\$treatment##i.\$post \$controls \$timevar, fe robust
putexcel A2=(_b[1.\$treatment#1.\$post])
putexcel B2=(_b[1.\$treatment#1.\$post] - invttail(e(df_r),0.025)*_se[1.\$treatment#1.\$post])
putexcel C2=(_b[1.\$treatment#1.\$post] + invttail(e(df_r),0.025)*_se[1.\$treatment#1.\$post])

forvalues i=1/\$numruns {
randomtag if \$timevar == awardm-4, count(\$treatment_groupsize) gen(r) //ssc
bys \$idvar: egen placebo = max(r)
drop r
tab placebo

capture xtreg \$depvar i.placebo##i.\$post \$controls \$timevar, fe robust
putexcel A`cellcounter'=(_b[1.placebo#1.\$post])
putexcel B`cellcounter'=(_b[1.placebo#1.\$post] - invttail(e(df_r),0.025)*_se[1.placebo#1.\$post])
putexcel C`cellcounter'=(_b[1.placebo#1.\$post] + invttail(e(df_r),0.025)*_se[1.placebo#1.\$post])

if _rc!=0 {
display "Error on run "`i'
}
else {
estimates store result`i'
}
drop placebo
local cellcounter=`cellcounter'+1
}
```

In one of the next blog posts, I will show how to use this generated spreadsheet for plots of the placebo confidence intervals or simple tabulation summaries for your papers.