3 frequent problems with time-series data in Stata–and how to solve them

1. Spell beginnings and ends

When dealing with firm data, we typically do not observe all firms for all periods. Some of them might go out of business, some might not be tracked anymore due to falling below a threshold of size, and others might simply miss due to a lack of data.

This is why we are often interested in identifying the beginnings, ends, and lengths of the individual firm spells in the data. But this is not so straightforward. Here’s the trick on how to get the beginning of the spell, the end, and the length.

Assume we have firms (firm_id) observed over several years (year):


gen firstyear=. bysort firm_id: replace firstyear = year if _n==1

gen lastyear = 1 if firm_id!=firm_id[_n+1]

bys firm_id: egen spell_length = count(year)

2. XTSET does not work due to repeated observations of the time variable

Assume we have firms (firm_id) observed over several years (year). We do xtset firm_id year and Stata prompts an error message, indicating that we cannot xtset due to repeated observations in the time variable. What do we do?

Well, what we should do is, first of all, inspect the firm-year pairs for duplicates by:

 duplicates report firm_id year 

This prints us with a count of the duplicate firm-year pairs. We might find out that some prior merging of the data went wrong. Thus, we may want to go back to our original merging and check what led to a huge number of duplicates.

If we are sure that the duplicates are unnecessary, we can drop them right away:

 duplicates drop firm_id year, force 

Or, we may find out that only a few observations are affected. Then we might inspect these in more detail

 
bys firm_id year:  gen dup = cond(_N==1,0,_n)

3. Missing observations downward or upward

Sometimes we have missing data in our time series that we want to fill downward from the top observation. Let’s assume we have firms with a distinct firm_id and the variable location is only given for some of them:

 

bys firm_id: replace location=location[_n-1] if location==. & location[_n-1]!=. 

..or upward:

 bys firm_id: replace location=location[_n-1] if location==. & location[_n-1]!=. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s