How To / STATA: Draw a Random Sample from Panel Data

Assume we have a data set containing firm data across years. The variable id uniquely identifies a firm. The variable performance is some kind of financial performance of the firm and the variable year indicates when that performance happened. Thus,  we have a small panel where firm-year is the unit of analysis.

If you want to draw a random sample from a data set like that, you shouldn’t directly use the command –sample-. If you use it, then you will lose the panel structure of the data (or at very least you are very likely to lose it!). What you should do instead is to randomly select firm ids and then keep all the observations (all years) for each of the randomly selected firm ids. Below you can see an example of a STATA code to perform this operation. Remember we have three variables: id, year, performance.

use "yourdataset.dta", replace

tempfile paneldata
save `paneldata'

collapse (mean) performance, by(id)
keep id
sample 50

tempfile randomsampleid
save `randomsampleid'

use `paneldata'

merge m:1 id using `randomsampleid'

drop if _merge == 1
drop _merge

After opening the data set, we save a temporary file called paneldata (lines 3-4). Then we get rid of the repeated ids using –collapse– and then we drop all the variables and we keep only id (lines 6-7). In line 8 we use the command –sample– so STATA randomly select, ins this case, a 50% of the total number of unique ids (-help sample– to see other options, such as defining the number of observations you want to draw from the original set). In lines 10-11 we save this subset of ids in a temporary file called randomsampleid.

Finally, we return to the panel data (line 13) and then we merge it using the randomsampleid. It is a m:1 merge because in the panel data the id variable does not uniquely identify each observation but it does that in the using data. Those observations that are successfully merged are the ones that STATA randomly chose for you, so we get rid of the rest in line 17.

4 thoughts on “How To / STATA: Draw a Random Sample from Panel Data

  1. emilbebr

    Thank you, that was just was I need! Clear and precise. I needed to sample from a paneldata for the first time ever this morning, and i thought “oh god, this is going to be drag to figure out”. But no, one google search, got to your blog post, and that was it. So thank you, you made my day.

    one thing that didn’t work was the tempfile and `randomsampleid’/`paneldata’ thing, got the error “invalid file specification”, but I just saved them like regular files and then everything worked just fine. thanks again.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s