The primary method for creating new variables in Stata is the
generate command. Load the
clear sysuse auto describe
New Variable from Existing Variables
Let’s create a new variable that is the sum of
length (ignore for the moment that summing weights and lengths doesn’t make a ton of sense). The syntax of
generate nameOfNewVariable = whateverTheNewVariableIsEqualTo
So to create a new variable called
weightlength that is the sum of
length we type:
generate weightlength = weight+length
Now we have new variable called
Suppose now that we want to create a new variable that is the square of weight.
generate weight2 = weight^2
New Variable that is a Constant
Suppose we want to create a new variable that is a constant value (this isn’t necessarily a good idea and you can use macros to store constants but using a variable can be pretty convenient too). Let’s make a new variable
x that is equal to 100.
generate x = 100
Let’s create a new variable that is equal to the mean of weight — we’ll call it
generate meanweight = 3019.459
You can also use the results of the
summarize command to create a mean.
summarize weight generate meanweight = r(mean)
You can use the
_N operator to create a new variable that is equal to the number of observations in a dataset.
generate obs = _N
If you combine this with
by you can create a new variable that will be equal to the number of observations within the levels of the
by variable. For example, we can type:
by foreign: generate obs = _N
This will create a variable that is a constant within the levels of
foreign. That is, we are going to get the number of foreign cars and the number of domestic cars. If a line in the data is associated with foreign cars the new
obs variable will have a value of 22 and domestic cars will have a value of 52. Give it a try and see how it works.
New Variable that is a Random Draw from a Distribution
We can create a new variable that is a random draw from a distribution. Let’s create a new variable whose values will be random draws from a normal distribution with a mean of 0 and a standard deviation of 1. The random normal generator command is
rnormal() (it defaults to a mean of 0 and standard deviation of 1 and it will draw as many values as there are observations in the dataset).
generate random = rnormal()
Create a New Variable that Indexes the Observations
You can use the
_n operator to create a variable that indexes the observation number.
generate index = _n
This will create a new variable that runs from 1 to 74. You can combine this with
by to create an index within another variable.
by foreign: index = _n
This will create a new variable that runs from 1 to 52 for domestic cars and 1 to 22 for foreign cars.
I’ve just touched on the ways you can create new variables. You can also use the
egen command to create new variables. Try new ways to create variables and be sure to read the help files.