# Stata: Using generate to create new variables

The primary method for creating new variables in Stata is the `generate` command. Load the `auto` dataset.

``````clear
sysuse auto
describe
``````

## New Variable from Existing Variables

Let’s create a new variable that is the sum of `weight` and `length` (ignore for the moment that summing weights and lengths doesn’t make a ton of sense). The syntax of `generate` is:

``````generate nameOfNewVariable = whateverTheNewVariableIsEqualTo
``````

So to create a new variable called `weightlength` that is the sum of `weight` and `length` we type:

``````generate weightlength = weight+length
``````

Now we have new variable called `weightlength`.

Suppose now that we want to create a new variable that is the square of weight.

``````generate weight2 = weight^2
``````

## New Variable that is a Constant

Suppose we want to create a new variable that is a constant value (this isn’t necessarily a good idea and you can use macros to store constants but using a variable can be pretty convenient too). Let’s make a new variable `x` that is equal to 100.

``````generate x = 100
``````

Let’s create a new variable that is equal to the mean of weight — we’ll call it `meanweight`.

``````summarize weight
``````
``````generate meanweight = 3019.459
``````

You can also use the results of the `summarize` command to create a mean.

``````summarize weight
generate meanweight = r(mean)
``````

You can use the `_N` operator to create a new variable that is equal to the number of observations in a dataset.

``````generate obs = _N
``````

If you combine this with `by` you can create a new variable that will be equal to the number of observations within the levels of the `by` variable. For example, we can type:

``````by foreign: generate obs = _N
``````

This will create a variable that is a constant within the levels of `foreign`. That is, we are going to get the number of foreign cars and the number of domestic cars. If a line in the data is associated with foreign cars the new `obs` variable will have a value of 22 and domestic cars will have a value of 52. Give it a try and see how it works.

## New Variable that is a Random Draw from a Distribution

We can create a new variable that is a random draw from a distribution. Let’s create a new variable whose values will be random draws from a normal distribution with a mean of 0 and a standard deviation of 1. The random normal generator command is `rnormal()` (it defaults to a mean of 0 and standard deviation of 1 and it will draw as many values as there are observations in the dataset).

``````generate random = rnormal()
``````

## Create a New Variable that Indexes the Observations

You can use the `_n` operator to create a variable that indexes the observation number.

``````generate index = _n
``````

This will create a new variable that runs from 1 to 74. You can combine this with `by` to create an index within another variable.

``````by foreign: index = _n
``````

This will create a new variable that runs from 1 to 52 for domestic cars and 1 to 22 for foreign cars.

## Conclusion

I’ve just touched on the ways you can create new variables. You can also use the `egen` command to create new variables. Try new ways to create variables and be sure to read the help files.