In this tutorial I will show you how to use stata to produce graphs especially the cross sectional bar graphs. If you want to use the same data base that I will be using for this tutorial you can o btain that from this website http://faculty.arts.ubc.ca If you click on anyone of these databases, you should be able to open it on stata. Here are the variables. This is demographic health survey database. This is incomplete data for India. Incomplete because it only has information on a few of the characteristics and only for two states. First let me talk about the data here. Each row in this case represents one woman who was a respondent in the survey so this one row represents this woman with a case identification number as given here. Different columns represent different variables or different characteristics for this woman. Now this woman belongs to the age group 25 to 29, she lives in Gujarat in a rural area country side. She has secondary education and so on. So each row here is representing one woman. Let’s look at some of the health statistics that we have for a woman here. We have information about the respondent’s weight, height and body mass index so let’s use the body mass index for respondent. Generate a variable called BMI which is equal to v445. You can label this variable, label var BMI
“Body mass index, ok. So if we go down, we have created this variable BMI, body mass index for the respondent. Now we want to see how this variable or how the health status of this woman differ across different states that we haven’t mentioned on here ok. tab v024 this is for the different states. So we have information on Gujarat and Kerala. We can use a bar graph or a bar chart to see how BMI varies across different states v024 So these are the categories so we divide off the mean of BMI across these two categories represented by or measured in this variable v024. We can give it a title. We can give the title a graph. We can give the graph a title, ‘health status’ submit and then this is what we have. The mean BMI is higher in Kerala than it is in Gujarat ok. There are various things we can do in a bar chart. We have categories v024, we can have it, we can divide this category into sub categories by putting another variable in group 2. We can decide that we want it for certain characteristics and so on. Let me explain that further. Let’s look at the bulk variable. We have tab v190 is your wealth index categorical v12 and we can generate wealth=v190 tabulate wealth or just the [?] that poorest was represented by number 1. 2 was defined as poorer so this was the wealth index. What’s happened here is that we lost the label values that were attached in case of v190 Easy to show that v190 and wealth. Let’s look at these two variables side by side. So richer was actually number four and it was number 4 in wealth but they’ve lost that label attached to number 4. We can have wealth variable. We can attach the same value labels to wealth as v190 has by the following command. Label values to wealth variables same as v190 (label values wealth v190) Now if we edit the two you realize that we have the same label variables, ok. Now let’s look at the bar chart again. We want the categories to be wealth, we want to see how the BMI varies across wealth. It shouldn’t be surprising for a richer woman having a higher BMI. Submit and this is what we have. Poorest, poor, middle, richer, richest. BMI status goes up with wealth status. Suppose you want me to categorize this further
by states. You want to see if we have the same pattern across
two states, ok. Submit and what we observe is that the pattern more or less looks the same across the two states. What if you just want to look at pattern of wealth only for one state? Now (codebook v024) v024 Gujarat is represented by number 24. Suppose we only wanted to see the bar graph for Gujarat. If v024 is equal to 24(v024==24) Now it will calculate the mean of BMI only for the state which Gujarat. So that’s how we can use the if command. The other interesting question can be how does it vary across different religions and religion is variable v130 We can create, generate variable called religion as
equal to v130 We want it to have the same values. Religion should have the same value as v130 Let’s go back to this graph. We are looking at mean
for BMI. We want the categories to be, the different grouping variables to be religion ok Let’s remove the if command. We want it for all the variable all the data values we have, submit and this what we see across religions. So the BMI health status is the best for women belonging to this religious group. Suppose we wanted to see how it varies across religion by state? What we see is that there are certain religions that are absent in Kerala and there doesn’t seem to be much variations across religion in Kerala It does seem to exist in Gujarat especially these last three. As you can see, you can’t really distinguish between the various religions on the x-axis. One way to fix that would be to go into Properties and choose how you want the labels to be. That 45 degrees, 90 degrees. I’ve found that 45 degrees seems to have a nice way of centering. This is what we see. Hindu, Muslim, Christians. We don’t see that difference in Kerala. Let’s look at son preferences. Ideal number of boys, ideal number of girls. Let’s generate label called son preference is equal to people who prefer boys over v628 girls (gen sonpref=v627=v628) so this is our son preference. We can label this variable (label var sonpref “Preference reference for sons”) It is our variable preference for sons. Now let’s create a bar chart so BMI if we want to look at son preferences In this case it’s religion is the only variable, we haven’t changed it yet. Let’s look at some preferences across religion and this is what we see. Higher in Sikhs and Hindus and Muslims than in Christians. We might also want to see this across
Education groups. Look for education v149 for the mother. Let’s go back to this graph bar and say we want it for v149 and what we find is again we need to change these properties and have an angle of 45 degrees, ok. So this is what we have. No education, incomplete primary, high mean of son preferences. So as education increases, preference for son goes down. Now we might want this across different religions or we might just want this for a specific religion. Let’s do this for different religion is going to be very tough looking graph but what we can do is we can do this for some religions. We know that there are three religions that have a higher frequency and we can ignore the ones that do not have a higher frequency. code book religion one, two, three. So we want religion to be above to be less than four to capture all these three different religion categories. So if religion is less than four(religion