creationslasas.blogg.se - Stata collapse

Stata collapse how to#
Stata collapse code#

Like many Stata users, I prefer the old version of the merge command, but the newer one will also work. So I recommend lines such as the following: Since your cluster-level file does not contain v002 and v003, they are irrelevant for the merge. Analytic weights observations as if each observation is a mean computed from a sample of size n, where n is the weight variable. The other weighting options are a bit more complicated. However, when I do this I sort the IR file on v001 v002 v003, even though it's not really required. Basically, by adding a frequency weight, you are telling Stata that a single line represents observations for multiple people. To merge the collapsed data back onto the individual records in the IR files, you only need to sort both files on v001. They are numeric variables but caseid is a string with embedded blanks. This may be useful for performing destructive operations such as collapse. The collapsed file will have one record per cluster.Ĭaseid is a combination of v001, v002, and v003. You should replace "by(v001 caseid)" with "by(v001)". collapse, encode, insheet variable formats (strings, ints, floats. If you collapse by v001, you cannot include caseid in the "by" part of the collapse command. Stata Programming: Teaches Stata programming in a systematic way to students who. Please what are the steps I should take to get this to work properly?įollowing is a response from DHS Senior Stata Specialist, Tom Pullum:

Note that we won’t necessarily see a benefit for small(ish) datasets like the one. The community-contributed gtools suite can help a lot with speedups and, fortunately, has a faster version of collapse, called gcollapse. Yet there are also many problems, especially with irregular sets of observations for varying times, that do not yield easily to this approach.

Stata collapse code#

However, when I put it in the by() portion of "collapse", it causes the dataset to collapse by the caseid, and not the PSU.īelow is the code I have been working with:Ĭollapse(mean) commresid=wherelives commregion=region meancommeduc=comm_educlvl communemp=unemployedĬommpoverty=poverty commpovlevel=povlevel commwealth=v190 commanc=ancvisits commpostnatal=postchk commsba=birthassistĬommdelivery=birthplace commfemeduc=femeduc commeneduc=meneduc commfemjob=femjob commenjob=menjobĬommworkprev=workprevyr commfirstmarr=agefirstunion commfirstbirth=matagefirstbirth commallkids=parityĬommidealkids=idealkidnum commsons=xxsonsalive commgirls=xxgirlsalive commdecision=all_decisionĬommviolence=violence commcontrol=control, by(v001 caseid) cw With big datasets, Stata can be slow compared to other languages, though they do seem to be trying to change that a bit. Counting panels, and more generally groups, is sometimes possible in Stata through a reduction command (e.g., collapse, contract, statsby) that produces a smaller dataset or through a tabulation command. I need to the caseid variable to stay in the dataset so that I can merge the community level means back onto the original IR dataset. However, I am running into problems with keeping my caseid variable in the collapsed dataset.

Stata collapse how to#

However, when I included this option, Stata returned an error message: "no observations" and I am not sure how to get around this.Ģ) I want to collapse by PSU (v001) so as to get the community means for my variables. I would be grateful if I could get some guidance on how to tackle the following challenges:ġ)According to the Stata manual,"collapse" will, by default, use all my observations to calculate the summary statistics if I want to exclude missing observations for variables, I am to specify the "cw" option. I am currently trying to aggregate individual-level statistics to create community-level variables using the Stata collapse command. use clear collapse (mean) avgageage, by(famid) list famid avgage 1. Then, I collapsed the dataset by summing up the price x units and units variables: collapse (sum) pricexunits units, by(week product)Īnd finally, I created a new variable where I divided pricexunits by units.I am analyzing contextual determinants of neonatal mortality using the Nigeria 2013 DHS. The following collapse command does the exact same thing as above, except that the average of age is named avgage and we have explicitly told the collapse command that we want it to compute the mean. So what I did is I manually calculated a new variable price x units in the original data set.

The data set is a collection of single transactions with # of purchases and prices per unit paid at week/store/product level: clearĬlearly, taking simple averages when collapsing ignores number of units purchased, resulting in a wrong average price estimate. It means that I am not able to get weighted average prices paid in my sales data set at a week/product level where the weight is the units sold. I couldn't find a Stata command on the following issue, so I solved it manually:Īccording to the official manual, Stata doesn't do weights with averages in the collapse command (p.