In order to get started quickly we will work through a standard wordscores analysis using some example documents. The documents are British political party manifestos from elections in 1992 and 1997. the manifestos are available as a zipfile from the wordscores homepage.
Extract the contents of the zip file to your Stata working folder, and launch Stata. In the transcript, your input is shown in white and Stata's responses in green.
Before beginning the analysis, we will allow Stata to allocate sufficient memory. 10 megabytes should be enough for this number of documents.
. set memory 10m |
(10240k) |
Now we compute word frequencies from all the manifestos:
. wordfreq lab92.txt con92.txt ld92.txt lab97.txt con97.txt ld97.txt |
Starting WORDFREQ ... lab92.txt --> tlab92 con92.txt --> tcon92 ld92.txt --> tld92 lab97.txt --> tlab97 con97.txt --> tcon97 ld97.txt --> tld97 |
We will use the 1992 manifestos as reference documents, so we will assign scores to them:
. setref tlab92 5.35 tld92 8.21 tcon92 17.21 |
We can take a look at the texts we're working with using the describetext function.
. describetext t*92 t*97 |
| Ref Total Unique Mean Median Text | Score Words Words Freq. Freq. ------------+---------------------------------------- tld92 | 8.21 17,671 3,167 5.58 1.00 tcon92 | 17.21 29,413 4,028 7.30 2.00 tlab92 | 5.35 11,445 2,372 4.83 1.00 tld97 | . 13,959 2,418 5.77 2.00 tcon97 | . 21,129 3,174 6.66 2.00 tlab97 | . 17,567 2,994 5.87 2.00 |
Now we set the name of the dimension we used for the reference scores:
. wordscore economic |
And finally infer the positions of the 1997 manifestos on this dimension:
. textscore economic tlab97 tld97 tcon97 |
Wordscore v0.36 (c) 2003 Kenneth Benoit Dimension: ECONOMIC | Unique Trans- Trans- Total % Virgin | Raw Raw Scored formed formed Transformed Words Tot Text | Score SE Words Score SE [95% Conf. Interval] Scored Sc'd ------------+------------------------------------------------------------------------------------- tlab97 | 10.3718 0.0149 2,247 9.1274 0.3459 8.4356 9.8192 16,616 94.6 tld97 | 10.1934 0.0153 1,949 4.9922 0.3559 4.2804 5.7039 13,380 95.9 tcon97 | 10.7184 0.0137 2,341 17.1640 0.3175 16.5290 17.7989 20,072 95.0 |
Each row represents the results for a virgin text. The main quantities of interest in this table are the inferred scores of each text on the economic scale, in the column marked 'Transformed Score', and the standard error estimate for this estimate, in the column marked 'Transformed SE'.
To learn more about the functions applied in this session, click 'Next'.
Previous | Up to Table of Contents | Next |