================================================================================
REPLICATION PACKAGE

"Class Dismissed: The Effect of International Student Exclusion on the
 U.S. STEM Workforce and Economic Growth"

PIIE Policy Brief PB-XX
Michael A. Clemens, Jeremy Neufeld, and Amy M. Nice

Replication package version: June 2026
================================================================================


--------------------------------------------------------------------------------
1. OVERVIEW
--------------------------------------------------------------------------------

This package reproduces every figure and every table in the Policy Brief named
above, which are a subset of the figures and tables in the longer underlying 
paper (Clemens, Neufeld, and Nice 2025; hereafter CNN 2025). 

Running the single Stata script in code/ produces all six outputs:

   Figure 1  Cumulative monthly F-1 student visa issuances by year, 2017-2025
   Figure 2  Foreign-born share of the employed U.S. high-skill STEM workforce
             (Bachelor's-or-higher), 2000-2023
   Figure 3  Sankey "front door" pipeline for foreign STEM graduates of U.S.
             universities (FY 2018-2022 average)
   Figure 4  STEM OPT participation and F-1 to H-1B transitions, 2004-2019
   Table 1   Long-term impact of a one-third reduction in the annual flow of
             foreign STEM graduates from U.S. universities
   Table 2   Short-term impact of the same reduction


--------------------------------------------------------------------------------
2. PACKAGE CONTENTS AND STRUCTURE
--------------------------------------------------------------------------------

    replication package/
    |
    +-- README.txt                          This file.
    |
    +-- code/
    |   +-- run_brief_replication.do        Single Stata master script. Runs
    |                                       end-to-end and writes every figure
    |                                       and table to ../output/.
    |
    +-- raw_data/
    |   +-- niv_issuances_2017_2025.csv     U.S. Department of State, Monthly
    |   |                                   Nonimmigrant Visa Issuance
    |   |                                   Statistics, compiled to a single
    |   |                                   CSV.
    |   +-- hsip-data02-nationwide.dta      Nationwide panel: ACS workforce
    |   |                                   stocks, IPEDS graduate flows, USCIS
    |   |                                   H-1B and EB administrative data,
    |   |                                   2000-2023. Built upstream from CNN
    |   |                                   2025.
    |   +-- fy_counts.csv                   Annual counts of F-1 outcomes and
    |   |                                   status transitions from SEVIS
    |   |                                   administrative microdata, FY2004-
    |   |                                   2023. Obtained by FOIA from DHS by
    |   |                                   the Institute for Progress; the
    |   |                                   data file used here is the public
    |   |                                   aggregated extract.
    |   +-- nscg2023_full.dta               National Survey of College
    |   |                                   Graduates (NSCG), 2023 public-use
    |   |                                   file. Used to estimate the
    |   |                                   U.S.-trained share among foreign-
    |   |                                   born STEM workers, by degree.
    |   +-- nscg2003_full.dta               NSCG 2003 public-use file. Same
    |                                       role for the 2003 baseline used in
    |                                       Table 1, growth column.
    |
    +-- output/                             Generated by the script. Initially
                                            empty (or contains files from a
                                            previous run). Contents listed in
                                            Section 5 below.


--------------------------------------------------------------------------------
3. SOFTWARE AND COMPUTING REQUIREMENTS
--------------------------------------------------------------------------------

   * Stata 18 or later (the script declares version 18.0 for forward
     compatibility; it was last tested in Stata 19.5).

   * User-written Stata commands. The script attempts to install these
     automatically. Internet access is required on first run.

         palettes          SSC
         colrspace         SSC
         graphfunctions    SSC
         sankey            GitHub: asjadnaqvi/stata-sankey

   * Disk space: ~35 MB for raw_data; output is <10 MB.
   * Memory: <2 GB. The largest single file is nscg2003_full.dta (~20 MB).
   * Runtime: under five minutes on a modern laptop (Apple Silicon or x86_64).

The script does not require any operating-system-specific tools. It runs on
macOS, Linux, and Windows with no modification beyond editing one line for the
project root path.

Optional: if the font "Linux Libertine O" is installed (TeX Gyre and Libertine
families are common on TeX distributions), figures will use it. If not, Stata's
default font is used; output is otherwise identical.


--------------------------------------------------------------------------------
4. HOW TO RUN
--------------------------------------------------------------------------------

   (1) Copy this entire folder ("replication package") to any location on disk
       on which you have write permission. The folder may be renamed.

   (2) Open code/run_brief_replication.do in Stata. Find the line near the
       top of the script that reads:

           local root "..."

       and replace the contents of the quoted string with the absolute path to
       the replication package folder on your machine. Save the file.

       This is the ONLY line that needs to be edited for portability. All
       other paths in the script are constructed relative to this root.

   (3) Execute:    do run_brief_replication.do

       The script will:
         (a) install user-written dependencies if they are not already present,
         (b) load and merge the source data,
         (c) compute every figure and table, and
         (d) write outputs to ../output/.

   (4) Outputs are written to the output/ folder. A Stata log file
       (replication_log.smcl) is written there as well.


--------------------------------------------------------------------------------
5. OUTPUTS AND MAPPING TO THE BRIEF
--------------------------------------------------------------------------------

   Output filename                                 Brief item
   -------------------------------------------     ----------------------------
   figure1_f1_cumulative_by_year.png/.pdf          Figure 1
   figure2_foreign_share_stem_workforce.png/.pdf   Figure 2
   figure3_sankey_stem.png/.pdf                    Figure 3
   figure4_opt_transitions.png/.pdf                Figure 4
   table1_long_term_impact.csv                     Table 1 (data)
   table1_long_term_impact.tex                     Table 1 (LaTeX, booktabs)
   table2_short_term_impact.csv                    Table 2 (data)
   table2_short_term_impact.tex                    Table 2 (LaTeX, booktabs)
   replication_log.smcl                            Stata log of the full run

Notes on individual outputs:

   Figure 1. Cumulative F-1 visa issuances are summed by calendar month within
   year. Source data are State Department monthly nonimmigrant visa issuance
   statistics. Data through September 2025 are included.

   Figure 2. The bachelor's-or-higher panel of the foreign-born STEM workforce
   stocks figure. Inline percentages report the foreign-born share of the 2023
   stock and of growth between 2000 and 2023.

   Figure 3. Sankey for all STEM graduates. Percentages are flow shares of the
   original F-1 cohort. The OPT-to-H-1B layer uses a two-year-lagged
   denominator (people finishing OPT in the year of the H-1B petition).

   Figure 4. Two y-axes: stock of STEM OPT workers (right, in red); fraction
   of F-1s changing status to H-1B and F-1/OPT-to-H-1B (left, in green). The
   2008 STEM OPT extension is marked.

   Table 1. Reproduces Sheet 1 of "without students calc july 17.xlsx".
   For each degree level d (STEM all, STEM master's, STEM Ph.D.):
     stu_y_d  = for_y_d * p_y_d
   where for_y_d is the ACS count of foreign-born S&E STEM workers in year y
   (2003 or 2023) at degree level d, and p_y_d is the NSCG-derived fraction
   of those workers whose first degree is from a U.S. institution. The
   columns are stu_2003, stu_2023, impact: level = -(1/3) * stu_2023 / all_2023,
   and impact: growth = -(1/3) * (stu_2023 - stu_2003) / (all_2023 - all_2003).

   NSCG sample restrictions: foreign-born (BTHRGN in 10-55), STEM (stem==1),
   first-listed degree (degnum==1), and degree level deg in 1..3 (all
   bachelor's+), deg==2 (master's), deg==3 (Ph.D., excluding professional).
   The U.S.-trained fraction is computed as a weighted ratio directly:
       p = sum(WTSURVY * us_deg) / sum(WTSURVY)
   restricted to the subpop above. This gives the same point estimate as
   -svy: proportion- without depending on its version-specific output layout.

   Table 2. Reproduces Sheet 2 of "without students calc july 17.xlsx".
   For each degree level d:
     short_term_impact_d = -(1/3) * L_d * T_d / O_d
   where L_d and O_d are IPEDS 2023 foreign and total STEM graduates at
   degree level d, and T_d is the year-1 retention rate of F-1 STEM graduates
   into the U.S. labor force, computed from SEVIS FOIA aggregates.
   T_d = (n_opt + n_nopt_h1b + n_nopt_o1 + n_nopt_j1 + n_nopt_other) / n_grad
   averaged over FY2018-2022 for STEM-all and Ph.D. (Sankey-style mean-then-
   ratio), and over FY2017-2022 for master's (matches the longer-paper
   `frac_masters` calculation, ratio-then-mean).

If your output differs from the published Brief at the second decimal place,
the cause is almost certainly differences in NSCG public-use weights or in
the IPEDS revision used. The reported numbers in the Brief used IPEDS through
2023 and NSCG public-use 2003 and 2023.


--------------------------------------------------------------------------------
6. DATA SOURCES AND CITATIONS
--------------------------------------------------------------------------------

   niv_issuances_2017_2025.csv
       U.S. Department of State, Monthly Nonimmigrant Visa Issuance
       Statistics. Public series. URL:
       https://travel.state.gov/content/travel/en/legal/visa-law0/visa-statistics/nonimmigrant-visa-statistics/monthly-nonimmigrant-visa-issuances.html
       (Accessed 2025-10. Data through September 2025.)

   hsip-data02-nationwide.dta
       Constructed in the upstream pipeline of Clemens, Neufeld, and Nice
       (2025) from:
         * American Community Survey (ACS) public-use microdata, IPUMS USA
           (2000-2023). Steven Ruggles et al., IPUMS USA: Version 15.0.
         * Integrated Postsecondary Education Data System (IPEDS), U.S.
           Department of Education, NCES (2000-2023).
         * USCIS H-1B Employer Data Hub.
         * USCIS Adjustments of Status statistics for employment-based
           immigrant visas.

   fy_counts.csv
       Aggregated counts of F-1 and post-F-1 status transitions, FY2004-2023,
       computed from SEVIS administrative microdata. Obtained from the
       Institute for Progress, which received the underlying microdata from
       the U.S. Department of Homeland Security under the Freedom of
       Information Act. Public-release aggregates only.

   nscg2023_full.dta, nscg2003_full.dta
       National Science Foundation, National Center for Science and
       Engineering Statistics, National Survey of College Graduates,
       restricted-use cleaned to public-use variables. Public-use files
       available from NSF NCSES:
       https://www.nsf.gov/statistics/srvygrads/

If using this package, please cite the Brief and the longer paper:

   Clemens, Michael A., Jeremy Neufeld, and Amy M. Nice. 2026. "Class
   Dismissed: The Effect of International Student Exclusion on the U.S. STEM
   Workforce and Economic Growth." PIIE Policy Brief PB-XX. Washington:
   Peterson Institute for International Economics.

   Clemens, Michael A., Jeremy Neufeld, and Amy M. Nice. 2025. "Brain Freeze:
   How International Student Exclusion Will Shape the STEM Workforce and
   Economic Growth in the United States." Commissioned by the U.S. National
   Academies of Sciences, Engineering, and Medicine. IZA DP 18548.


--------------------------------------------------------------------------------
7. WHAT THIS PACKAGE DOES NOT CONTAIN
--------------------------------------------------------------------------------

This package is the minimum sufficient to reproduce the Policy Brief. It does
NOT contain:

   * The full upstream construction of hsip-data02-nationwide.dta from raw
     ACS, IPEDS, and USCIS sources. See the longer paper's full replication
     archive for that pipeline.

   * Source-level extraction of the F-1 transition counts from the SEVIS
     microdata. The Institute for Progress, which holds the FOIA extract,
     publishes the aggregated counts in fy_counts.csv.

   * The State Department PDFs and XLSX files upstream of
     niv_issuances_2017_2025.csv. The CSV is the compiled aggregation used
     by Figure 1; the underlying State Department reports are public.

   * Any of the analyses in the longer paper that are not preserved in the
     Brief, including state-level analyses, citation-based productivity
     estimates, and the entrepreneurship/patenting decompositions.


--------------------------------------------------------------------------------
8. KNOWN BEHAVIOR AND TROUBLESHOOTING
--------------------------------------------------------------------------------

   * The first time the script runs, it tries to install sankey, palettes,
     colrspace, and graphfunctions. If your machine has no internet access,
     install these manually before running:

         ssc install palettes
         ssc install colrspace
         ssc install graphfunctions
         net install sankey, from("https://raw.githubusercontent.com/asjadnaqvi/stata-sankey/main/installation/")

   * If you see "command sankey is unrecognized" the install did not succeed.
     The script writes a warning to the log identifying which packages are
     missing.

   * The Sankey output uses Arial Narrow. If that font is unavailable, Stata
     falls back to its system default; label positions can shift slightly.

   * NSCG svy proportion estimates use Stata's survey design. The estimates
     are point estimates; standard errors are not reported in the published
     Brief tables but can be obtained by inspecting the log from the
     proportion commands the script runs.

   * The script does not delete any pre-existing files in output/. It
     overwrites only the named outputs it creates.


--------------------------------------------------------------------------------
9. CONTACT
--------------------------------------------------------------------------------

   Michael A. Clemens (corresponding for replication)
   School of Government and Policy, Johns Hopkins University
   Peterson Institute for International Economics

   Questions on the SEVIS extract should be directed to the Institute for
   Progress.

================================================================================
END OF README
================================================================================
