Data publication: Demand-pull and technology-push: What drives the direction of technological change? Empirical data on a coupled two-layer input-output and patent-citation network

Hötte, Kerstin

The core data in this publication are empirical data on two coupled network layers inferred from cross-industrial citation links (patent citation network) and input-output flows among industries (input-output (IO) network). These data are available in quinquennial time steps for the years 1977, 1982, 1987, 1992, 1997, 2002, 2006. The data is available at the 6-digit level. The analyses in the paper mainly refer to 4-digit level results of a balanced panel of industries, i.e. industries for which both patent and IO data are available for the full time horizon. This publication also contains a sample of panel data on industry characteristics (mainly industry size by patent stock and output and network indicators). The data are available ein RData format and supplemented by the R-scripts used to compile and analyze the data. ------------------------------------------------------------------------------------------------ ### This data publication contains all data and R-code used for the paper: #### Hötte, Kerstin (2021): "Demand-pull Demand-pull and technology-push: What drives the direction of technological change? An empirical network-based approach". [Forthcoming. If you use the data, please cite the most recent version of the paper.] The paper also offers a description of the data and its compilation. ### Abstract: Demand-pull and technology-push are linked to an empirical two-layer network based on coupled cross-industrial input-output (IO) and patent-citation links among 155 4-digit (NAICS) US-industries in 1976-2006. I study the evolution of industry hierarchies and link formation. Both layers co-evolve, but differently: The patent network became denser and increasingly skewed, while market hierarchies are balanced and sluggish in change. Industries became more similar by patent-citations, but less by input use. Similar R&D capabilities as other big industries is beneficial for innovation providing access to knowledge but relying on the same market inputs is unfavorable if it intensifies competition. This may incite industries to explore other technological pathways. Growth in the market is constrained by scarcity and competition, but knowledge as innovation input is non-rival leading to increasing returns and a skewed distribution. This may strengthen existing R&D trajectories while market pressure may trigger a re-direction in both layers. This work is limited by its reliance on endogenously evolving classifications. ------------------------------------------------------------------------------------------------- ### To reproduce the results and the data from the raw data, you must run the code provided in the following order: (1) CREATING THE DATA: (a) The patent data can not be fully reconstructed from the data that are available in this data publication because one of the intermediate steps relies on proprietary data that can not be provided here. For the remainder: You can compile parts of the patent and the IO data from the raw data. To do so, please use the code and raw data provided in the folders io_data_R_files and patent_data_R_files. Further detail is provided below. (b) The folder R_scripts_both provides all code needed to create the merged panel data that is used in the analysis. (2) REPRODUCING THE ANALYSES: The folder R_script_both provides all code needed to reproduce the figures, tables, descriptive statistics and regression analyses. Further detail is provided below. This data publication also provides additional results on the analyses at different levels of data aggregation. You will find it in the folder statistical_output but you can also produce additional results running the code provided. ---------------------------------------------------------------------------------------------- ### This data publication consists of 6 folders: (1) patent_data_R_files (2) io_data_R_files (3) R_scripts_both (4) data_combined (5) statistical_output ---------------------------------------------------------------------------------------------- ### Details: ---------------------------------------------------------------------------------------------- (1) patent_data_R_files This folder contains 2 subfolders: code, data - code: This subfolder contains the R-scripts of all single steps executed to process the patent raw data. These steps are explained in detail in the Supplementary Material of the paper Hötte, K (2021): "Demand-pull and technology-push [forthcoming]" - data: This subfolder contains the processed data at different levels of aggregation and a folder where to put the source files, i.e. the original NBER patent data used in this analysis that need to be downloaded from https://sites.google.com/site/patentdataproject/Home [accessed on Mar 17, 2021]. To use the data, please follow the instructions described in the readme in the source data folder and cite the paper: Bronwyn H Hall, Adam B Jaffe, and Manuel Trajtenberg. "The NBER patent citation data file: Lessons, insights and methodological tools." National Bureau of Economic Research Working Paper w8498, 2001. DOI: 10.3386/w8498 The industry level patent data can not be fully reconstructed because the mapping from firm-level data to industry codes relies on proprietary data and can not be provided here. This subfolder also contains a folder named "temp" which is empty in this download. It will be automatically filled with intermediate data when you run the scripts in the "code" folder to reproduce the processed data from the source files. All intermediate data in this folder can be removed after having processed the data. ---------------------------------------------------------------------------------------------- (2) io_data_R_files This folder contains 3 subfolders: code, data - code: This subfolder contains the R-scripts used to compile the IO data. The single steps are explained in detail in the Supplementary Material of the paper Hötte, K (2021): "Demand-pull and technology-push [...]". - data: This subfolder contains the processed and the raw data downloaded from the Bureau of Economic Analysis (BEA) websites: https://www.bea.gov/industry/benchmark-input-output-data and https://www.bea.gov/industry/historical-benchmark-input-output-tables [accessed on Dec 21, 2020] and described in Horrowitz, K and Planting, M. "Concepts and Methods of the U.S. Input-Output Accounts. Measuring the Nation’s Economy". BEA 2006, URL: https://apps.bea.gov/papers/pdf/IOmanual_092906.pdf [accessed on Mar 17, 2021] Again, this subfolder also contains a folder "temp" which is empty but will be filled when you run the R-scripts to reproduce the data. ---------------------------------------------------------------------------------------------- (3) R_scripts_both This folder contains 1 file (settings_data.r) and 3 subfolders (create_merged_panel, descriptives, regressions): - settings_data.r: Here you can specify the data settings for the analyses. Here, it is set to the default specification of 4-digit level balanced panel data. This script also sets the output directory. Potentially, you have to adjust this for your own needs. - create_merged_panel: This subfolder contains all scripts to compile the merged panel data from the data compiled before (using the code in patent_data_R_files and io_data_R_files. To create the data, run the main script. It sources functions and subscripts from the aux folder. - descriptives: This subfolder contains all scripts needed to reproduce the descriptive statistics and exports them to latex tables. It also creates all figures. To produces these statistics, run the main scripts which sources the scripts from the src folder. - regressions: This folder contains all scripts needed to reproduce the regression analyses. It additionally contains scripts that were used for purposes of data exploration and model selection. To reproduce the regressions, run the scripts regression_industrial_hierarchies.r and regression_link_formation.r. The scripts allow for many features that vary the model specifications and types of analyses (e.g. also stepGAIC for model selection). The scripts write the regression outputs to txt and save the data of the results. These data can be processed and written to latex tables running the script process_selected_regression_results.r (in the write_results_to_tex folder). ---------------------------------------------------------------------------------------------- (4) data_combined This folder contains the merged data at different aggregation levels. These data are compiled running the code in create_merged_panel explained above. The data are: industry panel data, technological similarity networks, data on link formation processes, data on spillovers and a sample of network metrices. Some of these data are merged in the regression analyses scripts. This folder also contains a subfolder name other_data. This subfolder contain the data on trade exposure downloaded from: https://sompks4.github.io/sub_data.html [accessed on Mar 17, 2021]; (9. Import penetration, low-wage country competition and trade costs) If you want to make use of these data, please cite: Bernard, AB, Jensen, JB, Schott, PK: Survival of the Best Fit: Low Wage Competition and the (Uneven) Growth of US Manufacturing Plants, Journal of International Economics 68 (2006). It also contains a second subfolder named BLS with data on multi-factor productivity and other measures downloaded from the Bureau of Labor Statistics (BLS) website https://www.bls.gov/mfp/ [accessed on Mar 17, 2021]. This data is not used in the publication but was used for data exploration purposes. ---------------------------------------------------------------------------------------------- (5) statistical_output This folder contains the results presented in the paper and additional results for other aggregation levels and data subsets. These data in this folder are overwritten when you run the R-scripts for the regressions and descriptive statistics. ----------------------------------------------------------------------------------------------- LICENSE: This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. To give appropriate credit, please cite the most recent version of the paper: Hötte, Kerstin (2021): "Demand-pull Demand-pull and technology-push: What drives the direction of technological change? An empirical network based approach".

PUB - Publikationen an der Universität Bielefeld

Data publication: Demand-pull and technology-push: What drives the direction of technological change? Empirical data on a coupled two-layer input-output and patent-citation network

Zitieren