Skip to content

Python code to generate network of patents over time and to process the centrality measures of the network.

Notifications You must be signed in to change notification settings

kevincheng96/EconLab-Patent-Network-Centrality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EconLab-Patent-Network-Centrality

Python code to process the US patent citation network over time to produce centrality measures and network visualizations of patent categories.

Dependencies

  • Python 2.7
  • Msgpack-numpy
  • NetworkX
  • Matplotlib
  • Pandas
  • Numpy

Folders

data

Contains all the original CSV patent citation data.

cache

Stores the msgpack serialization files used to save and load the processed data. Serialization post-processed data is important because the process step can take up to a day to finish running (due to the size of the data).

outputs

Stores all the outputs generated by network_analysis.py. This includes .png files of the network graphs, heatmaps, and centrality rankings over time. this also includes CSV files that contain centrality rankings for each patent category.

Files

network_creator.py

This file loads in all the original patent citation data and processes them to create adjacency matrices and vectors. This file also creates a crosswalk dictionary that links uspto values to ipc values.

At the top of the file are some parameters that can be changed, such as the starting year and the ending year for generating these network matrices and vectors. The year_gap parameter is used to specify how many years of future citing patents should be considered as linked to a patent.

Note that this file should be called first before network_analysis.py is called. It also should only be called once (every time parameters are changed), as the outputs are serialized as msgpack files and saved in the cache folder. Generating these adjacency matrices and vectors can take a long time due to the sheer size of the input data.

network_analysis.py

This file reads in the processed data saved in "cache" and creates centrality rankings, network graphs, and heatmaps. These figures are outputted to the "outputs" folder.

Similar to network_creator.py, this script has parameters that can be adjusted at the top of the file. There are the starting year, ending year, and year_gap parameter that have been described in network_creator.py. The years_to_graph variable is a list containing the years of interest to graph in the network graphs and heatmaps. The network_to_use variable indicates whether the network to be graphed should be for the uspto, ipc108, or ipc8 category. Finally, the years_per_aggregate variable indicates how many consecutive years should be aggregated together when plotting the centrality rankings. A higher number of years to aggregate by will result in rankings that are less noisy.

Running the code

If the cache folder is empty, that means the original data has not been processed by the "network_creator.py" yet. So, you should run the "network_creator.py" first to generate the adjacency matrices and vectors for the patent network. A crosswalk dictionary that links uspto categories to ipc categories will also be generated.

Now that the cache folder has been populated with processed data, the "network_analysis.py" file can be called. There are multiple functions in this file that can be called to output centrality ranking plots over time, centrality network graphs, centrality heatmaps, and centrality rankings csvs. Each function is commented heavily in the code.

About

Python code to generate network of patents over time and to process the centrality measures of the network.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages