This python file strips the Medical Education Search Heading (MeSH) XML file,
to create a csv which can be bulk uploaded into an empty SQL table.
MeSH terms are tagged to all articles submitted to pubmed and are useful
controlled vocabularly for biology and science in general. There is often
a need to use the terms in a relational database, hence this script.
The script can be RUN BY TYPING BELOW IN A TERMINAL WITH PYTHON SET UP (takes ~20sec to run) 2 WILL BE CREATED = "output.csv" & "output.txt"
python3 meshFromDescXML.py
In the python file, modify the filename 'desc2020.xml' to be whatever desc file you are using.
MODIFY TO WHATEVER MESH XML FILE YOU ARE PARSING BELOW, HAVE IN SAME DIRECTORY tree = ET.parse('desc2020.xml') root = tree.getroot() itemArrayS = []
THE BELOW LINE ADD ONE HEADER ROW TO FILE, BUT UNDESIRED IF IMPORTING INTO ALREADY CREATED EMPTY SQL TABLE
#itemArrayS.extend([['meshTerm', 'synonym']])
for QR in root.findall('.//DescriptorRecord'): conceptNameString = QR.find('ConceptList/Concept/ConceptName/String').text for element in QR.findall('ConceptList/Concept/TermList'): itemArray = [] if element.find('Term') is not None: terms = [terms.text for terms in element.findall('Term/String')] i=0 #itemArray=[] while i < len(terms): #itemArray.append(conceptNameString,terms[i]) itemArray.extend([[conceptNameString, terms[i]]]) i += 1 itemArrayS.extend(itemArray)
#print(conceptNameString , " , terms = " , terms , " , len(terms) = " , len(terms)) #print("itemArray = " , itemArray) #print("itemArrayS = " , itemArrayS)
print("itemArrayS len = " , len(itemArrayS))
print("itemArrayS = " , itemArrayS , file=open("output.txt", "a"))
with open("output.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerows(itemArrayS)