Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionError: ('Connection aborted.', BadStatusLine('Error #2000\n',)) #5

Open
katelynstenger opened this issue Dec 10, 2018 · 3 comments
Assignees
Labels

Comments

@katelynstenger
Copy link

My script iterates through a list of patents I want to collect information on.
I initially received this error:
Exception is: ('Connection aborted.', error(10054, ''))
I introduced a time.sleep(2) between calls of pypatent.Search function and remediated this error.

In the 5th iteration of pypatent.Search() , I received this error:
ConnectionError: ('Connection aborted.', BadStatusLine('Error #2000\n',))

Any suggestions on remediating this error? Thank you for your help in advance!

@katelynstenger
Copy link
Author

Here is the total error message:


BadStatusLine Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
--> 601 chunked=chunked)
602

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
386 # otherwise it looks like a programming error was the cause.
--> 387 six.raise_from(e, None)
388 except (SocketTimeout, BaseSSLError, SocketError) as e:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py in raise_from(value, from_value)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
382 try:
--> 383 httplib_response = conn.getresponse()
384 except Exception as e:

~\AppData\Local\Continuum\anaconda3\lib\http\client.py in getresponse(self)
1330 try:
-> 1331 response.begin()
1332 except ConnectionError:

~\AppData\Local\Continuum\anaconda3\lib\http\client.py in begin(self)
296 while True:
--> 297 version, status, reason = self._read_status()
298 if status != CONTINUE:

~\AppData\Local\Continuum\anaconda3\lib\http\client.py in _read_status(self)
278 self._close_conn()
--> 279 raise BadStatusLine(line)
280

BadStatusLine: Error #2000

During handling of the above exception, another exception occurred:

ProtocolError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
439 retries=self.max_retries,
--> 440 timeout=timeout
441 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
638 retries = retries.increment(method, url, error=e, _pool=self,
--> 639 _stacktrace=sys.exc_info()[2])
640 retries.sleep()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
356 if read is False or not self._is_method_retryable(method):
--> 357 raise six.reraise(type(error), error, _stacktrace)
358 elif read is not None:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py in reraise(tp, value, tb)
684 if value.traceback is not tb:
--> 685 raise value.with_traceback(tb)
686 raise value

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
--> 601 chunked=chunked)
602

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
386 # otherwise it looks like a programming error was the cause.
--> 387 six.raise_from(e, None)
388 except (SocketTimeout, BaseSSLError, SocketError) as e:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py in raise_from(value, from_value)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
382 try:
--> 383 httplib_response = conn.getresponse()
384 except Exception as e:

~\AppData\Local\Continuum\anaconda3\lib\http\client.py in getresponse(self)
1330 try:
-> 1331 response.begin()
1332 except ConnectionError:

~\AppData\Local\Continuum\anaconda3\lib\http\client.py in begin(self)
296 while True:
--> 297 version, status, reason = self._read_status()
298 if status != CONTINUE:

~\AppData\Local\Continuum\anaconda3\lib\http\client.py in _read_status(self)
278 self._close_conn()
--> 279 raise BadStatusLine(line)
280

ProtocolError: ('Connection aborted.', BadStatusLine('Error #2000\n',))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
in ()
13 for j in range(df.shape[1]):
14 for i in range(1):
---> 15 Patent_info(df.iloc[i, j])
16 time.sleep(2)
17

in Patent_info(patent_number)
5
6 try:
----> 7 results = pyp.Search(patent_number).as_dataframe()
8
9 # reindex

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pypatent_init_.py in init(self, string, results_limit, get_patent_details, pn, isd, ttl, abst, aclm, spec, ccl, cpc, cpcl, icl, apn, apd, apt, govt, fmid, parn, rlap, rlfd, prir, prad, pct, ptad, pt3d, pppd, reis, rpaf, afff, afft, in_, ic, is_, icn, aanm, aaci, aast, aaco, aaat, lrep, an, ac, as_, acn, exp, exa, ref, fref, oref, cofc, reex, ptab, sec, ilrn, ilrd, ilpd, ilfd)
260 while (num_results_fetched < total_results) and (num_results_fetched < results_limit):
261 this_url = url_pre + str(list_num) + url_post
--> 262 thispatents = self.get_patents_from_results_url(this_url)
263 patents.extend(thispatents)
264

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pypatent_init_.py in get_patents_from_results_url(self, url, limit)
273
274 def get_patents_from_results_url(self, url: str, limit: int = None) -> list:
--> 275 r = requests.get(url, headers=Constants.request_header).text
276 s = BeautifulSoup(r, 'html.parser')
277 patents_raw = s.find_all('a', href=re.compile('netacgi'))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\api.py in get(url, params, **kwargs)
70
71 kwargs.setdefault('allow_redirects', True)
---> 72 return request('get', url, params=params, **kwargs)
73
74

~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\api.py in request(method, url, **kwargs)
56 # cases, and look like a memory leak in others.
57 with sessions.Session() as session:
---> 58 return session.request(method=method, url=url, **kwargs)
59
60

~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
506 }
507 send_kwargs.update(settings)
--> 508 resp = self.send(prep, **send_kwargs)
509
510 return resp

~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
616
617 # Send the request
--> 618 r = adapter.send(request, **kwargs)
619
620 # Total elapsed time of the request (approximately)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
488
489 except (ProtocolError, socket.error) as err:
--> 490 raise ConnectionError(err, request=request)
491
492 except MaxRetryError as e:

ConnectionError: ('Connection aborted.', BadStatusLine('Error #2000\n',))

@daneads daneads self-assigned this Dec 15, 2018
@daneads daneads added the bug label Dec 15, 2018
@daneads
Copy link
Owner

daneads commented Dec 15, 2018

@katelynstenger Thanks for submitting this issue, I get connection errors too. My guess is they've introduced rate limiting on the site. I'll take a look, introduce time.sleep(), and troubleshoot from there.

@jhc154
Copy link

jhc154 commented Jan 21, 2020

@daneads @katelynstenger I am wondering if time.sleep() was ever introduced as part of pypatents? When I got started with this library, I encountered issues when attempting to retrieve large amounts of data. I did not dig too deep but I thought that the rate-limiting might still be an issue.

I first noticed that using selenium really helped but then I found this page and found your idea interesting.

I tested introducing sleep(0.5) in on line 328 after the patents.append(p); under the get_patents_from_results_url. Also, on line 8,from time import sleep The results seem promising by adding sleep(); however, I'm not sure if this the best place to use the function. There is an obvious time tradeoff, it runs longer, but the search seems to work since it looks like it is easier on the server.

Testing for time.sleep() performance:

  1. Without time.sleep(), run pypatent.Search('crispr', results_limit=test, get_patent_details=True, web_connection=conn) at varying results_limits (where test = 500, 200, and 5)
  • results_limit = 500*: failed, server error
    *observed "error 2000...process terminated abnormally... document may be truncated" errors; did not look like the browser recovered - interrupted the kernel.
  1. With edits to introduce time.sleep(0.5), run the same searches.
  • results_limit = 500*: CPU times: user 31.3 s, sys: 318 ms, total: 31.7 s; Wall time: 16min 41s
    *observed some of the same error 2000 errors but the search was able to keep running.
    *observed some empy pages
  1. From Mac, Chrome, Jupyter Notebook, Python 3.7.3

btw, thank you so much for this library!

  • justin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants