-
-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantics of registered_domain
property for private domains
#138
Comments
(Note to self, if we need to track public vs. private at runtime, #66 is a requirement.) |
Yeah, I bet most will associate it with registrar registration, as you have. In my mind, tldextract has been consistent, working as designed, via a more abstract interpretation of "registered." Excluding private domains, GitHub registered github.io with a registrar, who controlled the domain. Including private domains, GitHub user tuler "registered" tuler.github.io with GitHub, who controlled the domain. I have no strong evidence if my interpretation is broadly useful. It was for a very specific case, when I originally wrote this lib. Or maybe both interpretations are useful. |
I see your point. Nonetheless, keeping runtime information regarding each domain from the PSL can be useful to handle this appropriately by the application. Something like a |
Yes, at the very least we should do #66 and expose I'd then consider a new Renaming today's |
The PR for #66 currently tracks the source of an extraction, whether the official public suffix list, the private domains in the public suffix list, or user-provided extra suffixes. We haven't figured out how to expose that yet. It's tricky, since it's a |
Suppose the following url:
tuler.github.io
github.io
is a private domain in the PSL.When parsed with
include_psl_private_domains=True
we getsubdomain=''
,domain=tuler
,suffix=github.io
.The
registered_domain
property just joinsdomain
andsuffix
, giving metuler.github.io
, but IMHO it still should begithub.io
, as this is the domain registered with the registrar, and can be found in a whois query.One problem to implement this is that when a URL is parsed, we can't know if the parsed domain is a private domain or a ICANN domain, because this is not kept internally when the PSL is read.
Any thoughts?
The text was updated successfully, but these errors were encountered: