-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query tables from plain text files #8
Comments
I'd be happy to help you take this on. I've faced similar data inconsistencies when building tooling for real estate projects |
@ahiddenproxy I have assigned you, please let me know if you need any additional context! Thank you! (Sorry for the late reply) |
I should also mention, you don't need to worry about the pedantic details of the source code, if you don't wish to. I can point you to the function where text input is given as a table, and you can work from there on creating a function that returns structured data. As long some sort of querying function is created to turn the table into structured data, I can implement said function into the code so it works with everything else properly. |
Here is the relevant code.
The link may be slightly different then the provided code, as some reliability edits have been made |
It's been a month since I assigned @ahiddenproxy, and there has not yet been any code committed. Therefore, I will be unassigning them. This issue is up for grabs. @ahiddenproxy Let me know if you still wish to work on this - I'll reassign you as soon as I can. |
@leftmove hello!! I'd like to try my luck on this!! has anybody tried this with regular expressions? what is the most accurate regex someone has gotten to getting this right? |
@leftmove Can I be assigned to this? |
hi @leftmove i'd also faced a quite similar problem while processing the text data. Since I'm new to open-source contribution i'd like to solve this problem, waiting for a response from your side. |
@chickenleaf @parthmshah1302 @jass024 Sorry for the late response, guys. I'll assign all three of you for now, and I encourage you all to collaborate and discuss the issue here, or anywhere else you wish. I suspect three people is too much, but I'll let you guys decide: either give this issue your best shot, or let me if you wish to be unassigned. If it's the ladder, thanks anyway for trying. If you do decide to work on this issue though, make sure to add a comment describing what you will attempt, before you attempt it. This is so that others do not waste time and effort trying to do what you've already accomplished. Thanks. |
I tried regex myself, but that data seems far too inconsistent. It might be worth a shot though, since I am a regex amateur. The closest I got was using Pandas, as that has a text table feature, but again, the data was too inconsistent. |
hi @leftmove thanks for assigning me |
I have just realized that the relevant code I linked above is no longer valid. Here is the new link and examples of current querying methods.
The objective is not necessarily to complete the |
@leftmove Hello, could you share some more details on what you've tried with panda? regex is not getting it done for me, im thinking of making an attempt at NLP? |
@chickenleaf Sorry to dissapoint, but I didn't get far. Pandas needs a consistent layout; if you can achieve one, it's fairly easy to get structured data (see this StackOverflow question, among others if you do some Googling). Getting a consistent layout though, is obviously easier said than done. NLP sounds like it could definitely work, but from what I know (I'm a novice) it will definitely take some work. If you need training data, just let me know and I will gather as much as I can. Otherwise, I don't really know how to help. I commend you for taking on such a brave task, and hopefully it does not take too much of your time. I will try to help in any way I can. |
hi @leftmove i googled about it and came to the conclusion that ML will take a lot of work for this and i think regex will work, I'm trying this to solve it using regex and let's see where it will lead....if the issue gets solved congrats to us otherwise learning is there. |
Hi @leftmove , I'm trying this as a first issue. However, while working, I noticed that in the |
A few other questions:
|
The most recent addition to wallstreetlocal was the ability to query XML files along with HTML files. The only format remaining to code in now, is plain text (TXT).
The SEC's XML and HTML stocks were barely structured enough to be queried accurately, but TXT provides an even harder challenge. The problem is the inconsistency. While tables in TXT can be read fairly easily by human eyes, they are too disimilar to query effectively.
Here are some minified examples.
The column sizes, names, and overall formatting of each table changes too often for any meanginful code to be written. Without writing a gargantuan amount of code, or using AI (which is expensive), there doesn't seem to be much way to query stocks like this.
There should be a better, more effective method to taking the TXT tables, and creating usable, structured data.
The text was updated successfully, but these errors were encountered: