Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal and some questions #28

Open
1 task done
tphlru opened this issue Feb 2, 2024 · 1 comment
Open
1 task done

Proposal and some questions #28

tphlru opened this issue Feb 2, 2024 · 1 comment

Comments

@tphlru
Copy link

tphlru commented Feb 2, 2024

Proposal and some questions

Hello. I really like your project, I have reviewed all the readme and tutorials in detail. I have a few questions and maybe some suggestions for improvement:

  • What is the difference between raw, raw alpha and other folders in d_price?
  • Which of these folders should I put my OHLCV data in, and what file names should I give them (or how can I change this in the program?)?
  • As far as I understand, you get news data from twitter (is that correct?)... Suggestion: add basic functionality for parsing data from the listed RSS feeds.
  • how to perform calculation of technical indicators with your script using my OHLCV data ? (some cryptocurrencies - they are missing on alphavantage and yahoo)
  • As far as I understand, you are using yahoo data for the last 6-7 days for forecasting. If I use another data source, what frequency should this data have (1 minute, ticks, 10 minutes)? How often should this data be updated relative to the present (offset)?
  • Forecasting doesn't work at all for overnight and evening data as they were missing from the datasets, do you have any thoughts or intentions to fix this?

Checklist

  • I have checked that there is no similar issue in the repo (required)

Thanks, I would be happy to get feedback

@tphlru tphlru changed the title [Proposal] Proposal title Proposal and some questions Feb 3, 2024
@Leci37
Copy link
Owner

Leci37 commented May 17, 2024

Thanks for your questions.

What is the difference between raw, raw alpha and other folders in d_price?
RAW is if you get the data from yahoo API , if RAW_alpha if you get it from 0_API_alphavantage_get_old_history.py , min_max is deprecated and I have removed it , and I have created the alpaca folder for when you get it from API_alpaca 0_API_alpaca_historical.py
Note: my opinion 0_API_alpaca_historical.py works better (but you have to register in alpaca)

In which of these folders should I put my OHLCV data, and what file names should I give them (or how can I change this in the program?)?
Just look in the code for the string “d_price/”. it is recommended to change it in each file, since we don't know which OHLCV data provider you have (yahoo, alpha and alpaca APIs are offered by default).

From what I understand, you get the news data from twitter (is that correct?).... Suggestion: add basic functionality to parse data from listed RSS feeds.
News is obtained from yahoo and finviz.com, (see file https://github.com/Leci37/TensorFlow-stocks-prediction-Machine-learning-RealTime/blob/master/news_sentiment/news_get_data_NUTS.py ) NO NEWS FROM TWITTER in this version.
While news are obtained, the news are not valued by the TensorFlow models as of today. (providers with little and poor data).
However the language processing is a proprietary engineering and it would take a team of +-8 people to get something decent. Besides there is not enough news and each news provider requires its own extraction bootstrap.
If you are not a team it is better to focus on technical patterns.

how to calculate technical indicators with your script using my OHLCV data (some cryptocurrencies - missing in alphavantage and yahoo)?
Download the data to d_price with your own code, and take a look at these 2 lines:

from features_W3_old import extract_features_v3
df_bars = extract_features_v3(df_raw, extra_columns=False) # IT WORKS Tech indicators Count: 292  

https://github.com/Leci37/TensorFlow-stocks-prediction-Machine-learning-RealTime/blob/master/Tutorial/RUN_buy_sell_Tutorial_3W_5min_RT.py

### From what I understand, you are using yahoo data from the last 6-7 days for the forecast. If I use another data source, how often should this data be (1 minute, ticks, 10 minutes)? How often should this data be updated with respect to the present (offset)?
I have already tried 1min , 5min, 15min and one hour, they are all a nightmare. the change of the frequency has to be given to you by the OHLVC data provider (yahoo, alpaca). In my opinion slightly better with daily.

The forecast does not work at all for the evening and afternoon data, as it was missing in the data sets, do you have any idea or intention to fix this?
I don't understand what you mean, if you are referring to the aftermarket and premarket, they are hard data to work with as the most relevant data the volume disappears. How do you get into the model data in which the volume disappears and makes the technical indicators go crazy ?

I recommend you to see my other project which is technical indicators, and simply refine (this stock works 89% of the time with the RSI below 11.34) the values using a decision tree. https://github.com/Leci37/Strategy-stock-Random-Forest-ML-sklearn-TraderView

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants