预测股价一直是吸引投资者和研究人员的话题。投资者总是猜测股票的价格是否会上涨,因为有许多复杂的财务指标,只有投资者和具有良好财务知识的人才能理解,所以股市的走势对普通百姓来说非常难以琢磨。
对于非专家而言,机器学习是一个很好的机会,它可以准确地预测并获得稳定的财富,并且可以帮助专家获得最有用的指标并做出更好的预测。
本教程的目的是在TensorFlow 2和Keras中构建一个预测股市价格的神经网络。更具体地说,我们将使用LSTM单元构建循环神经网络,因为这是时间序列预测的最新技术。
一、安装环境
pip3 install tensorflow pandas numpy matplotlib yahoo_fin sklearn完成所有设置后,打开一个新的Python文件(或bfwstuio)并导入以下库:
import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard from sklearn import preprocessing from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from yahoo_fin import stock_info as si from collections import deque import numpy as np import pandas as pd import matplotlib.pyplot as plt import time import os import random
# set seed, so we can get the same results after rerunning several times np.random.seed(314) tf.random.set_seed(314) random.seed(314)
二、准备数据集
def load_data(ticker, n_steps=50, scale=True, shuffle=True, lookup_step=1, test_size=0.2, feature_columns=['adjclose', 'volume', 'open', 'high', 'low']): """ Loads data from Yahoo Finance source, as well as scaling, shuffling, normalizing and splitting. Params: ticker (str/pd.DataFrame): the ticker you want to load, examples include AAPL, TESL, etc. n_steps (int): the historical sequence length (i.e window size) used to predict, default is 50 scale (bool): whether to scale prices from 0 to 1, default is True shuffle (bool): whether to shuffle the data, default is True lookup_step (int): the future lookup step to predict, default is 1 (e.g next day) test_size (float): ratio for test data, default is 0.2 (20% testing data) feature_columns (list): the list of features to use to feed into the model, default is everything grabbed from yahoo_fin """ # see if ticker is already a loaded stock from yahoo finance if isinstance(ticker, str): # load it from yahoo_fin library df = si.get_data(ticker) elif isinstance(ticker, pd.DataFrame): # already loaded, use it directly df = ticker else: raise TypeError("ticker can be either a str or a `pd.DataFrame` instances") # this will contain all the elements we want to return from this function result = {} # we will also return the original dataframe itself result['df'] = df.copy() # make sure that the passed feature_columns exist in the dataframe for col in feature_columns: assert col in df.columns, f"'{col}' does not exist in the dataframe." if scale: column_scaler = {} # scale the data (prices) from 0 to 1 for column in feature_columns: scaler = preprocessing.MinMaxScaler() df[column] = scaler.fit_transform(np.expand_dims(df[column].values, axis=1)) column_scaler[column] = scaler # add the MinMaxScaler instances to the result returned result["column_scaler"] = column_scaler # add the target column (label) by shifting by `lookup_step` df['future'] = df['adjclose'].shift(-lookup_step) # last `lookup_step` columns contains NaN in future column # get them before droping NaNs last_sequence = np.array(df[feature_columns].tail(lookup_step)) # drop NaNs df.dropna(inplace=True) sequence_data = [] sequences = deque(maxlen=n_steps) for entry, target in zip(df[feature_columns].values, df['future'].values): sequences.append(entry) if len(sequences) == n_steps: sequence_data.append([np.array(sequences), target]) # get the last sequence by appending the last `n_step` sequence with `lookup_step` sequence # for instance, if n_steps=50 and lookup_step=10, last_sequence should be of 60 (that is 50+10) length # this last_sequence will be used to predict future stock prices not available in the dataset last_sequence = list(sequences) + list(last_sequence) last_sequence = np.array(last_sequence) # add to result result['last_sequence'] = last_sequence # construct the X's and y's X, y = [], [] for seq, target in sequence_data: X.append(seq) y.append(target) # convert to numpy arrays X = np.array(X) y = np.array(y) # reshape X to fit the neural network X = X.reshape((X.shape[0], X.shape[2], X.shape[1])) # split the dataset result["X_train"], result["X_test"], result["y_train"], result["y_test"] = train_test_split(X, y, test_size=test_size, shuffle=shuffle) # return the result return result
三、模型制作
点击查看剩余70%
网友评论0