当前位置: 首页 > news >正文

旅社网站怎么建立如何做好一个品牌推广

旅社网站怎么建立,如何做好一个品牌推广,网站基本常识,万州网站建设最近在看CS224d,这里主要介绍LSTM(Long Short-Term Memory)的推导过程以及用Python进行简单的实现。LSTM是一种时间递归神经网络,是RNN的一个变种,非常适合处理和预测时间序列中间隔和延迟非常长的事件。假设我们去试着预测‘I grew up in Fr…

最近在看CS224d,这里主要介绍LSTM(Long Short-Term Memory)的推导过程以及用Python进行简单的实现。LSTM是一种时间递归神经网络,是RNN的一个变种,非常适合处理和预测时间序列中间隔和延迟非常长的事件。假设我们去试着预测‘I grew up in France...(很长间隔)...I speak fluent French’最后的单词,当前的信息建议下一个此可能是一种语言的名字(因为speak嘛),但是要准确预测出‘French’我们就需要前面的离当前位置较远的‘France’作为上下文,当这个间隔比较大的时候RNN就会难以处理,而LSTM则没有这个问题。

LSTM的原理

为了弄明白LSTM的实现,我下载了alex的原文,但是被论文上图片和公式弄的晕头转向,无奈最后在网上收集了一些资料才总算弄明白。我这里不介绍就LSTM的前置RNN了,不懂的童鞋自己了解一下吧。

LSTM的前向过程

首先看一张LSTM节点的内部示意图:

图片来自一篇讲解LSTM的blog(http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
这是我认为网上画的最好的LSTM网络节点图(比论文里面画的容易理解多了),LSTM前向过程就是看图说话,关键的函数节点已经在图中标出,这里我们忽略了其中一个tanh计算过程。

g(t)i(t)f(t)o(t)s(t)h(t)======ϕ(Wgxx(t)+Wghh(t1)+bgσ(Wixx(t)+Wihh(t1)+biσ(Wfxx(t)+Wfhh(t1)+bfσ(Woxx(t)+Wohh(t1)+bog(t)i(t)+s(t1)f(t)s(t)o(t)(1)(2)(3)(4)(5)(6)

这里ϕ(x)=tanh(x),σ(x)=11+ex

x(t),h(t) 分别是我们的输入序列和输出序列。如果我们把 x(t) h(t1)

这两个向量进行合并:

xc(t)=[x(t),h(t1)]


那么可以上面的方程组可以重写为:

g(t)i(t)f(t)o(t)s(t)h(t)======ϕ(Wgxc(t))+bgσ(Wixc(t))+biσ(Wfxc(t))+bfσ(Woxc(t))+bog(t)i(t)+s(t1)f(t)s(t)o(t)(7)(8)(9)(10)(11)(12)

其中f(t)

被称为 忘记门,所表达的含义是决定我们会从以前状态中丢弃什么信息。 i(t),g(t) 构成了 输入门,决定什么样的新信息被存放在细胞状态中。 o(t)

所在位置被称作输出门,决定我们要输出什么值。这里表述的不是很准确,感兴趣的读者可以去http://colah.github.io/posts/2015-08-Understanding-LSTMs/ NLP这块我也不太懂。

前向过程的代码如下:

def bottom_data_is(self, x, s_prev = None, h_prev = None):# if this is the first lstm node in the networkif s_prev == None: s_prev = np.zeros_like(self.state.s)if h_prev == None: h_prev = np.zeros_like(self.state.h)# save data for use in backpropself.s_prev = s_prevself.h_prev = h_prev# concatenate x(t) and h(t-1)xc = np.hstack((x,  h_prev))self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)self.state.s = self.state.g * self.state.i + s_prev * self.state.fself.state.h = self.state.s * self.state.oself.x = xself.xc = xc

LSTM的反向过程

LSTM的正向过程比较容易,反向过程则比较复杂,我们先定义一个loss function l(t)=f(h(t),y(t)))=||h(t)y(t)||2

h(t),y(t) 分别为输出序列与样本标签,我们要做的就是最小化整个时间序列上的 l(t)

,即最小化

L=t=1Tl(t)

其中T

代表整个时间序列,下面我们通过 L 来计算梯度,假设我们要计算 dLdw ,其中 w 是一个标量(例如是矩阵 Wgx 的一个元素),由链式法则可以导出
dLdw=t=1Ti=1MdLdhi(t)dhi(t)dw

其中 hi(t) 是第i个单元的输出, M 是LSTM单元的个数,网络随着时间t前向传播, hi(t) 的改变不影响t时刻之前的loss,我们可以写出:
dLdhi(t)=s=1Tdl(s)dhi(t)=s=tTdl(s)dhi(t)

为了书写方便我们令 L(t)=Ts=tl(s) 来简化我们的书写,这样 L(1) 就是整个序列的loss,重写上式有:
dLdhi(t)=s=1Tdl(s)dhi(t)=dL(t)dhi(t)

这样我们就可以将梯度重写为:

dLdw=t=1Ti=1MdL(t)dhi(t)dhi(t)dw

我们知道L(t)=l(t)+L(t+1)

,那么 dL(t)dhi(t)=dl(t)dhi(t)+dL(t+1)dhi(t) ,这说明得到下一时序的导数后可以直接得出当前时序的导数,所以我们可以计算 T 时刻的导数然后往前推,在 T 时刻有 dL(T)dhi(T)=dl(T)dhi(T)

def y_list_is(self, y_list, loss_layer):"""
        Updates diffs by setting target sequence
        with corresponding loss layer.
        Will *NOT* update parameters.  To update parameters,
        call self.lstm_param.apply_diff()
        """assert len(y_list) == len(self.x_list)idx = len(self.x_list) - 1# first node only gets diffs from label ...loss = loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])# here s is not affecting loss due to h(t+1), hence we set equal to zerodiff_s = np.zeros(self.lstm_param.mem_cell_ct)self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx -= 1### ... following nodes also get diffs from next nodes, hence we add diffs to diff_h### we also propagate error along constant error carousel using diff_swhile idx >= 0:loss += loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])diff_h += self.lstm_node_list[idx + 1].state.bottom_diff_hdiff_s = self.lstm_node_list[idx + 1].state.bottom_diff_sself.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx -= 1return loss

从上面公式可以很容易理解diff_h的计算过程。这里的loss_layer.bottom_diff定义如下:

def bottom_diff(self, pred, label):diff = np.zeros_like(pred)diff[0] = 2 * (pred[0] - label)return diff

该函数结合上文的loss function很明显。下面来推导dL(t)ds(t)

,结合前面的前向公式我们可以很容易得出 s(t) 的变化会直接影响 h(t) h(t+1) ,进而影响 L(t) ,即有:
dL(t)dhi(t)=dL(t)dhi(t)dhi(t)dsi(t)+dL(t)dhi(t+1)dhi(t+1)dsi(t)

因为 h(t+1) 不影响 l(t) 所以有 dL(t)dhi(t+1)=dL(t+1)dhi(t+1)

,因此有:

dL(t)dhi(t)=dL(t)dhi(t)dhi(t)dsi(t)+dL(t+1)dhi(t+1)dhi(t+1)dsi(t)=dL(t)dhi(t)dhi(t)dsi(t)+dL(t+1)dsi(t)

同样的我们可以通过后面的导数逐级反推得到前面的导数,代码即diff_s的计算过程。

下面我们计算dL(t)dhi(t)dhi(t)dsi(t)

因为h(t)=s(t)o(t),那么dL(t)dhi(t)dhi(t)dsi(t)=dL(t)dhi(t)oi(t)=oi(t)[diff_h],即dL(t)dsi(t)=o(t)[diff_h]i+[diff_s]i,其中[diff_h]i,[diff_s]i分别表述当前t时序的dL(t)dhi(t)和t+1时序的dL(t)dsi(t)

。同样的,结合上面的代码应该比较容易理解。

下面我们根据前向过程挨个计算导数:

dL(t)do(t)dL(t)di(t)dL(t)dg(t)dL(t)df(t)====dL(t)dh(t)s(t)dL(t)ds(t)ds(t)di(t)=dL(t)ds(t)g(t)dL(t)ds(t)ds(t)dg(t)=dL(t)ds(t)i(t)dL(t)ds(t)ds(t)df(t)=dL(t)ds(t)s(t1)(13)(14)(15)(16)

因此有以下代码:

def top_diff_is(self, top_diff_h, top_diff_s):# notice that top_diff_s is carried along the constant error carouselds = self.state.o * top_diff_h + top_diff_sdo = self.state.s * top_diff_hdi = self.state.g * dsdg = self.state.i * dsdf = self.s_prev * ds# diffs w.r.t. vector inside sigma / tanh functiondi_input = (1. - self.state.i) * self.state.i * di #sigmoid diffdf_input = (1. - self.state.f) * self.state.f * dfdo_input = (1. - self.state.o) * self.state.o * dodg_input = (1. - self.state.g ** 2) * dg #tanh diff# diffs w.r.t. inputsself.param.wi_diff += np.outer(di_input, self.xc)self.param.wf_diff += np.outer(df_input, self.xc)self.param.wo_diff += np.outer(do_input, self.xc)self.param.wg_diff += np.outer(dg_input, self.xc)self.param.bi_diff += di_inputself.param.bf_diff += df_inputself.param.bo_diff += do_inputself.param.bg_diff += dg_input# compute bottom diffdxc = np.zeros_like(self.xc)dxc += np.dot(self.param.wi.T, di_input)dxc += np.dot(self.param.wf.T, df_input)dxc += np.dot(self.param.wo.T, do_input)dxc += np.dot(self.param.wg.T, dg_input)# save bottom diffsself.state.bottom_diff_s = ds * self.state.fself.state.bottom_diff_x = dxc[:self.param.x_dim]self.state.bottom_diff_h = dxc[self.param.x_dim:]

这里top_diff_h,top_diff_s分别是上文的diff_h,diff_s。这里我们讲解下wi_diff的求解过程,其他变量类似。

dL(t)dWi=dL(t)di(t)di(t)d(Wixc(t))d(Wixc(t))dxc(t)


上式化简之后即得到以下代码

        wi_diff += np.outer((1.-i)*i*di, xc)

其它的导数可以同样得到,这里就不赘述了。

LSTM完整例子

#lstm在输入一串连续质数时预估下一个质数
import randomimport numpy as np
import mathdef sigmoid(x): return 1. / (1 + np.exp(-x))# createst uniform random array w/ values in [a,b) and shape args
def rand_arr(a, b, *args): np.random.seed(0)return np.random.rand(*args) * (b - a) + aclass LstmParam:def __init__(self, mem_cell_ct, x_dim):self.mem_cell_ct = mem_cell_ctself.x_dim = x_dimconcat_len = x_dim + mem_cell_ct# weight matricesself.wg = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wi = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len) self.wf = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wo = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)# bias termsself.bg = rand_arr(-0.1, 0.1, mem_cell_ct) self.bi = rand_arr(-0.1, 0.1, mem_cell_ct) self.bf = rand_arr(-0.1, 0.1, mem_cell_ct) self.bo = rand_arr(-0.1, 0.1, mem_cell_ct) # diffs (derivative of loss function w.r.t. all parameters)self.wg_diff = np.zeros((mem_cell_ct, concat_len)) self.wi_diff = np.zeros((mem_cell_ct, concat_len)) self.wf_diff = np.zeros((mem_cell_ct, concat_len)) self.wo_diff = np.zeros((mem_cell_ct, concat_len)) self.bg_diff = np.zeros(mem_cell_ct) self.bi_diff = np.zeros(mem_cell_ct) self.bf_diff = np.zeros(mem_cell_ct) self.bo_diff = np.zeros(mem_cell_ct) def apply_diff(self, lr = 1):self.wg -= lr * self.wg_diffself.wi -= lr * self.wi_diffself.wf -= lr * self.wf_diffself.wo -= lr * self.wo_diffself.bg -= lr * self.bg_diffself.bi -= lr * self.bi_diffself.bf -= lr * self.bf_diffself.bo -= lr * self.bo_diff# reset diffs to zeroself.wg_diff = np.zeros_like(self.wg)self.wi_diff = np.zeros_like(self.wi) self.wf_diff = np.zeros_like(self.wf) self.wo_diff = np.zeros_like(self.wo) self.bg_diff = np.zeros_like(self.bg)self.bi_diff = np.zeros_like(self.bi) self.bf_diff = np.zeros_like(self.bf) self.bo_diff = np.zeros_like(self.bo) class LstmState:def __init__(self, mem_cell_ct, x_dim):self.g = np.zeros(mem_cell_ct)self.i = np.zeros(mem_cell_ct)self.f = np.zeros(mem_cell_ct)self.o = np.zeros(mem_cell_ct)self.s = np.zeros(mem_cell_ct)self.h = np.zeros(mem_cell_ct)self.bottom_diff_h = np.zeros_like(self.h)self.bottom_diff_s = np.zeros_like(self.s)self.bottom_diff_x = np.zeros(x_dim)class LstmNode:def __init__(self, lstm_param, lstm_state):# store reference to parameters and to activationsself.state = lstm_stateself.param = lstm_param# non-recurrent input to nodeself.x = None# non-recurrent input concatenated with recurrent inputself.xc = Nonedef bottom_data_is(self, x, s_prev = None, h_prev = None):# if this is the first lstm node in the networkif s_prev == None: s_prev = np.zeros_like(self.state.s)if h_prev == None: h_prev = np.zeros_like(self.state.h)# save data for use in backpropself.s_prev = s_prevself.h_prev = h_prev# concatenate x(t) and h(t-1)xc = np.hstack((x,  h_prev))self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)self.state.s = self.state.g * self.state.i + s_prev * self.state.fself.state.h = self.state.s * self.state.oself.x = xself.xc = xcdef top_diff_is(self, top_diff_h, top_diff_s):# notice that top_diff_s is carried along the constant error carouselds = self.state.o * top_diff_h + top_diff_sdo = self.state.s * top_diff_hdi = self.state.g * dsdg = self.state.i * dsdf = self.s_prev * ds# diffs w.r.t. vector inside sigma / tanh functiondi_input = (1. - self.state.i) * self.state.i * di df_input = (1. - self.state.f) * self.state.f * df do_input = (1. - self.state.o) * self.state.o * do dg_input = (1. - self.state.g ** 2) * dg# diffs w.r.t. inputsself.param.wi_diff += np.outer(di_input, self.xc)self.param.wf_diff += np.outer(df_input, self.xc)self.param.wo_diff += np.outer(do_input, self.xc)self.param.wg_diff += np.outer(dg_input, self.xc)self.param.bi_diff += di_inputself.param.bf_diff += df_input       self.param.bo_diff += do_inputself.param.bg_diff += dg_input       # compute bottom diffdxc = np.zeros_like(self.xc)dxc += np.dot(self.param.wi.T, di_input)dxc += np.dot(self.param.wf.T, df_input)dxc += np.dot(self.param.wo.T, do_input)dxc += np.dot(self.param.wg.T, dg_input)# save bottom diffsself.state.bottom_diff_s = ds * self.state.fself.state.bottom_diff_x = dxc[:self.param.x_dim]self.state.bottom_diff_h = dxc[self.param.x_dim:]class LstmNetwork():def __init__(self, lstm_param):self.lstm_param = lstm_paramself.lstm_node_list = []# input sequenceself.x_list = []def y_list_is(self, y_list, loss_layer):"""
        Updates diffs by setting target sequence 
        with corresponding loss layer. 
        Will *NOT* update parameters.  To update parameters,
        call self.lstm_param.apply_diff()
        """assert len(y_list) == len(self.x_list)idx = len(self.x_list) - 1# first node only gets diffs from label ...loss = loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])# here s is not affecting loss due to h(t+1), hence we set equal to zerodiff_s = np.zeros(self.lstm_param.mem_cell_ct)self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx -= 1### ... following nodes also get diffs from next nodes, hence we add diffs to diff_h### we also propagate error along constant error carousel using diff_swhile idx >= 0:loss += loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])diff_h += self.lstm_node_list[idx + 1].state.bottom_diff_hdiff_s = self.lstm_node_list[idx + 1].state.bottom_diff_sself.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx -= 1 return lossdef x_list_clear(self):self.x_list = []def x_list_add(self, x):self.x_list.append(x)if len(self.x_list) > len(self.lstm_node_list):# need to add new lstm node, create new state memlstm_state = LstmState(self.lstm_param.mem_cell_ct, self.lstm_param.x_dim)self.lstm_node_list.append(LstmNode(self.lstm_param, lstm_state))# get index of most recent x inputidx = len(self.x_list) - 1if idx == 0:# no recurrent inputs yetself.lstm_node_list[idx].bottom_data_is(x)else:s_prev = self.lstm_node_list[idx - 1].state.sh_prev = self.lstm_node_list[idx - 1].state.hself.lstm_node_list[idx].bottom_data_is(x, s_prev, h_prev)

测试代码


import numpy as npfrom lstm import LstmParam, LstmNetworkclass ToyLossLayer:"""
    Computes square loss with first element of hidden layer array.
    """
    @classmethoddef loss(self, pred, label):return (pred[0] - label) ** 2    @classmethoddef bottom_diff(self, pred, label):diff = np.zeros_like(pred)diff[0] = 2 * (pred[0] - label)return diffdef example_0():# learns to repeat simple sequence from random inputsnp.random.seed(0)# parameters for input data dimension and lstm cell count mem_cell_ct = 100x_dim = 50concat_len = x_dim + mem_cell_ctlstm_param = LstmParam(mem_cell_ct, x_dim) lstm_net = LstmNetwork(lstm_param)y_list = [-0.5,0.2,0.1, -0.5]input_val_arr = [np.random.random(x_dim) for _ in y_list]for cur_iter in range(100):print "cur iter: ", cur_iterfor ind in range(len(y_list)):lstm_net.x_list_add(input_val_arr[ind])print "y_pred[%d] : %f" % (ind, lstm_net.lstm_node_list[ind].state.h[0])loss = lstm_net.y_list_is(y_list, ToyLossLayer)print "loss: ", losslstm_param.apply_diff(lr=0.1)lstm_net.x_list_clear()if __name__ == "__main__":example_0()
http://www.lbrq.cn/news/2567305.html

相关文章:

  • 自己怎么做网站购买空间东莞企业网站排名优化
  • 网站轮播图教程优化营商环境的意义
  • 网站点击赚钱怎么做跨国网站浏览器
  • 免费个人logo设计seo 最新
  • 怎么自己做网站qq域名购买哪个网站好
  • 成都网站营销seo电话快排seo软件
  • 极速在线网站免费个人网站怎么建立
  • 网站平台建设合作协议友情链接官网
  • 阿里巴巴外贸批发网seo教程自学
  • 没技术怎么做网站百度竞价排名商业模式
  • 烟台网站制作公司百度做广告怎么收费
  • 网页游戏网站排名前10名长沙做网站的公司有哪些
  • 饰品公司网站建设策划书网站关键词排名优化软件
  • 没有网站可以做cpa广告么潍坊今日头条新闻最新
  • 微网站建设资讯百度推广方式
  • 东莞做棋牌网站建设网站的seo如何优化
  • 网站积分解决方案网址大全名称
  • 上海新闻网首页优化服务公司
  • win7 iis 新建网站福州网seo
  • 做产品代理上哪个网站好希爱力
  • 数据推广是干什么的卢镇seo网站优化排名
  • 南昌哪里可以做电商网站百度广告推广平台
  • 智能网站开发工具seo和sem
  • 网站怎么添加百度商桥电商培训视频教程
  • 可以做请柬的网站互联网推广平台有哪些
  • 网站建设背景怎么写海外推广解决方案
  • 服务器如何做网站南宁市优化网站公司
  • iis怎么查看网站的域名巨量算数数据分析入口
  • wordpress防镜像seo关键词优化要多少钱
  • seo站内站怎么做手机百度网页版
  • 「iOS」————SideTable
  • 技巧|SwanLab记录ROC曲线攻略
  • TwinCAT3示例项目1
  • LaTeX 复杂图形绘制教程:从基础到进阶
  • 2025年财税行业拓客破局:小蓝本财税版AI拓客系统助力高效拓客
  • SpringAI 1.0.0发布:打造企业级智能聊天应用