本文最后更新于 524 天前，其中的信息可能已经过时，如有错误请发送邮件到wuxianglongblog@163.com

Theano 在 Windows 上的配置

注意：不建议在 windows 进行 theano 的配置。

务必确认你的显卡支持 CUDA。

我个人的电脑搭载的是 Windows 10 x64 系统，显卡是 Nvidia GeForce GTX 850M。

安装 theano

首先是用 anaconda 安装 theano：

conda install mingw libpython
pip install theano

安装 VS 和 CUDA

按顺序安装这两个软件：

安装 Visual Studio 2010/2012/2013
安装对应的 x64 或 x86 CUDA

Cuda 的版本与电脑的显卡兼容。

我安装的是 Visual Studio 2012 和 CUDA v7.0v。

配置环境变量

CUDA 会自动帮你添加一个 CUDA_PATH 环境变量（环境变量在控制面板->系统与安全->系统->高级系统设置中），表示你的 CUDA 安装位置，我的电脑上为：

CUDA_PATH
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0

我们配置两个相关变量：

CUDA_BIN_PATH
- %CUDA_PATH%\bin
CUDA_LIB_PATH
- %CUDA_PATH%\lib\Win32

接下来在 Path 环境变量的后面加上：

Minicoda 中关于 mingw 的项：
- C:\Miniconda\MinGW\bin;
- C:\Miniconda\MinGW\x86_64-w64-mingw32\lib;
VS 中的 cl 编译命令：
- C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin;
- C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE;

生成测试文件：

%%file test_theano.py
from theano import config
print 'using device:', config.device

Writing test_theano.py

我们可以通过临时设置环境变量 THEANO_FLAGS 来改变 theano 的运行模式，在 linux 下，临时环境变量直接用：

THEANO_FLAGS=xxx

就可以完成，设置完成之后，该环境变量只在当前的命令窗口有效，你可以这样运行你的代码：

THEANO_FLAGS=xxx python .py

在 Windows 下，需要使用 set 命令来临时设置环境变量，所以运行方式为：

set THEANO_FLAGS=xxx && python .py

import sys

if sys.platform == 'win32':
    !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py
else:
    !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py

using device: cpu

if sys.platform == 'win32':
    !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
else:
    !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py

Using gpu device 0: Tesla C2075 (CNMeM is disabled)
using device: gpu

测试 CPU 和 GPU 的差异：

%%file test_theano.py

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))

t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

Overwriting test_theano.py

if sys.platform == 'win32':
    !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py
else:
    !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py

Looping 1000 times took 3.498123 seconds

Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761

  1.62323284]

Used the cpu

if sys.platform == 'win32':
    !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
else:
    !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py

Using gpu device 0: Tesla C2075 (CNMeM is disabled)
Looping 1000 times took 0.847006 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

可以看到 GPU 明显要比 CPU 快。

使用 GPU 模式的 T.exp(x) 可以获得更快的加速效果：

%%file test_theano.py

from theano import function, config, shared, sandbox
import theano.sandbox.cuda.basic_ops
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), 'float32'))
f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))

t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
print("Numpy result is %s" % (numpy.asarray(r),))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

Overwriting test_theano.py

if sys.platform == 'win32':
    !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
else:
    !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py

Using gpu device 0: Tesla C2075 (CNMeM is disabled)
Looping 1000 times took 0.318359 seconds
Result is 
Numpy result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

!rm test_theano.py

配置 .theanorc.txt

我们可以在个人文件夹下配置 .theanorc.txt 文件来省去每次都使用环境变量设置的麻烦：

例如我现在的 .theanorc.txt 配置为：

[global]
device = gpu
floatX = float32

[nvcc]
fastmath = True
flags = -LC:\Miniconda\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin

[gcc]
cxxflags = -LC:\Miniconda\MinGW

具体这些配置有什么作用之后可以查看官网上的教程。

Theano 在 Windows 上的配置

安装 theano

安装 VS 和 CUDA

配置环境变量

配置 .theanorc.txt

发送评论 编辑评论

推荐文章

发送评论编辑评论