前面两篇让代码飞起来——高性能 Julia 学习笔记(一) 让代码飞起来——高性能 Julia 学习笔记(二), 介绍了如何写出高性能的 Julia 代码, 这篇结合我最近的项目, 简单测试对比一下各种语言用 monte carlo 算法计算 pi 的效率。
首先声明一下, 本文不能算严格意义上的性能测试, 也不想挑起语言圣战, 个人能力有限, 实现的不同语言版本代码也未必是最高效的, 基本都是 naive 实现。
如果对 Monte Carlo 算法不熟悉, 可以参考下面两个资料, 我就不浪费时间重复了:
- https://zh.wikipedia.org/wiki...
- http://www.ruanyifeng.com/blo...
机器是 2015 年的 MacPro:
Processor: 2.5GHz Intel Core i7
Memory: 16GB 1600 MHZ DDR3
Os: macOS High Sierra Version 10.13.4
JS 版本
function pi(n) {let inCircle = 0;for (let i = 0; i <= n; i++) {x = Math.random();y = Math.random();if (x * x + y * y < 1.0) {inCircle += 1;}}return (4.0 * inCircle) / n;
}
const N = 100000000;
console.log(pi(N));
结果:
➜ me.magicly.performance git:(master) ✗ node --version
v10.11.0
➜ me.magicly.performance git:(master) ✗ time node mc.js
3.14174988
node mc.js 10.92s user 0.99s system 167% cpu 7.091 total
Go 版本
package mainimport ("math/rand"
)func PI(samples int) (result float64) {inCircle := 0r := rand.New(rand.NewSource(42))for i := 0; i < samples; i++ {x := r.Float64()y := r.Float64()if (x*x + y*y) < 1 {inCircle++}}return float64(inCircle) / float64(samples) * 4.0
}func main() {samples := 100000000PI(samples)
}
结果:
➜ me.magicly.performance git:(master) ✗ go version
go version go1.11 darwin/amd64
➜ me.magicly.performance git:(master) ✗ time go run monte_carlo.go
go run monte_carlo.go 2.17s user 0.10s system 101% cpu 2.231 total
C 版本
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#define SEED 42int main(int argc, char **argv)
{int niter = 100000000;double x, y;int i, count = 0;double z;double pi;srand(SEED);count = 0;for (i = 0; i < niter; i++){x = (double)rand() / RAND_MAX;y = (double)rand() / RAND_MAX;z = x * x + y * y;if (z <= 1)count++;}pi = (double)count / niter * 4;printf("# of trials= %d , estimate of pi is %g \n", niter, pi);
}
结果:
➜ me.magicly.performance git:(master) ✗ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
➜ me.magicly.performance git:(master) ✗ gcc -O2 -o mc-pi-c mc-pi.c
➜ me.magicly.performance git:(master) ✗ time ./mc-pi-c
# of trials= 100000000 , estimate of pi is 3.14155
./mc-pi-c 1.22s user 0.00s system 99% cpu 1.226 total
C++ 版本
#include <iostream>
#include <cstdlib> //defines rand(), srand(), RAND_MAX
#include <cmath> //defines math functionsusing namespace std;int main()
{const int SEED = 42;int interval, i;double x, y, z, pi;int inCircle = 0;srand(SEED);const int N = 100000000;for (i = 0; i < N; i++){x = (double)rand() / RAND_MAX;y = (double)rand() / RAND_MAX;z = x * x + y * y;if (z < 1){inCircle++;}}pi = double(4 * inCircle) / N;cout << "\nFinal Estimation of Pi = " << pi << endl;return 0;
}
结果:
➜ me.magicly.performance git:(master) ✗ c++ --version
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
➜ me.magicly.performance git:(master) ✗ c++ -O2 -o mc-pi-cpp mc-pi.cpp
➜ me.magicly.performance git:(master) ✗ time ./mc-pi-cppFinal Estimation of Pi = 3.14155
./mc-pi-cpp 1.23s user 0.01s system 99% cpu 1.239 total
Julia 版本
function pi(N::Int)inCircle = 0for i = 1:Nx = rand() * 2 - 1y = rand() * 2 - 1r2 = x*x + y*yif r2 < 1.0inCircle += 1endendreturn inCircle / N * 4.0
endN = 100_000_000
println(pi(N))
结果:
➜ me.magicly.performance git:(master) ✗ julia__ _ _(_)_ | Documentation: https://docs.julialang.org(_) | (_) (_) |_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.| | | | | | |/ _` | || | |_| | | | (_| | | Version 1.0.1 (2018-09-29)_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |julia> versioninfo()
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:OS: macOS (x86_64-apple-darwin14.5.0)CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHzWORD_SIZE: 64LIBM: libopenlibmLLVM: libLLVM-6.0.0 (ORCJIT, haswell)➜ me.magicly.performance git:(master) ✗ time julia mc.jl
3.14179496
julia mc.jl 0.85s user 0.17s system 144% cpu 0.705 total
另外 Rust 开发环境升级搞出了点问题, 没弄好, 不过根据之前的经验, 我估计跟 C++差不多。
github 上找到一份对比, 包含了更多的语言, 有兴趣的可以参考一下https://gist.github.com/jmoir... , LuaJIT 居然跟 Rust 差不多一样快, 跟 Julia 官网的 benchmark 比较一致https://julialang.org/benchma... 。
另外实现了两个 Go 的并发版本:
package mainimport ("fmt""math/rand""runtime""time"
)type Job struct {n int
}var threads = runtime.NumCPU()
var rands = make([]*rand.Rand, 0, threads)func init() {fmt.Printf("cpus: %d\n", threads)runtime.GOMAXPROCS(threads)for i := 0; i < threads; i++ {rands = append(rands, rand.New(rand.NewSource(time.Now().UnixNano())))}
}func MultiPI2(samples int) float64 {t1 := time.Now()threadSamples := samples / threadsjobs := make(chan Job, 100)results := make(chan int, 100)for w := 0; w < threads; w++ {go worker2(w, jobs, results, threadSamples)}go func() {for i := 0; i < threads; i++ {jobs <- Job{n: i,}}close(jobs)}()var total intfor i := 0; i < threads; i++ {total += <-results}result := float64(total) / float64(samples) * 4fmt.Printf("MultiPI2: %d times, value: %f, cost: %s\n", samples, result, time.Since(t1))return result
}
func worker2(id int, jobs <-chan Job, results chan<- int, threadSamples int) {for range jobs {// fmt.Printf("worker id: %d, job: %v, remain jobs: %d\n", id, job, len(jobs))var inside int// r := rand.New(rand.NewSource(time.Now().UnixNano()))r := rands[id]for i := 0; i < threadSamples; i++ {x, y := r.Float64(), r.Float64()if x*x+y*y <= 1 {inside++}}results <- inside}
}func MultiPI(samples int) float64 {t1 := time.Now()threadSamples := samples / threadsresults := make(chan int, threads)for j := 0; j < threads; j++ {go func() {var inside intr := rand.New(rand.NewSource(time.Now().UnixNano()))for i := 0; i < threadSamples; i++ {x, y := r.Float64(), r.Float64()if x*x+y*y <= 1 {inside++}}results <- inside}()}var total intfor i := 0; i < threads; i++ {total += <-results}result := float64(total) / float64(samples) * 4fmt.Printf("MultiPI: %d times, value: %f, cost: %s\n", samples, result, time.Since(t1))return result
}func PI(samples int) (result float64) {t1 := time.Now()var inside int = 0r := rand.New(rand.NewSource(time.Now().UnixNano()))for i := 0; i < samples; i++ {x := r.Float64()y := r.Float64()if (x*x + y*y) < 1 {inside++}}ratio := float64(inside) / float64(samples)result = ratio * 4fmt.Printf("PI: %d times, value: %f, cost: %s\n", samples, result, time.Since(t1))return
}func main() {samples := 100000000PI(samples)MultiPI(samples)MultiPI2(samples)
}
结果:
➜ me.magicly.performance git:(master) ✗ time go run monte_carlo.1.go
cpus: 8
PI: 100000000 times, value: 3.141778, cost: 2.098006252s
MultiPI: 100000000 times, value: 3.141721, cost: 513.008435ms
MultiPI2: 100000000 times, value: 3.141272, cost: 485.336029ms
go run monte_carlo.1.go 9.41s user 0.18s system 285% cpu 3.357 total
可以看出, 效率提升了 4 倍。 为什么明明有8 个 CPU, 只提升了 4 倍呢? 其实我的 macpro 就是 4 核的, 8 是超线程出来的虚拟核,在 cpu 密集计算上并不能额外提升效率。 可以参考这篇文章: 物理 CPU、CPU 核数、逻辑 CPU、超线程 。
下一篇,我们就来看一下 Julia 中如何利用并行进一步提高效率。
欢迎加入知识星球一起分享讨论有趣的技术话题。