当前位置：首页 > news >正文

营销型网站建设php源码/深圳seo网络推广

news 2025/8/21 23:22:59

营销型网站建设php源码,深圳seo网络推广,网站建设主要考虑哪些因素,wordpress伪静态 404Word-Count 问题 WordCount是分布式计算的入门问题,但也是最基本和经典的问题. 问题是让你统计一个超级大的文件(可能上T级别), 里面的每个字符串各出现了多少次. 本文聚焦的并不是真正的WordCount问题,而是DigitCount, 即一个64G大小的Int数组,让你统计里面每个Int分别出现了…

Word-Count 问题

WordCount是分布式计算的入门问题,但也是最基本和经典的问题. 问题是让你统计一个超级大的文件(可能上T级别), 里面的每个字符串各出现了多少次.

本文聚焦的并不是真正的WordCount问题,而是DigitCount, 即一个64G大小的Int数组,让你统计里面每个Int分别出现了多少次. 这也是我们并行计算课程的大作业.

思路1: Map-Reduce

刚开始自己对MPI的API还不够熟悉,所以Map-Reduce都是自己写的, 先通过主进程通过read读取数据,再通过scatter分发数据, 分发数据之后各个进程自己计算,然后最后通过子进程send, 根进程receive来统计结果(reduce过程)

这里写图片描述

如果作为Word Count问题来思考的话,是需要维护一个map<string, int>结构的,在这个问题中就是map<int, int>, 前一个int是出现的int,后一个是出现的次数.但是这样来做的话效率会比较低. 因为每次看到一个int都需要查Map并且进行累加, 累计时间复杂度O(NlogN). 并且由于需要传输一个Map, 需要重新定义传输结构:

void DigitCount::defineTransportStruct()
{MPI_Datatype oldtypes[2];int          blockcounts[2];// MPI_Aint type used to be consistent with syntax of// MPI_Type_extent routineMPI_Aint    offsets[2], extent;// setup description of the 1 MPI_INT fields numoffsets[0] = 0;oldtypes[0] = MPI_LONG;blockcounts[0] = 1;// setup description of the 1 MPI_LONG fields cntMPI_Type_extent(MPI_LONG, &extent);offsets[1] = extent;oldtypes[1] = MPI_INT;blockcounts[1] = 1;// define structured type and commit itMPI_Type_struct(2, blockcounts, offsets, oldtypes, &MPI_NumCnt);MPI_Type_commit(&MPI_NumCnt);
}

在对数组进行计数的时候,由于访问临界区,并且因为Map需要有个判断是否存在的过程,无法使用OpenMP的原子(atomic)操作来进行并行化,只能使用critical directive,消耗较大

void DigitCount::countArray(const int *arr, long length, map<int, long> &num2cnt) {
//#pragma omp parallel shared(num2cnt, N, process_num) private(id)for (int i = 0; i < length; ++i) {if(num2cnt.count(arr[i]))++num2cnt[arr[i]];elsenum2cnt[arr[i]] = 1;}
}

从下面测试结果可以看出(在一个小的测试数据集), 子程序在Scatter部分消耗了大量时间,而这时候主进程在scatter是几乎不消耗时间的,这造成各个进程实际上并不是在并行计算,主进程会提前完成任务

这里写图片描述

思路2: 并行I/O

实际上MPI提供并行I/O的接口,可参考下面的PPT

http://www.training.prace-ri.eu/uploads/tx_pracetmo/pio1.pdf

并行I/O就避免了读取再Scatter的过程,这样增加了消息传递时间,并且还消耗了2倍内存.

这里写图片描述

MPI_File fh;
MPI_Status status;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
bufsize = FILESIZE/nprocs;
nints = bufsize/sizeof(int);
MPI_File_open(MPI_COMM_WORLD, "/pfs/datafile",MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
MPI_File_seek(fh, rank * bufsize, MPI_SEEK_SET);
MPI_File_read(fh, buf, nints, MPI_INT, &status);
MPI_File_close(&fh);

思路3: 空间换时间

查了以下天河超算的性能, 每个节点有2颗Ivy Bridge-EXeonE5 2692处理器和3个XeonPhi处理器, 每个运算节点有64GB主内存,而每个Xeon Phi板载8GB内存,因此每个节点共有88GB内存,所以来说,空间是非常大的.所以考虑可以开个UINT32_MAX 大的数组,然后用Unsigned Short来计数.
也就是通过构造索引时间复杂度为O(1)的Hash Table, 来降低原本索引复杂度为O(logN) 的map<int, int>

除此之外,这样做还有两个好处:

在使用OpenMP并行计数的时候可以使用原子操作,来降低原本临界区锁的消耗

void DigitCount::countArrayWithArray(const int *arr, long long length, short *cntArr) {omp_set_num_threads(THREAD_NUM);
#pragma omp parallel forfor (int i = 0; i < length; ++i) {
#pragma omp atomic++cntArr[getIndex(arr[i])];}
}

方便使用MPI_Reduce操作, 由于开了个UINT32_MAX大小的数组, 不能直接MPI_Reduce, MPI_Reduce的count最大支持INT32_MAX大的数据,所以需要Reduce两次.

void DigitCount::reduceResultWithArray(short *branchArr, short *masterArr) {MPI_Status stat;MPI_Reduce(branchArr, masterArr, ARRAY_SIZE>>1, MPI_SHORT, MPI_SUM, 0, MPI_COMM_WORLD);MPI_Reduce(branchArr + (ARRAY_SIZE>>1), masterArr + (ARRAY_SIZE>>1), ARRAY_SIZE>>1+1, MPI_SHORT, MPI_SUM, 0, MPI_COMM_WORLD);
}

最终实验结果

4进程, 每个进程8线程并行时:

RANK_2: Read duration = 16.1     Count duration = 146.4  Reduce duration = 21.6
RANK_3: Read duration = 17.1     Count duration = 147.6  Reduce duration = 20.6
RANK_0: Read duration = 17.6     Count duration = 147.4  Reduce duration = 13.4
RANK_1: Read duration = 18.4     Count duration = 145.6  Reduce duration = 14.8
Total Use Memory 20971520.0 KB, Average Cost 186.84 seconds, Max Duration is 190.44 seconds, Min Duration is 183.85 seconds

8进程, 每个进程8线程并行时:

RANK_4: Read duration = 8.0  Count duration = 74.5   Reduce duration = 10.9
RANK_6: Read duration = 8.1  Count duration = 74.7   Reduce duration = 12.4
RANK_7: Read duration = 7.2  Count duration = 73.5   Reduce duration = 14.2
RANK_5: Read duration = 7.7  Count duration = 74.0   Reduce duration = 12.2
RANK_2: Read duration = 8.3  Count duration = 74.7   Reduce duration = 13.8
RANK_3: Read duration = 7.7  Count duration = 74.0   Reduce duration = 11.5
RANK_0: Read duration = 7.5  Count duration = 73.6   Reduce duration = 14.7
RANK_1: Read duration = 7.2  Count duration = 74.4   Reduce duration = 13.9
Total Use Memory 20971520.0 KB, Average Cost 99.99 seconds, Max Duration is 101.97 seconds, Min Duration is 98.43 seconds

由于不太熟悉天河超算,在上面跑的程序每5整分就会被cancel一次,所以超过5分钟的程序就无法完整跑完.

整体来看,可以看到主要的时间消耗在Count阶段,Reduce阶段惊人地快, UINT32_MAX 的short数组, 大概8GB的内存,本来以为进程间通信会消耗很多时间,但是实际上没有,所以看得出MPI在性能优化上做得相当出色.从4进程到8进程, 速度提升53.51%

完整代码GitHub链接:https://github.com/ZezhongWang/BlogCode

查看全文

http://www.lbrq.cn/news/1399429.html