|Posted on October 27, 2010 at 6:51 PM|
This week I spent a few days in Torun invited by Prof. Jacek Kobus. I gave a seminar about the harmonium project. I leave the talk here. I love the mistic atmosphere of the old city, its people and the food. Torun is certainly one of my favorite cities in Poland. The university is very nice, and it is growing with the help of EU funds. A quantum optics center linked to the university is just being build.
|Posted on October 26, 2010 at 9:29 AM|
The last call for BSC granted us with 70,000h to run our code. So we will be able to test the efficiency of the parallelization of our code and also to obtain some results for four-electron harmonium. An unfortunate event (the workstation in Szczecin halted) allowed me to test the performance on 16 cores. The results are very promising for the symmetry-related part of the code, yielding an efficency of 89% in 8 cores. There are some magic numbers of processors which yield the best performances. They correspond to submultiples of 64: 2, 4, 8 and 16. No surprise here, an even distribution of tasks maximizes the peformance. Obviously this is related to the way processors are distributed; there is a nice linear trend which relates the number of tasks (out of 64) per core with the computational task.
Maximum number of tasks assigned to each processor vs. time (s).
The only warning shows for the 16-processor job, which 'only' yields a performance of 76%. But I'm not much worried about that because using 16-processors we are saturating the machine (thus the job competes with system processes) and, in addition, one should notice the task split into 16 processors only takes 15 minutes (actually the same parallelized routine is called 10 times during these 15 minutes, therefore the performance is tested on a 1.5 minutes run, and delay accumulates ten times), which is not a long enough time to obtain reliable data.
|Posted on October 17, 2010 at 9:49 AM|
Last week I finished the implementation of the MPI-version of the double loop in the sigma vector calculation in Knowles program. The performance of the parallelization is very good, with an efficiency of parallelization above 85%. And most interestingly, it looks like the performance will hold with larger number of processors because the computational time decreases linearly with the number of processors. The maximum number of jobs allowed for this part of the code is 64, corresponding to the double loop over the number of irreducible representations (eight) of D2h group.
procs t(s) cpus effic
6 2345 5.14 86
5 2788 4.33 87
4 3200 3.77 94
3 4522 2.67 89
2 6720 1.79 90
1 12060 1.00 100
These calculations should be taken as a guide, because they were performed in a workstation with other processors running at the same time. If eventually I get an empty workstation to use, I would provide the new parallelization data. However, I don't expect major differences.
|Posted on September 24, 2010 at 11:04 AM|
As I mentioned few months ago we want to produce benchmark calculations for few-electron harmonium. The idea is to use this information in the calibration of quantum chemical methods, and specially we have in mind the design of a new functional using these data. To this aim, we developed a method to produce highly accurate results for few-electron harmonium. The method uses an extrapolation scheme based on few FCI calculations. Since these calculations for more than three electrons are prohibitive, we are developing an MPI version of the code. This post is about such possibility and a few benchmark results we have obtained so far.
In order for the extrapolation scheme to work we must use a particular basis set. There are many basis sets adequate for extrapolation schemes, but these basis are optimized for molecular systems and they are not so good for harmonium. Our basis set consists of even temperated gaussian functions, the exponents of which (alpha, beta) are optimized for each calculation using the simplex method. The simplex method shrinks and shift along a grid of nine points, and therefore we can expect few calculations that can run simultaneously. We always start with 9 calculations needed and every time we shrink the grid we need additional eight calculations, if we shift, between 3-5 calculations are needed for the next step. Therefore, in this first step we may use between 3-9 processors. This part is trivial to parallelize (it's an embarrassingly parallel problem).
In order to perform the FCI calculation we use a modified version of the Peter Knowles' code (CPL 1989). This code uses D2h symmetry to perform Davidson iterations over a FCI matrix constructed from Slater determinants. Since the program uses symmetry, it can be parallelized through the different irreducible representations of D2h point group. Namely, the calculation of the sigma vector -which involves a double loop over the eight irreducible representations- can be parallelized. We may thus use up to 64 processors in this part of the program.
2) The Sigma vector.
Each of these 64 tasks running in parallel contain a main procedure that takes most of time: the calculation of the sigma vector. The calculation of the sigma vector involves a large matrix multiplication. We can also split this last task among different processors (as simple as letting one processor handle one column at a time).
Therefore, altogether we can expect our code to parallelize very well among 192N to 576N processors. This factor N is the number of processors among which we could split our matrix multiplication. We have requested by third time [sic] computational time in the Barcelona Supercomputer Center (BSC), in there the maximum number of processors we can use in one parallel job is 1024. So this means we should prove the MPI subroutine for matrix multiplication parallelizes well for 2-5 processors. Let me give you some numbers of parallelization (these are lower bounds to the actual performance because few processes are running in the same workstation):
1 proc 3 proc* 5 proc*
total time (s) 26,689 10,077 (2.6) 6,737 (4.0)
largest multiplication (s) 1,996.85 670.97 (3.0) 403.32 (5.0)
* The numbers in brackets correspond to the number of processors used (ideally they should correspond to the numbers of processors requested).
These numbers prove that an efficient parallelization among up to thousand processors is possible with our modified FCI code. Nowadays I have implemented the matrix multiplication in parallel and I'm working in parallelization among the different irreducible representations (I expect to have it ready in less than two weeks). The parallelization of the optimization process is quite trivial and can be done in a few hours.
|Posted on August 2, 2010 at 7:01 PM|
It's summer time, and I will have a short break for two weeks, starting this Friday evening. I plan to spend one week by the seaside. I will bring some stuff to read to the coast, but I will be unplugged and mostly resting and meeting friends. The second week I will be taking it easy but available from time to time in the net.
Have a nice vacation!
|Posted on July 7, 2010 at 10:17 AM|
Last but not least, I'm attending the Girona seminar. The ninth edition of the binnual conference organized in Girona by the IQC (my former lab) dedicated to Ramon Carbó-Dorca 70th birthday. This year organizers managed to bring an exhaustive list of highly recognized experts, the longest I recall in Girona. I have had the opportunity to give a small talk on three-electron harmonium.
In 'el Celler': Jadwiga and Jacek Styszynski, Josep Roca, Jerzy Cioslowski, my brother Xevi and me.
|Posted on July 2, 2010 at 9:52 AM|
I'm currently attending the Electronic Structures: Principles and Applications (ESPA) conference, which in its 7th edition is held in Oviedo (Asturias). Here I was given the opportunity to communicate the last results we have obtained for a-few-electron harmonium. We are gettting close to obtain new benchmark data which will be used in the next months to calibrate the performance of different electronic structure methods and DFT functionals. These results involve three-electron harmonium. We need to go to larger number of electrons ( four and five) in order to provide more relevant data which can be used to develope new DFT and DMFT functionals. Let us hope we can find supercomputing facilities in order to carry out this research.
This presentation is available here.
|Posted on June 22, 2010 at 10:13 AM|
This week I'm in Paris, in the conference dedicated to the twenty years of the electron localization function (ELF) and to Bernard Silvi in occasion of his retirement. I gave a talk on an approximation to the ELF using natural orbitals. We will be submitting the work soon. Here the pdf of the talk.
Outside the Louvre with Carlo Gatti, Jose Manuel Recio and Ángel Martín Pendás (from left to right).
|Posted on June 10, 2010 at 5:15 AM|
A particular course-conference in Dubrovnic has given me the opportunity to give a small course on Molecular Vibrations, enjoy the nice weather and the excellent food in the oldest fortified city by the Mediterreanian (from Middle ages).
Here you can download the presentation.
|Posted on June 2, 2010 at 1:21 AM|
I'm back from Berkeley's. A nice conference dedicated to Fritz Schaefer III, with outstanding speakers (Becke, Burke, Yang, Mazzioti, Helgaker, Morokuma, Grimme, Scusseria and Ruud among others). I presented a poster there but didn't have much audience [sic]. I have to admit it's not the best poster I have ever done. Anyway... I leave it here in case it may be of your interest.
On the ludic side, I was trekking in Yosemite, driving through Napa Valley (wine region) and biking on SFO. A nice experience.