Despite the tremendous progress achieved over the past decade, the study of stellar formation is far from complete. We have not yet measured the minimum mass for star formation, nor the shape of the IMF down to the least massive free-floating planets, or know how universal this shape is. Although clusters are the building blocks of galaxies, little is known about their early dynamical evolution and dispersal into the field. The main culprit for this state of affairs is the high level of contamination and incompleteness in the sub-stellar regime, even for the best photometric and astrometric surveys. COSMIC-DANCE aims at overcoming these drawbacks and revealing the shape of the IMF with a precision and completeness surpassing current and foreseeable surveys of the next 15 years. We will:
  1. Measure: we will measure proper motions with an accuracy comparable to Gaia but 5 magnitudes deeper, reaching the planetary mass domain, and, critically, piercing through the dust obscured young clusters inaccessible to Gaia’s optical sensors.
  2. Discover: feeding these proper motions and the multi-wavelength photometry to innovative hyper-dimensional data mining techniques, we will securely identify cluster members within the millions of sources of the COSMIC-DANCE database, complemented by Gaia at the bright end, to obtain the final census over the entire mass spectrum for 20 young nearby clusters, the end of a 60-year quest.
  3. Understand: by providing conclusive empirical constraints over a broad parameter space unaccessible to current state-of-the-art surveys on the much debated respective contributions of evolutionary effects (dynamics, feedback and competitive accretion) and initial conditions (core properties) to the shape and bottom of the IMF, the most fundamental and informative product of star formation, with essential bearings on many areas of general astrophysics.

Context & motivations

The on-going Gaia space mission will provide an exquisite astrometric accuracy and complete 6 dimension census of the sky up to G≈15 mag, and a 5 dimension census up to G≈20 mag. Although it represents a tremendous improvement with respect to its predecessor Hipparcos, Gaia will unfortunately not be sensitive enough to study the least massive objects and the core of young associations. A luminosity of G≈20 mag indeed corresponds to ≈20 MJup at 150 pc and for an age of 3 Myr (typical of young nearby associations), when the mass function is known to extend at least down to 3∼4 MJup. Additionally, young stellar clusters and associations are very often deeply embedded and contain bright H II regions. Since it will operate in the visible part of the spectrum, Gaia will be mostly blind in the regions of heavy extinction and bright nebular emission (see animation to the right), where precisely most of the star formation is taking place. Deeper and longer wavelength (infrared, to see "through" the cloud) observations are required. There is therefore a strong need to complement Gaia:

1. beyond its sensitivity limit to reach the least massive sub-stellar objects, by using deeper images

2. in the embedded cores of young nearby associations, by using infrared images


The stars members of an association were all born from the same molecular cloud with its own original momentum. At the end of the formation process, all the members move together with space motions similar to the parent cloud, making it an extremely effective method of identification. Field stars indeed have random proper motions (i.e., motions across the plane of the sky) while background galaxies have no measurable proper motion. Any object displaying a proper motion similar to the group is therefore most likely a group member.

To identify the members of young nearby associations, we will measure the proper motion for millions of sources around the association. Objects moving at the same speed and in the same direction as the group and having spectral properties consistent with the group will be genuine members.

Measuring proper motions for millions of sources with an accuracy sufficient to evaluate their membership is a major technical challenge. Because the nearest young associations are far away (between 3 and 15x1015 km!), their memebrs move very slowly on the plane of the sky. The fastest move by 20 to 50 millarcsecond per year. A milliarcsecond is about the size of a 10 cent coin atop the Eiffel Tower as seen from New York. Detecting and measuring such motions requires high quality images and a long time baseline. By combining archival visible and near-infrared images obtained 15 to 20 years ago with new images, we can derive proper motion and multi-wavelength photometry for millions of stars in young (<100Myr) nearby (<500pc) associations and clusters.

Data mining: finding the needles in the haystack

Cataloguing all (and only) the members of a cluster is a major challenge in many ways similar to the “needle in the haystack” parable. One must identify the rare cluster members (the “needles”, typically a few hundreds) within the overwhelming multitude of field stars and background galaxies (the “haystack”, millions of interlopers!). The cluster members selection is also a field in which COSMIC-DANCE is going to profoundly transform our ability to interpret the mass function, by delivering luminosity functions with proper uncertainties.

Until recently, the samples involved in studies of nearby clusters were relatively small. The unprecedented scale of the COSMIC-DANCE database, including tens of millions of entries in multiple astrometric and photometric dimensions, cannot be comprehended by humans directly and makes standard selection techniques completely obsolete. Finding the needles in the haystack and turning the extraordinarily rich COSMIC-DANCE data collections into knowledge is a complex hyper-dimensional and Big-Data problem that we propose to solve using the most advanced methods from the areas of Data Mining and Probabilistic Learning. The objective is to decide on the cluster membership of sources and at the same time to derive the cluster’s fundamental properties (luminosity function, spatial distribution). The two problems must be solved concurrently because the membership of a source depends on the cluster properties, and the cluster properties can only be inferred from the members properties.

A simple example can illustrate the nested nature of this problem: a source located near the cluster core is more likely to be a member than a source located far from it. The spatial location of a source with respect to the cluster is telling us something about its membership, and we should use this important information to optimize the selection of members and minimize contamination. But to know the spatial distribution (e.g core location in this case) of the cluster, we first need to know its members. Hierarchical models are designed to deal with this kinf of "chicken-and-the-egg" problems. All of this must be accomplished in a high-dimensional space (typically ≥10-D of proper motions, colours and luminosities) and including a rigorous treatment of uncertainties and incomplete data.

Thanks to half a century of intense research, our knowledge of the nearby clusters is already well advanced, and good (although incomplete) samples of high-probability members exist based in particular on spectroscopic studies. Using that knowledge to define the prior distribution of our hierarchical models parameters facilitates the convergence of the analysis, ensures that only physically realistic parameters are probed, and makes the selection independent of evolutionary models. Special care is taken to make this technique scalable, as information and prior knowledge increase with new data (e.g. radial velocities, distances, rotational periods,…). In particular, it will be immediately applicable to the Gaia and Gaia-ESO catalogues, which we will use to complement COSMIC-DANCE and ensure a complete coverage from the fragmentation limit to the massive OB stars.