CLARIN.SI-embed.hr contains word embeddings induced from a large collection of Croatian texts composed of the Croatian web corpus hrWaC, a 400-million-token-heavy collection of newspaper texts and MaCoCu-hr. The embeddings are based on the skip-gram model of fastText trained on 4,586,769,197 tokens of running text for 3,406,574 lowercased surface forms.
The difference to the previous version of the embeddings is that this version was trained on the original dataset expanded with the MaCoCu-hr web crawl corpus (http://hdl.handle.net/11356/1516).