Every second in social networks recorded eight new users. Every hour network replenished with millions of likes, posts, photos, videos. What I can tell, these streams of data? Researchers race to create a program that can extract more useful information from social networks. Beware: your every like contains information about you!
Eighty million two hundred fifty eight thousand four hundred twenty seven
1. The posts and neologisms
"Repost", "like", "comment" — all these words firmly in our lexicon, but still remain outside the dictionaries. Scientists from the Higher school of Economics and Moscow state University. M. V. Lomonosov decided to rectify the situation and on materials Facebook has compiled a list of neologisms of the Russian language.
This was handled 573 million posts 3.2 million users (almost 40% of Russian Facebook): all texts are automatically split into words which are then searched for in the Open corpus of the Russian language OpenCorpora. Then, the experts manually filter the resulting list of neologisms created from words that are not found in the body.
The final list was 168 words. The vast majority of them anglicisms related to the Internet or media ("photo", "video blog"). The education of all neologisms is subject to strict grammatical rules. The only exception was words like "laugh", "Makhach", "srach": it is "h" as an element of word-formation became more productive thanks to social networks.
Seventy three million six hundred five thousand nine hundred eighty seven
2. Hashtags and revolution
The influence of social networks on the world became apparent during the "Arab spring" when it turned out that the rebels coordinate their actions and mobilising the masses through Facebook and Twitter. Is it possible to use social networks to predict such events?
This question was asked by scientists from Cambridge and Harvard, has developed a program that calculates the index of political polarization, and measures the level of tension in society — the proximity to the revolutionary situation. To do this, the researchers tested 7,000 messages of Egyptians on Twitter during the unrest of 2013 in the presence of a radical for a hashtag like "#do not forget, not forgive" — in Egypt, there is almost an exact analogue of this expression.
Hashtags — these are the marks that begin with " # " are familiar with the subject and allow you to recognize "their" in the information war. It turned out that their analysis is quite appropriate for the prediction of spades mention radical hashtag really preceded the real violence.
Eighty million six hundred eighty two thousand eight hundred twenty one
3. Likes and sexual orientation
"Man is what he laykaet", — could speak, scientists from Cambridge have studied the 58 thousand "likes" of Facebook users, and found the relationship between the fleeting preferences and a deeper personality characteristics.
They have developed the program distinguishes whites from blacks with an accuracy of 95%, the Republicans from the Democrats — 85%, Muslims from Christians — 82%. Less successfully, the program "guesses" marital status (accuracy 65%), Smoking (73%) and drug use (65%). Allow the huskies to judge sexual orientation: for men, with an accuracy of 88% for women 75%.
While correlations are not always direct: for example, only 5% of gay men laykali same-sex marriage and other equally specific events. The program makes conclusions based on indirect evidence such as the music preferences. For example, like Hello Kitty is to recognize their openness and emotional instability, and fans of spiral fries for sure identifitseerida as having a high intelligence.
Ninety six million three hundred seventy thousand seven hundred eight
4. Facebook and mood
Nothing human is not alien. The birth of children and the revolution, catastrophes and celebrations — all the major events of real life inevitably recorded on the pages of social networking. So I decided Alexander Panchenko, senior researcher of the Moscow "laboratory of digital companies," write a program that defines the General mood of Russian-speaking segment of Facebook.
His algorithm finds in the texts of the emotionally charged words, a list of which is compiled by experts (negative — "terrible", "boring", positive: "favorite", "free"). Then calculated the proportion of positive, negative and neutral words in the text and indices of emotion.
It turned out that the positive texts appear in Facebook 7.5 times more likely to be negative. In General posts, as users are sensitive to actual events: this chart shows flashes of joy in weekends and holidays and dips that coincide with wars, natural disasters and mass protests.
Forty four million two hundred fifty seven thousand two hundred sixty three
5. Tweets and colds
The Ministry of health announced a competition for the best program that tweets like "I Think I'm sick. Feel completely overwhelmed" will be able to track the spread of the disease. Was won by a team of researchers from Johns Hopkins University.
Seventeen million six hundred thirty one thousand seven hundred seventy five
Their algorithm analyzes 5000 short messages per minute and eliminates those that do not relate to the health of a specific user (for example, "Obama today was not impressed. Unwell, probably"). As a result, the percentage of "cold" tweets program in real time collects reliable information on the number of cases in the country and ways of spread of infection.
Eighty six million seven hundred thousand eight hundred twelve
6. Friends and parting
Even if you prefer not to advertise romanticheskie relations, computer software can still calculate your partner in the list of "friends". Well, not always, but in 60% of cases. The algorithm was created by John Kleeberg from Cornell University and Facebook engineer Lars Backstrom. For testing the program, they collected data on 1.3 million users who indicated their marital status and had 50 to 2,000 friends.
Somewhere there's a bridge where two souls meet…
Your emotions are the vibrational indicator of balance or imbalance
The algorithm keeps track of how many social groups is the connection between two people. For example, she's familiar with his work colleagues, and he with her friends at school, and no more of these oft-marginalized communities with each other not connected. The more of these bridges, the higher the probability that the pair is or will be in a romantic relationship. And Vice versa: the program predicts the likelihood of separation, even if the status is "in a relationship with...".