6 things FELD M learned at useR!2019 in Toulouse

on 23.07.2019 by Linda Le

Hi, I´m Linda. I am part of the Data Science team at FELD M and was excited to participate this year’s useR!2019 conference, which took place in Toulouse.

That meant 4 days full of great

  • 3h tutorials
  • keynotes
  • 30 min blocks of 6*5 min lightning talks
  • 1,5h blocks of 5*18 min talks
  • sponsor talks
  • poster session
  • social events, …on up to 6 parallel tracks!

The complete list of talks including slides can be found here http://www.user2019.fr/talk_schedule/ and video recordings of the keynotes here: https://www.youtube.com/channel/UC_R5smHVXRYGhZYDJsnXTwg/videos. The video recordings of all talks are uploaded here: https://www.youtube.com/channel/UC_R5smHVXRYGhZYDJsnXTwg/videos.

Let me tell you about the conference’s input as I guide you through a typical project´s timeline. I took advantage of a nice Machine Learning Workflow Hexa-Diagramm and added a 6th Hexagram, adding ‘Communication’ of projects.

Let’s go through the 2nd, 3rd and 6th Hexagon to give some examples, what I took with me from useR! and where we now are taking some deep dives to improve our workflow.

 

  • {tidyr} by famous Hadley Wickham (a must read for everyone advancing in R is his recent 2nd edition of “Advanced R” book: https://adv-r.hadley.nz/index.html) is updated. In the area of web analytics we, at FELD M, receive raw data, in which all touchpoints of all visitors/customers are recorded in rows. In order to analyse customer journeys, we need to reshape our data, so that we have the customers in rows and all touchpoints per customer, i.e. the customer journey in another column. The transformation of reshaping the data from long format to wide format is therefore a regulary used transformation in Data Science projects. The current functions to reshape data are spread() and gather(), where many R-users had to strugggle with the logic. So, Hadley Wickham showed us the work in progress functions pivot_longer() and pivot_wider(), with a more intuitive function and arguments name to reshape data. https://tidyr.tidyverse.org/
  • When working with large data sets we usually use either data.table or SparkR (which we currently prefer over sparklyr because of its more similar syntax to PySpark and hence easier switch between Python and R). The latter two methods rely on RAM for their performance. Since our datasets often don’t fit into the RAM anymore but are still below real big data (calculations can’t be handle by a single machine anymore), the newly developed package {disk.frame} (https://rpubs.com/xiaodai/intro-disk-frame) offers an interesting possibility to store and process medium sized datasets. Data larger-than-RAM is split up and stored in chunks on the harddrive and {disk.frame} provides an API for manipulating these chunks. Unlike Spark, {disk.frame} does not require a cluster and can use any function in R.
  • Before we build a model, we first analyse the data on a descriptive level to decide what assumptions we make to build a model. Visualizing high-dimensional data can then be a cumbersome task. In a tutorial Di Cook showed us her packages like {tourr} https://github.com/ggobi/tourr, which visualizes higher-dimensional (>3) data in an animated rotation. You can take a variable and rotate it out of the projection and see if the structure persists or disappears. The package {nullabor} https://github.com/dicook/nullabor is a tool for graphical inference. Your data plot will be displayed among several random nullplots (plots representing your nullhypothesis). If the difference is visible, there is probably a statistical significane in the structure of the plot.
  • Due to the individual advantages of Python and R, at FELD M Data/Software Engineering is mainly done in Python, while the analysis (building models, statistical tests) by the Data Science Team is more focused on R. Our Data/Software Engineering- and Data Science Team is already working very closely together on Advanced Analytics projects to take the advantage of both expertises and both languages (Python and R). Of course, it is in general our goal to build our (data) products in one programming language. Nevertheless, sometimes we build prototypes, which have to live in both worlds and require to use both languages. The {reticulate} package https://rstudio.github.io/reticulate/ makes it possible to call Python out of RStudio. Rounded off by the GUI developments of knit Rmarkdown, it will be easier to bridge language silos.

 

  • When it comes to building a model, it is always important to know the cause of a variable, as we all know “correlation != causation”. Under the assumption, that causal relationship leaves a structure in the data, there are many procedures that detect this causation. Causaldisco summarizes the causal discovery procedures in R and filters the appropriate procedures for your data when you choose your properties. http://biostatistics.dk/causaldisco/.

 

All in all, the success of a project depends not only on the methods, such as those mentioned above, but also on the environment you create in your company. Julie Lowndres showed us in her keynote (https://www.youtube.com/watch?v=Z8PqwFPqn6Y&t=2806s), how she and her team work by embracing open data science, openess and the power of welcome.

FELD M is looking forward to take some deep dives into the learnings listed above now and to put them into practice to improve our workflow and smoothen the journey for our customers.

If you are interested in our work, come and check out our portfolio: https://www.feld-m.de/service/data-strategy-advanced-analytics/.

Or if you are a NGO/NPO, come and check out our contribution to Data Science for good with our “Data Ambulance”: https://www.feld-m.de/datenambulanz/

12 Antworten auf „6 things FELD M learned at useR!2019 in Toulouse“

더존카지노 sagt:

Hello, I think your web site may be having web browser compatibility problems. When I look at your site in Safari, it looks fine but when opening in Internet Explorer, it’s got some overlapping issues. I merely wanted to provide you with a quick heads up! Aside from that, wonderful site!
https://www.wooricasino.site/thezonecasino

우리카지노 sagt:

최근에 시작한 내 블로그에서 독자가 많지 않아 댓글과 커뮤니티를 개발하는 데 시간이 걸리 겠지만 그래도 제대로하고 싶어요… 이 기사를 염두에 두십시오. 특히 귀하의 의견 # 5는 그 자체로 후속 기사가 될 가치가 있다고 생각합니다.
https://liveone9.com/woori/

카지노사이트 sagt:

When I ask a question at the end of a post, I always get responses. I do it judiciously, though, as I like to ‘mix it up’ and not always ask for a response.
https://www.vfv79.com/

예스카지노 sagt:

I really enjoy the article.Thanks Again. https://www.betcasino7.com/yescasino 예스카지노

카지노사이트 sagt:

Whether or not this happens though, and how it happens, is hard to predict. Does it just depend on who is commenting and how they respond, or is there something the blogger can do to encourage that? And if so, is it related to content, context or controversy? Or just to starting great conversations to begin with?
https://main7.net/

카지노사이트 sagt:

Great post. The conclusion is one aspect of my writing that I need to approve upon. I’ll definitely incorporate some of these ideas.
https://txt2080.com/

퍼스트카지노 sagt:

샌즈카지노https://pachetes.com/sands/ – 샌즈카지노
우리카지노https://pachetes.com/ – 우리카지노
메리트카지노https://pachetes.com/merit/ – 메리트카지노
퍼스트카지노https://pachetes.com/first/ – 퍼스트카지노
코인카지노https://pachetes.com/coin/ – 코인카지노

https://pachetes.com/first/ – 퍼스트카지노

퍼스트카지노 sagt:

우리카지노https://pikfeed.com/ – 우리카지노
메리트카지노https://pikfeed.com/meritcasino/ – 메리트카지노
퍼스트카지노https://pikfeed.com/firstcasino/ – 퍼스트카지노
샌즈카지노https://pikfeed.com/sandscasino/ – 샌즈카지노
코인카지노https://pikfeed.com/coincasino/ – 코인카지노
카지노사이트https://pikfeed.com/casinosite/ – 카지노사이트
바카라사이트https://pikfeed.com/casinosite/ – 바카라사이트

https://pikfeed.com/firstcasino/ – 퍼스트카지노

퍼스트카지노 sagt:

샌즈카지노https://www.betgopa.com/sandzcasino/ – 샌즈카지노
우리카지노https://www.betgopa.com/ – 우리카지노
메리트카지노https://www.betgopa.com/thekingcasino/ – 메리트카지노
퍼스트카지노https://www.betgopa.com/firstcasino/ – 퍼스트카지노
코인카지노https://www.betgopa.com/coincasino/ – 코인카지노

https://www.betgopa.com/firstcasino/ – 퍼스트카지노

카지노사이트 sagt:

우리카지노https://szarego.net/ – 우리카지노
메리트카지노https://szarego.net/merit/ – 메리트카지노
샌즈카지노https://szarego.net/sands/ – 샌즈카지노
퍼스트카지노https://szarego.net/first/ – 퍼스트카지노
코인카지노https://szarego.net/coin/ – 코인카지노
카지노사이트https://szarego.net/bestcasinosite/ – 카지노사이트
바카라사이트https://szarego.net/bestcasinosite/ – 바카라사이트

https://szarego.net/first/ – 퍼스트카지노

hennylu sagt:

Best Quality Montblanc Card Leather Holder. Mont Blanc. $35.95. Montblanc wallet-2021321. Buy Replica Montblanc Leather Wallet – Small Size. Mont Blanc. $35.95. Replica Hermes H Buckle for Belt – Buckle Only for Sale https://www.allpurse.ru

fake purse sagt:

sporting color block wallets, embellished wallets and more, men are still sticking to their basic black, tan, or brown two-fold leather replica wallets https://www.eepurse.ru

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht.