Explore data, Assignment 1: Visualise data

Submit your assignment for the Visualise data in R module in a reply to this topic

Remember to include these five elements:

  1. A plot :bar_chart: that you’ve created during this module
  2. The code :abc: you used to create it, surrounded by ```
  3. How do you interpret :bulb: the plot; what insight can you gain?
  4. A brief reflection :thinking: next step, challenge you solved, or problem you still have
  5. Give specific and useful feedback :thumbsup: to a coursemate

Remember that each written step need only be a sentence or two

Hi Lucy,
Here is my assignment.

  1. Plot

  2. Code
    ‘names(penguins)
    ggplot(penguins,
    aes(x = body_mass_g,
    y = flipper_length_mm)) +
    geom_point()’

  3. Interpretation - as body weight increases, so does flipper length

  4. Reflection - forgot the headers, so used ‘names()’ command. Very useful.

1 Like

Hi All,

My assignment

‘’'penguins_summary ← penguins %>%
group_by(sex, year) %>%
summarise(body_mass_mean = mean(body_mass_g))

ggplot(penguins_summary,
aes(x = year, y = body_mass_mean,
colour = sex)) +
geom_line()

ggplot(penguins,
aes(x = year, y = body_mass_g,
colour = species)) +
geom_point() ‘’’

The Gentoo species has had the highest average body mass over the three years. Chinstrap increased in 2007 and dropped in 2009 while Adelie seems to have peaked in 2009

Been trying to practice with own data in Rstudio but I am failing to access the tidyverse packages so I can sun ggplot.

1 Like

Yes, names() is particularly useful when working with unfamiliar datasets :smile:

Is your tidyverse problem caused by the amount of time it takes to download and install the tidyverse? It has so many packages! Have you tried installing only those packages you need, e.g. ggplot2, readr, dplyr?

Or is it another error, that we can maybe help with?

Thank you for this, i will try and download only the packages i need.

1 Like

Hello . Here is my assignment

  1. ggplot(penguins,
    aes(x = year,
    fill = island)) +
    geom_bar(position = “dodge”)

3.The largest sample size is on Biscoe island , followed by Dream then Torgen with the least ; across all years .

  1. I have been having trouble running these codes on my own data in R studio .I keep getting an error or “NULL” when I edit and run these codes . I downloaded functions, so will try to find out where exactly I am missing it . Kinda frustrating to be getting all these errors , however, its part of the learning I suppose. A few hiccups before getting it right ! :slight_smile:

Well done, Kundai!

Coding can be really frustrating, but on the flip side, when it goes smoothly it is very satisfying! We just need to get you to that point…

This week’s learning activities include importing data, assigning the appropriate data type, and correcting errors, so perhaps you’ll find some answers to your problems there, later today

Do share the error messages you’re getting, either as a screenshot or copy/paste your code and the error, with a brief description of your data and/or what you want the code to do

It’s best to ask for help in a new post in our course discussion area:

  1. Hit the C on your keyboard
  2. Select our course area where it says category…
  3. Add a descriptive title mentioning the error/problem
  4. Add the help-needed tag where it says Optional tags
  5. Describe your problem, with code/screenshots

If you need to come back to a draft post at a later point, you can find it in the Drafts area of your profile

Now that others have posted their assignments, you can have a go at the final Feedback step and complete your assignments @Alice_Laguardia, @Nicorll, @Kundai and all :smiley:

Hi! Since the boxplot exercise is now online, I thought of testing it out with your example.

1 Like

Great work Alice,

I am not good at interpreting box plots

Hi all,

Here’s my assignment:

  1. Plot: Line plot of mean flipper length (mm) over sampled years, by species:


    (I am still troubleshooting another line plot of overall mean flipper lengths just by year, sans species.)

  2. Code:

penguins_summary <- penguins %>% 
  group_by(year, species) %>% 
    summarise(flipper_length_mean = mean(flipper_length_mm))

ggplot(penguins_summary,
  aes(x = year, y = flipper_length_mean,
    colour = species)) +
  geom_line()
  1. Interpretation: Mean flipper length slightly increased over time, for all three species, and thus overall. This could indicate that penguin sizes are potentially becoming larger.

  2. Brief reflection: My next step will be to try out these plotting methods on my own data. I’ll hopefully be able to post some results here!

  3. Feedback: @Nicorll Great line scatterplot! I find them harder to interpret than other plotting techniques. What type of plot do you think would be best suited to easily visualise this data?

And a suggestion/idea for all of us as we advance our plot-making skills, to ensure they are as reader-friendly as possible — to explore which colour schemes are best to address colourblindness! :slight_smile:

1 Like


2. Code
"library(tidyverse) # loaded to remove missing data

penguins_weight ← penguins %>% # create new dataset to exclude all N/A for certain variable
drop_na(body_mass_g) %>% # remove all n/a
drop_na(sex) %>% # remove all n/a
drop_na(species) # remove all n/a

head(penguins_weight) # check data + opened data frame

ggplot(penguins_weight, # used modified dataset
aes(x = interaction(sex, species),
y = body_mass_g,
colour = species)) +
geom_boxplot() "

  1. Intrepretation: males are heavier than females for each of three species. The Gentoo species ( both males and males) both have higher median weights compared to the other two species. Whereas the gender weights of Adeile and Chinstrap species are similar
  2. Reflection: Good summary of different plot types and uses and how to quickly cross check dataset using names.(). Visualisation of the plot is easy to interpret. It would be nice in this stage to modify x labels aswell as both axis titles

Hi all,

here is a bar chart showing number of penguins counted every year.

  1. ggplot(penguins,
    aes(x = year, fill = species)) +
    geom_bar(position = “dodge”)

(btw. if I surround the code with ```, the code is not complete once published)

  1. Adelie penguins were the most represented in the measurement each year. Followed by Gentoo and then Chinstrap penguins.

  2. I wanted to put year and island on x line to know specificaly how many individuals were measured each year on every island. I tried some combinations, but nothing worked out :slight_smile:

  3. I don’t have a feedback rather than question. In code:
    “penguins_summary ← penguins %>%
    group_by(sex, year) %>%”
    what does %>% do? Or what does it stand for?

Thank you and cheers,
Pavel

Hi @selenaflores , perhaps a line graph would have been the best way?

Hello,
I tried to use my own dataset but I will have to wait to complete the next module to know how to prepare the data to be imported in R.
Being extremely stubborn I insisted and create a couple of plots with the leaf data we collected together.


first plot I created with the following code

ggplot(leaf_dataset, 
       aes(x= Country, fill=Angio_Gymno))+
  geom_bar(postion="dodge")

I wanted to see the Angiosperm / Gymnosperm relation by country.
I see that a mispelled word in my dataset caused R to create three categories.
Being unable to fix that error without touching the original dataset and then re-upload it I decided to try with another variable, so to compare the collectors to the trunk size,
image
using this code

ggplot(leaf_dataset, 
       aes(x = Collector, fill = TrunkSize )) + 
  geom_bar(position = "fill")
  • despite the satisfaction of being able to create this very colourful plot, the datas are categorised wrong because of mispelling.
  • Again, being unable to fix it using R, I tried again and again with more variables within the same dataset until I found one variable that seems all correct (mostly because I don’t know the original size of your leaves!)
ggplot(leaf_dataset, 
       aes(x=Length_mm, y= Width_mm))+
  geom_point()

most of the leaves we collected are less than 200 mm long and 100 mm wide

Reflection: data are often recorded directly in the field, often in difficult conditions and with many things that can cause distraction, this exercise made me think at the importance of having a solid data collection methodology designed to reduce human error, and that I need to improve my knowledge to prepare data (I constantly mispell words and do typos!)

1 Like

Hi Selena , very good point about colorful plot for colorblind people! (my colleague is colorblind and since they told me I try to do all the graphs in the presentations with colors he can see too)
But also many papers are published in black and white only, how can we render line plot like yours in a black and white image?

Chiara

Great question! I think putting points of differing shapes on the lines would be one way to address the lack of differing colour options on such plots.

For the factors being compared, I agree! :slight_smile: