Dear OF Team, I would like to import a rather long species list (csv >700 MB) to Collect, using the provided R-scripts (https://drive.google.com/open?id=1FKe3dlFRgw1IrUmUtF6q02j4qDr9gkki). As reference, I am working with the PDF Species coding approach for Open Foris Collect and Calc. However, I can't get the script to work properly as the following step keeps resulting in an error:
Even though I am considering splitting the csv into separate species lists according to the taxonomical order, I'd still need the script to work. Can anybody point me in the right direction to find the source of this error? Thank you very much in advance, Alex asked 10 Dec '20, 12:00 wexxo |
Dear Alex, thanks for sharing your data. I fixed two issues in the R script: 1) Microsoft applications may add weird characters into the column names, and this was the case in your CSV (and you do not see this Excel nor NotePad++), see e.g. https://stackoverflow.com/questions/22974765/weird-characters-added-to-first-column-name-after-reading-a-toad-exported-csv-fi?rq=1 so name "family" was read as "ï..family". This is fixed in read.csv() line. 2) The code was only working with a list that contains at least one case where there is subspecies or variant name! This obvious design mistake is now fixed so that the script works when input data contains just "pure" species names. Regards, Lauri answered 11 Dec '20, 14:42 Lauri (OF) ♦♦ |
Dear Alex, please try to update package data.table This is probably caused by a bug in that package and it should be already fixed, see e.g. https://github.com/Rdatatable/data.table/issues/3495 Does this help? Regards, Lauri answered 10 Dec '20, 16:47 Lauri (OF) ♦♦ Dear Lauri, thank you for your quick answer. Unfortunately, updating the data.table (+all) packages and R.Studio (v.1.3.1093) did not solve the problem. The error output stays the same, print(sp_dt)at this points yields: family scientific_name 1: Chactidae auyantepuia amapaensis 2: Chactidae auyantepuia laurae 108: Chactidae vachoniochactas lasallei 109: Chactidae vachoniochactas roraima The csv represents an extract of the final species list to be imported (from the GBIF DB) for testing purposes.
(11 Dec '20, 07:29)
wexxo
Dear Alex, is comma the separator in your CSV input file? I noticed that this script fails if separator is tab or semicolon.. Indeed, it may needs fixes then.
(11 Dec '20, 08:33)
Lauri (OF) ♦♦
Dear Lauri, Yes, the separator of the sp_list.csv is comma.
(11 Dec '20, 09:23)
wexxo
Dear Alex, pls send a subset of your data to us and we can check this. Thanks! Lauri
(11 Dec '20, 09:57)
Lauri (OF) ♦♦
|